Tuesday 14 October 2008

Creating a GeoFOSS marketing pipeline

Petrol stations are strategically placed around the globe to be accessible to the world's extensive fleet of motorised transport. All petrol stations (called gas stations in the US) contain a petrol pump, a cash register and a shop attendant. This distribution pipeline for petrol can be effectively used for other products too. Cars need oil and drivers get thirsty so why not package up oil and canned drinks and sell them too. You see, it is easy to add extra packaged products to an existing pipeline.
The key elements in this pipeline are:
  • The distribution channel (a chain of petrol stations)
  • Packaged products. Coke is easier to sell in the can than pouring from of a jug
  • A company which produces the product, like the Coca-cola company.
  • Promotion and Price. These should also be mentioned to complete the four Ps of a marketing strategy: (Product, Price, Promotion and Place - or distribution pipeline).
Geospatial and Software Conferences are an attractive marketing pipeline for Geospatial Free and Open Source Software (GeoFOSS). These conferences attract key purchasing decision makers, the conferences occur around the world using a standard format of speakers + exhibition booths and GeoFOSS has a world wide pool of enthusiastic champions willing to man booths and give presentations.
To fill the potential for this conference pipeline, we need a conference exhibition pack. This pack could include a stack of demo DVDs, fliers, an OSGeo banner or two, GeoFOSS demos running on a computer and train-the-trainer material for GeoFOSS exhibitors. My feeling is the demo DVD is the most valuable sales tool in the pack and should be given our initial priority. The demo DVD can run on a computer in an exhibition booth and I've heard a lot of people walking away from GeoFOSS workshops with a demo DVD saying they will try the demo DVD at home with their own data.

The GeoFOSS exhibition pack will be a win for multiple parties:
  1. OSGeo exhibitors will have professional material to present which will increase the exhibitors' professionalism and sway more users toward GeoFOSS.
  2. GeoFOSS software developers will have an international marketing team selling their projects. The offer to developers is, "Get your packages into the demo DVD, get your marketing material sorted and your products will be marketed internationally by a team of GeoFOSS evangelists".
  3. GeoFOSS Sponsors will have an extensive marketing channel which will justify sponsorship in return for advertising. We need sponsors because OSGeo's modest marketing budget won't cover costs for a presence at all key international, geospatial conferences.
Once established, GeoFOSS products can be marketed through other channels as well, (e.g. the demo DVD can be handed out to new University students) and other products can be marketed through GeoFOSS channels (e.g. The OGC are considering developing a persistent, standards based Geospatial Integration Showcase).
So I propose that we, the GeoFOSS community, build our marketing pipeline, starting with a GeoFOSS exhibition pack and GeoFOSS demo DVD. The marketing pipeline will have a major impact on the GeoFOSS market and will spawn numerous related projects which make use of this direct line to users.
Further Reading:

Friday 10 October 2008

Community Schemas: Making sense out of disparate datasets

With so many organizations publishing geospatial datasets using standards based web services, a raft of new opportunities for large scale data analysis are presenting themselves. The challenge now is integrating the datasets which use different terms and attributes to describe the same data. For example, "water quality" (good, medium, bad) in one database might equate to "pollution level" (1,2,3,4,5) in another.

Communities, like the hydrology community backing the Australian Water Data Infrastructure (AWDIP), are solving these data integration issues by defining a community schema for their domain, then ensuring all agencies publish data using the community schema.

Community Schemas are used to describe a rich set of semantics for a domain, using basic building blocks provided by Geography Markup Language (GML). This allows communities to define schemas appropriate for their data to be used for data transfer within their community. The schemas can then be referenced to ensure consistent structure and taxonomy between related datasets, improving the communities’ ability to share data. For example, a hydrology schema may define a class called Water Use with acceptable terms defined as irrigation, domestic, and industrial. A dataset published with an invalid Water Use of farming will not validate and the user will know to correct the mistake.

Publishing data through a standardised community schema means:

  • A range of applications and data analysis projects are developed because extensive, quality data becomes available and cost effective

  • Clear data definitions reduce data misinterpretation

Defining a community schema for a domain is non-trivial as it requires participating parties to create and agree upon a data architecture, vocabularies and an interchange protocol. Luckily, the first community schema projects have left a trail of reusable building blocks and processes that can be used by future efforts. For instance, specifications for Observation and Measurement (O&M) were developed as part of the Sensor Web Enablement (SWE) specification and have since been used as a component in Geoscience Markup Language (GeoSciML), Water Markup Language (WaterML) and others. Other building blocks include Geography Markup Language (GML), SensorML, CityGML and the ANZLIC profile of ISO19115 for Metadata.

The other critical component in the development of a community schema is buy-in, governance and testing from the user community. This is often an international effort. GeoSciML, a schema for geology, has participants from BGS (United Kingdom), BRGM (France), CSIRO (Australia), GA (Australia), GSC (Canada), GSV (Australia), APAT (Italy), JGS (Japan), SGU (Sweden) and USGS (USA) and the OGC (International).

Communities need to adopt a governance structure to resolve the inevitable disagreements over technical details. The GeoSciML community, who started in 2003 and are now onto their third schema iteration, have organised working groups for information model development, computational model development, vocabulary definition, defining use cases, testing the schemas in a formal test bed, then promoting the schema though an outreach working group. A key element in the success of GeoSciML is the fact that custodianship of geologic information is managed by similar agencies in most jurisdictions (Geologic Surveys) and these have a history of collaboration.

Most agencies will first encounter a community schema when they are asked to deploy their datasets using one. Spatial data is collected by numerous agencies, for various purposes, following different collection guidelines. Storage models tend to reflect the original data use and are rarely designed for data exchange. When data is published through web services, the schema usually reflects the storage model. This works fine for the original application but is an integration nightmare when trying to share data between agencies. Changing the storage format usually isn’t desirable if it breaks legacy applications or introduces sub-optimal performance. Hence, it is necessary to differentiate between the storage and exchange formats. Storage format can be defined by the custodian who generates and maintains the data. The challenge is to map the storage model to a community schema. Again, prior projects have built a suite of tools to help out.

Funding from Australia’s National Collaborative Research Infrastructure Strategy (NCRIS), CSIRO and DPI Victoria has added community schema support into GeoServer, an open source WFS and WMS server. Deegree, another open source WFS/WMS is also being investigated. The open source FullMoon supports transforming UML data models into various GML application schemas and is being managed by CSIRO. Duckhawk, an open source WFS & WMS robustness and validation testing tool was developed for the Australia Water Data Infrastructure Project (AWDIP) for testing WaterML.

There is a common theme developing around community schemas; many of the tools being developed are open source. Spatial Data Infrastructures (SDI) increase in value as more agencies contribute data to them. Also, users who benefit most from the infrastructure are often not the agencies that collect or manage the data. This results in large organisations developing SDI’s to aggregate and serve data from numerous smaller, more specialised agencies with different priorities, budgets and timelines. In order to encourage these smaller agencies to manage and publish their data using community schemas, SDI sponsoring organisations like NCRIS are developing Open Source tools in order to reduce the financial barriers faced by small agencies in getting their data online.

Australia, like the rest of the world, has a huge variety of data sets all fulfilling their own purpose and while data integration is non-trivial, we have the knowledge, tools and processes to integrate disparate datasets. This will enable more powerful analysis and new business opportunities for all participating parties.

Credits:

I'd like to thank Stefan Hansen, Software Developer at LISAsoft who was technical lead on the Duckhawk WFS conformance and performance testing framework who helped research this blog. Also Rob Atkinson and Simon Cox who provided a lot of background on CSIRO's involvement with Community Schemas.

A version of this article will be published by Position Magazine in their December 2008 edition.

Thursday 9 October 2008

Wiki paralysis

Wikis are good for collecting information from a community but they are limited when it comes to editing and reviewing - an important stage in the writing process.
Sure, wikis have tools to allow people to change others comments and review a history of changes, but I'm yet to see a wiki editor with "Track Changes" similar to Word or Open Office and consequently we don't edit wikis as much as we should. In particular, we don't remove irrelevant content, an essential component for clear, concise articles.
The problem is that wikis don't help us honour the unwritten social law of reviewing.
"It is OK to suggest changes to an author but it is disrespectful to change content without the author's blessing."
The writing workflow should be:
  1. Author writes
  2. Reviewer suggests changes
  3. Author accepts or rejects changes
  4. Publish
The messy wiki editing process tends to be:
  1. Author 1 writes, and asks others to extend their page.
  2. Authors 2, 3, 4 add content (making sure not to remove prior content). Wiki is published after each update.
  3. Time elapses
  4. Author 5 wants to clean up the page
  5. If dedicated, author 5 looks for all previous authors to ask permission to consolidate their text.
  6. Author 5 rewrites the page.
The problem is that we often get "wiki paralysis" at the clean up stage and consequently many wiki pages are long, repetitive and disjointed.