In January we brought together coders, developers, designers, business developers, domain experts and scientists for the Future Food Hack 2015 at Wageningen University & Research Centre. The hackathon was a ‘32 hour get-away’ for interdisciplinary teams trying to create impact with open data in agriculture and nutrition.
So, how far did we get? Did we change the world, find solutions for intractable issues and develop easy-money-generating apps? Not quite (yet). We did however learn a lot about the issues at stake and what it takes to develop innovative software solutions. We hope that by sharing our experience and insights, we’ll collectively be able to build on top of them.
First up: the Semagrow challenge, which deals with the wondrous world of Linked Open Data and the Semantic Web. We realize that for most people this is unfamiliar territory. In that sense, it was actually our least ‘accessible’ challenge. But as our retrospective will point out, getting a bit more familiair would actually really benefit both experts and the rest of us.
ps – with thanks to Rob Knapen for editing
Semagrow is an EU funded research project that targets the problem of the growing supply of large, dynamic, unstructured and heterogeneous data. It has developed a distributed infrastructure to allow transparent querying of large heterogeneous data sources. The infrastructure is a layer that lies on top of existing data repositories and networks. This way data providers and consumers can publish data in their own preferred format and query all data with the standard Semantic Web query language SPARQL. For the hackathon the Semagrow approach was applied to the specific challenge of making the most of publicly available food safety data.
The Food Safety challenge: putting Semagrow to the test
The early detection of food disease outbreaks and the extraction of conclusions from data analysis of food safety alerts have become a major societal challenge. It is really important we can take full advantage of existing publicly available data about food borne diseases. This requires access to data, analyzing and correlating it to find potential food safety issues, validating them, creating dashboard like visualizations, and sending out alert notifications. Currently, decision makers both in the public and private sector, food scientists, microbiologists and epidemiologists cannot take full advantage of existing data for foodborne diseases, mainly for two reasons:
- part of the information remains unstructured or is still enclosed in internal databases and
- the information is stored in custom and non-standard schemas and thus it is not shared globally in an interoperable way.
The technology developed in the Semagrow project was showcased and demonstrated during the hackathon, so participants could experiment with its components and tools (for more info, check the Semagrow presentation).
A four members team investigated Semagrow technology to see how it helps in coping with big open agricultural data. The key question was to test the concept of federated query processing for Linked Data, i.e. that you can generate relevant insights by combining multiple sources of data. They took “Salmonella” as a literal and queried an emulated version of Semagrow. The nice thing about Linked Data is you can keep on asking the data for connections to other data. Federated query processing further allows for the transparent querying of heterogeneous and distributed data sources. In fact you get access to numerous data sources from around the world from a single access point. The Semagrow infrastructures routes your query to the proper servers and collects and combines all the data returned into a unified answer, while monitoring performance and completeness.
Although some connections were found, what the available taxonomy did not show was that there are several species of Salmonella. So here we see that Linked Data performs poorly when it depends only on text-oriented environments. We need an anchor point in the real world to which data can be connected, i.e. make more use of shared ontologies instead of controlled vocabularies. And markup more data with e.g. microformats so it becomes easier to semantically process by machines. Here’s a short presentation of the team’s approach and results.
Linked Data is very exciting since it allows combining knowledge from several data sources into one. However, it is still messy and it will only work if the crowd is heavily involved with cleaning it up or open data is provided in a better usable way from the start. Semagrow right now seems to be still a scientific solution that has not yet been embraced by its users. However including hackathons in these kind of research projects is certainly a good way to make such a connection, get real community feedback, and increase longevity of the project’s results.
Ps- If you’d like to know more about LOD, this is a good place to start!