Humbly Report: Sean Bechhofer

Semantics 'n' stuff

The Eurovision Workflow Contest

with one comment

Ever wondered where the workflows that are most downloaded or viewed in myExperiment come from? Wonder no longer! Here’s a nifty visualisation using Google’s Public Data Explorer:

myExperiment Statistics

What’s it doing?

myExperiment is a Virtual Research Environment that supports users in the sharing of digital items associated with their research. Initially targeted at scientific workflows, it’s now being used to share a number of different contribution types. For this particular example, however, I focused on workflows. Workflows are associated with an owner, and owners may also provide information about themselves, for example which country they’re in. The site also keeps statistics about views and downloads of the items. This dataset allows exploration of the relationship between these various factors.

How does it work?

The myExperiment data is made available as a SPARQL endpoint, supporting the construction of client applications that can consume the metadata provided by myExperiment. A few simple SPARQL queries (thanks to David Newman for SPARQLing support) allowed me to grab summary information about the numbers of workflows in the repository, their formats, and which country the users came from. The myExperiment endpoint will deliver this information as CSV files, so it’s just then a case of packaging these results up with some metadata and then uploading to the Public Data Explorer. Hey presto, pretty pictures!

The Explorer expects time series data in order to do its visualisations. The data I’m displaying is “snapshot” data, so there’s only one timepoint in the time series — 2011. We can still get some useful visualisations out though, allowing us to explore the relationships between country of origin, formats, and numbers of downloads and views.

This is, to quote Peter Snow on Election Night, “just a bit of fun” — I’ve made little attempt to clean the data, and there will be users who have not supplied a country, so the data is not complete and shouldn’t necessarily be taken as completely representative of productivity of the countries involved! In addition, the data doesn’t use country identifiers that the data visualiser knows about, and has no lat/long information, so mapping isn’t available (there are also some interesting potential issues with the use of England and United Kingdom). However, it’s a nice example of plumbing existing pieces of infrastructure together in a lightweight way. Although this was produced in a batch mode, in principle this should be easy to do dynamically.

So, Terry, the results please…

“Royaume Uni, douze points”.

Boom Bang-a-Bang!

Written by Sean Bechhofer

March 16, 2011 at 5:30 pm

Posted in rdf, visualisation

Tagged with

One Response

Subscribe to comments with RSS.

  1. Updated figures for Feb 2012.

    Sean Bechhofer

    February 8, 2012 at 9:59 am


Leave a comment