Archive for the ‘rdf’ Category
What’s it doing?
myExperiment is a Virtual Research Environment that supports users in the sharing of digital items associated with their research. Initially targeted at scientific workflows, it’s now being used to share a number of different contribution types. For this particular example, however, I focused on workflows. Workflows are associated with an owner, and owners may also provide information about themselves, for example which country they’re in. The site also keeps statistics about views and downloads of the items. This dataset allows exploration of the relationship between these various factors.
How does it work?
The myExperiment data is made available as a SPARQL endpoint, supporting the construction of client applications that can consume the metadata provided by myExperiment. A few simple SPARQL queries (thanks to David Newman for SPARQLing support) allowed me to grab summary information about the numbers of workflows in the repository, their formats, and which country the users came from. The myExperiment endpoint will deliver this information as CSV files, so it’s just then a case of packaging these results up with some metadata and then uploading to the Public Data Explorer. Hey presto, pretty pictures!
The Explorer expects time series data in order to do its visualisations. The data I’m displaying is “snapshot” data, so there’s only one timepoint in the time series — 2011. We can still get some useful visualisations out though, allowing us to explore the relationships between country of origin, formats, and numbers of downloads and views.
This is, to quote Peter Snow on Election Night, “just a bit of fun” — I’ve made little attempt to clean the data, and there will be users who have not supplied a country, so the data is not complete and shouldn’t necessarily be taken as completely representative of productivity of the countries involved! In addition, the data doesn’t use country identifiers that the data visualiser knows about, and has no lat/long information, so mapping isn’t available (there are also some interesting potential issues with the use of England and United Kingdom). However, it’s a nice example of plumbing existing pieces of infrastructure together in a lightweight way. Although this was produced in a batch mode, in principle this should be easy to do dynamically.
So, Terry, the results please…
“Royaume Uni, douze points”.
Continuing the theme of reflecting on SKOS, the question of organisation is next. SKOS provides an RDF vocabulary for describing Knowledge Organisation Systems and there’s an assumption that SKOS is RDF from the ground up. The use of RDF brings advantages, but there are also limitations, in particular when we consider issues of containment. This is something that I wrestled with in the past when building the OWL-API libraries to support OWL . In the RDF/XML serialisations of OWL, there was no explicit connection between the axioms stated in an ontology and the Ontology object itself. This can cause difficulties in the face of
owl:imports as there was also no explicit link between the location where an RDF graph that represents and ontology is retrieved from and the URI of the Ontology itself. This was partly solved by the use of physical and logical URIs, but the question of containment is still there.
There is a similar, but perhaps more easily stated issue with SKOS. Consider, for example, the following fragment from the IVOAT thesaurus :
asteroid is a narrower term of
rotating body. In the SKOS version of this thesarus, we have two concepts,
http://www.ivoa.net/rdf/Vocabularies/IVOAT#asteroid with triples asserting the appropriate labels, the fact that these concepts occur in the IVOAT scheme and the narrower relationship.
What we don’t have here, however, is the assertion that the narrower relationship occurs within the ConceptScheme. The same also holds of the labels — the labelling of the concept is not explicitly bound to the concept scheme.
Now, this isn’t really a failing of SKOS, but is rather a consequence of the use of RDF for the representation. Solutions to this could involve reification (bleuurgh) or the use of named graphs to identify the triples associated with a ConceptScheme. At the time of the SKOS Recommendation, however, no standard was available.
Does this really matter? Is it an issue? So far, a lot of SKOS publication seems to be organisations exposing their own vocabularies, with instances of
skos:Concept appearing in a single
skos:ConceptScheme with semantic relationships asserted “within” that scheme and thus under the control of the Scheme “owner”. That may not be too difficult to manage. Things will get more interesting once we have greater use of the SKOS mapping relationships , which are intended for use between Concepts in different ConceptSchemes. Such mappings are likely to present different and potentially conflicting points of view or opinions, and we will then require more details of the provenance of the assertions.