Humbly Report: Sean Bechhofer

Semantics 'n' stuff

Archive for November 2016

Voles on the Line

with 4 comments

Rodent at Heaton Chapel (artist’s impression)

A friend of mine, Di Maynard, who works in computational linguistics and NLP, alerted me to cheapbotsdonequick last week, a service that makes it really easy to set up a twitter-bot. It hooks up to a twitter account and will tweet generated messages at regular intervals. The message content is generated via a system called tracery, using a grammar to specify rules for string generation. There are a number of bots around that use this service including some that generate SVG images — @softlandscapes is my favourite. I thought this looked like an interesting and fun idea to explore.

I’d done some earlier raspberry pi-based experiments hooking up to real-time rail information, so I decided to stick with the train theme and develop a bot tweeting “status updates” for Northern Rail. These wouldn’t quite be real updates though.

A tracery grammar contains simple rules that are expanded to produce a final result. Each rule can have a number of different alternatives, which are chosen at random. See the tracery tutorial for more information. For my grammar, I produced a number of templates for simple issues, e.g.

high volumes of X reported at Y

plus some consequences such as re-routing or disruption to catering services. The grammar allows us to put together templates plus rules about capitalisation or plurals etc.

For the terminals of the grammar — the things that appear as X or Y, I pulled lists from an external, third party data source: dbpedia. For those who aren’t aware of dbpedia, it’s a translation of (some of) the data in Wikipedia into a nicely structured form (RDF), which is then made available via a query endpoint. In this case, I used dbpedia’s SPARQL endpoint to query for words to use as terminals in the grammar. There are other open data sources I could have used, but this was one I was familiar with.

This allowed me to get hold of the stations managed by Northern Rail, plus some “causes” of disruption, which I chose to be European Rodents, Amphibians, common household pests and weather hazards. The final grammar was produced programmatically (using python).

The grammar then produces a series of reports, for example:

Wressle closed due to Oriental cockroaches. Replacement bus service from Lostock Gralam.

The bot is currently set up to tweet at regular intervals, and to date has picked up 6 followers — five of which aren’t me! You can find it at @chromaticwhale. Code is available on github.

So, is there anything to this other than some amusement value? Well, not really, but there are perhaps a couple of points of interest. First off, it’s an illustration of the way in which we can make use of third party, open information sources. This is nice because:

  • I don’t need to think about lists of European rodents and amphibians or stations served by Northern Rail.
  • The actual content of the lists were unseen to me, so the combinations thrown up are unexpected and keep me amused.
  • I can substitute in a different collection of stations or hazards and extend when I get bored of hearing about Cretan frogs and Orkney voles.
  • The data sources use standardised vocabulary for the metadata (names etc.) so it’s easy to pull out names of things (potentially in other languages).

I teach an Undergraduate unit on Fundamentals of Computation that focuses largely on defining languages through the use of automata, regular expressions and grammars. The grammars here are (more or less) context free grammars, so this gives an amusing example of what we can do with such a construct.

I am now awaiting the first irate email from a traveller who “didn’t go for the train because you said the station was closed due to an infestation of Orkney Voles”.

Written by Sean Bechhofer

November 16, 2016 at 4:24 pm

Posted in rdf