My roots are in the confluence of databases, operating systems, programming languages. For many years this was manifested in work on persistent systems. Recently I have been working on three areas which still relate to the above themes.

The Digitising Scotland project is creating a linked pedigree of the Scottish people from the mid 1800s. This is being created from digitised birth, deaths and marriage records. A group at St Andrews. of which I am a part, is engaged in linking these records. Related to this we are creating a population simulator to provide ground truth for linkage experiments. This work is largely being conducted by Tom Dalton who is supervised by Graham Kirby and myself.

As part of the Digitising Scotland project we have found many problems with current approaches to linkage. To address these deficiencies I have been working with Richard Connor from Stirling on Similarity Search. We are developing new Similarity Search algorithms over Metric Spaces to enable efficient search for linkage matching.

Lastly my interest in Operating Systems is continuing with Ward Jaradat (now at Adobe) and Jon Lewis. We are working on a UniKernel operating system called Stardust which is capable of supporting Java applications. The Stardust Unikernel is now operational and supports a full networking stack and Posix threads.

Current Projects and activities

Digitising Scotland and associated activities

I am co-investigator on the Digitising Scotland project, in which we aim to create a linked genealogy of Scottish historical records, with Chris Dibben and Lee Williamson at Edinburgh and Graham Kirby at St Andrews. We have focused on automatic classification of certain fields within the records (cause of death and occupation); now we are starting to experiment with various probabilistic linkage approaches. This work also includes Eilidh Garrett and Alice Reid at Cambridge, and Peter Christen at ANU.

This work is also related to a work package on linkage methodology within the ESRC-funded Administrative Data Research Centre - Scotland - Scotland with Graham Kirby, Peter Christen and Alasdair Gray.

As part of this project we are working with Tom Dalton who I co-supervise who is trying to create a statistically realistic synthetic linked genealogy of Scottish historical records to use as a test case for linkage algorithms. This will be our third attempt at creating such a genealogy - this problem is surprisingly difficult.

Older projects

The Sea of Stuff

With Graham Kirby and my PhD student Simone Conte we are working with a group from Adobe in Edinburgh (Adrian O’Lenskie and Ian Patterson) on something we call the Sea of Stuff. The sea of stuff is a potentially infinite amalgam of storage repositories with potentially different storage costs, access latencies etc supporting addressable persistent data. The problem we are addressing is how to better manage data many different devices such as tablets, laptops, and various cloud providers and storage providers. We are also interested in the provenance of data in such a system; in particular provenance over how data is originated, how and when was it changed, by whom and what processes have been applied it.


With Simon Dobson and Shyam Reyal we have developed a system called NOMAD (NMR Online Management and Datastore) which manages the five Bruker liquid state NMR machines in the School of Chemistry. This is on-going tech transfer experiment which was conceived to address the needs of the School of Chemistry whilst allowing us to experiment with data provenance and management. The system offers an end-to-end management system for the NMR machines from scientists walking in with their samples, to programming the results, to producing publication ready URLs for use in papers.


This project started as a part of the SICSA Smart Tourism Initiative funded by the SFC which brought together Universities from across SICSA, tourism organisations, and industry to address some of the key challenges in the sector. The question we address are: how can layered accessible interpretation be delivered on unstaffed sites, particularly those with no utilities, broadband or 3G access? and how can data about visits to unstaffed sites be recorded and feedback from these visits be gathered? The result was Qraqrbox which a visitor to a tourist site to freely view the same rich information at the site and at home despite not having any Internet connection at the remote site. Contextual information is delivered using a specialised low-power server that requires no external power supply, instead relying on renewables (primarily solar) for use in isolated areas where there is no mains power or 3G services. This stand-alone server can also be configured to provide a captured local Internet for locations such as museums where power is not a problem but a self-contained service is required.