Categories
Research Process

Building the Database

The Mapping Museums project is an interdisciplinary one between Arts and Computer Science and as such a challenge in many ways as discussed in the earlier blog on “Interdisciplinarity“. The project is being run using an iterative and collaborative methodology, as the data collection often leads to new knowledge that needs to be modelled and retained. This incremental accumulation of data and knowledge means that flexibility is important so as to be able to respond to frequent changes.

We, therefore, use a Semantic Database to store and describe our data: semantic databases are also known as Triple Stores and they store pieces of information in triplets of the form Subject-Predicate-Object. For example, the fact that the Science Museum is located in London would be stored as the triplet Science Museum-hasLocation-London. The data model that describes entities (such as museums and locations) and the relationships between them (such as hasLocation) is sometimes called an Ontology.

This kind of data model can easily be extended with new triplets as new data and knowledge accrue. It can also easily be integrated with other already existing ontologies, for example relating to geographical regions and types of museums. Equally important, it allows us to describe in fine detail the different relationships between entities.

In our project, the data is first recorded within Excel spreadsheets. It is then converted into a triplets format to load into our database.  We encode the metadata, e.g. the data types and relationships, directly within the spreadsheets as additional header rows, so as to keep the model and the data “in sync”.

In more detail, the processing of the Excel spreadsheets comprises several steps:

  1. The spreadsheet is converted into a CSV (comma separated values) file.
  2. The metadata is converted into a graph, defined in the Graffoo language.
  3. This graph is processed into a number of templates, to be used for converting the data into RDF (Resource Description Framework) and RDFS (RDF Schema).
  4. These templates are used to convert each row of the CSV file into a set of triplets to be loaded into the database (which is stored using Virtuoso).

Once the database has been created, we use it to support a web-based user interface allowing users to explore the data:

 

By using semantic technologies to describe and store the data, we can support a flexible user interface that will allow users to explore spatial and temporal relationships in the data in order to begin to answer the research questions around independent museum development in the UK.

© Nick Larsson, August 2017

Leave a Reply

Your email address will not be published. Required fields are marked *