Getting started: Building the database

The Mapping Museums project aims to identify trends in the growth of independent museums from 1960 to 2020. In order to conduct our analysis we need to be able to interrogate longitudinal data for a number of museum variables, including years of opening and closure, size, and status change. At present, no such database exists that would allow us to do so. Ironically, for a sector committed to the preservation of cultural memory, documenting the institutions that participate in these activities is seemingly much less of a priority (see ‘Problems with the Data’ post). Thus, the first objective of the project was to create a functional database that catalogued all of the museums that have existed in the UK since 1960.

Before we began building this database we first considered the logistics of the process, namely the point during our timeframe when it would be best to begin to collect the data. Should we put together a snapshot of the nation’s museums as of 2016 (estimated at 2,500 at the outset of the project) and work backwards, or begin with a baseline of around 900 museums that existed in 1960 and work forwards? The former would give us a solid foundation but might require tortuous weaving back through name changes and amalgamations; the latter would give us fewer museums to start with, but might be easier as we attempted to record individual museum trajectories.

The solution was a compromise based on time and the availability of data. Between 1994 and 1999 the Museums and Galleries Commission ran a programme that produced the Digest of Museum Statistics (DOMUS). It involved annual reporting from museums that participated in the scheme in the form of  lengthy postal surveys. The information captured included address, registration status, visitor numbers and many other characteristics. While some limitations with the data have been highlighted in retrospective analyses (specifically by Sara Selwood in 2001), the baseline data that DOMUS provided was sufficient for our needs.

Using this as a starting point enabled us to begin with detailed information on nearly 2,000 museums. This snapshot of the museum sector in the late 1990s provided us with the flexibility to work both forwards and backwards in time. In particular, having records of museums at an interstitial stage of their development has been helpful in tracking (often frequent) changes of name, status, location and amalgamations.

The major problem with the DOMUS survey was accessing the data and formatting it for our use. After the project was wound up in 1999 the mass of information it had generated was deposited at the National Archives. However, given the complex nature of the data, there was no way of hosting a functional (i.e. searchable) version of the database. Consequently, it was archived as a succession of data sheets – in a way, flat-packed, with instructions as to how the sheets related to one another.

The first task was to reassemble DOMUS from its constituent parts. This meant trying to interpret what the multiple layers of documents deposited in the archive actually referred to. While the archival notes helped, there was still a great deal of deductive work to do.

Once we had identified the datasheet with the greatest number of museums to use as our foundation, the next step was to matchup associated data types held in auxiliary sheets into one single Excel master sheet. To do so we used the internal DOMUS numbers (present within each document) to connect the various data to create single cell data lines for each individual museum. We slowly re-built the dataset in this way.

In some instances the splitting of the data – while presumably logical from an archival perspective – was frustrating from a practical standpoint. A particularly exasperating example was that museum addresses were stored in a separate sheet from their museum, and had to be reconnected using a unique numerical reference termed ADDRID. While the process was relatively straight-forward, there was always a degree of anxiety concerning the integrity of the data during the transfers, and so regular quality checks were carried out during the work.

The next step was to clean-up the reassembled sheet. Firstly, we removed anything from the data that was not a single museum (e.g. references to overarching bodies such as Science Museum Group). Second, we reviewed the amassed data columns to assess their usefulness and determine what could be cut and what should be retained. Thus, old data codes, fax numbers and company numbers were deleted, while any information that could potentially be of use, like membership of Area Museum Councils, was retained. We also ensured that the column headings, written in concise programming terminology, reverted back to more intelligible wording.

This formatting helped shape the data into a usable form, but the final step was to put our own mark on it. Thus we devised specific project codes for the museums, which was useful for recording the source of the data and managing it effectively moving forwards. To tag the museums we decided on a formula that indicated the project name, the original data source, and the museum’s number in that data source (e.g. mm.DOMUS.001). Once our database is finalised, each entry will be ascribed a unique, standardised survey code.

Ultimately, the DOMUS data has acted as the bedrock of our database. It provided a starting point of 1848 museums and thus the majority of our entries have their basis as DOMUS records (which have been updated where applicable). One of our initial achievements is that the DOMUS data is now re-usable in some form, and this may be an output of the project at a later date.

A wider lesson from this process is the importance not only of collecting data, but ensuring that it is documented in a way that allows researchers to easily access it in the future. When our data comes to be archived in the course of time, the detailed notes that we have kept about this process – of which this blog will form a part – aim to provide a useful guide so that our methods and outputs can be clearly understood. Hopefully this will allow the history of the sector that we are helping build to be used, revisited, and revised for years to come.

© Jamie Larkin June 2017

Leave a Reply

Your email address will not be published. Required fields are marked *