Categories
Research Process

Modelling Patchy Data

How do researchers manage when they have missing data? One of the initial aims of the Mapping Museums project was to establish an authoritative dataset of all the museums open between 1960 and 2020, and to record information on their location, governance, accreditation status, subject matter, opening and closing dates, and visitor numbers. Having this material would provide the first step in constructing a nuanced, evidence-based history of the development of the museum sector during the period, and so the research team began to compile information from numerous sources: surveys conducted by government bodies, by the Association of Independent Museums, and the Museums Association; lists of museums held by the national organisations for the arts; guidebooks; and websites. The researchers also got in touch with dozens of tourist boards and local history groups, and hundreds of curators and volunteers to follow up leads or information. All this material was cross-checked within the team, and then reviewed by experts from the Museum Development Network.

We now have a rigorously researched list of museums in the UK from 1960-2020. Even so, there is still a considerable amount of missing data. When the first phase of data collection was finished we had identified almost 4,000 museums and had established the following coverage of their key attributes:

  • Museum opening dates: 88%
  • Museum closing dates: 6%
  • Governance: 92%
  • Visitor numbers: 67%

The question then was, how were we to represent and model the missing dates, governance, and visitor numbers within our analysis?

At the same time as collecting data, we started to build a knowledge base that allows users to explore. The system is designed so that users can browse in a structured way through the categories of accreditation, governance, location, size, subject classification, year of opening and year of closing, and see the results on a map or in a list view. Alternatively, they can submit a detailed search that allows them to filter results by combinations of the categories above, or they can generate visualisations of how the different types of museums have emerged over time and create tables showing how the various categories inter-relate. At any point, it is possible to scrutinise the details of individual venues.

One option for dealing with missing information was to exclude museums with missing data from the relevant searches. The problem with that approach is that incomplete data tends to be associated with small, unaccredited museums or with museums that have since closed and so excluding them on this basis would bias our analysis in favour of extant established museums., which would be counter to the purposes of the project as a whole. Thus, when we could not identify a museum’s governance, we assigned it a value of Unknown. The advantage of an explicit Unknown category is that the missing data is made apparent, and the problem of data patchiness is exposed rather than hidden.

We took a different approach to opening and closing dates because we often had rough information about these rather than no information at all – for example, we might know that a museum had closed at some point in the 1990s. This approximate information would be lost if we just categorised a date as ‘unknown’. Therefore, we decided to use a date range of the form (earliest possible year, latest possible year) to capture imprecise knowledge about museum opening/closing dates. These date ranges are used in different ways across the different facilities provided by our system:

  • In the Browse facility, we take museums’ opening/closing dates to be the mid point of the specified date range.
  • In the Visualise facility, event occurrences are ‘spread’ equally over a date range. For example, if a museum is known to have opened between 1965 and 1969, then the count of one museum opening is spread over that time period (i.e. a count of 0.2 is assigned to each of the five years 1965, 1966, 1967, 1968, 1969).
  • In the Search facility, the user has the option of searching by definite dates so that the results exclude all the museums with date ranges attached, or by possible dates, in which case the results include museums where the date range intersects with the specified period. This allows for a much more nuanced analysis.

Looking in more detail at how Search works, opening and closing dates are stored as a pair of years (f,t) in our database, where f and t may be the same year if we know the year of opening/closing for certain.  So, for example, the pair (1965,1969) would be stored for a museum known to have opened between 1965 and 1969; and the pair (2011,2011) would be stored for a museum known to have closed in 2011. Modal Logic operators are supported by our system’s Search facility that allow the user to query whether a particular museum definitely or possibly opened/closed in a given year.  In particular, suppose a given museum ‘m’ is recorded as having opened in year ‘f’ at the earliest and year ‘t’ at the latest.  Suppose a researcher wishes to find out whether museum m opened before, on, or after a specified year ‘d’.  Then the following comparison operators are supported by our system to allow the researcher to determine whether this is definitely the case:

Comparison operator Implementation logic
(f,t) = d  DEFINITELY ON A SPECIFIC YEAR f = d  and t = d
(f,t) < d  DEFINITELY BEFORE t < d
(f,t) <= d DEFINITELY BEFORE OR INCLUDING t <= d
(f,t) > d  DEFINITELY AFTER f > d
(f,t) >= d DEFINITELY AFTER OR INCLUDING f >= d
(f,t) != d DEFINITELY APART FROM t < d OR f > d

 

And the following comparison operators are supported to allow the researcher to determine whether this is possibly the case:

 

Comparison operator Implementation logic
(f,t) = d POSSIBLY ON A SPECIFIC YEAR f <= d AND d <= t
(f,t) < d  POSSIBLY BEFORE f < d
(f,t) <= d POSSIBLY BEFORE OR INCLUDING f <= d
(f,t) > d POSSIBLY AFTER t > d
(f,t) >= d POSSIBLY AFTER OR INCLUDING t >= d
(f,t) != d POSSIBLY APART FROM not (f=d and t=d)

 

The same comparison operators are available for interrogating closing dates.

We employed a further strategy for visitor numbers, which is the least complete category and has discontinuities that make it difficult to compare like with like.  Our primary objective was to use visitor number data to provide an indication of the size of the museum and, given the patchiness of the information, we decided to have a category of Unknown and also to gross numbers into size categories of Large, Medium and Small, where large and small also have sub-categories. This approach enabled us to include data from the Association of Independent Museums and Arts Council England who generally provide visitor number ranges rather than precise figures, and to use predicative analysis to establish broad size ranges. It also allowed us to circumvent some of the methodological problems of having figures collected by different means and from across the decades. Users can browse or search according to these size categories, and in addition, they can search according to precise date-stamped visitor numbers where available.

In conclusion, in the Mapping Museums project we have managed data patchiness in a variety of ways: designing a flexible knowledge base that can be modified and added to as required; representing absence rather than ignoring unknown information; using date ranges and providing users with the option of searching by definite or possible dates; and apportioning the probability of an opening/closing event occurrence over the estimated time interval for statistical analysis. Rather than implying that all visitor numbers data are of equal reliability, we created size categories for a large number of museums, and provided the means to search the definite but incomplete data that was available.

Fiona Candlin, Alex Poulovassilis
September 2018

Categories
Research Process

Upload!

Fiona Candlin

On Friday 26th January, the Mapping Museums project reached the end of its first phase, and for us, it felt like a momentous date. For the last fifteen months Dr Jamie Larkin and I have been compiling a huge dataset of all the museums that have been or were open at any point between 1960 and now. That information has now been finalised and handed over to the computer science researcher to be uploaded. In the coming weeks, we will be able to start analysing our material and generating findings about the past sixty years of museum practice in the UK.

The dataset of museums synthesises information from a wide variety of different sources. We started with DOMUS (The Digest of Museum Statistics), which was a huge survey of museums conducted in the mid 1990s and with the 1963 Standing Committee Review of Provincial Museums. These captured a large number of museums that were open in the mid to late twentieth century, but have since closed. We then added current records and information from the Arts Council England (ACE) accreditation scheme, and from the national records gathered by from both Museum Galleries Scotland (MGS), and the Welsh Museums Libraries Archives Division (MALD) and the Northern Ireland Museums Council (NIMC), since these lists both include non-accredited venues museums. The Association of Independent Museums (AIM) gave us a list of the museums that have been members their membership records and we also managed to find the results of a very old survey that they had conducted in the 1980s in the University of Leicester Special Collections library. This was research gold for it identified very small museums that are extremely difficult to trace once they have closed.

We included around half of the historic houses that are listed in the Historic Houses Association guidebook, and a number of properties that are managed by English Heritage, Historic Environment Scotland, or CADW. Deciding which venues reasonably constituted museums was a difficult process and one that we did in consultation with senior managers and curators of those associations, colleagues from the Museums Development Network and with the ACE accreditation team, although the final decisions were our own.

In the course of researching my last book Micromuseology: an analysis of independent museums, I had compiled a list of very small idiosyncratic museums, and these were added into our rapidly growing list, as were a surprisingly long list of museums that were listed online but not in any of our other sources. We then checked our dataset against the Museums Association ‘Find A Museum Service’ and against two huge gazetteers The Directory of Museums and Living Displays and The Cambridge Guide to the Museums of Britain and Ireland edited by Kenneth Hudson and Ann Nicholls in 1985 and 1987 respectively. Finally, we also consulted the Museums Association Yearbook at five yearly intervals from 1960 until 1980 and also a variety of publications that listed historic houses that were open to the public. In all cases, any venues that we had previously missed were added.

Having established a long list of museums we needed to ensure that we had a correct address, and the opening and closing dates for each venue. We also wanted to establish its governance, whether it was national, local authority, university, or independent, and if the later, if it was managed by a charitable trust or by a private group. Finding this information necessitated months of emailing and telephone calls, and we often ended up speaking to the children of people who had founded museums, or to members of local history associations in the relevant area. Even so, the process of compiling our dataset was not yet finished for we also needed to classify each museum by subject matter. In order to do this we devised our own classification system and considered each venue on an individual basis. It is little wonder that major museum surveys are infrequently undertaken.

The next phase of the research is analysing the data, so watch this space for updates. The first findings on museum opening and closure will be presented at ‘The Future of Museums in a Time of Austerity’ symposia at Birkbeck on February 24th 2018. We will also be tweeting about interesting aspects of our analysis, so don’t forget to follow us @museumsmapping on twitter.

Copyright Fiona Candlin January 2018.