Position paper: From libraries as patchwork to datasets as assemblages?

Photo of beach view

My position paper for Always Already Computational: Collections as Data. Every attendee wrote one – read the others at Collections as Data – National Forum Position Statements.

From libraries as patchwork to datasets as assemblages?

Dr Mia Ridge, Digital Curator, British Library

The British Library’s collections are vast, and vastly varied, with 180-200 million items in most known languages. Within that, there are important, growing collections of manuscript and sound archives, printed materials and websites, each with its own collecting history and cataloguing practices. Perhaps 1-2% of these collections have been digitised, a process spanning many years and many distinct digitisation projects, and an ensuing patchwork of imaging and cataloguing standards and licences. This paper represents my own perspective on the challenges of providing access to these collections and others I’ve worked with over the years.

Many of the challenges relate to the volume and variety of the collections. The BL is working to rationalise the patchwork of legacy metadata systems into a smaller number of strategic systems.[1] Other projects are ingesting masses of previously digitised items into a central system, from which they can be displayed in IIIF-compatible players.[2]

The BL has had an ‘open metadata’ strategy since 2010, and published a significant collection of metadata, the British National Bibliography, as linked open data in 2011.[3] Some digitised items have been posted to Wikimedia Commons,[4] and individual items can be downloaded from the new IIIF player (where rights statements allow). The BL launched a data portal, https://data.bl.uk/, in 2016. It’s work-in-progress – many more collections are still to be loaded, the descriptions and site navigation could be improved – but it represents a significant milestone many years in the making. The BL has particularly benefitted from the work of the BL Labs team in finding digitised collections and undertaking the paperwork required to make the freely available. The BL Labs Awards have helped gather examples for creative, scholarly and entrepreneurial uses of digitised collections collection re-use, and BL Labs Competitions have led to individual case studies in digital scholarship while helping the BL understand the needs of potential users.[5] Most recently, the BL has been working with the BBC’s Research and Education Space project,[6] adding linked open data descriptions about articles to its website so they can be indexed and shared by the RES project.

In various guises, the BL has spent centuries optimising the process of delivering collection items on request to the reading room. Digitisation projects are challenging for systems designed around the ‘deliverable item’, but the digital user may wish to access or annotate a specific region of a page of a particular item, but the manuscript itself may be catalogued (and therefore addressable) only at the archive box or bound volume level. The visibility of research activities with items in the reading rooms is not easily achieved for offsite research with digitised collections. Staff often respond better to discussions of the transformational effect of digital scholarship in terms of scale (e.g. it’s faster and easier to access resources) than to discussions of newer methods like distant reading and data science.

The challenges the BL faces are not unique. The cultural heritage technology community has been discussing the issues around publishing open cultural data for years,[7]in part because making collections usable as ‘data’ requires cooperation, resources and knowledge from many departments within an institution. Some tensions are unavoidable in enhancing records for use externally – for example curators may be reluctant or short of the time required to pin down their ‘probable’ provenance or date range, let alone guess at the intentions of an earlier cataloguer or learn how to apply modern ontologies in order to assign an external identifier to a person or date field.

While publishing data ‘as is’ in CSV files exported from a collections management system might have very little overhead, the results may not be easily comprehensible, or may require so much cleaning to remove missing, undocumented or fuzzy values that the resulting dataset barely resembles the original. Publishing data benefits from workflows that allow suitably cleaned or enhanced records to be re-ingested, and export processes that can regularly update published datasets (allowing errors to be corrected and enhancements shared), but these are all too rare. Dataset documentation may mention the technical protocols required but fail to describe how the collection came to be formed, what was excluded from digitisation or from the publishing process, let alone mention the backlog of items without digital catalogue records, let alone digitised images. Finally, users who expect beautifully described datasets with high quality images may be disappointed when their download contains digitised microfiche images and sparse metadata.

Rendering collections as datasets benefits from an understanding of the intangible and uncertain benefits of releasing collections as data and of the barriers to uptake, ideally grounded in conversations with or prototypes for potential users. Libraries not used to thinking of developers as ‘users’ or lacking the technical understanding to translate their work into benefits for more traditional audiences may find this challenging. My hope is that events like this will help us deal with these shared challenges.

[1] The British Library, ‘Unlocking The Value: The British Library’s Collection Metadata Strategy 2015 – 2018’.

[2] The International Image Interoperability Framework (IIIF) standard supports interoperability between image repositories. Ridge, ‘There’s a New Viewer for Digitised Items in the British Library’s Collections’.

[3] Deloit et al., ‘The British National Bibliography: Who Uses Our Linked Data?’

[4] https://commons.wikimedia.org/wiki/Commons:British_Library

[5] http://www.bl.uk/projects/british-library-labs, http://labs.bl.uk/Ideas+for+Labs

[6] https://bbcarchdev.github.io/res/

[7] For example, the ‘Museum API’ wiki page listing machine-readable sources of open cultural data was begun in 2009 http://museum-api.pbworks.com/w/page/21933420/Museum%C2%A0APIs following discussion at museum technology events and on mailing lists.

Photo of beach view
The view from UC Santa Barbara is alright, I suppose

Workshop: Information Visualisation, CHASE Arts and Humanities in the Digital Age 2017

I ran a full-day workshop on Information Visualisation for the CHASE Arts and Humanities in the Digital Age training programme at Birkbeck, London, in February 2017. The abstract:

Visualising data to understand it or convince others of an argument contained within it has a long history. Advances in computer technology have revolutionised the process of data visualization, enabling scholars to ask increasingly complex research questions by analysing large scale datasets with freely available tools.

This workshop will give you an overview of a variety of techniques and tools available for data visualisation and analysis in the arts and humanities. The workshop is designed to help participants plan visualisations by discussing data formats used for the building blocks of visualisation, such as charts, maps, and timelines. It includes discussion of best practice in visual design for data visualisations and practical, hands-on activities in which attendees learn how to use online tools such as Viewshare to create visualisations.

At the end of this course, attendees will be able to:

  • Create a simple data visualisation
  • Critique visualisations in terms of choice of visualisation type and tool, suitability for their audience and goals, and other aspects of design
  • Recognise and discuss how data sets and visualisation techniques can aid researchers

Please remember to bring your laptop.


Exercises for CHASE’s ADHA 2017 Introduction to Information Visualisation

  • Exercise 1: comparing n-gram tools
  • Exercise 2: Try entity extraction
  • Exercise 3: exploring scholarly data visualisations
  • Viewshare Exercise 1: Ten minute tutorial – getting started
  • Viewshare Exercise 2: Create new views and widgets

Talk: Planning for big data (lessons from cultural heritage)

I was invited to give an hour-long talk for the Association for Project Management’s Knowledge Management SIG event on ‘What does big data mean for project and knowledge managers?’. I shared lessons from work in cultural heritage, including the British Library and Cooper Hewitt Design Museum, on ‘Planning for Big Data’.

Talk: ‘Small ontologies, loosely joined’: linked open data for the First World War, DH2015

I presented a paper, ‘Small ontologies, loosely joined’: linked open data for the First World War, in a panel on Linked Open Data and the First World War at Digital Humanities 2015 (based on my experiences as a Fellow at Trinity College Dublin working on histories of World War One with the CENDARI project).

Workshop: Information Visualisation, CHASE Arts and Humanities in the Digital Age

I’ve been asked to give a workshop on Information Visualisation for the CHASE Arts and Humanities in the Digital Age training programme in June 2015.

The workshop will introduce students to the use of visualisations for understanding, analysing and presenting large-scale datasets in the Humanities, enabling scholars to ask increasingly complex research questions.

Slides, sample data and instructions for exercises are downloadable here: CHASE InfoVis Handouts 2015.

Links for the various exercises are collected below for ease of access.

Exercise 1: Exploring network visualisations

Exercise 2: Comparing N-gram tools



Exercise 3: Trying entity recognition

Exercise 4: Exploring scholarly data visualisations

Exercise 5: create a chart using Google Fusion Tables

Google Fusion Tables: https://www.google.com/fusiontables/data?dsrcid=implicit

An Excel version of this exercise is available at http://www.openobjects.org.uk/2015/03/creating-simple-graphs-with-excels-pivot-tables-and-tates-artist-data/

Exercise 6: Geocoding data and creating a map using Google Fusion Tables

Google Fusion Tables: https://www.google.com/fusiontables/data?dsrcid=implicit

Exercise 7: Applying data visualisation to your own work

Explore more visualisations:

Sketch ideas for visualisations:

Try visualising data in different tools:

Try visualising existing data

Lecture: ‘A pilot with public participation in historical research: linking lived experiences of the First World War’, Trinity College Dublin

Trinity lecture poster
Trinity lecture poster

As part of my Visiting Research Fellowship at Trinity College Dublin’s Long Room Hub I gave a lecture on ‘A pilot with public participation in historical research: linking lived experiences of the First World War‘.

The abstract and podcast are below, and there’s further information about my CENDARI Fellowship here.

Abstract: The centenary of World War One and the digitisation of records from a range of museums, libraries and archives has inspired many members of the public to research the lives of WWI soldiers. But it is not always easy to find or interpret military records. What was it like to be in a particular battalion or regiment at a particular time. Can a ‘collaborative collection’ help provide context for individual soldiers’ experience of the war by linking personal diaries, letters and memoirs to places, people and events? What kinds of digital infrastructure are needed to support research on soldiers in the Great War? This lecture explores the potential for collaborating with members of the general public and academic or amateur historians to transcribe and link disparate online collections of World War One material. What are the challenges and opportunities for participatory digital history?

Thursday, 04 December 2014 | 13:00 | Trinity Long Room Hub

A lecture by Visiting Research Fellow at the Trinity Long Room Hub, Mia Ridge (The Open University). Mia is a Transnational Access fellow, funded by the CENDARI project (Collaborative European Digital Archive Infrastructure).

Keynote: ‘Collaborative collections through a participatory commons’, 2014 National Digital Forum conference

I was delighted to be invited to present at New Zealand’s 2014 National Digital Forum conference in Wellington. I was asked to speak on my work on the ‘participatory commons’. As a focus for explaining the need for a participatory commons, I asked, ‘What could we create if museums, libraries and archives pooled their collections and invited specialists and enthusiasts to help link and enhance their records?’.

As a conceptual framework rather than a literal technical architecture, every bit of clearly licensed content with (ideally) structured data published around it makes a contribution to ‘the commons’. In my keynote I explored some reasons why building tightly-focused projects on top of that content can help motivate participation in crowdsourcing and citizen history, and some reasons why it’s still hard (hint: it needs great content supported by relevant structured data), using my TCD/CENDARI research project on ‘lived experiences of World War One‘ as an example.

The video is now online.

Conference paper: Play as Process and Product: On Making Serendip-o-matic

The abstract for our Digital Humanities 2014 conference paper is below. Scott’s posted his notes from the first part, my notes for the middle part How did ‘play’ shape the design and experience of creating Serendip-o-matic? are on Open Objects and Brian’s are to follow.

Play as Process and Product: On Making Serendip-o-matic

Amy Papaelias, State University of New York at New Paltz

Brian Croxall, Emory University

Mia Ridge, The Open University

Scott Kleinman, California State University, Northridge


Who says scholarship can’t be playful? Serendip-o-matic is a “serendipity Feeding the machine animated gifengine” that was created in less than a week by twelve digital humanities scholars, developers, and librarians. Designed to replicate the surprising experience of discovering an unexpected source while browsing library stacks or working in an archive, the visual and algorithmic design of Serendip-o-matic emphasizes playfulness. And since the tool was built by a group of people who were embarking on a difficult task but weren’t yet sure of one another’s names, the process of building Serendip-o-matic was also rather playful, encouraging participants to take risks, make mistakes, and learn something new. In this presentation, we report on how play shaped the creation, design, and marketing of Serendip-o-matic. We conclude by arguing for the benefits of more playful work in academic research and scholarship, as well as considering how such “play” can be evaluated in an academic context.


Continue reading “Conference paper: Play as Process and Product: On Making Serendip-o-matic”

CENDARI Visiting Research Fellowship: ‘Bridging collections with a participatory Commons: a pilot with World War One archives’

I’ve been awarded a CENDARI Visiting Research Fellowship at Trinity College Dublin for a project called ‘Bridging collections with a participatory Commons: a pilot with World War One archives’. Here’s Trinity’s page about my Fellowship, which runs until mid-December. I’ve decided to be brave and share my thoughts and actions throughout the process, so I thought I’d start as I mean to go on and post my proposal (1500 words, below). CENDARI is a ‘research infrastructure project aimed at integrating digital archives for the medieval and World War One eras’ which ‘aims to leverage innovative technologies to provide historians with the tools by which to contextualise, customise and share their research’ (source) so this research fellowship very neatly complements my PhD research.

You can contact me by leaving a comment below, or via my contact page. If you’d like to follow my progress, you can sign up for (very infrequent) updates at MailChimp: http://eepurl.com/VUXEL or keep an eye out for posts tagged ‘CENDARI Fellowship’ on my blog, Open Objects.

Updates so far:

Continue reading “CENDARI Visiting Research Fellowship: ‘Bridging collections with a participatory Commons: a pilot with World War One archives’”

Keynote ‘Enriching cultural heritage collections through a Participatory Commons’ at Sharing is Caring

Photo of glider plane against blue sky
Image: Library of Congress

I was invited to Copenhagen to talk about my research on crowdsourcing in cultural heritage at the 3rd international Sharing is Caring seminar on April 1. I’ve posted my notes on Open Objects: Enriching cultural heritage collections through a Participatory Commons platform: a provocation about collaborating with users.

Much of this comes from my PhD research and my previous work in museums, and I’m grateful to everyone who’s commented in person or on twitter so far, particularly as it helps me understand the best ways to explain the Participatory Commons and the research underlying it for different audiences.