Hidden REF nomination for crowdsourcing at the British Library

In September 2021 year I was delighted to be nominated for a Hidden REF award. The Hidden REF is a project that celebrates the work of people who are vital to the success of research, but who may go unrecognised by traditional academic criteria for research outputs. 

I'm sharing a copy of the nomination for LibCrowds, the platform and community on which In the Spotlight, a project crowdsourcing the transcription of historical playbills, was built:

LibCrowds is a platform dedicated to hosting crowdsourcing projects aimed at enhancing access to British Library collections. Since launching in 2015, it has hosted 171 projects, drawing in 265,000 contributions from nearly 3,000 registered volunteers, and many more anonymous individuals. The crowdsourcing projects greatly enhance the discoverability of library collections.

Our community of volunteers have contributed to projects such as: Georeferencer‚ providing more accurate, diverse metadata about digitised historic maps; In the Spotlight‚ transcribing 18th-19th century playbills (making them more findable and searchable); Convert-a-Card‚ retro-converting printed card catalogues into electronic records, particularly improving access to Chinese and Indonesian collections.

The platform is carefully designed for productivity; it's easy to use and interact with images. However, engagement with collections is also a key outcome. LibCrowds has built a strong community. Our surveys indicate that most contributors participate because it's enjoyable, and some take a personal interest in the subject matter. They can discuss discoveries with others through a forum, and can easily share images via social media.

LibCrowds has enabled important research findings. For instance, the playbills project has allowed research on plays which were previously important but which waned in popularity, and has revealed details about marginalised groups including women and Black actors. We are aware of multiple doctoral students working on aspects of theatrical history and researchers in several universities that have used the transcribed collections in their publications.

The scholarly and professional literature recognises LibCrowds to be an extremely valuable case study of a successful crowdsourcing project. It's referenced in dozens of articles and conference papers. Recently, insights from LibCrowds have been integral to the planning of research in the Library and Turing Institute's Living with Machines project, using crowdsourcing to engage the public with data science methods and produce effective and timely results about 19th century newspapers.

2019: an overview(ish)

A very incomplete page…

Projects: Living with Machines

  • Continued recruiting the project team
  • Set up the project website (graphic identity and WordPress template by an agency, working with the project team)
  • Helped devise the Communications strategy

Publications

Ridge, M. (forthcoming). Crowdsourcing in cultural heritage: A practical guide to designing and running successful projects. In K. Schuster & S. Dunn (Eds.), Routledge Handbook of Research Methods in Digital Humanities. Routledge.

Talks and teaching

June: I was at Indiana University-Purdue University Indianapolis to teach Collections as Data with Thomas Padilla for the HILT digital humanities summer school.

An invited talk on 'Voyages of discovery with digital collections' for the Eskenazi Museum of Art, Indiana University, Bloomington, June 2019

Blog posts

Other

Peer reviewer, Digital Humanities 2019

2018: an overview

2017-18 was a bit of an odd year and I've subsequently reduced the number of invitations I accept each year.

Projects

2018 finished with a bang, as the press release for the British Library/Alan Turing Institute's Living with Machines project went live. I'd been working on the proposal since early 2017. In this project, we're experimenting with 'radical collaborations' around applying data science methods to historical newspaper collections to advance the potential of digital history.

Talks and teaching

January: a lecture on 'Scholarly crowdsourcing: from public engagement to creating knowledge socially' for the Introduction to Digital Humanities Masters course at King's College London, and an 'Overview of Information Visualisation' for the CHASE Winter School: Introduction to Digital Humanities.

February: a full-day workshop on Information Visualisation for PhD students in the Digital Humanities for CHASE.

March: a talk on 'Crowdsourcing: the British Library experience' for CILIP's Multimedia Information & Technology (MmIT) Group's event on 'The wisdom of the crowd? Crowdsourcing for information professionals'.

April: a talk on 'Challenges and opportunities in digital scholarship' for a British Library Research Collaboration Open House, and took part in a panel for the Association of Art Historians (AAH) conference on ‘Sharing knowledge through online engagement’ around Art UK's Art Detective project at the Courtauld Institute of Art.

May: I was in Rotterdam for a EuropeanaTech panel on User Generated & Institutional Data Transcription projects and gave a talk on 'Open cultural data in the GLAM sector' for a CPD25 workshop on The GLAM sector: what can we learn from Galleries, Libraries, Archives and Museums

June: with Thomas Padilla I co-taught 'Collections as data' for the HILT Digital Humanities Summer School, June 4–8, 2018, University of Pennsylvania. I then went onto Oberlin College to give a keynote on 'Digital collections as departure points' at the Academic Art Museums and Libraries Summit.

September: a talk on 'Crowdsourcing at the British Library: lessons learnt and future directions' at the Digital Humanities Congress | University of Sheffield, 6th – 8th September 2018. And a 'provocation' for the Building Library Labs event, 'A modest proposal: crowdsourcing is good for all of us'.

November: I travelled to Bonn to do a keynote on 'Libraries and their Communities: Participation from Town Halls to Mobile Phones' for the 2018 SWIB (Semantic Web in Libraries) conference, and gave a preview talk on Living with Machines for the British Library Labs 2018 Symposium.

Publications

An article on Breathing life into digital collections at the British Library for ACCESS / Journal of the Australian School Library Association, 2018.

A chapter for a Routledge publication on research methods in the Digital Humanities, called 'Crowdsourcing in cultural heritage: a practical guide to designing and running successful projects' (in process).

Other

I was a peer reviewer for conference proposals and articles for museum studies and digital humanities events and journals.

I also gave internal talks on IIIF and the Universal Viewer and taught Data Visualisation and Crowdsourcing workshops on the British Library's Digital Scholarship Training Programme.

I wrote a number of blog posts, newsletters and press releases for work. I've collected some of those blog posts and newsletter updates for the British Library at Updates from Digital Scholarship at the British Library.

Blog post 'Notes from ‘AI, Society & the Media: How can we Flourish in the Age of AI’' and 'Cross-post: Seeking researchers to work on an ambitious data science and digital humanities project'

2017: an overview

This page is a work in progress…

2017 was an unexpectedly challenging year, as much of it was taken up with treatment for cancer. (I'm fine now).

In February 2017 I did a workshop in Edinburgh for Dr Anouk Lang's Beyond the Black Box: Building Algorithmic and Statistical Literacy through Digital Humanities Tools and Resources and in Santa Barbara for Always Already Computational: Library Collections as Data. I keynoted at DIGIKULT 2017 in Sweden in March, and in June I was in Sydney for the Future Library Congress at EduTECH. I was in Taiwan in August and in October I spoke at the German Historical Institute in Washington, DC and gave a keynote on crowdsourcing in Angers, France.

Position paper: From libraries as patchwork to datasets as assemblages?

Photo of beach view

My position paper for Always Already Computational: Collections as Data. Every attendee wrote one – read the others at Collections as Data – National Forum Position Statements.

From libraries as patchwork to datasets as assemblages?

Dr Mia Ridge, Digital Curator, British Library

The British Library's collections are vast, and vastly varied, with 180-200 million items in most known languages. Within that, there are important, growing collections of manuscript and sound archives, printed materials and websites, each with its own collecting history and cataloguing practices. Perhaps 1-2% of these collections have been digitised, a process spanning many years and many distinct digitisation projects, and an ensuing patchwork of imaging and cataloguing standards and licences. This paper represents my own perspective on the challenges of providing access to these collections and others I've worked with over the years.

Many of the challenges relate to the volume and variety of the collections. The BL is working to rationalise the patchwork of legacy metadata systems into a smaller number of strategic systems.[1] Other projects are ingesting masses of previously digitised items into a central system, from which they can be displayed in IIIF-compatible players.[2]

The BL has had an 'open metadata' strategy since 2010, and published a significant collection of metadata, the British National Bibliography, as linked open data in 2011.[3] Some digitised items have been posted to Wikimedia Commons,[4] and individual items can be downloaded from the new IIIF player (where rights statements allow). The BL launched a data portal, https://data.bl.uk/, in 2016. It's work-in-progress – many more collections are still to be loaded, the descriptions and site navigation could be improved – but it represents a significant milestone many years in the making. The BL has particularly benefitted from the work of the BL Labs team in finding digitised collections and undertaking the paperwork required to make the freely available. The BL Labs Awards have helped gather examples for creative, scholarly and entrepreneurial uses of digitised collections collection re-use, and BL Labs Competitions have led to individual case studies in digital scholarship while helping the BL understand the needs of potential users.[5] Most recently, the BL has been working with the BBC's Research and Education Space project,[6] adding linked open data descriptions about articles to its website so they can be indexed and shared by the RES project.

In various guises, the BL has spent centuries optimising the process of delivering collection items on request to the reading room. Digitisation projects are challenging for systems designed around the 'deliverable item', but the digital user may wish to access or annotate a specific region of a page of a particular item, but the manuscript itself may be catalogued (and therefore addressable) only at the archive box or bound volume level. The visibility of research activities with items in the reading rooms is not easily achieved for offsite research with digitised collections. Staff often respond better to discussions of the transformational effect of digital scholarship in terms of scale (e.g. it's faster and easier to access resources) than to discussions of newer methods like distant reading and data science.

The challenges the BL faces are not unique. The cultural heritage technology community has been discussing the issues around publishing open cultural data for years,[7]in part because making collections usable as 'data' requires cooperation, resources and knowledge from many departments within an institution. Some tensions are unavoidable in enhancing records for use externally – for example curators may be reluctant or short of the time required to pin down their 'probable' provenance or date range, let alone guess at the intentions of an earlier cataloguer or learn how to apply modern ontologies in order to assign an external identifier to a person or date field.

While publishing data 'as is' in CSV files exported from a collections management system might have very little overhead, the results may not be easily comprehensible, or may require so much cleaning to remove missing, undocumented or fuzzy values that the resulting dataset barely resembles the original. Publishing data benefits from workflows that allow suitably cleaned or enhanced records to be re-ingested, and export processes that can regularly update published datasets (allowing errors to be corrected and enhancements shared), but these are all too rare. Dataset documentation may mention the technical protocols required but fail to describe how the collection came to be formed, what was excluded from digitisation or from the publishing process, let alone mention the backlog of items without digital catalogue records, let alone digitised images. Finally, users who expect beautifully described datasets with high quality images may be disappointed when their download contains digitised microfiche images and sparse metadata.

Rendering collections as datasets benefits from an understanding of the intangible and uncertain benefits of releasing collections as data and of the barriers to uptake, ideally grounded in conversations with or prototypes for potential users. Libraries not used to thinking of developers as 'users' or lacking the technical understanding to translate their work into benefits for more traditional audiences may find this challenging. My hope is that events like this will help us deal with these shared challenges.

[1] The British Library, ‘Unlocking The Value: The British Library’s Collection Metadata Strategy 2015 – 2018’.

[2] The International Image Interoperability Framework (IIIF) standard supports interoperability between image repositories. Ridge, ‘There’s a New Viewer for Digitised Items in the British Library’s Collections’.

[3] Deloit et al., ‘The British National Bibliography: Who Uses Our Linked Data?’

[4] https://commons.wikimedia.org/wiki/Commons:British_Library

[5] http://www.bl.uk/projects/british-library-labs, http://labs.bl.uk/Ideas+for+Labs

[6] https://bbcarchdev.github.io/res/

[7] For example, the 'Museum API' wiki page listing machine-readable sources of open cultural data was begun in 2009 http://museum-api.pbworks.com/w/page/21933420/Museum%C2%A0APIs following discussion at museum technology events and on mailing lists.

Photo of beach view
The view from UC Santa Barbara is alright, I suppose