Chapter: 'The contributions of family and local historians to British history online'

Participatory Heritage, edited by Henriette Roued-Cunliffe and Andrea Copeland, has just been published by Facet.

A pre-print is online at https://hcommons.org/deposits/item/hc:38017

My chapter is 'The contributions of family and local historians to British history online'. My abstract:

Community history projects across Britain have collected and created images, indexes and transcriptions of historical documents ranging from newspaper articles and photographs, to wills and biographical records. Based on analysis of community- and institutionally-led participatory history sites, and interviews with family and local historians, this chapter discusses common models for projects in which community historians cooperated to create digital resources. For decades, family and local historians have organised or contributed to projects to collect, digitise and publish historical sources about British history. What drives amateur historians to voluntarily spend their time digitising cultural heritage? How do they cooperatively or collaboratively create resources? And what challenges do they face?

My opening page:

IN 1987, THE Family History Department of the Church of the Latter Day Saints began a project with the British Genealogical Records Users Committee to transcribe and index the 1881 British census. Some community history societies were already creating indexes for the 1851 census, so they were well placed to take on another census project. Several tons of photocopies were distributed to almost 100 family history societies for double transcription and checking; later, a multi-million-dollar mainframe computer created indexes from the results (Young, 1996, 1998a; Tice, 1990). This ‘co-operative indexing’ took eight years – the process of assigning parts for transcription alone occupied 43 months – and while the project was very well received, in 1998 it was concluded that ‘a national project of this scope has proved too labour intensive, time consuming and expensive’ to be repeated (Young, 1998b). However, many years later, the US 1940 census was indexed in just four months by over 160,000 volunteers (1940 US Census Community Project, 2012), and co-operative historical projects flourish.

This example illustrates the long history of co-operative transcription and indexing projects, the significant contribution they made to the work of other historians and the vital role of community history organizations and volunteers in participatory heritage projects. The difference between the reach and efficiency of projects initiated in the 1980s and the 2010s also highlights the role of networked technologies in enabling wider participation in cooperative digitization projects. This chapter examines the important contributions of community historians to participatory heritage, discussing how family and local historians have voluntarily organized or contributed to projects to collect, digitize and publish historical sources about British history. This insight into grassroots projects may be useful for staff in cultural heritage institutions who encounter or seek to work with community historians.

The questions addressed in this chapter are drawn from research which sought to understand the impact of participatory digital history projects on users. This research involved reviewing a corpus of over 400 digital history projects, analysing those that aimed to collect, create or enhance records about historical materials. The corpus included both community- and institutionally led participatory history sites. Points of analysis included ‘microcopy’ (small pieces of text such as slogans, instructions and navigation) and the visible affordances, or website interface features, that encourage, allow or disable various participatory functions.

Bio

Mia Ridge is a Digital Curator in the British Library’s Digital Scholarship team. She has a PhD in digital humanities (2015, Department of History, Open University) entitled Making Digital History: the impact of digitality on public participation and scholarly practices in historical research. Previously, she conducted human-computer interaction-based research on crowdsourcing in cultural heritage.

9781783301232

Keynote: 'From Strings to Things', LODLAM Melbourne workshop

Culture Victoria's Eleanor Whitworth (@elewhitworth) and Museum Victoria's Ely Wallis (@elyw) organised a LODLAM workshop at Melbourne Museum on April 17, 2012.  There's now an event report on the Culture Victoria blog, Linked Open Data – Melbourne Workshop.

I was asked to introduce the basics of linked open data, describe some relevant work in the international museums, libraries and archives sector and include examples of the types of data held by memory organisations.  These are my talk notes, though I probably improvised a bit (i.e. rambled) on the day.

From strings to things

Linked Open Data in Libraries, Archives and Museums workshop, Victorian Cultural Network, Melbourne Museum, April 2012

Event tag: #lodlam

Mia Ridge @mia_out

Introduction

Hello, and thank you for having me.  It’s nice to be back where my museum career started back at the end of the 90s.

I’ll try to keep this brief and relatively un-technical.  Today is about what linked open data (LOD) can do for your organisations and your audiences.  We’re focusing on practical applications, pragmatic solutions and quick wins rather than detail and three-letter acronyms.  If at any point today people drift into jargon (technical or otherwise), please yell out and ask for a quick definition.  If you want to find out more, there’s lots of information online and I’ve provided a starter reading list at the end.

Why do we need LOD? (Or ‘James Cook’ = explorer guy?)

Computers are dumb.  Well, they’re not as smart as us, anyway.  Computers think in strings (and numbers) where people think in ‘things’.  If I say ‘Captain Cook’, we all know I’m talking about a person, and that it’s probably the same person as ‘James Cook’).  The name may immediately evoke dates, concepts around voyages and sailing, exploration or exploitation, locations in both England and Australia… but a computer knows none of that context and by default can only search for the string of characters you’ve given it.  It also doesn’t have any idea that ‘Captain Cook’ and ‘James Cook’ might be the same person because the words, when treated as a string of characters, are completely different.  But by providing a link, like the dbpedia link http://dbpedia.org/page/James_Cook that unambiguously identifies ‘James Cook’, a computer can ‘understand’ any reference to Captain Cook that also uses that link.  (DBPedia is a version of Wikipedia that contains data structured so that computers know what kind of ‘thing’ a given page is about.)

So in short, linked open data is a way of providing information in formats that computers can understand so that they can help us make better connections between concepts and things.

The shiny version for visual people

[Europeana video: http://vimeo.com/36752317]

This video was produced by Europeana to help explain LOD to their content providers and gives a nice visual overview of some of the relevant technical and licensing concepts.

What is Linked Open Data?

In the words of LODLAM’s Jon Voss, “Linked Open Data refers to data or metadata made freely available on the World Wide Web with a standard markup format”[1].

LOD is a combination of technical and licencing/legal requirements.

‘Linked’ refers to ‘a set of best practices for publishing and connecting structured data on the Web’[2].  It’s about publishing your content at a proper permanent web address and linking to pre-existing terms to define your concepts.  Publishing using standard technologies makes your data technically interoperable. When you use the same links for concepts as other people, your data starts to be semantically interoperable.

‘Open’ means data that is ‘freely available for reuse in practical formats with no licensing requirements’[3]. More on that in a moment!

If you can’t do open data (full content or ‘digital surrogates’ like photographs or texts) then at least open up the metadata (data about the content). It’s a useful distinction to discuss early with other museum staff as it’s easy to be talking at cross-purposes.

But beyond those definitions, linked open data is about enabling connections and collaboration through interoperability.  Interoperable licences, semantic connections and re-usable data all have a part to play.

5 stars

In 2010 Tim Berners-Lee proposed a 5 star system to help people understand what they need to do to get linked data[4].

Some of these are quick wins – you’re probably already doing them, or almost doing them.  Things get tricky around the 4th star when you move from having data on the web to being part of the web of data[5].

To take an example… rather than just hoping that a search engine will pick up on the string ‘James Cook’ and return the right record to the user typing that in, you can link to a URI that tells a search engine that you’re talking about Captain James Cook the explorer, not James Cook the wedding photographer, the footballer or the teen drama character.  Getting to this point means being able to match terms in your databases to terms in other vocabularies, or making links to point to for terms unique to your project, but it means you’ve moved from string to thing.

Now you’ve got that fourth star, you can start to pull links back into your dataset.  Because you’ve said you’re talking about a specific type of thing – a person identified by a specific URI – you can avoid accidentally pulling in references to the TV soap character or things held at James Cook University, or about cooks called James.

From string to thing

So James Cook has gone from an unknown string to a person embedded in a web of connections through his birth place and date, his associations with ships, places, people, objects and topics… Through linked open data we could bring together objects from museums across the UK, Australia, New Zealand, and present multiple voices about the impact and meaning of his life and actions.

What is LODLAM?

While there had been a lot of work around the world on open data and the semantic web in various overlapping GLAM, academic and media circles in previous year, the 2011 LODLAM Summit was able to bring a lot of those people together for the first time.  100 international attendees met in San Francisco in June 2011.  The event was oganised by Jon Voss (@jonvoss) with Kris Carpenter Negulescu, Internet Archive and sponsored by the Alfred P. Sloan Foundation, National Endowment for the Humanities and the Internet Archive.

Since then, there have been a series of events and meetups around the world, often tying in with existing events like the ‘linking museums[6]’ meetups I used to run in London, a meeting at the National Digital Forum in New Zealand or the recent meetup at Digital Humanities Australasia conference in Canberra.  There’s also an active twitter hashtag (#lodlam) and a friendly, low-traffic mailing list: http://groups.google.com/group/lod-lam.

4 stars

One result of the summit was a 4 star scheme for 'openness' in GLAM data… I’m not going to go into details, but this scheme was produced by considering the typical requirements of memory institutions against the requirements of linked open data.  The original post and comments are worth a read for more information.  It links ‘openness’ to ‘usefulness’ and says: “the more stars the more open and easier the metadata is to use in a linked data context[7]”.

Note that even 1 star data must allow commercial use.  Again, if you can’t licence your content, you might be able to licence your metadata, or licence a subset of your data (less commercially valuable images, for example).

So, enough background, now a quick look at some examples…

Example projects

The national libraries of Sweden, Hungary, Germany, France, the Library of Congress, and the British Library have committed resources to linked data projects[8]. Through Europeana, the Amsterdam Museum has published “more than 5 million RDF triplets (or "facts") describing over 70,000 cultural heritage objects related to the city of Amsterdam. Links are provided to the Dutch Art and Architecture Thesaurus (AATNed), Getty's Union List of Artists Names (ULAN), Geonames and DBPedia, enriching the Amsterdam dataset”[9].

BBC

The BBC have been using semantic web/linked data techniques for a number of years[10] and apparently currently devote 20% of its entire digital budget to activities underpinned by semantic web technologies[11].

Online BBC content is traditionally organised by programme (or ‘brand’), but they wanted to help people find other content on the sites related to their interests, whether gardening, cooking or Top Gear presenters.  For the BBC Music site, they decided to use the web as its content management system, and had their editors contribute to Wikipedia and the music database Musicbrainz, then they pulled that content back onto the BBC site.

The example shown here is the BBC Wildlife Finder, which provides a web identifier for every species, habitat and adaptation the BBC is interested. Data is aggregated from different sources, “including Wikipedia, the WWF’s Wildfinder, the IUCN’s Red List of Threatened Species, the Zoological Society of London’s EDGE of Existence programme, and the Animal Diversity Web. BBC Wildlife Finder repurposes that data and puts it in a BBC context, linking out to programme clips extracted from the BBC's Natural History Unit archive.”

They’re also using a controlled vocabulary of concepts and entities which are linked to dbpedia, providing a common point of reference across their sites.

British Museum

The British Museum launched a linked data service[12] in 2011.

In a blog post[13] they pointed out that their current interfaces can’t meet the needs of all possible audiences, but their linked data service “allows external IT developers to create their own applications that satisfy particular requirements, and these can be built into other websites and use the Museum’s data in real time – so it never goes out of date. … If more organisations release data using the same open standards then more effort can go into creative and innovative uses for it rather than into laborious data collection and cleaning.”

Civil War 150

Combining different datasets about the American Civil War, this site, Hidden Patterns of the Civil War[14], collects visualisations that allow you to explore maps of slave markets and emancipation, or to compare the language of Civil War era town newspapers on different sides of the issue.  This is possible because standalone information and images can be connected and combined in entirely new ways. As one writer said, “Just the ability to search across historical collections is a radical development, as search engines typically aren't able to crawl databases. Part of what linked data does is expose metadata that's been pretty much hidden up until now”[15].

Key issues

There are lots of things to consider before publishing linked open data, but by putting them upfront it’s easier to identify which are real show-stoppers and which are just design issues.  Concerns include loss of provenance, context, attribution, income…

GLAM data is messy – it’s incomplete, inconsistent, inaccurate, out of date – sometimes it’s almost non-existent.  The backlog of data to be tidied up ‘one day’ is huge, so let’s find ways of sharing data before the data is perfect.

GLAMs should be in this space to help shape tools and datasets to their needs… The LODLAM 4 star system includes attribution because people there knew it was important.  Sooner or later someone working on historical place names discovers that Geonames doesn’t work for changes over time, but we’re still working on a solution for that.  Your average programmer may not realise that dates need to be recorded back to several millennia or precise BC dates, let alone all the issues we face around fuzzy and uncertain dates and date ranges…

Over to you…

Look at suggestions according to whether technical or licensing issues are easier for you to tackle…

Give people something to link to – publish open data, ideally with a non-commercial license. Linkable data is as important as linked – lots of datasets are being produced, but the power of this will really be felt when they can be connected to other data sets.

Drink your own champagne – use linked data to solve internal problems.  Even if you can’t share the data, you can share what you’ve learned in the process of using linked data.  Using data services internally is the best way to ensure their sustainability and usability.

Thank you!

Find out more and share your thoughts with the reading list at http://bit.ly/lodlamlinks

[1] http://www.museumsandtheweb.com/mw2012/papers/radically_open_cultural_heritage_data_on_the_w

[2] http://linkeddata.org/faq

[3] http://berglondon.com/blog/2010/10/01/open-data-for-the-arts-human-scale-data-and-synecdoche/

[4] http://inkdroid.org/journal/2010/06/04/the-5-stars-of-open-linked-data/

[5] http://lab.linkeddata.deri.ie/2010/star-scheme-by-example/

[6] http://museum-api.pbworks.com/w/page/26458584/July%202010%20meetup

[7] http://lod-lam.net/summit/2011/06/06/proposed-a-4-star-classification-scheme-for-linked-open-cultural-metadata/

[8] http://www.clir.org/pubs/reports/pub152/Stanford%20Linked%20Data%20Workshop%20Report%20FINAL%20111024.htm

[9] http://www.europeana.eu/portal/thoughtlab_linkedopendata.html

[10] http://www.w3.org/2001/sw/sweo/public/UseCases/BBC/

[11] http://archiveshub.ac.uk/linkinglives/?p=256

[12] http://collection.britishmuseum.org/

[13] http://blog.britishmuseum.org/2011/09/16/the-british-museum-has-created-a-semantic-web-endpoint/

[14] http://dsl.richmond.edu/civilwar/

[15] http://radar.oreilly.com/2011/04/linked-data-civil-war.html

2010: an overview

An incomplete, retrospective list of work, talks and more in 2010…

In April I gave a talk and wrote a long paper Cosmic Collections: Creating a Big Bang at Museums and the Web in Denver (then got stuck in the US while the Icelandic volcano dust kept flights to Europe grounded). My abstract: "Cosmic collections' was a Web site mashup competition held by the Science Museum in late 2009 to encourage members of the public to create new interfaces with newly accessible collections data prepared for the Cosmos & Culture exhibition. The paper reports on the lessons learned during the process of developing and running the competition, including the organisational challenges and technical context. It discusses how to create room for experimentation within institutional boundaries, the tools available to organise and publicise such an event on a limited budget, the process of designing a competition, and the impact of the competition. It also investigates the demand for museum APIs.'

In June 2010 I went to Science Hack Day at the Guardian and worked on 'The Revolutionaries' with Premasagar Rose, Ian Wooten, Tom Morris, Inayaili de León, Andy McMillan and Richard Boulton – and it won a prize for the hack most useful in education! Prem wrote a blog post about it: Science Hack Day and The Revolutionaries.

In July I organised a meetup about 'Linking museums: machine-readable data in cultural heritage

In September I gave a talk at OpenTech 2010 on 'Museums meet the 21st century'.

I wrote a chapter called 'All change please: your museum and audiences online' for the book Museums Forward: social media and the web, edited by Gregory Chamberlain.

I created a cartoon character called Dora

In late 2010 I was madly working on my MSc dissertation on crowdsourcing games for museums, which included a lot of research, design and code: metadata crowdsourcing games for museums.

Linking Museums meetup

Somehow I ended up organising a meetup about 'Linking museums: machine-readable data in cultural heritage'.  I've written about it for the UK MCG blog and there's a write-up of 'linking museums' from various people on the 'Museums and the machine-processable web' wiki.  If you're interested in 'helping museums make content re-usable; helping programmers access museum content', the wiki is a good place to join in.

I've also shared some thoughts on publishing re-usable object data and subject authorities from the Science Museum on the wiki.

2009: an overview

An incomplete, retrospective list of work, talks and more in 2009…

February – I did a talk: "Happy developers + happy museums = happy punters" at JISC's Dev8D; I blogged a transcript.

At some point in early 2009 I started the Museum API wiki, which still exists at http://museum-api.pbworks.com.

In April I was inspired by the Museums and the Web international conference to setup 'the MW2009 challenge' – 'take something from all the conversation at Museums and the Web 2009 and do something with them. So – pick one task.  To keep the momentum going, you should do it while it's still April 2009.'

In June I gave a talk Bubbles and Easter eggs – Museum Pecha Kucha at the British Museum in London. In July I repeated Bubbles, icebergs and Easter eggs at the Melbourne Museum Pecha Kucha.

September – I had an article published on the Museum-iD website, Learning lessons from a decade of museum websites. It was based on a paper I gave at the Museum-iD seminar on "Museum as Media Company: Social Media, Broadcasting & The Web” about ‘the role of the web at the Science Museum’.

November – the 'Cosmic Collections' crowdsourced web mashup competition I ran got some press on two web developer sites! Yahoo Developer Network: A new API and hack competition – this time not from a tech company but by a museum! Programmable Web: Science Museum Opens API and Challenges Developers to Mashup the Cosmos

December – I was invited to Oslo to give a lecture on Social Media in the ABM (MLA) Sector: Opportunities and Challenges and curated a session at the UK Museums on the Web conference on ‘Sensory’. I also spoke with Elizabeth Lomas and Benjamin Ellis on Continued Communication: maximising your communications in a Web 2.0 world at the Online Information 2009 conference. The paper I wrote with Elizabeth and Ben is online at Continued communication – maximising the business potential of communications through Web 2.0.