I've been awarded a CENDARI Visiting Research Fellowship at Trinity College Dublin for a project called 'Bridging collections with a participatory Commons: a pilot with World War One archives'. Here's Trinity's page about my Fellowship, which runs until mid-December. I've decided to be brave and share my thoughts and actions throughout the process, so I thought I'd start as I mean to go on and post my proposal (1500 words, below). CENDARI is a 'research infrastructure project aimed at integrating digital archives for the medieval and World War One eras' which 'aims to leverage innovative technologies to provide historians with the tools by which to contextualise, customise and share their research' (source) so this research fellowship very neatly complements my PhD research.
You can contact me by leaving a comment below, or via my contact page. If you'd like to follow my progress, you can sign up for (very infrequent) updates at MailChimp: http://eepurl.com/VUXEL or keep an eye out for posts tagged 'CENDARI Fellowship' on my blog, Open Objects.
Updates so far:
- my first update was 'Defining the scope: week one as a CENDARI Fellow' (September 26);
- my second was 'Linking lived experiences of WWI through battalions?' and the related post 'Private Albert Henry Bailey, service number 13/970a' (October 10).
- I've also posted 'Linking lived experiences of the First World War': possible goals and a bunch of technical questions' (October 15) and the related 'In which I am awed by the generosity of others, and have some worthy goals' (October 17)
- 'Moving forward: modelling and indexing WWI battalions' (October 31)
- I gave a keynote at New Zealand's National Digital Forum conference on 'Collaborative collections through a participatory commons'; the video gives a lot of background to the project
- 'Three ways you can help with 'In their own words: collecting experiences of the First World War' (and a CENDARI project update)' (December 4) summarised a few weeks' work and decisions
- I gave a lecture at Trinity College Dublin on ‘A pilot with public participation in historical research: linking lived experiences of the First World War’
- I set up a wiki to link personal accounts to specific military units: http://collaborativecollections.org/WorldWarOne – this is where most of my work happened in the last month or so of the project
My original post is below – see the links above for more recent updates:
My motives in posting my proposal are partly selfish – it's an ambitious project which requires tackling community building, user experience design, historical materials and programming, and I'll be drawing on the expertise of many people, starting now! Specific questions I'd love help with are:
- do you have any family or local records relating to World War One that you'd like to share through this project?
- do you know of relevant personal records held by museums, libraries or archives that are either already digitised or could be digitised within my timeframe?
- do you have suggestions for specific software applications or code libraries that would be useful for this project?
- can you offer or help negotiate access to official records?
- on a lighter note, who or what should I see, meet or do while I'm in Dublin?
Bridging collections with a participatory Commons: a pilot with World War One archives
I propose a pilot project to test the potential for a ‘participatory Commons' based on World War One collections. A participatory Commons aggregates collections from memory institutions – archives, libraries, museums – and ‘shoebox archives’ of diaries, letters, photos, etc from the public, and enhances those records with the help of the public and historians. ‘Crowdsourcing’, or asking the public to help with a larger project such as digitising documents by undertaking ‘micro-tasks’ such as transcribing small sections of text, is an increasingly common method for engaging the public in ‘citizen history’. Historians can also contribute by sharing the content or knowledge around the personal record collections they create while conducting archival research. The coming centenary of WWI will create huge levels of public interest, and participatory projects such as this are ideally situated to convert this interest into action, however small.
My project would create a prototype participatory Commons in which a combination of text mining, named entity recognition and crowdsourcing will be used to link official and unofficial WWI archival collections by matching names, dates, places and events in private diaries, letters, photos and ephemera with those recorded in official records.
Specific technical and historical research questions include:
- What customisation does software for entity recognition require to recognise historic personal and place names?
- What are the best interface and interaction design strategies for encouraging members of the public to transcribe, index and describe historic documents?
- What logistical and intellectual issues arise for researchers using a participatory Commons?
- Would outputs like maps, geospatial search and place name indexes help researchers discover content from unexpected sources and from archives in other languages?
- What are the benefits of combining official accounts with the lived experiences of WWI as represented by personal diaries, photos and letters?
This project will use named entity recognition software to extract names, places, dates, concepts and events from digitised text. These entities will be used in searches across multiple publicly available archival datasets (a technique successfully applied with multilingual data in the Serendip-o-matic project). This can support the authentication of content from unofficial archives and form the basis for visualisations such as timelines and maps. The application of natural language processing techniques such as term frequency and inverse document frequency could be used to computationally ‘recommend’ further related resources to researchers.
By linking named people to military units, it should be possible to link individual diaries, letters and photographs to the movements of those units during the course of WWI on maps and timelines. The exact methods will depend in part on the progress of large-scale projects around the centenary of World War I in producing digitised sources. For example, entities such as dates, places, people and activities within the National Archive’s collection of British Army war diaries 1914-1922 (series WO 95) are being transcribed into structured metadata through a citizen history project, ‘Diaries of the First World War’. Possible matches from this dataset could be generated against people, places and events detected in content from unofficial archives, and presented to users of the Commons for disambiguation and confirmation.
In addition to the archives within CENDARI institutions such as the Trinity’s World War I and King’s Serving Soldier collections, should additional institutional sources be needed, options include the Liddle Collection at Leeds University, the Private Papers held by the Imperial War Museum and LSE’s Women’s Library and Jisc’s World War One resources. Unofficial sources include community-created collections online such as the WWI Document Archive and the Guardian newspaper’s Witness assignment collecting ‘letters, photographs and stories’ about World War I.
Pre-Fellowship: finalise a list of publicly available digital collections and negotiate access to restricted collections.
Week 1: select up to three sets of diaries or correspondence to use as test cases throughout the project. Selectively transcribe some text while close reading to familiarise myself with the sources and understand how names, places, events and other items of interest are described in personal writing.
Week 2: place images of the documents on an existing crowdsourcing platform like FromThePage, Scripto or the Zooniverse software; begin using social media and specialist contacts to recruit the public to help transcribe content.
Weeks 3-10: iteratively design, develop, release and evaluate prototype interfaces as per Methods listed, using informal agile project management. Continue community engagement work to motivate and learn from the participating public and historians.
Weeks 11-12: write report; post informal summary for participant community. Finalise documentation and code to support maintenance and sustainability.
I would produce traditional, electronic and public outputs during my Fellowship. Throughout the Fellowship I would post about my research questions, methods and reflections on the process on my widely read blog, Open Objects, and share prototype visualisations and interfaces online in order to gather feedback from my peers and potential users. I would share source code and documentation for any software libraries on GitHub.
Final electronic outputs would include a prototype for searching digitised records for named individuals in WWI archives. Entity recognition techniques also enable visualisations such as timelines and maps, so to the extent supported by the data, visualisations based on specific items will be created to situate them within the larger temporal and geographic space of WWI.
Finally, I will write an article for a relevant peer-reviewed journal such as Digital Humanities Quarterly and submit a proposal for the international Digital Humanities conference.
Context of the Research Project in my research to date
My PhD research outlines an approach for uniting collections from memory institutions (museums, libraries, archives) and ‘shoebox’ archives of diaries, letters, photos, ephemera and objects from the public into a shared Commons. In the tradition of ‘history from below’, a participatory Commons platform would enable historians and the interested public to collect, describe, index and transcribe historic material. More broadly, my PhD aims to understand the impact of digitality on humanities scholarship by comparing the practices and attitudes of academic and ‘amateur’ family or local historians regarding evaluating, using and contributing to scholarly crowdsourcing. As the subject of intensive academic, popular and amateur research, the history of WWI is a perfect test for my proposed methodologies and research questions. This project would complement my current research by providing an opportunity to build prototype interfaces that put the theoretical findings of my PhD into practice.
My research showed historians can be sceptical about material from the public. Can text mining and linked data technologies help authenticate these documents by linking them to records held in official archives? And can this be supplemented by capturing the expert judgements about the reliability of historic sources made by experienced historians in a participatory Commons?
This pilot would help answer questions that have resulted from my PhD research including the impact of interface design on contribution levels and quality, and the interaction design required to guide contributors through the process of imaging, uploading and describing the provenance of items from personal ‘shoebox’ or research collections. Given that historians search for personal names in non-specialist search engines like Google, would search engine optimisation and structured metadata designed to help make name-based content more discoverable? And would this in turn help engage wider audiences and supplement traditional archive directories and research guides?
Relevance to the CENDARI project
Digital access to fragmented archives is at the core of this project, as it is to the CENDARI project. Understanding how people assess the quality of resources and potential barriers to their decision to contribute to and use such repositories is vital if they are to be widely used. This project would help understand the specialist user requirements for research infrastructures, and any design solutions that emerge as aggregated collections are used and evaluated over the length of my Fellowship would be relevant to the CENDARI project. My past experience with and proposed use of named entities, search across heterogenous datasets and user annotations would also benefit CENDARI.
The process of populating a Commons platform by working with WWI archives and researchers would yield new insights and provide useful lessons for the CENDARI project as well as my own research. An understanding of computational techniques for linking material held in different archival systems, and testing methods for including contextual content drawn from multiple collections for specific collection items would be useful for many heritage platforms.