New data paper and datasets from crowdsourcing on Living with Machines

After lots of hard work by me, Nilo Pedrazzini, Miguel V., Arianna Ciula and Barbara McGillivray, we have a data paper in the Journal of Open Humanities Data: Language of Mechanisation Crowdsourcing Datasets from the Living with Machines Project.

And huge thanks to the thousands of Zooniverse volunteers who annotated 19th century newspaper articles to create the datasets we've published alongside the data paper!

Abstract: We present the ‘Language of Mechanisation’ datasets with examples of re-use in visualisations and analysis. These reusable CSV files, published on the British Library’s Research Repository, contain automatically-transcribed text from 19th century British newspaper articles. Volunteers on the Zooniverse crowdsourcing platform took part in tasks that asked ‘How did the word x change over time and place?’ They annotated articles with pre-selected meanings (senses) for the words coach, car, trolley and bike.

The datasets can support scholarship on a range of historical and linguistic research areas, including research on crowdsourcing and online volunteering behaviours, data processing and data visualisations methodologies.

The two datasets described are at:

Workshop: Information Visualisation, CHASE Arts and Humanities in the Digital Age 2017

I ran a full-day workshop on Information Visualisation for the CHASE Arts and Humanities in the Digital Age training programme at Birkbeck, London, in February 2017. The abstract:

Visualising data to understand it or convince others of an argument contained within it has a long history. Advances in computer technology have revolutionised the process of data visualization, enabling scholars to ask increasingly complex research questions by analysing large scale datasets with freely available tools.

This workshop will give you an overview of a variety of techniques and tools available for data visualisation and analysis in the arts and humanities. The workshop is designed to help participants plan visualisations by discussing data formats used for the building blocks of visualisation, such as charts, maps, and timelines. It includes discussion of best practice in visual design for data visualisations and practical, hands-on activities in which attendees learn how to use online tools such as Viewshare to create visualisations.

At the end of this course, attendees will be able to:

  • Create a simple data visualisation
  • Critique visualisations in terms of choice of visualisation type and tool, suitability for their audience and goals, and other aspects of design
  • Recognise and discuss how data sets and visualisation techniques can aid researchers

Please remember to bring your laptop.

Slides

Exercises for CHASE's ADHA 2017 Introduction to Information Visualisation

  • Exercise 1: comparing n-gram tools
  • Exercise 2: Try entity extraction
  • Exercise 3: exploring scholarly data visualisations
  • Viewshare Exercise 1: Ten minute tutorial – getting started
  • Viewshare Exercise 2: Create new views and widgets

Exercises for CHASE's Introduction to Information Visualisation

These exercises were prepared for the CHASE Arts and Humanities in the Digital Age event's workshop on Information Visualisation but they're also useful for people who want to learn more about data visualisations in cultural heritage and the humanities.

Exercise 1: compare simple text tools

Time: c. 5 minutes.

Goal: compare the ability of two different tools to help you understand a new text corpus

1.     Load the word cloud site

2.     Then, grab some text:

  • Open another browser tab
  • Go to http://pastebin.com/Nd0a86tm
  • Select and copy the 8 lines of text. The easiest way is to click into the box under 'RAW Paste Data'
  • Paste them into the text box on the Wordle site and hit 'go'
  • You can customise your visualisation using the menu. Which options create a more informative visualisation?

3.     Load the word tree site

  • Go to http://www.jasondavies.com/wordtree/
  • Paste the text into the 'Paste Text' box and hit 'Generate WordTree!' (Grab the text again from Step 2 if necessary)
  • You can click on words on the screen – which words produce the most options?

4.     Discuss

Bearing in mind that this is an unusual corpus, which tool gave you a better sense of its content? Why?

Are these tools better for exploring or explaining data? Why?

If tidying up the data provided – removing punctuation, making spelling consistent, etc – would improve the visualisation, then try editing the text and re-running the visualisation. Did it help? What else could you do?

Exercise 2: exploring scholarly data visualisations

Time: c. 10-15 minutes.

Goal: get hands-on experience and practice critical analysis.

Pair up with your neighbour to explore and discuss one of the visualisations listed on the following page.

Instructions

  1. In your browser, go to one of the sites below
  2. Take a few minutes to explore the visualisation
  3. Then discuss with your neighbour:
    • What do you think is being presented here?
    • Can you easily see where to start and how to use it?
    • What stories or trends can you start to see?
    • Does it work better at one scale over another?
    • Do you find it more effective at aggregate or detail level?
    • Does it present an argument or provide a space for you to explore and develop one?
    • What arguments (statements about the data) does the site present?
    • What have you learned from visualisation that you might not have learned from looking at the data or reading a description of it?
  4. Be prepared to report back to the group. e.g. summarise the site's purpose, visualisation formats and data types, or share unresolved questions or the most interesting parts of your discussion

 

University of Richmond, 'Visualizing Emancipation'

http://www.americanpast.org/emancipation/

Further information: http://dirt.terrypbrock.com/2012/04/visualizing-emancipation-examining-its-process-through-digital-tools/

Stanford 'Mapping the Republic of Letters'

http://www.stanford.edu/group/toolingup/rplviz/rplviz.swf

Further information: http://openglam.org/2012/03/21/mapping-the-republic-of-letters/, http://danbri.org/words/2010/11/22/603

Locating London's Past

http://www.locatinglondon.org/

GAPVis Ancient Places

http://gap.alexandriaarchive.org/gapvis/index.html#index

Further information: http://googleancientplaces.wordpress.com/

Digital Harlem :: Everyday Life 1915-1930

http://digitalharlem.org/

Further information: http://digitalharlemblog.wordpress.com/ http://writinghistory.trincoll.edu/evidence/robertson-2012-spring/

Digital Public Library of America's timeline, map, bookshelf

http://dp.la/

Further information: https://dp.la/about and http://dp.la/info/news/blog/

Orbis

http://orbis.stanford.edu/

Further information: http://hestia.open.ac.uk/updating-orbis/

Lost Change

http://tracemedia.co.uk/lostchange/

Further information: http://blog.britishmuseum.org/2014/02/19/lost-change-mapping-coins-from-the-portable-antiquities-scheme/

The State of the Union in Context

http://benschmidt.org/poli/2015-SOTU

Further exercises

Learn more: explore and analyse more visualisations

Sketch out ideas for a visualisation

  • Work out what data you need and the best way to prepare and present it. http://www.dear-data.com has some lovely examples of creative sketches.

Create your own visualisations

These sites can be used with your own or public data:

If you have sensitive data you must check whether any data you load will be made public.

Workshop: Information Visualisation, CHASE Arts and Humanities in the Digital Age

I've been asked to give a workshop on Information Visualisation for the CHASE Arts and Humanities in the Digital Age training programme in June 2015.

The workshop will introduce students to the use of visualisations for understanding, analysing and presenting large-scale datasets in the Humanities, enabling scholars to ask increasingly complex research questions.

Slides, sample data and instructions for exercises are downloadable here: CHASE InfoVis Handouts 2015.

Links for the various exercises are collected below for ease of access.

Exercise 1: Exploring network visualisations

Exercise 2: Comparing N-gram tools

Books

Newspapers

Exercise 3: Trying entity recognition

Exercise 4: Exploring scholarly data visualisations

Exercise 5: create a chart using Google Fusion Tables

Google Fusion Tables: https://www.google.com/fusiontables/data?dsrcid=implicit

An Excel version of this exercise is available at https://www.openobjects.org.uk/2015/03/creating-simple-graphs-with-excels-pivot-tables-and-tates-artist-data/

Exercise 6: Geocoding data and creating a map using Google Fusion Tables

Google Fusion Tables: https://www.google.com/fusiontables/data?dsrcid=implicit

Exercise 7: Applying data visualisation to your own work

Explore more visualisations:

Sketch ideas for visualisations:

Try visualising data in different tools:

Try visualising existing data