I was invited to present at Speaking in Code, an NEH-funded symposium and summit to 'give voice to what is almost always tacitly expressed in our work: expert knowledge about the intellectual and interpretive dimensions of DH code-craft, and unspoken understandings about the relation of that work to ethics, scholarly method, and humanities theory'. I've been writing about this for a while, so this event was both personally and professional important.
From my opening slide:
'There's a fundamental tension between available tools and cultural heritage data: we're trying to fit a square peg into a round hole. Do you craft the tools to the data or the data to the tool?
So what do you do with square pegs and round holes? You can chop off the interesting edges to fit something into a round hole, you can reduce the size of the entire peg so it'll slip through, or you can make a new bespoke hole that'll fit your peg. But then how do we make the choices we've made obvious to people who encounter the data we've squeezed through various holes? It's particularly important if people are using these collections in scholarly work to make the flattenings, exclusions that shape a dataset visible.
The choices you make will depend on your resources and skills, the audience for and the purpose of the final product… Will look at some examples of visualisations for exploring collections where I had to tidy the mess to make them work, and an example of designing software to cope with the messy reality it was trying to reflect.
I want to set the scene with my own experiences with cultural heritage data, but am curious to hear about your own experiences with messy data in your respective fields, and the solutions you've explored for dealing with it and conveying your decisions.'