'Introduction to Digital Humanities' was a new postgraduate course run by Dr Suzanne Paylor at Birkbeck College that examined the impact of technology on humanities research practice. It combined aspects of media studies, humanities computing and literary studies to foster an appreciation of the core methods and practical, political/philosophical and pedagogical issues in digital humanities.
I wrote and taught the following four classes in the Spring 2007 and Winter 2008 terms.
Class: Creating Digital Resources II: database design for the digital humanities
"Introduction to Digital Humanities", Birkbeck, Spring/Summer Term 2007, May 29, 2007 and December 2008
Class: New Working Models
"Introduction to Digital Humanities", Birkbeck, Spring/Summer Term 2007, May 15, 2007 and November 2008
Class: Creating Digital Resources
"Introduction to Digital Humanities", Birkbeck, Spring/Summer Term 2007, May 1, 2007 and November 2008
Class: Introduction to Databases
"Introduction to Digital Humanities", Birkbeck, Spring/Summer Term 2007, February 27, 2007 and October 2008
Presentation at Computer Applications and Quantitative Methods in Archaeology UK conference, January 24 – 26, 2007, Tudor Merchants Hall, Southampton
Buzzword or benefit? The possibilities of Web 2.0 for the cultural heritage sector
Mia Ridge, Museum of London
Computer Applications and Quantitative methods in Archaeology UK Chapter Meeting, January 2007
Buzzword or benefit?
Yes, 'Web 2.0' is a buzzword, eagerly seized on by marketers and venture capitalists; but the technologies, methodologies and development philosophies it describes are of great potential benefit.
While "the participatory Web" might more accurately describe the benefits for cultural heritage organisations, 'Web 2.0' is a useful shorthand or umbrella term for a set of related ideas about how we develop for and use the web.
What is Web 2.0?
Wikipedia, a free encylcopedia written by volunteers and itself a Web 2.0 site, defines Web 2.0 as:
"Web 2.0, a phrase coined by O'Reilly Media in 2004, refers to a perceived or proposed second generation of Internet-based services—such as social networking sites, wikis, communication tools, and folksonomies—that emphasize online collaboration and sharing among users."
Source: http://en.wikipedia.org/wiki/Web_2, January 2007
But what does that really mean?
The following characteristics of Web 2.0 sites may be particularly relevant for archaeology:
Services (applications) should get better the more people use them
Users add value through implicit and explicit content generation
Design for a seamless experience on PCs, handheld and mobile devices
Release web applications and services early and often
Take advantage of the Long Tail
Unique, hard to recreate data is a competitive advantage
Build for and with an 'architecture of participation'.
O'Reilly presents the following examples for 'Web 1.0' and 'Web 2.0':
upcoming.org and EVDB
domain name speculation
search engine optimization
cost per click
content management systems
It may be almost provocative to present the above table, particularly as the reader may not agree with the labelling of all the examples.
'Web 2.0' sites include those that profit from an architecture of participation and user-generated content such as Amazon; those that use tagging or 'social bookmarking' to label content like photos or URLs such as Flickr, YouTube, last.fm, Del.icio.us or Digg; sites that use technologies such as syndication (particularly RSS) – for example, using a site such as Bloglines to track new additions to the Portable Antiquities Scheme's 'Finds of Note' service; podcasts and blogs; social network sites such as Myspace and Facebook; and sites that use application programming interfaces and web services such as Google maps mashups. Even pre-Web 2.0 sites that require audience participation like AmIHotorNot or Kitten War (where you are presented with images of two kittens and you 'vote' by clicking on the cuter photo) created the right environment for the participatory web to thrive. Radical trust, as seen in Wikipedia, is another Web 2.0-ish idea.
Concepts discussed in this paper include folksonomies and 'the long tail'. Folksonomies are defined as 'collaborative categorization schemes' or 'user generated taxonomies' (http://en.wikipedia.org/wiki/Folksonomy).
The theory of the Long Tail states "products that are in low demand or have low sales volume can collectively make up a market share that rivals or exceeds the relatively few current bestsellers and blockbusters" (http://en.wikipedia.org/wiki/Long_tail). This has implications for organisations looking at the development and usage of online collections.
Heritage sector examples
Only a few examples are presented today but increasingly cultural heritage organisations are publishing sites that use some 'Web 2.0' technologies or methodologies. When looking at other projects from within our sector it is important to ask, who is doing it well and what can we learn from them? How does the site meet the requirements of its audiences, and what can we learn from projects that did not quite work?
Folksonomies and the Long Tail: The Powerhouse Museum and STEVE
"Steve" is a collaborative research project exploring the potential for user-generated descriptions of works of art and while it is mostly used in American art museums, it provides a good model for the implementation of tagging and folksonomies in the heritage sector. Folksonomies can be important for cultural heritage organisations because "allowing users to describe collections—using their own vernacular or language – may help other users find things that interest them". This may improve access to and encourage engagement with cultural content.
As the project website says, "social tagging offers professional and volunteer staff insights into the ways that visitors experience objects in your collection. It enables museums to transcend traditional distinctions by department or medium so you can better serve your publics" (http://www.steve.museum/).
The Powerhouse Museum
The Powerhouse Museum in Sydney re-launched their collection database online in June 2006.
Sebastian Chan from the Powerhouse Museum has a blog called 'Fresh + New' in which he talks about some of the work the Powerhouse is doing, including on-going tweaks to their OPAC 2.0 collection search.
In a post introducing their 'Collection 2.0' in June 2006 he said:
"Improving on previous collection search tools, OPAC 2.0 tracks and responds to user behaviour recommending 'similar' objects increase serendipitous discovery and encouraging browsing of our collection. It also keeps track of searches and dynamically ranks search results based on actual user interactions. Over time, this artificial intelligence will improve as it learns from users, and will allow for dynamic recommendations.
OPAC 2.0 also incorporates a folksonomy engine allowing users to tag objects for later recall by themselves or others. "
They have encouraged 'serendipitous discovery ' by implementing functionality such as the presentation of 'other search terms similar to X' and 'others who searched for X looked at' alongside search results as well as showing the tags other users have added so that 'almost every object view 'suggests' other objects to view'. This recommendation function is similar to Amazon's 'Customers who bought this item also bought'.
User-created tags are available as 'tag clouds', seen on the right-hand side of the screenshot, and are presented alongside the traditional museum-generated taxonomy.
The site has been a huge success. In August 2006 Chan posted:
"In just six weeks visitation to the Museum's website increased over 100%… In the 6 weeks from June 14-July 31 OPAC2.0 on its own received 239,001 visitors … who performed a total of 386,199 successful searches leading to object views … and over 1.2 million individual object views."
"The average number of successful searches per visit is 1.62.
The average number of objects viewed per visit is 5.02.
Contrast this with the single view per visit that objects on our previous 'packaged collection' received and the change is particularly marked."
At the 2006 National Digital Forum in New Zealand Chan reported that 95% of all available objects were visited at least once in the first ten weeks and that over 3,000 user tags had been added.
From the same post, this statement shows the effect of the long tail: "Now from our total object view figures we can determine that even the most popular object … represents only 0.1% of all views."
User-generated content: Çatalhöyük and Wessex Archaeology
User generated content
User-generated content and the 'participatory web' is a huge part of the success of Web 2.0. User-generated content (UGC) may be one of the biggest opportunities and one of the biggest challenges for heritage organisations.
For the purposes of this paper, user-generated content is considered as either 'implicitly generated' or 'explicitly generated' content. Implicitly generated UGC is created by the actions of users as they go about their normal business of viewing page, selecting search results or making purchases. This can be harnessed and used to generate things like general 'most viewed' lists or Amazon.com's 'people who bought this book also bought…' which is based on data generated by people with similar search and browsing patterns. The more a visitor uses Amazon and the more data they generate about their interests and preferences, the closer the matches get. Amazon 'Lists' and customer reviews are examples of explicitly generated content.
It is important to note that user-generated content is not written by random voices from an undifferentiated mass of users. Most sites require users to create a login and content is usually associated with a user name. Reputation and trust are important, whether 'Real Name' reviewers on Amazon, established authors on Wikipedia, or eBay sellers with good feedback. Amazon reviews are a good example of a reputation system – reviews can be rated by other users and Amazon gives a special status to the Top 1000 Reviewers.
Self-identifying and intentional users of cultural heritage websites include 'lifelong learners', subject specialists, potential clients (whether for archaeological work, or image or content licensing) and schools audiences. Other audiences include commercial contractors or clients and unintentional users who 'consume' our data via another interface entirely.
The benefits of the 'participatory Web' for both users and organisations go beyond greater visibility of cultural content and associated organisations – for example, specialist users may be able to add comments with information about parallels between objects in collection records held by different archaeological units or museums, or offer source references for more precise dating of objects. Web 2.0 technologies and practices may also help organisations engage with non-traditional audiences who encounter archaeological content in social network contexts or via web services.
The Çatalhöyük project team have put excavation photos online at the photo-sharing site Flickr, and created a Flickr group for Çatalhöyük photos. Users can also search for photos geotagged in the area or tagged with relevant keywords (for example, http://flickr.com/photos/tags/catalhoyuk/). Some Flickr users have added a lot of text and tags to their photos, suggesting a high level of engagement with the material. Ideally, this user-generated content could be integrated more closely with the main site at http://www.catalhoyuk.com.
Visitors to the Çatalhöyük website can download raw data from the project database and produce their own interpretations but currently there aren't any methods for visitors to inform the project and other visitors about how they used the data and any interpretative or scientific conclusions reached. Web 2.0 technologies and methodologies might provide ideal solutions. For example, linking photos visible on Flickr with the online excavation and finds database could be a good way to encourage users to comment on and tag the photos and contribute their interpretation of the team's data.
Wessex Archaeology have recently implemented a project to integrate their web pages with photos hosted on Flickr and found usage increased greatly:
"Over the 12 months before September 2005, our gallery was viewed by an average of 480 people per month. 6 months after we began using Flickr, in March 2006, we received 5664 visitors (based upon sessions) to the FAlbum gallery script."
What are the benefits of Web 2.0 for cultural heritage organisations?
How do the technologies and methodologies described as Web 2.0 apply to those working in archaeology or the cultural heritage sector? What can be picked from the hype and applied for the benefit of our organisations and our online audiences?
The specific benefits of Web 2.0 will depend on the goals of the organisation, but these may apply:
Wider dissemination of archaeological knowledge
Greater and more meaningful audience engagement
Re-use of existing content
Publicity and positive branding for the organisation
Better publication of a wider range archaeological data
Accessible primary data and transparent processes
The Web 2.0 approach to standards-based presentation is advantageous for content producers and consumers. It encourages platform independence –
audiences may well be on mobile devices rather than a traditional PC. It is a chance to view creating attractive, innovative, accessible and standards-compliant sites as a challenge. Finally, standards compliance is increasingly a requirement for project funding from public bodies.
The use of open standards in application development and architecture is also beneficial. Many Web 2.0-ish applications are Open Source or run on Open Source platforms. One immediate benefit is that Open Source applications and platforms are free. Another is that they have years of development and testing with vociferous and demanding real world users behind them so they tend to be the robust and stable solutions.
The use of existing Web 2.0 infrastructures saves development costs. Audiences benefit from familiar interfaces and pre-existing logins, which may further encourage active engagement with 'cultural' content. Satisfied users will lead their social networks to your content, and create links and tags that will refer other users to the same service.
The use of Web 2.0 practices provides an impetus for organisations to think about the best licence under which to release content. Would they and their users benefit if content is released under a Creative Commons or Copyleft licence?
What are the potential disadvantages and barriers to Web 2.0?
If data is to be published through a particular API or standard, the standard must be accessible to content creators and publishers as well as being realistically implementable. When there are competing standards, the choice between one and the other should be made carefully or deferred until a de facto standard emerges.
Some organisations will not have a dedicated IT department, or the technologies of Web 2.0 might not match the skillsets or training of existing staff. Some organisations do not have any control over their servers or architecture.
Organisations must be prepared for the possible challenge of corrections or re-interpretations from their audiences. Curators, archaeologists and other specialists are used to being the experts, disseminating their knowledge to the eager masses. How will they react to the idea of the masses writing back or expecting active engagement with the organisation? Further, are kids using organisations' pictures as the background to their Myspace page brats who are stealing bandwidth and intellectual property, or are they possibly members of a new generation of active and meaningfully engaged museum visitors?
Content may be placed in proximity with inappropriate or commercial content such as ads on commercial sites and the implications of this must be considered. Organisational branding might be affected by direct publication methods such as blogs.
Copyright clearance must be confirmed for object metadata, images and GIS data. Licensing requirements for software used, and client confidentiality should be considered.
Some existing social networking sites such as Myspace or Flickr tend to require active engagement and participation on the part of the institution or publisher to respond to audience engagement appropriately and get the full benefits of the site. This can have implications for the resources required to participate. In some cases it may not be appropriate for an organisation to comment on user-generated content and this must be considered.
Any user-generated content may require moderation and this has also resourcing implications. For example, at the Museum of London it has been noticed that every new collection site increases the number of curator enquiries. User forums and tagging may require on-going monitoring and moderation or the generation of new interpretation or research.
It is worth considering whether releasing content to distributed services will impact reported visitor numbers to traditional website or have an 'opportunity cost'. The usage of content released to web services may not be quantifiable.
Resistance may be encountered to the idea of committing resources to 'unproven' technologies. Mapping providers could disappear or start charging to use their web services or APIs, or change their licensing terms. Changes to the economic model might mean that existing services become unaffordable. Careful investigation can alleviate these concerns.
Overcoming barriers and using Web 2.0
While there is research to be done on the best practices for Web 2.0 and the cultural heritage sector, the potential benefits for organisations and their audiences outweigh the risks and disadvantages. Organisations that develop Web 2.0-ish applications and data will contribute to the future understanding of the best ways to serve the diverse organisations and audiences in cultural heritage.
Ideally, organisations should consider why they want to take advantage of Web 2.0 technologies and practices, investigate the requirements of their audiences and consider the content available to them, then begin with small-scale projects then build on the response as appropriate. To quote the programming adage, "release early and often". Monitor usage and adjusting project requirements accordingly.
Suggestions for getting started include:
Look at digitising and publishing existing copyright free audio or video content as a podcast or on the video sharing site YouTube.
It is easy and free or cheap to create a blog or a Flickr account to test uptake and usage.
Publish your events data in the events microformat so they can be included in social event sites such as upcoming.org.
Create an RSS feed for your News page.
Review the terms visitors have used in search engines to find your site.
Review terms used in internal site searches and any labels applied to content.
Set up a wiki or forum for people to comment on your data.
Consider publishing existing database(s) of records enhanced for online publication as an XML feed or CSV download.
Investigate creating a Google maps mashup or interface, or geotag photos and publish them on Flickr.
Embrace your 'long tail'.
Encourage users to tag images or records in languages other than English to start enabling multi-lingual access to your content.
Tag your favourite specialist sites on a social bookmarking site.
Organisations without dedicated IT resources may feel that Web 2.0 applications are out of their reach, but this does not have to be the case. Take advantage of existing models, particularly from commercial sites that have User Interface and Information Architect specialists – examine how Amazon separates user-generated content from 'official' content from publishers and authors, or how Flickr presents user comments compared to labels given by the content owner.
Match the technology to the content – you probably wouldn't publish your full finds catalogue as an RSS feed but if you regularly add small numbers of items a syndication or subscription model might be appropriate. Ensure that any content created can be exported into a interoperable or portable format in case the application or supporting organisation fails, and take regular back-ups. Investigate using open standards and Open Source Software where feasible, particularly where this avoids lock-in to proprietary systems.
Reduce moderation requirements by experimenting to set the right barriers to entry to suit specific audiences and content. Allow users to report offensive content for review by a moderator. Implement formal or informal peer review and reputation models to help the useful content rise and the non-useful content filter down. For example, tag clouds give prominence to the most popular tags while less popular tags become less visible; forum postings can be given 'karma' ratings by other visitors.
Consider using user-centred design methodologies, lightweight programming models and agile methods to reduce development time, therefore reducing the investment required. Take advantage of existing applications, services, APIs, architectures and design patterns wherever possible.
Finally, design for extensibility. Applications and infrastructure should be sustainable, interoperable and re-usable.
"Companies that succeed will create applications that learn from their users, using an architecture of participation to build a commanding advantage not just in the software interface, but in the richness of the shared data."
Tim O'Reilly, "What Is Web 2.0: Design Patterns and Business Models for the Next Generation of Software".
While the term 'Web 2.0' may just be more hype, the examples discussed have shown that there are real benefits to the technologies and methodologies described as 'Web 2.0'.
Chan, Sebastian. "Initial impacts of OPAC2.0 on Powerhouse Museum online visitation", http://www.powerhousemuseum.com/dmsblog/index.php/2006/08/10/initial-impacts-of-opac20-on-powerhouse-museum-online-visitation/
Chan, Sebastian. "Opening the gates: new opportunities in online collections", http://ndf.natlib.govt.nz/about/forum2006-files/schan.pdf via http://ndf.natlib.govt.nz/about/projects.htm#_ndf2006
Chan, Sebastian. "Powerhouse Museum launches Web 2.0-styled collection search", http://www.powerhousemuseum.com/dmsblog/index.php/2006/06/08/powerhouse-museum-launches-web-20-styled-collection-search/
Goskar, Tom. "Wessex Archaeology And Flickr: How We Use Web 2.0", http://www.24hourmuseum.org.uk/nwh/ART41987.html
SCPR Annual Conference, September 16, 2006
London Archaeological Archive and Research Centre, Mortimer Wheeler House
The paper discusses the process from initial specification through requirements gathering, database design, development of the database application and website, to publication online. This was later published in the Newsletter of the Society for Clay Pipe Research.
Update, December 2011: if you're interested in clay pipes, you may be interested in Locating London's Past. The site also has an article that explains how Museum of London Archaeology (MoLA) Datasets – including clay pipes and glass – have been incorporated into the site. NB: other than adding these links, I haven't updated the original 2006 paper, so it doesn't include any enhancements made for this new work. On a personal note, it's lovely to see that the sites, and the backend work behind them, still have value.
Update, November 2012: the Society for Clay Pipe Research's Newsletter featured as Guest Publication in the BBC's Have I Got News For You. Fame, at last!
Update, January 2015: possibly the best clay pipe ever?
Super mudlarking find on the foreshore of the Thames today.Beautiful clay pipe depicting woman on the toilet! pic.twitter.com/Nb17supwCW
As the Çatalhöyük Archive Report 2006 is only available online as a large PDF, I've copied the report below, but you can find additional reporting of my work in specialist reports like the Figurines report. I also contributed to the Çatalhöyük blog during the 2006 season.