Buzzword or benefit? The possibilities of Web 2.0 for the cultural heritage sector

Mia Ridge, Museum of London

Computer Applications and Quantitative methods in Archaeology UK Chapter Meeting, January 2007

[DRAFT]

Buzzword or benefit?

I'm going to mix things up and give you my conclusion now: yes, 'Web 2.0' is a buzzword, eagerly seized on by marketers and venture capitalists but the technologies, methodologies and development philosophies it describes are of great potential benefit. 

Many ideas described as 'Web 2.0' are not new and you have probably implemented some already as part of other projects so I don't think I'll be saying anything too shocking today.  However, I suggest that though "participatory Web" might be a better phrase, as an umbrella term 'Web 2.0' is useful shorthand for a set of related ideas about how we develop for and use the web.

I would like to start by saying that I think of this paper as the start of a conversation between peers rather than a pronouncement.  I am definitely not here to tell you what you should be doing, particularly as some of you have been doing really exciting thing already, and I don't believe anyone should implement new technologies without a compelling (and ideally user-centred) reason for doing so.

I would like to get people in the cultural heritage sector excited about the possibilities of these technologies and methodologies, and maybe inspire you to try some Web 2.0 ideas out.  My inspiration for this paper was the conversations I've been having as part of the Semantic Web Think Tank (SWTT), an AHRC-funded thinktank.  You can follow the progress of the SWTT at http://culturalsemanticweb.wordpress.com.  

What is Web 2.0?

I quite like the irony of using a Web 2.0 site to define Web 2.0 so first, Wikipedia's definition:

"Web 2.0, a phrase coined by O'Reilly Media in 2004, refers to a perceived or proposed second generation of Internet-based services—such as social networking sites, wikis, communication tools, and folksonomies—that emphasize online collaboration and sharing among users."

Source: http://en.wikipedia.org/wiki/Web_2, January 2007

[insert image http://en.wikipedia.org/wiki/Image:Web20_en.png, caption: The same page also has a lovely image of a tag cloud of buzzwords]:

But what does that really mean?

I think the following points are particularly relevant for archaeology:

General examples

Or, 'methods and technologies commonly considered Web 2.0-ish'.

O'Reilly presents the following examples for 'Web 1.0' and 'Web 2.0':

Web 1.0

 

Web 2.0

DoubleClick

-->

Google AdSense

Ofoto

-->

Flickr

Akamai

-->

BitTorrent

mp3.com

-->

Napster

Britannica Online

-->

Wikipedia

personal websites

-->

blogging

evite

-->

upcoming.org and EVDB

domain name speculation

-->

search engine optimization

page views

-->

cost per click

screen scraping

-->

web services

publishing

-->

participation

content management systems

-->

wikis

directories (taxonomy)

-->

tagging ("folksonomy")

stickiness

-->

syndication

I am being almost deliberately provocative in presenting the above, particularly as I don't think all the examples are dichotomous, but it's an interesting table.

Relevant examples could include blogs; sites that profit from an architecture of participation and user-generated content, such as Amazon; sites like Flickr, YouTube and last.fm that use tagging (labelling content such as photos or URLs) and social bookmarking sites like Del.icio.us or Digg for folksonomies (user-generated shared vocabularies); syndication; RSS (Bloglines for reading, Feedburner for tracking); podcasts; social networks (Myspace, Facebook, Faceparty); APIs and web services driving applications like Google maps mashups, and the small 's' 'w' semantic web.  Even seemingly pre-Web 2.0 sites like AmIHotorNot or Kitten War (where you are presented with images of two kittens and you 'vote' by clicking on the cuter photo) are examples of the participatory web.  Radical trust, as seen in Wikipedia, is another lovely Web 2.0-ish idea.  It works for Wikipedia and it may well work for us too.

[insert image kittenwar.png, caption 'Screenshot of kittenwar.com, a participatory site']

Heritage sector examples

I'm only presenting a few examples today but more and more cultural heritage organisations are publishing sites that use some 'Web 2.0' technologies or methodologies.  When looking at other projects from within our sector I think it's important to ask who is doing it well, and what can we learn from them?  Equally important, does the site actually work for its audiences, and what can we learn from projects that don't work?

The Powerhouse Museum: folksonomies and the Long Tail

[insert image Powerhouse1.png, caption 'Screenshot of the Powerhouse Museum Collection interface, showing 'user keywords' and traditional museum-generated categories.']

The Powerhouse Museum in Sydney re-launched their collection database online in  June 2006.

Sebastian Chan from the Powerhouse Museum has a blog called 'Fresh + New' in which he talks about some of the work the Powerhouse is doing, including on-going tweaks to their OPAC 2.0 collection search.

In a post introducing their 'Collection 2.0' in June 2006 he said:

"Improving on previous collection search tools, OPAC 2.0 tracks and responds to user behaviour recommending ’similar’ objects increase serendipitous discovery and encouraging browsing of our collection. It also keeps track of searches and dynamically ranks search results based on actual user interactions. Over time, this artificial intelligence will improve as it learns from users, and will allow for dynamic recommendations.

OPAC 2.0 also incorporates a folksonomy engine allowing users to tag objects for later recall by themselves or others. "

They have encouraged 'serendipitous discovery ' by implementing functionality such as the presentation of ‘other search terms similar to X’ and ‘others who searched for X looked at’ alongside search results as well as showing the tags other users have added so that 'almost every object view ’suggests’ other objects to view'.  This recommendation functionality is similar to Amazon's 'Customers who bought this item also bought'.

User-created tags are available as 'tag clouds', seen on the right-hand side of the screen, and are presented alongside the traditional museum generated taxonomy.

And it's been a huge success.  In August 2006 Chan posted:

"In just six weeks visitation to the Museum’s website increased over 100% (excluding spiders and bots). In the 6 weeks from June 14-July 31 OPAC2.0 on its own received 239,001 visitors (excluding internal museum users) who performed a total of 386,199 successful searches leading to object views (we currently track anonymous data on search terms linked to object views to provide the necessary data for our recommendation engine) and over 1.2 million individual object views."

...

"Now from our total object view figures we can determine that even the most popular object - the steam locomotive no 3830 - represents only 0.1% of all views. Because OPAC2.0 has only be online for 7 weeks we are yet to reach a point where ALL possible objects have been viewed at least once - but we are already at 75%."

...

"The average number of successful searches per visit is 1.62.
The average number of objects viewed per visit is 5.02.

Contrast this with the single view per visit that objects on our previous ‘packaged collection’ received and the change is particularly marked."

At the 2006 National Digital Forum in New Zealand Chan reported that 95% of all available objects were visited at least once in the first ten weeks and that over 3,000 user tags had been added.

Steve

"Steve" is another example of a Web 2.0-ish development and while it's mostly used in American art museums, it's been a big driver in the implementation of tagging in the heritage sector generally.

""Steve" is a collaborative research project exploring the potential for user-generated descriptions of the subjects of works of art to improve access to museum collections and encourage engagement with cultural content." (http://sourceforge.net/projects/steve-museum)

As they say, "Social tagging offers professional and volunteer staff insights into the ways that visitors experience objects in your collection. It enables museums to transcend traditional distinctions by department or medium so you can better serve your publics."

Çatalhöyük

One example I'm personally involved with is the Çatalhöyük project.  It's a Neolithic site in Turkey that generates a lot of interest, partly because of the history of the site and partly because it's being dug and recorded with post-processual methodologies.  We've taken advantage of some Web 2.0-ish applications but we still have work to do on integrating user-generated content with the general site databases.

The project team have put photos online at Flickr, and created a Flickr group for Çatalhöyük photos.  You could also search for photos geotagged in the area or tagged with relevant keywords (http://flickr.com/photos/tags/catalhoyuk/).  Some people have added a lot of text to their photos, so it would be wonderful to integrate that content more closely with the main site at http://www.catalhoyuk.com/.  Linking photos hosted on Flickr with the online excavation and finds database could be a good way to encourage users to comment on and tag the photos and contribute their interpretation of the team's data.  However, it would also require resources to select the best photos from the vast mass of excavation and lab photos, and I'm not sure if tagging or commenting on photos offers the same richness of experience as commenting directly on excavation or find record. 

We have a project forum but while I haven't conducted a formal analysis, it does not seem to have been a success - the view numbers are reasonable but new posts are rare.  I'm not sure what the barriers to participation are in this case - is the hassle of creating an account too much?  Is it the unfamiliar or unknown context (who is reading? who else is posting?) or is it inertia, or do people not feel empowered to post?

Visitors to the Çatalhöyük website can download raw data and produce their own interpretations but as we're still investigating other ways to let people add their own content to the site we're not sure how people are using the data.  We need to do some analysis of our users and their requirements but always the implementation itself is a resourcing issue. 

Overall, I feel we haven't yet taken full advantage of the database and the possibilities of Web 2.0 but preliminary investigation suggests that we can use existing technologies to reach diverse audiences.

User generated content

Participation, and the 'participator web' is a huge part of Web 2.0.  I think user-generated content (UGC) is one of the biggest opportunities, and one of the biggest challenges for heritage organisations.  Do we want to hear what 'they out there' have to say about our archaeology, finds and interpretation?  Is letting the user into the walls of the institution a mistake or an opportunity?

UGC can also be called 'harnessing collective intelligence', which isn't to be confused with the power of the mob or even the 'wisdom of the crowd'. 

UGC can be either 'implicit' or 'explicit'.  Implicit UCG is created by the actions of users as they go about their normal business of viewing page, selecting search results and making purchases.  This can be harnessed and used to generate things like 'most viewed' lists, (which is a bit of a sledgehammer approach), or Amazon's 'people who bought this book also bought...' which is smarter because it's already matching you to a niche of people a bit like you.  The more you use Amazon and the more data you generate about your interests and preference, the closer the matches get.  Perfect if you're trying to increase sales or views of content.

Amazon 'Lists' and reviews are explicitly generated content.  Amazon reviews are also a good example of a reputation system - reviews can be rated by other users and Amazon gives a special status to the Top 1000 Reviewers.

It's important to note that user-generated content isn't written by random voices from undifferentiated mass of users. Reputation and trust are important, whether 'Real Name' reviewers on Amazon, established authors on Wikipedia, or eBay sellers with good feedback.

So who are these users and how might they participate and actively engage with our content?  I suspect some of our self-identifying and intentional users include 'lifelong learners', subject specialists, potential clients (whether for archaeological work, or image or content licensing) and schools audiences.  The 'participatory Web' can benefit all these users and our organisations can benefit too.

The benefits go beyond greater visibility  - for example, finds specialists may be able to add comments with information about parallels between objects in your collection and objects in another collection, or offer source references for more precise dating of one of your objects.

One of the advantages of Web 2.0 is that we can attract non-traditional museum audiences.  It would be interesting to see to what extent this is true.  Search engine traffic can bring non-traditional audiences who are searching for a particular item or piece of information, but search engines can now include content from YouTube and Flickr.  As organisations, we have an opportunity to support and encourage visitors who come across archaeological contact through a general search.

I have a feeling (though I haven't done any research on this) that if you build stand-alone interfaces for each data set your users will be restricted to specialists except for search engine users who wander in by mistake.  If you publish content in an existing interface, or release it via established web services, you may reach a much wider audience.

What are the benefits of Web 2.0 for us?

So how does Web 2.0 apply to those of us working in archaeology?  What can we pick out of the hype and actually apply for the benefit of our organisations and our online audiences?  Our audiences might be commercial contractors or clients, visitors to our websites or people who 'consume' our data via another interface entirely.

The specific benefits of Web 2.0 will depend on the goals of your organisation, but these may apply:

The Web 2.0 approach to standards-based presentation is advantageous for content producers and consumers.  It encourages platform independence - beyond the browser or platform wars, your audience may well be on mobile devices rather than a traditional PC.  It's a chance to view creating attractive innovative standards-compliant sites as a challenge.  Finally, standards compliance is increasingly a requirement for project funding from public bodies.

The use of open standards in application development and architecture is also beneficial.  Many Web 2.0-ish applications are Open Source or run on Open Source platforms.  One immediate benefit is that Open Source applications and platforms are free.  Another is that they have years of development and testing with vociferous and demanding real world users behind them so they tend to be the robust and stable solutions.

You can save development costs by using existing Web 2.0 infrastructures.  You can also take advantage of users familiarity and experience with their interface, existing logins, etc. to encourage take-up and active participation.  (Users can interact with your content before they've even realised they're viewing 'cultural' content!)

I think there's benefit in the impetus for your organisations to think about best licence under which to release content.  Do you need to hold onto it or would they and their users benefit if you 'let it go' under a Creative Commons or Copyleft licence?

Happy users will lead their social network (i.e. friends) to your content, as well as creating links and tags that will refer other users of the same service.  Network effects help users find relevant data, and follow paths created by previous visitors.

What are the potential disadvantages?

It would be unfair if I didn't mention some of the possible disadvantages of leaping aboard the Web 2.0 bandwagon.

You could waste time and resources chasing new technologies that may or may not be around in a few years.  Your mapping partner could disappear or start charging to use their web services or APIs, or change their licensing terms.  You could always update your services to use a new API but if the market model changes, access may not be affordable.

Ideally, you should have legal advice before publishing content on sites with licences that claim the right to re-publish your content or content generated in response to your content in case this restricts your ability to use that content elsewhere.

If you choose to publish your data through a particular API or standard, the standard must be accessible to content creators and publishers as well as being realistically implementable.  When there are competing standards, you should choose carefully.

Some existing social networking sites such as Myspace or Flickr tend to require active engagement and participation on the part of the institution or publisher to get the most from the site.  Your organisation may not have the resources to do this or it may not be appropriate for it to comment on other users.  This may compromise your brand on the site.  Research could be done on the effectiveness of museums already on sites such as Myspace.

User-generated content may require moderation and this has resourcing implications.  At the Museum of London we've noticed that every new site increases the number of curator enquiries - FAQs help but there's still an increase in enquiries that should be planned for.

Your content may be placed in proximity with inappropriate or commercial content such as ads on commercial sites.

Distributed services may impact visitor numbers.  This is an issue for museums whose reported visitor numbers to the DCMS (Department of Culture, Media and Sport) affect their funding.

There may be an 'opportunity cost' in releasing content.  If we release our data, haven't we lost our competitive advantage?  I haven't seen evidence to suggest this but it's worth mentioning.

If a similar organisation has a critical mass of user-generated content, how will that affect your offering?  Could a partnership be an appropriate solution?

What happens to content that's 'leaked out' via web services and lost its original context?  Do we need to worry about it?

If you build it, will they come?  What if you had a Web 2.0 party and nobody came?  I doubt this would happen, but consider using user-centred design methodologies and lightweight programming models and agile methods to reduce development time, therefore reducing the investment required, as well as "releasing early and often" and monitoring usage and adjusting project requirements accordingly. 

Finally, are you prepared for the possibility of corrections or re-interpretations from your audience?  How will your organisation deal with this challenge?

Barriers to the world of Web 2.0

Institutional

You may find resistance to the idea of committing resources to 'unproven' technologies.  Sometimes it doesn't hurt to let the dust settle, or the hype fade - IT is subject to fashion trends just like any other field, the trick is knowing when a technology or industry has matured.

Are kids using one of your pictures as the background to their Myspace page brats who are stealing your bandwidth and intellectual property, or are they possibly members of a new generation of active and meaningfully engaged museum visitors?  How will your organisation settle this question?

Do you need to report online visitors to funding bodies or your organisation?  This needs thought if you're considering releasing content via web services as once consumed by other services you won't have direct access to usage figures.  The SWTT is considering pushing for a change in the visitor number/funding model for DCMS museums so let me know if you have any thoughts on that.

Your marketing department may worry about how direct publication methods like blogs might affect organisational branding.

They might say, "the web is full of idiots, why would we want to let them comment on our data?"  This is where the peer review and reputation models are so valuable - they help the useful content rise and the un-useful content filter down.  Some Amazon reviews are comically bad, but they can be rated down and good reviews rated up.  Wiki entries can be hacked, but with email notification of changes to active editors they'll probably be fixed within hours.  Some content tags may seem a little random, but tag clouds give prominence to the most popular tags and less popular tags become less visible.  I think we can even learn from 'bad' data - it can be a fascinating insight into our audiences.

Perhaps more challenging is a possible result of allowing the visitor behind the scenes.  Museum curators, archaeologists and other specialists are used to being the expert, disseminating their knowledge to the eager masses.  How will they react to the idea of the masses writing back?  I haven't yet come across this attitude so far but I suspect the participatory web will challenge our institutions at some point.

Technical

You might not have a dedicated IT department, or you might be that IT department yourself and the technologies of Web 2.0 might not match your skillset or training.  You might not have any control over your servers or architecture.

Make sure you can retrieve data from anything you start because nothing would be more disheartening than to create content and have your users create content and then not be able to get it out of a proprietary or licensed system.  For the same reason, make sure your data is backed up regularly.

Legal

Copyright - do you own it for your objects, images, GIS data and interpretation?  Are there any licensing issues with the software used to analyse or publish your data?  Are there any issues with client confidentiality?

Resources (or lack of)

This is fairly self-explanatory.  But you don't need a dedicated IT department or web programmer, for the reasons I'll explain below.

However, if you don't own copyright or have licensing for your content, resources will become an issue.  However, archaeological data is generally free of the copyright issues that plague museums.

You can experiment to set the right barriers to entry for your particular audiences and content to reduce moderation requirements.  This is another reason for monitoring the usage of your content and services.

Overcoming barriers and going Web 2.0

Think about why you're doing it and who you're doing it for, and then try something small.  Be prepared to be surprised about how your content or application is used - hopefully in a good way! 

Match the technology to the content - you probably wouldn't publish your full finds catalogue in RSS but if you regularly add small numbers of items a subscription model might be appropriate.  The PAS scheme has an RSS feed for items reported.

Try to use Open standards and Open Source Software where possible.  This saves lock-in to proprietary systems.

Don't re-invent the wheel, but do be careful where you invest your data.

When developing applications or services, consider the Unix model of "doing one thing and doing it well".

Take advantage of existing models, particularly from commercial sites that have User Interface and Information Architect specialists - examine how Amazon separates user-generated content from 'official' content from publishers and authors, or how Flickr presents user comments compared to labels given by the content owner.

Conclusion

"Companies that succeed will create applications that learn from their users, using an architecture of participation to build a commanding advantage not just in the software interface, but in the richness of the shared data."

Tim O'Reilly, "What Is Web 2.0: Design Patterns and Business Models for the Next Generation of Software".

While the term 'Web 2.0' may just be more hype, I hope that some of the examples discussed have shown that there are real benefits to the technologies and methodologies commonly considered Web 2.0.

There's a still lot of research to be done on the best practices for Web 2.0 and the cultural heritage sector generally but I think the potential benefits outweigh the risks and disadvantages.  Your Web 2.0-ish applications and data will contribute to our future understanding of the best ways to serve our diverse organisations and audiences.

Bibliography

Chan, Sebastian. "Initial impacts of OPAC2.0 on Powerhouse Museum online visitation", http://www.powerhousemuseum.com/dmsblog/index.php/2006/08/10/initial-impacts-of-opac20-on-powerhouse-museum-online-visitation/

Chan, Sebastian. "Opening the gates: new opportunities in online collections", http://ndf.natlib.govt.nz/about/forum2006-files/schan.pdf via http://ndf.natlib.govt.nz/about/projects.htm#_ndf2006

Chan, Sebastian. "Powerhouse Museum launches Web 2.0-styled collection search", http://www.powerhousemuseum.com/dmsblog/index.php/2006/06/08/powerhouse-museum-launches-web-20-styled-collection-search/

Goskar, Tom. "Wessex Archaeology And Flickr: How We Use Web 2.0", http://www.24hourmuseum.org.uk/nwh/ART41987.html

O'Reilly, Tim. "Web 2.0 Compact Definition: Trying Again", http://radar.oreilly.com/archives/2006/12/web_20_compact.html

O'Reilly, Tim. "What Is Web 2.0: Design Patterns and Business Models for the Next Generation of Software", http://www.oreillynet.com/pub/a/oreilly/tim/news/2005/09/30/what-is-web-20.html

Powerhouse Museum, Sydney, "Collection 2.0", http://www.powerhousemuseum.com/collection/database/

Semantic Web Think Tank blog, "UK Museums and the Semantic Web; A not so formal ongoing commentary and discussion on the AHRC-funded thinktank", http://culturalsemanticweb.wordpress.com

Sierra, Kathy.  "Why Web 2.0 is more than a buzzword", http://headrush.typepad.com/creating_passionate_users/2006/11/why_web_20_is

_m.html

Wikipedia, Web 2.0, http://en.wikipedia.org/wiki/Web_2

Sites mentioned in this paper:

http://www.amazon.co.uk/

http://www.bloglines.com/

http://www.catalhoyuk.com/

http://www.del.icio.us/

http://www.digg.com/

http://www.facebook.com/

http://www.faceparty.com/

http://www.feedburner.com/

http://www.flickr.com/

http://www.hotornot.com/

http://www.kittenwar.com/

http://www.myspace.com/

http://www.reddit.com/

Updates and related content

http://openobjects.blogspot.com/