MW-photo
April 11-14, 2007
San Francisco, California

Tagging and Searching – Serendipity and museum collection databases

Sebastian Chan, Powerhouse Museum, Australia

http://www.powerhousemuseum.com

Abstract

In mid-2006 the Powerhouse Museum launched a new on-line catalogue http://www.powerhousemuseum.com/collection/database). Inspired and informed by the explosion of Web2.0 sites and services, the new collection database aimed not only to provide a 'better,' more usable museum catalogue, but also to explore ways to leverage user interest and community knowledge.

Internally called OPAC2.0, the new catalogue put more than 70% of the Museum's collection on-line. In order to operate effectively, OPAC2.0 collects detailed information about search terms and object relationships as well as tagging and controlled vocabulary usage patterns. With these and other evaluation tools built in to the structure of the site from day one, OPAC2.0 has been conceptualised as an ongoing project requiring continual enhancements and usability modifications.

This paper examines the OPAC2.0 project and its impact on the Museum. It presents initial usage patterns, search trends, and social tagging trends over the first 6 months of operation (from June 14 to December 31, 2006). In particular, the paper explores the impacts of opening up and the driving of traffic down the 'long tail' of the Museum's collection; tag structures submitted by users using the folksonomy engine; and the internal Museum changes that have come about as a result of unprecedented user access and, importantly, user input and engagement.

Keywords: Web 2.0, evaluation, user analysis, folksonomy, user engagement, databases

Introduction

Museum collections are increasingly being digitised and made available to the public on-line. Whilst many organisations make their collections available, most have been slow to implement or consider the user-centred navigation and exploration tools that have been available on many consumer Web sites such as Amazon for years. Even fewer have optimised their collections for Google (http://www.google.com) and other search engines. In many cases the lack of extra feature sets is the result of using out-of-the-box software provided by collection management system vendors whose primary functionality focus remains internal museum record keeping.

In many museums, too, the organisational focus on exhibitions and 'expert-led' storytelling through narrative arrangement of objects remains a priority. Translated to the Web, this often results in stand-alone, project-funded, virtual exhibitions or small 'curated' sets of objects deemed to be of interest to the public, even though the space limitations of the gallery are no longer present on the Web.

In Sydney, the Powerhouse Museum (http://www.powerhousemuseum.com) promotes itself as a museum of 'design, science and social history' . It hosts the annual Sydney Design festival, and in recent years its major exhibition focus has been oriented around themes of design, technology and contemporary 'popular culture' . These factors, combined with the sheer enormity of the collection (some 400,000 objects), are reasons that there are many areas of the Museum's collection that have not been accessible to the public.

The Powerhouse Museum's Web site was relaunched in 2004 after a two-year user study recommended that the new look Web site focus on the needs of users wanting to visit the Museum (Sumption, 2003). Amongst the changes were a visitor-centred navigation system and the inclusion of 'people-centric' images of the Museum on the home page and subsequent page headers - visitors enjoying the 'experience' of the Museum rather than collection objects, for example. Coinciding with the new Web site has been promotion, both on-line and in traditional media, of the Museum as a 'destination' and an 'experience' .

On the new look Web Site, usage of non-visit-related resources was seasonal and influenced by high school semesters and key annual education dates, such as state-wide examinations for which the Museum had developed specialist but popular on-line study guides. Similarly, collection resources on the Web site were limited to static pages with 150 key objects, 700 object records held in our Sydney 2000 Olympic and Paralympic Games collection microsite (http:///www.powerhousemuseum.com/sydney2000games), 200 Tyrrell Collection historical images (http://www.powerhousemuseum.com/tyrrell), and, most recently, 150 photographs and objects from the Hedda Morrison collection (http://www.powerhousemuseum.com/heddamorrison).

In June 2006, the Museum launched a new means for browsing and searching almost all of its collection database. Although a large part of the Museum's catalogue had been previously searchable through the museum-to-museum professional portal Australian Museums On Line (now Collections Australia Network), object records in this catalogue search had not been updatable since 2001 and were out of date, and lacked images. To optimise usage required a familiarity with collecting and museological practice.

Internally known by the rather dry project name OPAC2.0 (on-line public access catalogue), the new collection search on the Powerhouse Museum's own Web site implemented a number of key features aimed at making the Museum's collection considerably more usable, 'browsable' and 'discoverable' .

This paper examines the usage data accumulated and logged through these features and explores the questions of where visitors actually go within a diverse collection, and, how they use new features like folksonomies (user keywords, tagging) and search recommendations.

About OPAC2.0

OPAC2.0 was built in-house using PHP and Microsoft SQL. The project developed out of several internal experiments - an information kiosk in our permanent design gallery (Inspired! Design across time) connected to the Museum's collection management system KE Software's kEmu product), the Electronic Swatchbook project which trialled user responses to some basic user tagging, and an on-line virtual photographic gallery, the Hedda Morrison collection, which trialled public access to an Emu-connected collection and object-related user feedback. Development of OPAC2.0 has been modular, experimental and 'unfinished' , and for the first 6 months of operation the public site was still tagged as being in 'beta' .

Fig 1.1 Inspired! Design across time in-gallery kiosk

Fig 1.1 Inspired! Design across time in-gallery kiosk

Fig 1.2 Electronic swatchbook showing swatch zoom detail (http://www.powerhousemuseum.com/electronicswatchbook/)

Fig 1.2 Electronic swatchbook showing swatch zoom detail (http://www.powerhousemuseum.com/electronicswatchbook/)

Fig 1.3 Hedda Morrison photographic collection microsite (http://www.powerhousemuseum.com/heddamorrison)

Fig 1.3 Hedda Morrison photographic collection microsite (http://www.powerhousemuseum.com/heddamorrison)

Back End

OPAC2.0 periodically harvests selected fields from objects records stored in the Museum's collection management system (KE EMu). These records are stored in a Microsoft SQL database with full-text indexing on all descriptive fields (statement of significance, history, description, marks, credit line and dimensions). Each record also contains reference pointers to associated multimedia objects (currently zoomable images only, but the same application is capable of video or other multimedia). These pointers point to multimedia objects that are currently duplicated on a Web server, to add a layer of security and enhance performance rather than serving them directly from KE EMu. Whilst the KE EMu product does have its own Web interface, it was decided to create a new one to allow for flexibility in terms of implementation, additional user tracking and other features, and for speed and scalability.

Front End

The site's front end is comprised of three main pages - a home/search page, a search results page, and an object view page. The site implements a series of features to enhance navigability. Each page of these three base-template pages is structured to maximise pivot or exit points (Vander Wal, 2006). On each page is a large search box allowing searches to be performed at any time and at any point in the site. The right hand column of the search results page allows for further filtering of results, whilst the same column position on the object view page can contain user keywords, related searches, subject keywords and related objects, depending upon the object being shown.

Fig 2.1 Home page showing search box at upper left, category navigation at bottom left, and tag cloud at bottom right (http://www.powerhousemuseum.com/collection/database)

Fig 2.1 Home page showing search box at upper left, category navigation at bottom left, and tag cloud at bottom right (http://www.powerhousemuseum.com/collection/database)

Fig 2.2 Search results showing similar searches, search results in user keywords, search results in general catalogue, and applicable filters to refine searching

Fig 2.2 Search results showing similar searches, search results in user keywords, search results in general catalogue, and applicable filters to refine searching

Fig 2.3 Object view showing user keyword pane, related searches, related subjects and similar object pivot points (http://www.powerhousemuseum.com/collection/database/?irn=246808)

Fig 2.3 Object view showing user keyword pane, related searches, related subjects and similar object pivot points (http://www.powerhousemuseum.com/collection/database/?irn=246808)

User Tracking And Intentional Data

User tracking is a key component of the features to enhance discovery in the collection. User tracking currently provides the basis of 'related search' recommendations and is being examined for application within an enhanced future search ranking algorithm.

When a visitor searches the database, intentional data is recorded – the search term entered and the object selected by the user are recorded in a search table along with a date stamp. IP address, session cookies or other user data are currently not recorded. Only searches that result in a view of an object record are recorded. Those that return no results, or that do not result in the viewing of an object record, are currently not recorded.

When an object is viewed, a counter for that object record is incremented, allowing for a continuous counter of view popularity. Whilst it is possible to add extra granularity to user tracking, it was decided to slowly increase logging capabilities to ensure future scalability.

Opensearch

In addition to search engine optimisation, the Museum's collection is now directly searchable using the Opensearch (http://www.opensearch.org) protocol. Opensearch allows Internet users to search the collection through their browsers or a search aggregator such as A9.com without visiting the Museum's site. Search results are available as an RSS feed, allowing users to 'subscribe' to be notified automatically of new objects in their area of interest, as indicated by their search term. Although traffic using this feature has been very low to date, it is expected that as new Opensearch aggregators emerge and other similar organisations take up Opensearch, it will grow in importance. Now, as it changes, the Museum's collection can be updated on Collections Australia as users can now use the OPAC2.0 Opensearch feed to return search results to their user bases, without needing to harvest records.

Impact On Web Site Traffic To Powerhouse Museum

OPAC2.0 went live on June 14, 2006, and since then overall visitation to the Museum's Web site has climbed from 228,246 visits in May to 571,432 in December. Of these total visitation figures, on-line collection visitation was 17,394 (7.85% of total visitation) for May, which grew to 355,180 (62.15% of total visitation) in December. The rapid visitor increase, the broadening of content accessible, and the widening intentions of site visitors has seen the average time spent on the site fall from 9:42 min in May to a still sizeable 5:00 minutes in December.

Changing Geography Of Search Traffic

Coinciding with the launch of OPAC2.0, the Museum undertook an internal exercise in search engine optimisation. As a result of this, the entire collection database can be spidered by search engines. Since optimisation and the exposure of the collection, Internet users searching for many common topics and consumer items come across Powerhouse Museum objects in their first page of Google search results. This has led to a considerable rise in traffic to the Museum's Web site originating from Google and other search engines, and a broadening of search terms used to arrive at the Web site.

From May to December 2006, total search traffic to Museum quadrupled from 45,873 to 193,655 visits. The geography of these searchers has also changed. In May, Google Australia (http://www.google.com.au) represented 51.21% of search traffic, followed by Google USA (http://www.google.com) at 23.78%. By the end of December this had changed. The origin of searches was now dominated by Google USA (44.27%); Google Australia had fallen to 26.92%, and non-English language Google searches had risen to direct a further 15.62% of traffic.

Interestingly, in December, traffic from Google Australia was still dominated by the search phrase “Powerhouse Museum”, making up 11.94 of traffic, but on Google USA the same phrase is at the head of a very long tail of different terms at just 1.22%.

Impact On Public Collection Enquiries

The increased interest in the Museum's collection has also translated into a significant rise in public enquiries collected via emails, up from 266 in the 6 month period January-June 2006 to 670 for July to December 2006. The nature of these enquiries has changed, with an increase in requests for valuations and identifications, as well as many enquiries providing information about objects to assist the Museum in improving its collection documentation. This has put increased pressure on scant curatorial and research resources. Proportional to the collection visits, however, enquiries per collection visit have actually decreased from 1 in 539.50 to 1 in 2310.29.

But the nature of enquiries has changed. Now, with considerably richer collection information available on the Web site for a greater range of objects, enquiries are often more detailed and require more research on the part of curatorial staff. These enquiries can be as specific as locating a highly specialised manufacturing code or serial number on the interior of a piece of machinery, or the markings on a medal or coin – all details of interest to collectors. Sometimes this information requires the examination of the object out of storage, or retrieval from un-digitised collection research files dating back many years. There are other instances, for example in areas where specialist Museum staff have retired or no longer work for the organisation, when the Museum is unable to offer assistance.

The number of requests for valuations has also increased, largely as a result of the popularity of private selling of antiques on eBay; the Museum has had to clearly state on the Enquiry Form on the Web site that valuations cannot be given.

Occasionally, however, the exposure of these records, and their increased searchability, especially through Google, has led to the Museum receiving additional contextual information about objects, or corrections to research. Sometimes this additional information comes from international experts, collectors and researchers, but at other times, as was the case with a convict love token, it comes through community members researching their family histories – and simply searching for a series of family names in Google. Whilst this kind of interest in the collection is extremely valuable to the Museum, the time required to validate and check additional material evidence can place strain on resources that are geared towards exhibition production and other areas of research.

Serendipity and OPAC2.0

Serendipity is the art of chancing upon things, of making unexpected discoveries. The design of an environment can be made so as to encourage or discourage serendipity. In many areas of research, serendipity is a key factor in advancing knowledge. When a visitor to a museum or library browses along shelves or showcases, serendipity plays a considerable role in the 'user experience' . Even in highly specialised exhibitions and in fine-grained Dewey Decimal System library bookshelf browsing, visitors inevitably come across objects or publications that are of interest to them but that they were unaware of prior to visiting. Whilst most on-line catalogue searches present users with opportunities for serendipity on their search results pages, very few do so at an object view level. With increasing percentages of visitors using third party search applications such as Google to find deep content (Nordbotten, 2000), it is increasingly important to offer visitors opportunity for serendipity at the object record level, to retain their attention and encourage them to explore the Web site.

The Powerhouse Museum's OPAC2.0 enhances serendipity in three key ways:

Object And Subject Taxonomies

Object and subject taxonomies developed over the past 20 years for the Museum's collection management system, are utilised to recommend related objects to users on the basis of formal museum classification systems. This allows users to be confident that an object classified as a chair is also related to another object classified as a chair and belonging to the same higher level category of 'furniture' . More than 90% of objects searchable through OPAC2.0 are classified in this way.

Augmented Serendipity – User Keywords

Tagging, or the ability to add user created keywords to data, has been popularised over the past few years by Web sites such as Flickr (http://www.flickr.com) with images and Delicious (http://del.icio.us) with hyperlinks. Collected together, user tags form folksonomies, dynamic community-created classification systems. Seeing the potential in this, trials have begun using tagging in museum contexts, most notably in the steve project (Trant, 2006) and with the Smithsonian Photography collection. Each Web site that uses tagging does so in different ways and for different purposes. Marlow, Naaman, Boyd & Davis (2006) propose a useful framework for understanding the different implementations of tagging and their use.

In the case of OPAC2.0 the use of user keywords to tag collection objects has been conceived of as a means to achieve better resource discovery. In this sense, 'augmented serendipity' is provided by user added keywords (or 'tags' ) which allow the recommendation of related objects by their classification by other users of the Web site.

This type of serendipity is described as 'augmented' because it is explicitly provided by other users of the site and has the effect of a publicly editable and dynamic thesaurus. These user keywords most often are generally descriptive, allowing users to discover objects that are difficult to discover through the Museum's formal classification systems.

For example, a search for 'model train' would usually neglect to find objects formally classified as 'model locomotive' . However, as users have tagged several of these model locomotives as 'model train,' they are now discoverable using that search term. This sort of tagging can be described as community synonym generation. Other objects such as some of the Museum's numismatics and weapons collections have been tagged using the specialist languages of coin and gun collectors, allowing them to be discovered more easily by fellow collectors. Furthermore, these tags can be spidered by search engines as pointers to object records, allowing for their discovery and use outside of the OPAC2.0 site itself.

A particularly memorable user keyword for a Sylvester the Cat soft toy is 'puddy tat' (http://www.powerhousemuseum.com/collection/database/?irn=11831). A museum's official collection system would be highly unlikely to employ such a term; however, an Internet searcher can easily make the semantic connection between this object and its user keyword.

In total, 3,928 tags were submitted between June 14 and December 31, 2006. Of these, 537 were deleted, edited for spelling, or removed by other users and the system administrator. In the time period under study, 2,246 objects were tagged with 3,391 tags (avg=1.51, sd=1.01, n=3391).

OPAC2.0 offers only a basic instruction to users wishing to add keywords to objects: “Tagging helps others locate this material more easily. Please check your spelling. Use comma to separate multiple tags”. Tags are immediately visible after being added, and any user can remove tags, including those submitted by other users. Tags appear on the site as hyperlinks and can be clicked to trigger a search for that user keyword.

None of the most tagged objects are on public display within the Museum.

The most tagged object, a badge commemorating Lambton Bowling Club's 75th Anniversary (registration number 89/591), has been tagged 10 times with a mix of aesthetic (kangaroo, marsupial, parrot, rosella), material (enamel), descriptive (membership badges, bowling club, anniversary, badge) and social (Australiana) terms. The formal object record for this badge has two images but no statement of significance, no production or history notes, and only a very basic one-line description and dimensions. In this case, the user keywords offer alternative means to discover the object given the lack of other object data.

The two objects tagged with 9 user keywords each are equally sparse in terms of their formal collection object records. A mantel clock dating from 1881 (registration number H8942) has a collection object record copied from a paper stockbook entry dating from before the 1960s, a singular black and white image, dimensions and the most bare of descriptive statements. It has been tagged with keywords suggesting that the taggers hold additional or updated aesthetic and material information about the object (Belgium, black 1882, black, black face, gold face, lions head, rams head) and only two descriptive tags (roman numerals, chime clock).

The Flutina, forerunner of the accordion (registration number H3788), has a sparse collection record but 4 images, descriptive text, and dimensions. This object has useful descriptive tags (squeezebox, accordion, instrument, decorative), social tags (music, play) and material tags (wood, mother of pearl) provided by users.

Frictionless Serendipity – User Tracking

'Frictionless serendipity' is provided by user tracking; the aggregate behaviour of visitors to the site is used to make further recommendations based on actual behaviour. This is described as 'frictionless' because the visitor/user need do nothing other than navigate the site to lay trails of 'intentional data' for recommendations to occur. Aggregating this intentional data allows for search term recommendations, building dynamic relationships between terms, that reveal both aggregated synonyms and free associations. Thus a searcher for 'minton' currently gets suggestions for other searches of 'mintons' , 'bone china' , 'british' , 'porcelain' and 'peacock' , based on the terms other searchers of the term 'minton' have used and the objects they have viewed.

Frictionless serendipity also allows the site to distinguish between different meanings of homonyms based on user behaviour. Thus a searcher of the term 'cricket' is offered other words associated with popular Australian sports such as 'football' and 'rugby' , rather than terms associated with 'cricket' as a term for a grasshopper. In simple terms, this is possible because at this time, the majority of users who have searched for 'cricket' have chosen to look at sport-related objects which have also been looked at by other users searching for 'football' and 'rugby' . These relationships are dynamic and change over time. If OPAC2.0 were to be implemented in a natural history museum with a collection of grasshoppers, then the associated homonyms would change.

User tracking is currently processing 26,078 unique terms and relationships derived from 2,811,540 successful onsite searches since June 2006.

It should be reiterated here that OPAC2.0 only counts successful searches. This is defined as a search that results in the viewing of an object record. Searches that display no results or show a result listing but are not followed by the selection and viewing of an object record are not counted. Google searches are not included either. Term/object relationships are based on an aggregate figure of user selections by search term. As search results are displayed in an order generated by Microsoft Search which is used for full-text indexing, these term/object relationships are biased by the Microsoft search algorithm.

Top Onsite Searches

Other than external entry to an OPAC object, users can search the database directly by:

Of the 2,811,540 successful on site searches since June 2006, there were :

User keyword and subject searching require less user effort than a free text search, and this may explain their popularity as navigation devices. All three options are available at an object level.

Narratives Of Search

Anyone who watched the outcry over the release by AOL of semi-anonymous data about their subscribers' searches in 2006 knows the value of search data (Hoover, 2006). Likewise, Google, Yahoo and Microsoft all use information gathered from searches performed through their Web sites to generate revenue through targeted advertising.

With user tracking data it is possible to examine the different paths taken by users for particular search terms and type of search. Whilst Doolan, Peacock and Ellis (2004) are correct in their assertion that “keyword search is often barely adequate as a way of finding information”, when combined with an interface that allows for multiple pivot or exit points as well as serendipitous discovery of closely and partially related content, the user experience can move from one of dead-ends into one of many paths.

Whilst individual Web users create their own, often highly personal 'search narratives,' when these are aggregated it is possible to observe usage trends and patterns of behaviour.

Free text
search terms
Frequency of
object view
red 10851
fashion 8791
suit 6488
aboriginal 6105
ring 6099

Fig 3.1 Table showing top 5 free text search terms.

Since the search term 'fashion' is both a popular free text search and a subject keyword: it is possible to observe different use patterns and object choices for the same term. As a free text search term the most frequently viewed objects for the search term 'fashion' are shown in Fig 3.2.

Placement in results Registration number Short Description
25 2002/96/1 Women's outfit designed by sass & bide
32 2005/234/1 Women's outfit designed by Akira Isogawa
35 2002/95/1 'Speedy' travelling bag designed by Marc Jacobs and Stephen Sprouse for Louis Vuitton
19 2001/19/1 Women's outfit designed by Michelle Jank
Not in first page of results (top 50) 2001/105/1 Unisex shoes designed by Royal Elastics

Fig 3.2 Most popular objects for free text search for 'fashion'

As expected, results are dominated by objects by designers well known both locally, within Australia, and internationally. Generally one would expect that search engine users will select results that appear towards the top of a result listing, and that if presented with a choice between an item with a thumbnail image preview over one without, the thumbnail will be chosen.

However this is not the case, suggesting users are scanning results or filtering them using other means. The highest placed object appears in 8th position on a search, and one does not appear on the first page of results at all.

Other Emerging Search Patterns

Fig 3.3 lists the most popular objects by view. The top 5 are dominated by those for whom external search engine traffic is the greatest source, and suggests a broad general audience arriving on the site with very diverse intentions. That three of the top five are evening dresses, none of which are on public display, might indicate their high ranking in Google for general Web users looking to purchase such a garment on-line. The locomotive and Catalina flying boat are iconic Powerhouse Museum objects and on permanent display.

Views Object
11142 2005/1/1 Evening dress, beaded pink chiffon trimmed with charms
6248 94/129/1 Evening dress, womens, `Chocolate box' , plastic / fabric
5036 88/4 Steam locomotive, No. 3830, iron/steel/brass, New South Wales
4775 95/23/1 Dress, evening, silk / polyester, designed by Jenny Bannister
4708 B1495 Aircraft, flying boat, Catalina, PB2B-2, "Frigate Bird II", VH-ASA

Fig 3.3 Most viewed objects on OPAC2.0

Fig 3.4 lists the most popular objects viewed by using the onsite OPAC2.0 search box. This second listing suggests a more museum-centric user, and the popular objects reflect the Museum's public focus on design and technology – decorative arts represented by a glass bowl and Florence Broadhurst designed wallpaper, and technology by the Catalina flying boat and the Sony AIBO robot dog.

Views Object
1238 2004/36/1 Bowls & compotes, (15), 'Carnival' , press-moulded
1023 2000/12/1 Robot, dog, Aibo, Entertainment Robot ERS-110 / M
1019 B1495 Aircraft, flying boat, Catalina, PB2B-2, "Frigate Bird'
881 2001/127/1 Messenger bag, canvas/ nylon/ plastic,
812 2002/77/1 Wallpaper roll, 'Peacock' design, Broadhurst

Fig 3.4 Most popular objects by free text search

Search Narratives Case Study : Lisa Ho Dress Worn By Delta Goodrem

Fig. 4.1 Dress designed by Lisa Ho, worn by Delta Goodrem. Collection: Powerhouse Museum, Sydney. Photographer: Jean-François Lanzarone, Powerhouse Museum.

Fig. 4.1 Dress designed by Lisa Ho, worn by Delta Goodrem. Collection: Powerhouse Museum, Sydney. Photographer: Jean-François Lanzarone, Powerhouse Museum.

The most viewed object, a dress designed by Lisa Ho and worn by Australian pop star and celebrity Delta Goodrem (registration number 2005/1/1), is almost twice as popular (11,142 views) as the next most viewed object, an evening dress designed by Melbourne designer Jenny Bannister (6,248 views). Of these 11,142 views, only 1,124 (10.09%) have been the result of searches within OPAC, with the remainder of traffic coming from external Web links and Google searches.

The Delta Goodrem dress was collected by the Museum as an example of contemporary fashion and in particular the influence of 'celebrity' on style, and thus it is entirely appropriate that this object be discovered by users searching in Google or linked from Goodrem fan sites. It also has a relatively complete object record with three available zoomable images (including one of Delta Goodrem wearing the dress), a full statement of significance, object description, production notes and history notes (totalling 865 words) written by curator Glynis Jones. The object has never been on public display. However, this is not the whole story.

This object has had many user keywords attached to it - the names of other celebrities (Nelly Furtado, Nicole Kidman, and 'Fiona Biggest Loser' which has been subsequently deleted by another user) who we might assume may have worn a similar or the same garment. Amongst the keywords is 'tennis' which is possibly associated with this object as a result of Goodrem's former relationship with Australian tennis player Mark Philippoussis. 'Delta Goodrem' also has been added as a user keyword. It also has four subject keywords attached to it - 'Australian Music Awards' , 'Australian music' , 'Australian Television Personalities' and 'The Powerhouse Museum Sydney Morning Herald Fashion of the Year' .

term frequency
tennis 227
elegant 140
dress 128
delta goodrem 102
pink 77
collette dinnigan 54
lisa ho 44
nicole kidman 26
evening dresses 25
nelly furtado 22

Fig 4.2 Most frequently search terms for object 2005/1/1
(Lisa Ho dress worn by Delta Goodrem)

Four of the top ten searches for this object are user keywords, representing a total of 377 (33.54%) searches. The remainder are words or phrases that feature prominently in the object description and text.

In the case of the Delta Goodrem dress, user keywords operate as an important way to discover the object once users are already on the site. However, the majority of traffic to this object is from Google searching and external links. Whilst it is not currently possible to break down these external referrals in specific detail, the search terms 'dress' and 'dresses' represent over 40,000 external search referrals to the Museum's Web site; 'delta' and 'goodrem' are not represented in the top 200 referrals from June 2006 to December 2006.

The tagging of this object is an example of a community of users, in this case, likely Delta Goodrem fans, using the tagging facility as a means of adding associated information to the object record, and in so doing making it more discoverable. Within the Delta Goodrem community it would appear from looking at fan sites and forums that information on this particular dress is highly sought after – and that the Powerhouse Museum is the known holder of this information.

Search Narratives Case Study : 1923 Carnival Glass Bowl

Fig 5.1 1923 ‘Carnival’ glass bowl, 1923. Collection: Powerhouse Museum, Sydney. Photography: Powerhouse Museum

Fig 5.1 1923 'Carnival' glass bowl, 1923. Collection: Powerhouse Museum, Sydney. Photography: Powerhouse Museum

Examining the object most searched for using free text search, a 1923 Carnival glass bowl (2004/36/1 Bowls & compotes, (15), 'Carnival' ), a different user profile emerges.

The bowl has a long collection record of 1,423 words with detailed statement of significance, history notes, production notes and description. It has been tagged with three user keywords (candle holders, carnival glass, piping) and is associated with three subject areas (Australian flora and fauna in applied art, Glass manufacture, Australian commercial glass).

This bowl has been searched for 1733 (85.24%) times out of 2033 views, indicating a much greater proportion of site-specific traffic and a comparatively low interest level externally through Google or external site linkages. Of these searches, 1238 (71.43%) were free text, the remainder from user keywords and subject keywords.

term frequency
glassware 589
bowls 167
carnival glass 123
interesting 121
glass 86
bowl 70
emu 40
carnival 34
Glass manufacture 32
Australian commercial glass 22

Table 5.2 Most frequently search terms for object 2004/36/1 (1923 Carnival glass bowl)

This object is typical of an object whose interest is currently confined to a niche or specialist audience, and, it seems, one accustomed to visiting a museum Web site and collection database with the intention of searching.

The top search terms are heavily weighted towards one term: glassware, with other additional thematic and subject terms featuring prominently (Carnival glass, Glass manufacture, Australian commercial glass). These are normal terms for subject specialists, researchers, and those who traditionally would be expected to use a collection search facility on a museum Web site. Strangely the search term 'interesting' also scores highly, perhaps indicating that other audiences are also discovering this object in their own ways.

Conclusion And Further Research

The Powerhouse Museum's OPAC2.0 project demonstrates the importance of museums making their collections accessible through major search engines. It also demonstrates the possibilities for audience research and improved navigation of collections through the detailed recording and analysis of user traffic and behaviour data beyond traditional logfile data. User tagging and folksonomies can be used to improve navigation and discoverability but work most effectively when matched with detailed collection records and balanced with the structural benefits of formal taxonomies. When combined with these features, search tracking can provide a means to improve serendipitous discovery and enhance the ability of users to find related objects and explore deeper into a collection.

However this raises many new questions.

Significant further research is required into the specifics of how users search museum collections and the impact of information architecture and visual design on their behaviour. Comparisons with projects such as Steve.museum and their open data sharing models offer much in this regard. This research needs to be matched against other research examining the influence of Google and other large search engines on museum collections.

The Powerhouse Museum hopes to continue to work and share data with other organisations and researchers to assist in the development of shared data about how museum users search and discover in these vast collections, and how, in some instances new audiences and traditional museum audiences might use these collections to create meaning.

Acknowledgements

OPAC2.0 and its ongoing development and research would not have been possible without Giv Parvaneh who contributed considerable programming and conceptual expertise to the project, and Luke Dearnley who has helped develop the analysis tools on the site.

References

Peacock, D., D. Ellis, & J. Doolan (2004). “Searching For Meaning, Not Just Records”. in D. Bearman & J. Trant (Eds.), Museums and the Web 2004: Proceedings. Washington DC / Arlington VA, USA: Archives & Museum Informatics. Also available at http://www.archimuse.com/mw2004/papers/peacock/peacock.html

Hoover, J. (2006). “AOL Search-Term Data Was Anonymous, But Not Innocuous.” Information Week, 14 August 2006. Available http://www.informationweek.com/software/showArticle.jhtml?
articleID=191901983&subSection=Development
. Accessed 28 January 2007.

Marlow, C., M. Naaman, D. Boyd, & M. Davis (2006). “HT06: Tagging Paper, Taxonomy, Flickr, Academic Article, ToRead”. Proceedings of Hypertext 2006. New York: ACM Press. Also available at http://www.danah.org/papers/Hypertext2006.pdf

Nordbotten, J. (2000). “Entering Through the Side Door - a Usage Analysis of a Web Presentation.” In D. Bearman & J. Trant (Eds.) Museums and the Web 2000 Proceedings. CD ROM. Archives & Museum Informatics, 2000. Also available at http://www.archimuse.com/mw2000/papers/nordbotten/nordbotten.html

Sumption, K. (2003). “Powerhouse Museum Web site Evaluation.” Digital Cultural Content Forum, Pistoia, Italy. Available on-line at http://www.culturalcontentforum.org/meetings/italy-2003/agenda.html. Accessed 5 Dec 2006.

Trant, J. (2006). “Social Classification and Folksonomy in Art Museums: early data from the steve.museum tagger prototype”. ASIST-CR Social Classification Workshop, Austin, TX, USA. Nov 4, 2006. Also available http://www.archimuse.com/papers/asist-CR-steve-0611.pdf

Trant, J. (2006). “Exploring the potential for social tagging and folksonomy in art museums: Proof of concept.” New Review of Hypermedia and Multimedia, Vol 12, No 1, June 2006. Also available: http://www.archimuse.com/papers/steve-nrhm-0605preprint.pdf

Vander Wal, T. (2006.) Folksonomy to Improve Information Architecture. Presentation to OZ IA, 30 Sydney, Australia. Sep 2006. Podcast available at http://podworkx.com/ozia2006/2006/10/18/ThomasVanderWalFolksonomyToImproveIA.aspx

Cite as:

Chan, S., Tagging and Searching – Serendipity and museum collection databases, in J. Trant and D. Bearman (eds.). Museums and the Web 2007: Proceedings, Toronto: Archives & Museum Informatics, published March 1, 2007 Consulted http://www.archimuse.com/mw2007/papers/chan/chan.html

Editorial Note