Archives & Museum Informatics: Museums and the Web 2007: Mini-Workshop: Searching and Annotating Virtual Heritage Collections with Semantic-Web Techniques

Searching and Annotating Virtual Heritage Collections with Semantic-Web Techniques

Alia Amin, Centrum voor Wiskunde en Informatica, The Netherlands
Victor de Boer, Universiteit van Amsterdam, The Netherlands
Lynda Hardman, CWI, The Netherlands
Guus Schreiber, Free University Amsterdam, The Netherlands
Mark van Assem, Vrije Universiteit Amsterdam, The Netherlands
Michiel Hildebrand, CWI, The Netherlands
Marco de Niet, Digital Heritage Netherlands, The Netherlands
Borys Omelayenko, Vrije Universiteit Amsterdam, The Netherlands
Jacco van Ossenbruggen, Centrum voor Wiskunde en Informatica (CWI), The Netherlands
Jos Taekema, Digital Heritage Netherlands, The Netherlands
Bob Wielinga, Universiteit van Amsterdam, The Netherlands
Anna Tordai, Vrie Universiteit, Amsterdam, The Netherlands
Jan Wielemaker, Universiteit van Amsterdam, The Netherlands
Marie-France van Orsouw, Dutch Institute for Cultural Heritage ICN, The Netherlands
Annemiek Teesing, Dutch Institute for Cultural Heritage ICN, The Netherlands

The main objective of the this work, which is performed in the context of the MultimediaN E-Culture project, is to demonstrate how novel semantic-web and presentation technologies can be deployed to provide better indexing and search support within large virtual collections of cultural-heritage resources. The architecture is fully based on open web standards, in particular XML, SVG, RDF/OWL and SPARQL. This paper gives some details about the internals of the demonstrator. The online version of the demonstrator can be found at http://e-culture.multimedian.nl/demo/search. Readers are encouraged to first take a look at the demonstrator before reading on. We suggest you consult the tutorial (linked from the online demo page) which provides a sample walk-through of the search

[NOTE for running the demonstrator: Make sure your browser has adequate SVG support, see the demonstrator FAQ for details. Firefox 2 is expected to make the plug-in installations unnecessary (you can try the beta-release). As a project we are committed to web standards (such as SVG) and are not willing to digress to (and spend time on) special-purpose solutions.]

The technical baseline of the demo is formed by SWI-Prolog and its (Semantic) Web libraries. From the user perspective, the architecture provides (i) annotation facilities for web resources representing images, and (ii) search and presentation/visualization facilities for finding images. Currently, the demonstrator hosts four thesauri, namely the three Getty vocabularies, i.e., the Art & Architecture Thesaurus (AAT), the Union List of Artists Names (ULAN) and the Thesaurus of Geographical Names (TGN), as well as the lexical resource WordNet, version 2.0. The Getty thesauri were converted from their original XML format into an RDF/OWL representation. The RDF/OWL conversion of WordNet is documented in a publication of the W3C Semantic Web Best Practices and Deployment Working Group The architecture is independent of the particular thesauri being used. We are currently in the process of adding the Dutch version of AAT, amongst others to support a multi-lingual interface. Integration of other (multi-lingual) thesauri is planned.

Using multiple vocabularies is a baseline principle of our approach. It also raises the issue of alignment between the vocabularies. Basically, semantic interoperability will increase when semantic links between vocabularies are added. Within the Getty vocabularies one set of links is systematically maintained: places in ULAN (e.g., place of birth of an artist) refer to terms in TGN. Within the project we are Adding additional sets of links. One example is links between art styles in AAT (e.g. "Impressionism") and artists in ULAN (e.g., "Monet"). The project has worked on deriving these semi-automatically from texts on art history.

For annotation and search purposes the tool provides the user with a description template derived from the VRA 3.0 Core Categories. The VRA template is defined as a specialization of the Dublin Core set of metadata elements, tailored to the needs of art images. The VRA Core Categories follow the "dumb-down" principle, i.e., a tool can interpret the VRA data elements as Dublin Core data elements.

In principle, every web resource with a URI can be included and annotated in the virtual collection of our demonstrator. As a test set of data we have included three web collections: (i) the Artchive collection (http://www.artchive.com/, 4,000 images of paintings), (ii) the ARIA collection (http://rijksmuseum.nl/aria/, images of some 750 master pieces), and (iii) the RMV collection (http://www.rmv.nl, 80,000 images of ethnographic objects). Parsing techniques were used to convert the original textual metadata into semantic metadata (i.e. entries in the vocabulary). We are incorporating more collections.

Mini-Workshop: Search & Annotation [Contributed Content]

Keywords: virtual collection, semantic web, seach strategies, demonstrator, web standards

Interactions: Description

Searching and Annotating Virtual Heritage Collections with Semantic-Web Techniques