Sessions
April 15-18, 2009
Indianapolis, Indiana, USA

Sessions: Abstract

hoard.it: Aggregating, displaying and mining object-data without consent   go to paper

Dan Zambonini, Box UK, United Kingdom
Mike Ellis, Eduserv, United Kingdom
http://hoard.it

A prototype system that allows the aggregation of data from museum and related Web sites, including object and event records, was rapidly developed. By screen-scraping the existing pages of 17 Web sites, tens of thousands of data records were collected without any technical agreement, investment or consent from the participating institutions. In this paper, we examine the reasons and benefits for aggregating this type of data, how our approach differs to other funded projects that have similar aspirations, and the relative strengths and weaknesses of each. An analysis of the data is presented, showing how the aggregate data set varies by assorted parameters, including location and date. Our work is related to the bigger picture of on-line data publishing, such as Semantic Web technologies, and some suggestions are presented as to how the grand vision of the Semantic Web may be achievable without the complexity.

Session: Technology Strategies [Technology]

Keywords: collections, aggregation, API, data, scraping, Semantic Web, top down