Michael K. Buckland, Fredric C. Gey, and Ray R. Larson, University of California, Berkeley, USA

http://ecai.org/imls2004

Abstract

Learning, if it is to be more than memorizing, requires an understanding of context. A networked environment greatly increases the range and variety of accessible resources. A series of studies concerned with making better use of existing descriptive metadata are summarized: Mapping between different topical vocabularies; the use and improvement of place name gazetteers; named time period directories for better chronological search and temporally-dynamic map displays; and structured mark-up for biographical texts. Embedding live queries within links and the use of intelligibly structured URLs provide substantial but inexpensive enhancements to search support. A series of modest improvements in standards and best practices will, individually and cumulatively, improve our collective ability to find materials of different kinds related to individual museum objects or for any other purpose.

Keywords: interoperability, gazetteers, metadata, search support, timelines, vocabulary mapping, mapping

Introduction

Most well edited resources have some kind of indexing or categorization, but the “vocabularies” of categories, codes, and terms used vary widely and can be quite complex. In practice, efficient effective searching and selection requires some familiarity with whatever vocabulary is being used.

The Web increases the number and variety of resources that are accessible, but which are more or less difficult to search effectively because their vocabularies are unfamiliar. Since the manner in which remote resources are provided is beyond one’s control, the practical challenge is to find ways to support effective remote use.

We have been engaged in a series of studies of the issues to be addressed, specifically, mapping from familiar to unfamiliar vocabularies (Buckland, Chen, Chen & others, 1999), the inherent problems of searching across and between different media types (e.g. text, images, sound, and numeric data series) (Buckland, Gey & Larson, 2002; Buckland, Chen, Gey & others, 2007), improved geographic search (Buckland, Gey & Larson 2004; Buckland, Chen, Gey & Larson, 2006), time period directories (Petras, Larson & Buckland, 2006), search by time, place, topic, and person (Buckland & Lancaster, 2004; 2006), and, currently, biographical text (Bringing, 2006).

Underlying this work is the importance of context. The difference between mere memorizing and understanding is that to understand anything you need to know about the context: the who, what, where, when, why, and how of whatever it is. So, for the learner, whether a student, teacher, or curator, the ability to search for related contextualizing resources is important.

Museum Objects and the Search for Context

Ordinarily, search support involves accepting a textual query and matching it against a collection of text resources, either textual documents or textual metadata representing other kinds of objects. In information retrieval theory it is accepted that a document can be used as a query. Any search system that can accept a textual query will not know or care whether the query is a newly-formed query or a fragment (or even the entirety) of a pre-existing text document. A function that is sometimes explicitly provided is a “nearest neighbor” search: What other document(s) most closely resemble this one?

Another important principle in documentation theory is that any object considered to be signifying something can be considered a “document”, and from about 1930, European documentalists considered museum objects to be documents just as much as books and images (Buckland, 1997). In her manifesto of 1951, now finally available in English, Suzanne Briet (1951, 7; 2006, 10) explained that if a stone in a mineralogical museum or a captive antelope in a zoo is examined as evidence, it has become as much a “document” as a book or a journal article. Birger Hjørland (1997, 111) added that a stone, like any other document, would mean different things to different people, according to their interests and perspectives, and Geoffrey Bowker puts it nicely: Stones lead a double life. On the one hand they just do what rocks do, ordinarily just sitting on or in the ground, but, when examined, they reveal the history of the earth (Bowker, 2005, 36). In other words, for informational and educational purposes it is inappropriate to restrict the notion of a “document” to only printed or textual objects. In this view, a museum object is, functionally, as a much a “document” as a text, an image, or a digital file. There may be technical difficulties in matching, say, biological specimens or material culture with related items in dissimilar forms, such as books, photographs, numeric data series, and sound recordings, but that is exactly what is needed to discover and select contextualizing resources.

In a recent project, we took contextualizing as design goal, with two applications in mind. One challenge was to consider K-12 history and social science teaching and to ask what kind of search support would enable a teacher to find additional resources explaining the background on any topic, person, institution, or event mentioned in the assigned textbook. The other challenge was museum-related: Suppose one selected a museum at random and then, from its collections, picked an object at random, what kind of search support would facilitate discovering contextualizing resources in other institutions’ collections? What other objects like it can be identified elsewhere in archives, databases, libraries, or museums? What else is known to come from the same place and time or has the same purpose? What literature, archival records, images, sound recordings, or other documents relate to it specifically or, more generally, to the kind of object it is? Our assumption was that this kind of inquiry is an important part of what museum curators do, so the task was to investigate what courses of action might make the apparatus of metadata, search engines, and interoperability more supportive of their efforts.

Our project, entitled “Support for the Learner: What, Where, When and Who,” built directly on our earlier work and was supported in part by a grant from the Institute of Museum and Library Services for the period October 2004 through December 2006. It became apparent early in the project that the two design challenges were too ambitious for the time and resources available. Nevertheless, they remained inspirational while we worked on more elementary steps that, we thought, would help to build the bases for eventually addressing these challenges in a practical way. The rest of this paper summarizes what we learned and what we recommend in terms of new techniques and improved “best practices” that could move us all closer to achieving success with these two design challenges.

Recommendations for Innovation and Best Practices

WHAT – Mapping between vocabularies.

It is widely understood and accepted that the inherent ambiguity and instability of language needs to be “controlled” in indexes, so, for example, synonyms should be explicitly related and hierarchical and other relationships indicated. However, in a Web environment multiple resources use quite different metadata vocabularies: mapping between related terms in different vocabularies becomes very important, but has received relatively little attention. Manually relating the terms in two or more vocabularies becomes extremely labor-intensive as well as inherently obsolescent as each vocabulary evolves. There are, however, techniques using statistical association and natural language processing which can generate inexpensive mappings if a corpus of records is available as a training set (Buckland, Chen, Chen & others, 1999; Buckland, Gey & Larson, 2002; Buckland, Chen, Gey & others, 2006).

WHERE – Place, space, and gazetteers

Place names are notoriously ambiguous, multiple, unstable, and/or vague, but improved “best practices” could substantially improve support for geographical searching. It is useful to distinguish place, a cultural concept, from space, a physical construct. Place name gazetteers are best known from their appearance as large pages of small type at the back of atlases, where they also serve as indexes to the maps. But gazetteers exist in their own right as, in effect, bilingual dictionaries linking places and spaces. The importance of gazetteers is that they allow named places to be located (or represented) on a map (Hill, 2006).

Museums and others who deal with historical material have the additional problem that the common practice of using the names and boundaries of today’s political jurisdictions may make little sense when dealing with past times. Support for geographic search could be improved if a few additional steps were taken:

Authority lists of place names, as found in libraries, should either have geographical coordinates added (latitude and longitude), effectively making them into gazetteers, or, better, be linked to authoritative gazetteers maintained elsewhere.
Gazetteer entries should (but rarely do) include an indication of when that name was in use;
Now that catalogs are no longer made and presented in card form, but by computer interface, there is no reason not to provide map interfaces. The geographical distribution of retrieved records can be shown as an aid to a more refined selection. Also, drawing a region of interest on a map interface is a convenient way of expressing the geographical scope of a search: find all ceramic items in the collection from this area.
Understanding and explaining change over time is important for museums. Maps on paper have limited ability to show chronological changes. But digital maps can show changes over time dynamically (Zerneke, Buckland & Carl, 2006).
Gazetteer entries use geographical description codes (aka Feature types) to indicate the kind of place named: Castle, lighthouse, lake, city, etc. These feature type codes can be linked with corresponding subject headings. Comparison of the National Geo-intelligence Agency’s Geographical Description Codes (GDC) with Library of Congress Subject Headings (LCSH) reveals differences in style, emphasis, scope, and scale. Nevertheless, in most cases there are sensible matches. For example, the NGA code “School” means a school building, and corresponds to LCSH “School buildings” (for the physical feature) and to LCSH “Schools” (for schools as institutions). Mapping between these two vocabularies allows one to move from the literature on, say, the topic of lighthouses to locating (through a gazetteer and a map) instances of lighthouses on the ground. Moving in the other direction, if you find an actual lighthouse, you could search for literature about that particular lighthouse, lighthouses in that region, or lighthouses generally.

WHEN – Use of events to denote time

Clocks and calendars provide the obvious way to measure and record time, but in both speech and writing, events, rather than calendar dates, are widely used to denote points or periods of time. The events are commonly wars (“Civil war,” “World War II”), reigns, dynasties, and administrations (“a Louis XIV clock,” “under Clinton”), cataclysmic events (“after the Lisbon earthquake”) or personal (“after graduation”). This use of events to denote time tends to be situational, multiple, ambiguous, and unstable. A “civil war weapon” would date from the seventeenth century in England or the twentieth century in Spain. “The Great War” was renamed the “First World War” and, in some quarters, the Vietnam War is coyly referred to as the Vietnamese Conflict. These characteristics resemble those of place names, so we developed an analogous solution: a named time period directory modeled on the design of place name gazetteers (Petras, Larson & Buckland, 2006). For each named period, a code for type of period (war, dynasty, cataclysm, etc.), corresponding calendar dates, and, just as a gazetteer should specify when a name was in use, a time period should indicate the geographical context, thus:

Place name gazetteer

Place name—Type-–Geographical markers (lat. & long)--When in use

Named period directory

Name of period—Type—Chronological markers (calendar)—Where used

The geographical markers of latitude and longitude supplied in the gazetteer not only allow individual places to be located on a map, but they also allow places named to be related to each other geospatially (near, between, outside, south of, etc.). Correspondingly, relating each named period to the chronological markers (calendar dates) allows events to be positioned on a time line or in a chronology, and thereby temporal relationships can be identified: What else happened during this period? The inclusion of both place and time aspects in both gazetteers and time period directories allows geo-temporal relationships. What else was near this place around that time? Descriptive metadata systems can be seen as a form of infrastructure, and these kinds of links facilitate the construction of metadata infrastructures (Buckland, 2006).

WHO – Names and activities

The desirability of disambiguating different persons with the same name and of connecting different names for the same person is well understood and widely implemented. In contrast, although biographical text is important in many contexts, the standards and best practices for representing what people do and the events in people’s lives are seriously inadequate. (For a useful survey, see Text Encoding Initiative Consortium (2006)).

In current work we are examining the feasibility of representing the individual’s activities by expressing life activities as a set of separate activities or events, in whatever level of detail is desired, and then encoding each activity as a 4W-tuple of what kind of activity, where it took place, when and for how long it occurred, and who else was involved. The hope is that this approach could be generally acceptable across different communities and that the vocabularies already established within each community could be used to express what (topical subject headings), where (place name gazetteers), when (named time period directories), and who (name authority files and biographical dictionaries). If it is successful, the basis for a great deal of interoperability within and between communities could emerge.

Search Support

So far we have been concerned with improvements in description. Searching can also be facilitated.

Mark-up commonly incorporates embedded links to related sources which provide further explanation or validation. At best, these links are obsolescent as newer publications appear. An alternative is to provide a dynamic link in the form of a search query. As an example, ECAI Iraq (http://ecai.org/iraq/), a temporal-spatial portal into existing digital resources about the history, cultural sites, archaeological excavations, and heritage preservation initiatives relating to Iraq, contains a series of Web pages for individual historic sites (Electronic Cultural Atlas Initiative, 2003). Clicking on the first three links on each site page automatically generates search queries, using the Z39:50 search and retrieve protocol, for material concerning the site in the library catalogues of the University of California, the research libraries of the United Kingdom, and the Library of Congress. The merit of this approach is that the material retrieved will be as up-to-date as those libraries’ cataloguing rather than entries from a static, obsolescing bibliography.

The hierarchical structures of URLs are usually opaque and of little meaning, but clearly structured URLs can greatly simplify searching from remote locations. The Timeline of the Art History section of the Metropolitan Museum Web site (http://www.metmuseum.org/toah/) is an excellent example of helpful design. This section uses eleven defined time periods (from “20,000-8,000 BC” through “1900 AD – present”) and, for places, a hierarchical structure starting with nine major regions. Each time period and each geographical area has a simple, easily-discerned code, clearly visible in the /toah URLs. These codes can be mapped to the categories for time and place in any local system and inserted algorithmically into a link. If one were interested in the art of southern India during in the eighth century CE and knew that it would be categorized at the Met Web site by “06” for the time period 500-1,000 AD and “sss” for South South Asia, one can insert “sss” and “06” into the /toah URL extensions to form http://www.metmuseum.org/toah/ht/06/sss/ht06sss.htm, which, for this topic, is a good starting point for searching for the Met’s rich collections. Mapping local metadata to the /toah codes and inserting them in a link allows one to go directly from inside any local Web page to the appropriate page within the Met’s Web site.

The Wikipedia provides a similarly useful design for biographical searching. Creating a URL by adding “Firstname_surname” to the stem http://en.wikipedia.org/wiki/ to form, for example, http://en.wikipedia.org/wiki/John_smith will lead directly to biographical article(s) for any person(s) with that name, if any, in the Wikipedia.

Conclusion

Learning, if it is to be more than memorizing, requires an understanding of context. The Web greatly increases the range of accessible resources capable of providing context. However, Internet-accessible resources vary greatly in the metadata vocabularies used to describe and index their resources and in the search support provided. The use of standards should, of course, be encouraged, but mandating the use of the same systems or the same vocabularies is neither possible nor desirable. However, the wider adoption of improved best practices can greatly facilitate the incremental development of interoperability.

Searching by topic needs mapping between different topical vocabularies, and automated methods can often provide inexpensive but useful mappings. Place name gazetteers are pivotal for geographic search because they link places with spaces and enable both map visualizations and analysis of spatial relationships between places. A named time period directory can play an analogous role for events, dates, and chronological relationships. Searching for people by name is a well-understood problem, but relating the events in their lives to contextualizing resources needs more development. Searching can be supported by embedding live queries in mark-up and by using meaningful codes in structured URLs.

Picking a museum object at random and finding what materials of different kinds elsewhere are most closely related to it and to its context requires substantially better indexing and interoperability than is currently provided, but a series of improvements in standards and best practices will, individually and cumulatively, help us to advance in that direction.

Acknowledgments

Aitao Chen, Kimberly Carl, Ruth Mostern, Linda-Cathryn Muehlinghaus, Vivien Petras, Jeanette Zerneke, and others contributed to the work reported, which was funded in part by Institute of Museum and Library Services’ National Leadership Grant for Libraries, no. LG-02-04-0041-04, to the Electronic Cultural Atlas Initiative at the University of California, Berkeley.

References

Additional information on the topics discussed can be found on and through the “Support for the Learner: What, Where, When and Who” project website at http://ecai.org/imls2004.

Bowker, G. C. (2005). Memory practices in the sciences. Cambridge, MA: MIT Press.

Briet, S. (1951). Qu’est-ce que la documentation? Paris: EDIT, 1951. Also available at http://martinetl.free.fr/briet.pdf. Consulted January 30, 2007.

Briet, S. (2006). What is documentation? Lanham, MD: Scarecrow Press. Transl. of French edition, Paris, 1951. Also available at http://ella.slis.indiana.edu/~roday/what is documentation.pdf, consulted January 28, 2007.

Bringing lives to light: Biography in context. (2006). [Project website]. Consulted January 28, 2007. Available at http://ecai.org/imls2006/

Buckland, M. K. (1997). What is a “document”? Journal of the American Society for Information Science 48, no. 9 (Sept 1997): 804-809. Also available at http://www.sims.berkeley.edu/~buckland/whatdoc.html, consulted January 28, 2007.

Buckland, M. K. (2006). Description and search: Metadata as infrastructure. Brazilian Journal of Information Science vol 0 (2006). Consulted January 28, 2007. Available at http://www.ischool.berkeley.edu/~buckland/Brazil06.pdf

Buckland, M. K. (2007). Naming in the library: Marks, meaning and machines. In Nominalization, nomination and naming in texts, ed. by Christian Todenhagen & Wolfgang Thiele. Tübingen, Germany: Stauffenburg Verlag, forthcoming.

Buckland, M. K., A. Chen, H.-M. Chen, Y. Kim, B. Lam, R. R. Larson, B. Norgard & J. Purat. (1999). Mapping entry vocabulary to unfamiliar metadata vocabularies. D-Lib Magazine 5, no. 1 (Jan 1999). Consulted January 28, 2007. Available at http://www.dlib.org/dlib/january99/buckland/01buckland.html

Buckland, M. K., A. Chen, F. C. Gey & R. R. Larson. (2006). Search across different media: Numeric data sets and text files. Information Technology and Libraries 25, no 4 (Dec 2006): 181-189.

Buckland, M. K., A. Chen, F. C. Gey, R. R. Larson, R. Mostern & V. Petras. (2007). Geographic search: Catalogs, gazetteers, and maps. College & Research Libraries Forthcoming Sept 2007.

Buckland, M. K., F. C. Gey & R. R. Larson. (2002). Seamless searching of numeric and textual resources. Final report on Institute of Museum and Library Services National Library Leadership Grant No. 178. Berkeley: University of California, School of Information Management and Systems. Consulted January 28, 2007. Available at http://metadata.sims.berkeley.edu/papers/SeamlessSearchFinalReport.pdf

Buckland, M. K., F. C. Gey & R. R. Larson. (2004). Going places in the catalog: Improved geographic access: Final report. Consulted January 28, 2007. Available at http://ecai.org/imls2002/imls2002-final_report.pdf

Buckland, M. K. & L. R. Lancaster. (2004). Combining time, place, and topic: The Electronic Cultural Atlas Initiative. D-Lib Magazine 10, no. 5 (May 2004). Consulted January 28, 2007. Available at http://www.dlib.org/dlib/may04/buckland/05buckland.html

Buckland, M. K. & L. R. Lancaster. (2006). Advances in discovery: The Electronic Cultural Atlas Initiative experience. First Monday. 11, no. 8 (August 2006). Consulted January 28, 2007. Available at http://www.firstmonday.org/issues/issue11_8/buckland/index.html

Electronic Cultural Atlas Initiative. (2003). ECAI Iraq. Consulted January 28, 2007. Available at http://ecai.org/iraq/

Hill, L. L. (2006). Georeferencing: the geographic associations of information. Cambridge, MA: MIT Press.

Hjørland, B. (1997). Information seeking and subject representation. Westport, CT: Greenwood Press.

Petras, V., R. R. Larson & M. K. Buckland. (2006). Time period directories: A metadata infrastructure for placing events in temporal and geographic context. In Opening information horizons: Joint Conference on Digital Libraries (JCDL), Chapel Hill, NC, June 11-15, 2006. Consulted January 28, 2007. Available at http://metadata.sims.berkeley.edu/tpdJCDL06.pdf

Text Encoding Initiative Consortium. (2006). Report on XML mark-up of biographical and prosopographical data. Consulted January 28, 2007. Available at http://www.tei-c.org/Activities/PERS/persw02.xml

Zerneke, J. L., M. K. Buckland & K. Carl. (2006). Temporally dynamic maps: The Electronic Cultural Atlas Initiative experience. Human IT 8.3: 83–94. Consulted January 28, 2007. Available at http://www.hb.se/bhs/ith/3-8/jzmbkc.pdf

Cite as:

Buckland, M., et al., Access to Heritage Resources Using What, Where, When, and Who, in J. Trant and D. Bearman (eds.). Museums and the Web 2007: Proceedings, Toronto: Archives & Museum Informatics, published March 1, 2007 Consulted http://www.archimuse.com/mw2007/papers/buckland/buckland.html

Editorial Note