October 24-26, 2007
Toronto, Ontario, Canada

Paper: Constructing an Integrated Digital Archive Using Ontology and the User Community

Norio Togiya and Akira Baba, The University of Tokyo, Japan

Abstract

In conventional digital archive systems, different archives tend to be created if the types or series of materials are different. In this study, an integrated digital archive system has been developed to aggregate all types or series of materials into one archive. An archive is built and operated with a function to visualize a variety of relationships between the aggregated materials through the utilization of ontology as well as a community function that enables the users to feed back various information about the materials.

Keywords: ontology, digital archive, historical material, on-line community

Introduction

Digitizing cultural resources such as historical information primarily provides the following three benefits:

  1. temporal and physical barriers to viewing can be removed,
  2. more information can be collated, and
  3. multiple individuals can edit and modify information.

Various achievements have been gained with regard to removing physical and temporal barriers to viewing, while collation of information and collaborative editing remain important areas for focusing future efforts. By utilizing the characteristics and advantages of collating information and collaborative editing, it will also be possible to develop ways to resolve a variety of issues that historically arise when handling physical materials in the course of research.

Traditionally, physical materials tend to be stored separately from each other depending on their form or content. It was also difficult to edit or modify metadata in a multifaceted manner for paper or other catalogs when needed. Consequently, as shown by the social historical research by Ecole des Annales and others, many problems arise such as the inability to collate all materials in one place, the inability to record and maintain updates when analyzing a variety of materials from a macro perspective, or when handling materials such as artwork that may have a variety of interpretations.

However, the characteristics of collating information can enable the storage of a greater variety of the types of information from a macro perspective and to build relationships between these materials from a multifaceted perspective. The characteristics of collaborative editing enable the information to be edited by more people in a multifaceted way. The primary objective of this study is to build a digital archive based on these two principles. In order to realize this objective, the archive for this study was built based on the key pillars of versatile metadata capable of storing information in an integrated manner, ontology for categorizing diverse materials from a variety of relationships, and a community using which information can be freely edited.

In this paper, we explain the process of constructing an integrated digital archive that can respond to information given by users. In Section 1, the concept of digital archives is explained. In Section 2, a practical composition method of metadata is given. In Section 3, the process of constructing ontology is given. In Section 4, community functions in the digital archive are described. Finally, in Section 5, the summary and future tasks are discussed.

1. Concept of digital archives

1.1 Basic policy of digital archives

Various types of digital archives have been constructed so far. The following definitions of a digital archive have been used in this study:

  1. The digitization of cultural and historical materials to preserve cultural and historical heritage.
  2. The materials are ordered and arranged to trace the various contexts of the materials, for example, original order, human relationships, historical incidents, and other spatial and temporal conditions regarding the inception of the materials.

The basic elements of the archive are the preservation function and the manifesting context of the material. The digital content or database system called a “digital archive” in this study achieves both these features.

An issue with these archives is that they have been constructed separately according to the type of material and group of materials. However, digital archives also have the characteristic that various types of data can be handled as digital data in a cross-sector manner and it is desirable that documents can be synthetically handled across era, type, and genre.

Generally, materials stored in an archive are selected from a list or according to search strings; however, for historical materials, it is ideal that they be selected through an understanding of the relationship between various data items such as figures, social organizations, historical events, and eras related to the material. Moreover, for metadata associated with a particular material, the content needs to be corrected and added after disclosure. It is particularly desirable that metadata information be checked from various viewpoints not only by those who construct the archive but also by users. Users may add new facts and interpretations to image materials; consequently, they should be analyzed from many viewpoints and using various related materials. For this reason, it is desirable to provide a function to forward the opinions and remarks made regarding the materials by users back to archive administrators. It is considered that providing this function may improve the accuracy of the content.

For these reasons, this study 1) proposes the synthetic storage of various types of materials into one archive; 2) enables the selection of materials through an understanding of the relationship between figures, organizations, places, and things regarding the materials; and 3) provides tools with which users can provide or exchange information on a material and its metadata. The goal of this study is to construct an archive that satisfies the abovementioned features.

1.2 Target materials

The materials to be handled in this study are “historical picture materials” centering around photographs taken by Hikoma Ueno, a photographer who worked in Nagasaki from the end of the Edo Period to the Meiji Period; materials worked upon by Shogorou Tsuboi, an anthropologist and archeologist during the Meiji Period who was one of the discoverers of Yayoi ware; materials on the Tsuboi House, a house from the Taisho Period constructed by Seitaro Tsuboi, the son of Shogorou Tsuboi who was a famous petrologist; and Sekisui-zu, a map drawn by Sekisui Nagakubo–a geographer of the Edo Period. These materials have been integrated in separate archives in the past; however, in this study, they are integrated into one archive.

1.3. General structure of digital archives

The general structure and basic functions of an archive to achieve the goals described in item 1-1 were designed before constructing the archive. The basic functions are “Material list” (Fig 1), “Ontology map” (Fig 2 and Fig 3), “Map and chronology”, “Search”, and “Community” (Fig 4). Material list provides a function to view all the materials regardless of their groups and types. In addition, iPallet/Lime [http://www.ipallet.org]–a viewer to scale highly accurate images to a desired size (developed by iPallet)–is used to view the images. And user can add electronic tag or annotation on the material (Fig 5). “Ontology map” provides a function to select materials through an understanding of the relationship between various items. These items are composed of the people, organizations, time, place, objects, and abstraction related to the materials. The relationship between these items is visualized as a chart and the material can be selected. “Map and chronology” provides a function to portray the materials on a map and a chronological table that goes beyond types and differences. With these functions, different types of materials can be grouped according to new viewpoints such as geological conditions and time proximity. “Search” enables the display of information on related items as well as searching in all the four material groups in a cross-sector manner. “Community” enables comments on the entire material group, each material and item to be entered, and provides a discussion function regarding these materials. These are all of the basic functions provided by this archive. To achieve these, metadata design, relating various characteristics of the materials, and the design of community functions are described in the subsequent sections.

Figure 1
Figure 1. Main interface of “Material list”

Figure 2
Figure 2. Ontology map

Figure 3
Figure 3. Search window

Figure 4
Figure 4. Community function

Figure 5Figure 5. Annotation on the digitized material

2. Metadata

The most important element to integrate various types of materials is designing a versatile metadata element set. As mentioned above, four different groups of materials and various types of materials, for example, documents, maps, photographs, and books, were integrated into this archive. Therefore, we must design a versatile metadata element set to apply these various types of materials. In the previous project, the versatile metadata element set based on the Dublin Core (http://www.dublincore.org) was designed for each group of materials. [Togiya, 2005a] [Togiya, 2005b] [Togiya, 2004a] [Togiya, 2004b]

The basic construction of this element set was succeeded by a versatile metadata element set based on the Dublin Core to efficiently integrate various types of materials. The metadata consisted of three parts. Part A was employed in the Dublin Core to store the minimum information about the material, for example, title, creator, date, and so on (Table 1). In parts B and C, the elements involving physical objects and digital data information were designed. By doing this, a single metadata file could concurrently contain three divisions, namely, A, information about the contents of the resource (i.e., the Dublin Core); B, information about the physical object; and C, information about the digital data (Table 1). This versatile metadata element set enabled the showcasing of all the material in the material list window.

3. Ontology

3.1 Ontology definition

An important goal of this archive is to select and view each material through an understanding of the various materials and the relationship between the characteristics related to them. To achieve this, we decided to use ontology relating the various items in this project:

  1. defining concepts, characteristics, and meanings of various items ,
  2. aiming at systematizing various concepts and items in the world,
  3. aiming at generality, which makes materials reusable, and shared knowledge,
  4. describing items with rules and in a language based on certain regulations.

In Greek, ontology originally means “existence” and has been frequently used in philosophy. However, with the development of the study of artificial intelligence and knowledge engineering since the late 20th century, it began to be used as a term referring to a “semantic systematization method” for various items in the world in order to facilitate its understanding by machines. Meanwhile, when the Semantic Web [Berners-Lee, 2001] was proposed by Tim Berners-Lee as a more effective method to connect fragments of information on the Web through the spread of the Internet in the late 1990s, ontology was used as a term to refer to the methodology used to describe content created on the Web with more regulated rules and language.

Various types of ontologies have been developed through the course of time, such as “Upper ontology” aiming at the systematization of various items in the world based on philosophical discussions, “Domain ontology” developed to express the edifice of knowledge in a specific domain (mainly for industrial use), and “Web ontology” developed to systematize information and knowledge on the Web. [Mizoguchi, 2005]

“Upper ontology” and “Domain ontology” are provided to construct concepts 1), 3), and 4) aiming at 2) defined above. “Web ontology” initially had the practical purpose to develop item 4) in order to define the relationship between information items on creating contents using method 1) to achieve item 3), which can be seen in the combination of the Dublin Core and RDF technologies. The ontology of CIDOC CRM [http://cidoc.ics.forth.gr/]–used to describe the metadata of cultural resources–seems to belong to this type. Recent advances made in “Web ontology” aim at the semantic systematization of content on the Web, which was its initial objective. In this context, it may be said that it has advanced in the area of item 2) by replacing the world with the Web. However, how the system of the concepts on the Web and that of the things in the real world are unified as the former is a reflection on the latter remains to be seen.

This study systemizes item 2) for historical information with method 1) in anticipation of the union of these in the future and aims at item 3) by using item 4), which can be unified with Web ontology, etc. in the future. As for now, we are mainly focusing on implementing items 1) and 2).

3.2 Outline of ontology

As described above, various characteristics related to each material, such as figures, organizations, geographic names, objects, and abstraction, are described and these are individually related. There are very few precedents in which ontology is used for the systematization of historical items. However, the University of Maryland has attempted to express the relationship between figures and places in stories with ontology as a knowledge comprehension tool for a database collecting oral histories, etc. [White, 2006] In the Historical Event Markup and Linking (HEML) project [http://www.heml.org/], various historical events in the world are described in original XML. The former involves only historical events in certain areas, while the latter involves even extensive descriptions in the history of the world. We decided to systematize these extensive items, including abstract objects such as learning, thoughts, objects, actions, and social organizations in this project to target not only specific historical materials, particularly modern Japanese materials, but also any material and historical event.

General ontology do not include proper nouns and figures that existed, ending up with only a general concept level. However, we decided to construct a form of expandable ontology that connects figures, organizations, and geographical names that existed in history as a historical instance based on general concepts.

3.3 Outline of ontology

We decided to incorporate the upper ontology system of Guarino [Guarino,1997] constructed on philosophical discussions as a foundation to systematize various items. Upper ontology is the concept system systemized by separating the concepts of many items as described above into various elements such as “foundation (time and space)”, “concrete objects (objects and processes)”, “abstract objects”, “quality”, ”quantity”, “roles”, and “relationships”. These elements are composed of relational labels such as “is a” and “part of”. The digital archive to be constructed in this study is constructed as a system to define various concepts and items in the material world centered on concepts on materials such as “A: Space”, “B: Time”, “C: Concrete objects”, “D: Abstract objects”, “E: Attributes”, “F: Quantity”, “G: Roles”, “H: Events”, “I: Expression forms”, “J: Actions”, “K: Society”, and “L: Phenomena” based on the existing upper ontology. Relationships among these concepts were constructed using the relational labels shown in Table1. With these, the relationships among various characteristics such as time, space, figures, organizations, material, and abstract objects were constructed.

More concisely, residential areas, activity fields, years of birth and death, social organizations, and historical events of main figures regarding these four materials were related. In addition, their parents, brothers/sisters, colleagues, masters/pupils as well as the relationships between the classification systems for academic systems related to the materials, and figures related to the academic systems and objects to be studied were also related. Furthermore, the relationships between various matters regarding materials such as historical events, related eras, and relationships of succession between regions and organizations were described.

Meanwhile, data about these items related to the materials were stored in the keyword “Dc:subject” of each material. With this, the items and materials were related through metadata.

3.4 Using ontology in archives

The constructed ontology was stored in an XML database based on the iPallet/Kumu system and a chart visualized the relationships between the items, as shown in Fig2. If the materials related to each item were provided, they were displayed on the chart and clicking on the item displayed it. This enabled browsing of the materials through an understanding of the relationship between the various items and by relating the material on Shogoro Tsuboi and the historical pictures belonging to different groups through common figures.

When each material is displayed, a word on the item related to each material is displayed in the “Key Concept” field set for each metadata item, as shown in Fig3. Clicking this word displays the correlation chart to help understand the relationship with any other related item.

Displaying various items related to a search word in the correlation chart, as shown in Fig4, enables the access to various materials related to these as well as the entered words. This allows the selection of various materials through the understanding of the relationships between the items, which in turn enables the various items to be related to the materials.

4. Community

This archive provides the functionality to correct any errors and add information to each stored item and metadata through a membership-based community. An example approximating this function is the Drexel Digital Museum Project [Martin, 2004] in the United States. In this project, an archive with images (e.g., garments) enables the entering of comments about each material as well as provides a community function. Digital archives with this function are not common.

However, new information and facts to be corrected are often found. In addition, transcription errors may be found in literary materials. For this reason, a feedback function is provided in the community to improve the reliability of the information contained in the metadata.

Two functions are provided in the community: 1) a BBS “Chat Room” in which users can post messages and opinions and 2) an electronic note function allowing users to make comments on the material. The chat room includes topics across materials and the main three categories in which the comments on the materials and items are made. With the electronic note function, the selection of the region where the comments need to be made on each material opens a form where you can make and save comments. Other members can view and discuss these comments.

The provision of these functions will facilitate active discussion among the users: nearly 300 comments have been posted during testing. Errors in the metadata and ontology were pointed out and corrected because of the chat room.

5. Summary and future tasks

For the abovementioned digital archive, we conducted testing for 18 days from October 3–20, 2006. [http://cr-arch.chi.iii.u-tokyo.ac.jp/] Approximately 50 researchers and students who studied the stored materials participated in this test. This section provides the results of this test and future tasks. We successfully constructed the archive to achieve three goals: 1) systematically handling materials of various types and in different groups, 2) selecting materials understanding the relationships between items related to these materials, and 3) commenting on and discussing various pieces of information on the materials. This received uniform ratings from the users.

Meanwhile, we found various tasks through testing. One of these tasks in integrating various material groups is that all the users do not always want to view all the material groups. Consequently, it is also likely that a range of displays can be determined according to the user needs, centering on the function to view various materials in a cross-sector manner.

In addition, a more easily viewable interface should be constructed for users who want to view various materials. In this study, four material groups were integrated; however, if more groups are added in the future, the interface may become more complicated. This requires constructing a more easily viewable interface with which more material groups can be integrated.

Meanwhile, regarding ontology, cooperation with other items tended to be seen in those appearing frequently in the materials, rendering the interface difficult to view. As of now, it takes a long time to display the materials, and we often felt that the operation was heavy. As a result, the interface needs improvement.

The ontology constructed in this study was only at the stages of items 1) and 2) defined in item 3-1, but this should advance to items 3) and 4) in the future. With regard to metadata, a part of the Dublin Core should be related to the descriptions adapted for the Web ontology. In addition, data collaboration with other digital archives using the metadata adapted for the Web ontology should be investigated. Furthermore, ontology for the description of metadata and those on historical items should be merged. These subjects are shown in Fig6.

Figure 6
Figure 6. Tasks on future ontologies

For the community function, only some of the users tended to join the discussions. As these users generated an atmosphere that made other users feel uncomfortable about joining in on the discussions as they progressed, a congenial environment in which various types of people can easily join in should be provided. A moderator, which organizes the content of the discussions, is required to operate the chat room, but this is time-consuming. It is desirable that a suitable user becomes a moderator naturally in the course of the proceedings. It is important to construct a community in which users can divide their roles to achieve this.

With the community function, only the archive administrator responded to the feedback from the users in this test. However, a flow in which surveys requiring a certain period of time and more people are involved should be established in the future. Operationally speaking, establishing an operational flow for community and metadata updates seems to be a major task.

The above outlines this study. The three elements presented in this study may be an important task in constructing digital archives in the future in terms of how knowledge information is stored in the archive. This may require establishing better technology and methodology with further demonstrations in the future.

Metadata Elements
A: Information about contents of resource (Dublin Core)
Element name (English) Element description
title Title of the resource
creator Name of the creator of the resource
subject Theme of the resource
description Simple explanation of the resource contents
publisher Publisher and editor of the resource
contributor Contributor to the creation of the resource
date Year the resource was created
type Not used
format Not used
identifier Identification number of the resource
source Source of the resource
language Language(s) used on the resource
relation Relationship with other resources
coverage  
spatial Location handling the resource
temporal Time period handled on the resource
rights Rights of the resource

B: Information about physical objects

form Detailed information on the physical shape of the resource
comment Detailed explanation of the contents of the resource
rights Holder of the resource
identifier Identification number to manage the corresponding physical shape of the resource
C: Information about digital data
producer Person responsible for digitization of the resource
date Date of digitization
specification Information on digital data specifications
identifier Identification number of the digital data
comment Commentary information on the digital data
rights Rights and person in charge of the digital data and its disclosure.

Table 1. Description rule of the each metadata element

Relational Labels
Basic isA
  temporaryIsA
  partOf
  temporarypartOf
Physical relations consistOf
  connectedTo
  contain
  own
  interconnect
  branchOf
  ingredientOf
Spatial relations locatedIn
  locationOf
  adjacentTo
  surround
  traverse
Functional relations treat
  interactWith
  visitTo
  succeedTo
  researchIn
  majorIn
  liveIn
  resultsOf
  co-occursWith
Conceptual relations analyzes
  derivativeOf
  developmentalformOf
  methodOf
  issueIn
  relatedTo
Human relations rel:worksWith
  rel:spouseOf
  rel:siblingOf
  rel:parentOf
  rel:knowsOf
  rel:knowsInPassing
  rel:knowsByReputation
  rel:hasMet
  rel:isACompetitorOf

Table 2. Relational labels

References

Berners-Lee, T. (2001), The Semantic Web, Scientific American 284(5), 34–44

Guarino, N. (1997), Some Organizing Principles or a Unified Top-level Ontology, Working Notes of AAAI Spring Symposium on Ontological Engineering

Martin, K. (2004), The Role of Standards in Creating Community, Proceedings of the 13th International World Wide Web Conference, 35–41

Mizoguchi, R. (2005), Ontology Kougaku, Ohmu Publishing Co., Tokyo, 3–10

Togiya, N.,  Kuramochi, M., Baba, A. (2005a), Designing Digital Archive about Historical Photograph, Digital Libraries No.27&28, 40–48

Togiya, N., Baba, A. (2005b).Constructing Real Digital Archive of Architectural Material–Case Study of Seitaro Tsuboi’s Residence, Journal of the Japan Society for Archival Science No.4 , 50–70

Togiya, N., Tsuda, M., Baba, A. (2004a).Providing Metadata to Historical Material on Viewer Application iPalletnexus, Proceedings of the International Conference on Dublin Core and Metadata Applications,187–194

Togiya, N, Baba, A. (2004b). Practice and Verification of Providing Metadata to Modern Private Material, In Procedings of Computers and Humanities, 91–98

White, W, Song, H, Liu, J. (2006), Concept Maps to Support Oral History Search and Use Ryen, Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital libraries, 192–194

Acknowledgements

We would like to thank “21st Century COE Program: Next Generation Ubiquitous Information Society Infrastructure in the University of Tokyo (Leader: Dr. Ken Sakamura)” and the members of the program, Motoi Kuramochi, Sonia Oshinma, Daisuke?Ymashita, Yoshihumi Matsuda, Kenichi Tsuchida, Mamiko Ito, Shinji Yamane, Toshitaka?Nakamura, Yuta Tkahashi,?Shinsuke Kobayashi, Masataka Yoshida, and Tsutomu Soeno. Further, we would like to thank Takeshi Ozawa, Ichirou Ueno, Atsushi Sekikawa, Masamichi Tsuboi, Naomichi Tsuboi, Hashimu Nagakubo, Harutaka Miyoshi, Yasunori Yamamoto, Isao Yokoyama, The Sanno Institute of Management, National Museum of Ethnology, Takahagi Historical and Folklore Museum, Nagakubo Sekisui Kensho-kai, and The Minato City Local History Museum for providing the materials.

Finally, we would like to give special thanks to Mitsuhiro Tsuda, ipallet, Horiuchi Color Ltd., and Toppan Printing Co., Ltd. for co-researching and developing the digital archive.

Cite as:

Togiya, N., and A. Baba, Constructing an Integrated Digital Archive Using Ontology and the User Community , in International Cultural Heritage Informatics Meeting (ICHIM07): Proceedings, J. Trant and D. Bearman (eds). Toronto: Archives & Museum Informatics. 2007. Published October 24, 2007 at http://www.archimuse.com/ichim07/papers/togiya/togiya.html