MW2001

Register
Workshops
Sessions
Speakers
Interact
Demonstrations
Exhibits
Events
Best of the Web
Key Dates
seattle
Sponsor


A&MI home
Archives & Museum Informatics
158 Lee Avenue
Toronto, Ontario
M4E 2P3 Canada
info@ archimuse.com
www.archimuse.com

Search Search
A&MI

Join our Mailing List.
Privacy.

Published: March 15, 2001.

Papers

Building on the mda SPECTRUM-XML DTD for Collections Management Data Interchange

Bert Degenhart Drenth, ADLIB Information Systems, The Netherlands

Abstract

The Museum Documentation Association, working in collaboration with CIMI and other organizations, has developed an XML-DTD based on SPECTRUM, an established museum process and documentation standard. Of particular interest to the museum community is the potential this DTD offers for the exchange of collections-based information. Imagine being able to migrate data from your institution's collections management system to another with the click of a button. This application is the basis for CIMI's test bed this year. CIMI member, Bert Degenhart Drenth, Managing Director of ADLIB Information Systems will discuss the impact of the SPECTRUM XML DTD on data migration.

Keywords: SPECTRUM, XML, DTD, Data Interchange, Interoperability, mda, CIMI, test bed, CIDOC Conceptual Reference Model

Introduction

SPECTRUM is a publication by the UK mda. It catches the complete spectrum of activities that are performed to document and manage a museum collection. SPECTRUM architecture consists of two components: procedures and units of information. The SPECTRUM standard defines 20 procedures. Each of these procedures describes the steps than need to be carried out to perform a specific task in a museum. Examples of such tasks are: the entry of objects in the museum, loans of objects (in and out) and risk assessment. A separate chapter is devoted to each of these procedures. At the end of each chapter there is a list of data elements, in SPECTRUM jargon named 'units of information'. The 20 procedures form the first half of the SPECTRUM publication; the second half is an overview of all the units of information, their definition and usage and how they are linked with other units of information. The SPECTRUM standard is widely used in the UK and is adopted in quite a few other countries (specially Europe, e.g. Belgium, Germany and the Netherlands). SPECTRUM is available from the mda in printed form or in electronic form. The electronic version is called SPECTRUM interactive. More information about SPECTRUM can be found on http://www.mda.org.uk/spectrum.htm.

Although SPECTRUM is an essential instrument for museums to organise their collection management, it does not contain any information on how to implement the procedures and units of information in an automated system. This task is left to the system vendor. As a result of this, different interpretations of SPECTRUM exist and have resulted in different system solutions. The SPECTRUM-XML work was intended to fill this gap and to define a universal interchange format for systems that are based on SPECTRUM or for non-SPECTRUM systems that export data to SPECTRUM-based systems. In the long run a standard exchange format could save huge amounts of time and money when museums migrate from one collection management system to another.

XML is short for 'Extensible Mark-up Language'. It is a descendant from SGML and as such a brother (or sister) from the well-known HTML. Where HTML is used to define the layout of a page on the World Wide Web, XML is used to define the content of a document. In HTML you can specify that an area of text should be displayed in bold characters and perhaps printed in red, in XML you can specify that an area of text is a title or a subject... XML and HTML can work hand in hand: when a document is marked up in XML then a transformation process can turn the document into HTML so that it becomes a readable page on the web. This transformation is performed through a so-called 'XSL style-sheet'. The important thing to remember here is that XML is not a replacement of HTML.

Expressing museum data in XML is a very 'natural' process: XML allows data elements (fields) to be infinitely repeated and data elements can be nested. This means that data elements can contain other data elements that can contain other data elements etc. etc, much like a set of Russian Matryoshka dolls.

The CIMI Dublin Core test bed was probably one of the first applications of XML for museum data interchange. Although the use of XML in this project revealed some initial problems, the use of XML made it possible to join 200.000 metadata records from a wide variety of sources. This certainly proved the power of XML as a data transport syntax.

The use of XML has gained an enormous momentum. Some three years ago it was very hard to find books on XML in bookstores. These days you will probably find a whole bookcase devoted to the subject, including titles like 'XML for dummies' (ISBN: 0-7645-0360-X).

Major software vendors, including Microsoft and Oracle have implemented support for XML in their products such as Microsoft Explorer 5.x and Access 2000 and Oracle 8i. The Collection Management Software producers have also picked up this trend and have implemented or are implementing support for XML in their systems. All in all it is fair to say that it looks like "XML is here to stay".

DTD is short for Document Type Definition. XML alone cannot guarantee that two applications can exchange data in a sensible way. This is like two people talking the same language: the mere fact that they both speak English does not necessary mean that they can have a meaningful conversation. Just imagine a computer programmer having a conversation with an art historian... Besides a common grammar (or syntax) both parties have to have a minimum understanding about the meaning of what is said. This is where a DTD comes in: a DTD defines which data elements (units of information) are allowed in an XML document and how they are structured. Different applications of XML therefore have different DTD's. DTD's can be very simple (e.g. the CIMI Dublin Core Test bed DTD) or be rather complex such as the SPECTRUM-XML DTD.

DTD's have a rather awkward syntax that can be hard to read. This is one of the reasons why the World Wide Web consortium has been working on an alternative for DTD's under the name XML schema. The XML schema work has currently reached Candidate Recommendation status. This is the stage before becoming an official W3C standard. This stage was supposed to end on December 15th 2000 (http://www.w3org/xml/activity.html). Because the XML schema proposal was not stable at the start of the SPECTRUM XML work, it was felt that in this project the group should start with the older DTD structure. In the future this could always be converted to an XML schema.

Creation of the SPECTRUM XML DTD

The first ideas to create a DTD for SPECTRUM date back from mid-1999 when a group of system vendors, together with the mda started to discuss the subject at CIMI meetings. This was probably inspired by the use of XML in the CIMI Dublin Core test bed. The project was started with the set-up of a private E-mail discussion list on the 29th of September, 1999. During the first months of the project all communications were done through this list.

The first face-to-face meeting was organised on the 31st of March 2000 in the offices of ADLIB Information Systems in Swindon (UK). Participants were the different system vendors (ADLIB, SSL and Gallery Systems), mda (Gordon McKenna and Edmund Lee), Richard Light (independent XML consultant) and Nicholas Crofts from Geneva. As one of the architects of the CIDOC Conceptual Reference Model (CRM) the presence of Nick Crofts proved to be very useful: after a presentation by Nick it was agreed that the ideas in the CIDOC CRM could serve as guidelines in the design of the SPECTRUM-XML DTD.

One of the most important outcomes of this was that the DTD was not necessarily object-centric. In other words: the DTD should support other views on the data e.g. starting from a person, an exhibition or a loan. Another outcome was that the group wanted to take a look at being able to re-use similar data structures that appear in different places in SPECTRUM. An example of this is person data, where a person has a relation with an object, e.g. being the donor or the creator of an object. In this case the person can be seen as an 'actor' on the object and the 'actor' and the object are connected through an 'event' e.g. the creation or the donation of the object. Borrowing this idea from the CIDOC CRM proved to be an important decision that influenced the rest of the DTD design process.

Subsequent meetings were held on the 23rd of May, 2000 with the bulk of the work undertaken at a special "Lock-in" meeting on the 6th and 7th of July 2000 at the offices of SSL in London. The final meeting was held at the 18th of December 2000, again in London. The number of meetings was low in relation to the amount of work that had to be done. This was only possible through the use of the E-mail list and even more important: the effort of Gordon McKenna from mda. Lots of useful comments also came from people outside the "core" group, notably from Martin Dörr and Nicholas Crofts (CRM), John Perkins and Angela Spinazze (CIMI). The 'wrap up' meeting on the 18th of December 2000 produced three deliverables: the DTD itself, a user guide for the DTD and a mapping from the SPECTRUM units of information to the DTD.

The structure of the SPECTRUM-XML DTD

XML documents always contain a 'master' element that contains all the other elements. This special element is called the 'root element'. The root element in the SPECTRUM-XML DTD is named 'interchange' to indicate that the XML document is intended for data interchange purposes. The root element contains two sub-elements (or children) with the names 'metadata' and 'spectrum'. The metadata element can contain information about the XML document, such as who created it, for what purpose or when. The metadata element can contain a complete Dublin Core description. The 'spectrum' element contains the actual 'records' or 'descriptions'.

<?xml version="1.0"?>

<!DOCTYPE interchange SYSTEM spectrum-xml.dtd>

<interchange>

<metadata>

... Metadata description here .

</metadata>

<spectrum>

... Real data here ...

</spectrum>

</interchange>

Code snippet that illustrates the top level structure of an SPECTRUM-XML compliant XML document.

The SPECTRUM-XML DTD contains four different types of elements:

  1. 'Wrapper' elements. These are the three highest-level elements, 'interchange', 'metadata' and 'spectrum'. The wrapper elements serve as 'envelopes' around the real data. This is necessary to enable multiple top-level elements and metadata in the XML document.
  2. 'Top level' elements. The top-level elements contain the museum's entities. There are eight top-level elements: 'activity', 'concept', 'object', 'organisation', 'people', 'person', 'place' and 'reference'. Top-level element can contain other top-level elements: for instance places can be contained in object to represent 'associated places'.
  3. 'Structural' elements. There are 6 structural elements: 'actor', 'address', 'administration', 'description', 'entity-link', and 'identification'. These elements act as glue to keep information together. A good example of this is the 'actor' element when it connects a 'person' to an 'activity'.
  4. 'Data' elements. These contain the lowest level data elements. There are 7 data-elements defined: 'aspect-numeric', 'aspect-term', 'aspect-text', 'date', 'name', 'note' and 'title'.

The contents of the elements can be qualified using 'attributes' and 'values'. The possible attributes are also defined in the DTD. Examples of attributes are 'lang' (language') or 'role'.

It is outside the scope of this paper to list all the definitions and their possible attributes, but a simple example can probably clarify some of the above described principles:

<?xml version="1.0"?>

<!DOCTYPE interchange SYSTEM spectrum-xml.dtd>

<interchange>

<metadata>

... Metadata description here .

</metadata>

<spectrum>

<object>

<identification>

<ref-id type="object-number">

M4

</ref-id>

</identification>

<description>

<aspect-text lang="eng">

Model of the "Spaans karveel", also known as the

Mataro model

</aspect-text>

</description>

<activity type="acquisition">

<actor role="source">

<person>

<identification>

<name>

Beuningen, D.G. van

</name>

</identification>

</person>

</actor>

</activity>

</object>

</spectrum>

</interchange>

Deployment of the SPECTRUM-XML DTD

So far the SPECTRUM-XML DTD has been developed in isolation. The next step will be to test the DTD in practical applications. All members of the SPECTRUM-XML working group are also CIMI members. CIMI has already expressed an interest in organising a test bed around the SPECTRUM-XML DTD. An obvious advantage of this co-operation between the SPECTRUM-XML group and CIMI is that the visibility of the project is lifted from a UK based project to a world wide visibility. Furthermore, CIMI has considerable experience in organising test beds. There are two possible applications for the DTD: data exchange between collection management systems of different vendors and the definition of a record syntax in distributed searching applications.

The first application is the most straightforward: export data from collection management system from vendor A and import them into the system of vendor B. At first glance it seems really simple to prove that this application really works: if the data that have been imported in system B will be exported using the SPECTRUM-XML DTD then one could think that a simple file comparison could prove that the round-trip lost no data and kept the data structure intact. This is not quite true: the SPECTRUM DTD offers too much freedom in the expression of data to make this test possible without further restricting the test conditions. An example of this problem is that system A could export the data with the 'object' element as top element where system B could export the data with the 'person' element as top element (object and its maker(s) vs. maker and his/her object(s)). The formal validation of the export/import application therefore requires additional test conditions. This is where CIMI can offer its test bed experience.

The second application is probably more exciting than the first. Unfortunately a DTD alone is not sufficient for this purpose. In addition to the definition of the record syntax one also needs a definition of the query syntax. The simplest way to implement this is probably to express queries as CGI (Common Gateway Interface) strings. If several systems'vendors or their clients identify the need for this then a similar project as the SPECTRUM-XML could be organised.

CIMI's involvement

The usefulness of SPECTRUM-XML can hopefully be tested in 2001. This would mean that during the course of 2001 various implementors of Collection Management Systems will implement export/import functionality in their systems. This functionality will then become available for their cleints. This will result in a situation, which is referred to as 'standards by stealth'. The individual museum does not need to worry about the intricacies of the DTD. Their systems will contain the functionality by default and the user will be able to inform the mda/CIMI work at a very low cost. The potential of this approach goes well beyond any of the earlier test beds of CIMI. The simple reason for this is that mainstream technology is used for which hardly any usage barriers exist. This distinguishes CIMI's XML projects from let's say the CIMI Z39.50 test bed. Although the CIMI Z39.50 test bed proved without doubt that the technology worked, the actual implementation of Z39.50 was blocked by the steep learning curve of Z39.50 and the lack of support in off-the-shelf products. This is entirely different in the implementation of XML based solutions: a full range of products and components can almost be found at the nearest street corner.

A future vision

Will XML and SPECTRUM-XML be the answer to all known problems in the area of collection management? The answer is probably "no", although for the first time there is a standard available that "naturally" fits museum's data. What remains important is that data (and therefore information) will be "conserved" in a system neutral form that guarantees future usage of them. What most museums probably don't see is that information about their collections is just as important as their collections itself. So far I have not seen a job description for an information curator yet. Using this analogy one could see the SPECTRUM-XML developments as an information insurance policy: at least one can guarantee that data can leave a specific system and be re-used in other (newer) systems. Organisations like mda and CIMI are key in this development. Through test beds and projects like SPECTRUM-XML these organisations can support and promote the implementation of these insurance policies as best practice. Hopefully, the capability to export/import data in a form like SPECTRUM-XML will, in the future, be one of the top requirements in tender documents for future collection management systems.