nevile-illustration
March 22-25, 2006
Albuquerque, New Mexico

Papers: Simple Cultural Organisation System (SCOS):
An Interoperable Cultural Taxonomy

Liddy Nevile, Behzad Kateli, Sarah Pulis, La Trobe University, Australia

http://www.latrobe.edu.au/cs/
Abstract

In 2005, we presented a paper about using Semantic Web annotations to enable discipline or community specific annotations in a cultural setting. We described using formats such as vCard and Friend-of-a-Friend (FOAF) to enter information about the creator of the annotations so the annotations could be sorted and used in a variety of ways. This year, we look at the development of a taxonomy for annotations that will enable the matching of annotations to perspectives such as community or disciplinary bias, cultural authority, locations, and time. Such criteria can strongly influence cultural interpretation and should be explicit so that they can be taken into account as collections of annotations become the rich descriptive data of museums. Interpretive 'bias', once identified, can be used to help understand perspectives that will, in themselves, give insights into cultural material.

Keywords: annotations, thesaurus, SKOS, culture, Semantic Web

Introduction

Having resources, annotations, information about the annotator, and a taxonomy of relevant terms greatly increases the richness of the description of a resource. Museum visitors can pursue idiosyncratic interests more easily when a system does not need to anticipate their queries but can accommodate them. In this paper, we anticipate that users of resources may want to organise those resources according to different, perhaps idiosyncratic, structures and thesauri. We anticipate what we think of as different ontologies for the same resources.

Simple Knowledge Organisation System (SKOS) (W3C, 2004) is a model for expressing the basic structure and content of concept schemes (thesauri, classification schemes, subject heading lists, taxonomies, terminologies, glossaries and other types of controlled vocabulary). Simple Cultural Organisation System (SCOS) is proposed as a SKOS implementation. It uses the SKOS model for expressing the basic structure and content of cultural artefacts, knowledge practices and community schemes (such as roles, authority structures, expertise, or terminologies). SKOS (and SCOS) descriptions are machine-readable and expressed in Semantic Web formats so they can be related to other Semantic Web information.

The main advantage of SCOS over more traditional representation systems is that it enables a graphical interface. A SCOS thesaurus is easy to construct with no technical knowledge; it expressed in a powerful, interoperable format that can be exported and integrated with other systems.

In the museum world, we can expect at least the following two use cases. Imagine a museum collection of plant, animal and insect specimens from a remote region of Australia.

  1. A senior member of an Indigenous community describes its contents. She is the most authoritative person in her community with respect to traditional medicine and health practices. She accesses the existing descriptions of the collection and adds annotations about using the medicines and the practices with which she is familiar. She notes connections between the species present and customary laws.
  2. Some time later, a Chinese visitor to the museum adds his annotations to the descriptions. He is a medical researcher who specialises in tropical diseases; he has significant knowledge about the use of herbal medicines and the associated Chinese practices and their integration with medical practices. He makes different connections between the specimens and the ways in which they can be used.
  3. Some time still later, visitors to the museum (possibly via the museum Web site) express their interest in the variety of ways in which the species represented have been used in medical health contexts. The visitors want to know not only about the species and their use, but also about the authority of the information that is found at the museum. They particularly want to be able to determine what information will be accurate with respect to the community's practices and what information will be accurate in scientific terms with respect to the effectiveness of the remedies.

This paper is about enabling authors and their audiences to classify cultural information in flexible yet complex ways using simple techniques. It is the authors' intention to enable simple re-use of descriptions which are useful to those who are describing or annotating descriptions of resources, specimens for example, but to make it also easy for them to make significantly different connections between those resources, according to local criteria.

In effect, this paper is about extending what has become known as a typical situation in which those describing resources often want to maintain local specificity while enjoying global inter-operability with other descriptions and descriptive systems. In this case, the desire is to be able to reorganise the same resources for different local specificities, while maintaining global inter-operability. The authors contend that if they can describe the descriptions in an appropriate way, they will be able to mix and match them and easily replace one specific organisational system with another when the user's interests call for that change.

The Quinkan Matchbox draft taxonomy schema discussed in this paper was never intended to be otherwise. It is a base upon which an interoperable taxonomy might be built. This paper is motivated by concern with one aspect of what might become a process to be used by Indigenous and other people to customise the draft schema into one of value to them for repatriation and refreshment of Quinkan culture.

Background

Several years ago, Sophie Lissonnet (2004) developed a draft metadata schema for collection of tangible (and intangible) Indigenous cultural resources. In doing so, she increasingly encountered the richness of the culture she intended to serve as experts from a variety of disciplines and backgrounds contributed to the wealth of the fabric that encapsulates the knowledge, understandings, objects and practices of the culture. The aim of the schema was to find a way to describe any resources that might be included within the collection in such a way as to enable their discovery within conventional collections, such as those held by museums, and communities, or distributed outdoors across time and space (as is a significant collection of Quinkan Rock Art, which some may consider to be artefacts).

In coming to terms with the complexity of the particular culture, known as the Quinkan culture but significantly encompassing the culture of 5 main family groups and many individuals, such problems as the following aroused confusion:

  • Australian Aboriginal people respect the Elders of their community and recognize different rights of expression with respect to objects in their world;
  • 'country' is known as the bush by most (city-dwelling) Australians but the bush is a single instance of a plant for others outside the Australian vernacular and for Aboriginal people, 'country' is not an entity but part of them, or they are part of it.

Lissonet's work established a draft of a descriptive structure and encoding that would allow classification of Quinkan culture using the schema to interoperate with well-established classification schemes such as those of museums, libraries and the available relevant government records. What it could not do is provide a method to maintain a suitable, flexible-yet-complex-and-rich subject classification scheme that would allow for a range of descriptions of the subject of a resource. This paper proposes a way of doing that.

Traditional Australian Aboriginal customary laws do not understand family structures in the same way as the invading Europeans understood them in the nineteenth century. According to the European perspective, children without a mother or a father were orphans, but there was no such role within the traditional Aboriginal families. Many Aboriginal children became wards of the state, to be treated in effect as the property of the state. In many cases their parents also were treated as property. Understanding the social structure of early Australia is a complex business and depends upon understanding the many perspectives from which it was seen and can today be seen. Traditional Elders, now recognised for their high status within their communities and lauded for their knowledge, were the victims of what appears clearly as abuse today but was the norm in former times.

Finding simple ways to record things such as the roles of people so that the many ways of understanding their position within changing society can be analysed is one goal of the work described in this paper. It is thought to be a problem analogous to finding a simple way of recording the relationships between biological species and health practices. It is a thesaurus problem with the added complexity that it should be machine-readable for interoperability purposes and ease of use.

Subject Classification

Those who work currently in classification theory have a range of ways of approaching the classification problem. Librarians have for many years worked with what they call thesauri: what we might call taxonomies of knowledge that are enriched with terms that are in common use so that classifiers can determine how to choose a relevant term for a given resource. Such systems also enable users easily to find the resources they want using common or preferred terms. Librarians place on shelves books usually ordered by subject.

The simple version of this approach might be known as a linear approach: even though the resource can have several subjects, the subjects themselves are either broader or narrower than those related to them, and the schema can be presented as a hierarchy. We will refer to it as a linear hierarchy because it can be expressed numerically (eg DDC, LCSH, …) . Books can be placed on shelves one after the other - in a long line (spread across many bookshelves in huge libraries at times, of course). Catalogue records, on the other hand, often go further by adding a 'see also' that would suggest consideration of a different subject area for discovery of a book with a certain subject (not that this would necessarily make it available to the librarian to place the book in one location or the other).

Fig 1: a typical thesaurus hierarchy.

Fig 1: a typical thesaurus hierarchy.

Librarians traditionally worked with books, of course, and their main problem was to determine on which shelf to place a book, given that there could only be one location as there was only one book. Now, freed from the uniqueness of the classification given that computers can allow for a resource to have several 'locations' in cyberspace, the cataloguer can broaden the classification system by placing the resource within several classifications; that is, in several locations in the linear classification structure. The problem is that we want to place the resource within several classifications that are neither linear nor equivalent. It is not unusual for the classifications within one system to be just a little broader than in another, or equally, just a little narrower.

We can imagine our version of the family classifications needing to accommodate orthogonal classifications. In this case, we would want to accommodate the classification of one community and the classification of another. In some cases, there will be no problem with doing this as the classification will match, but in others it will be significantly different.

Let us take the example of a system that divides a class into 4 groups and another that divides the same class into five. The merging of these two classes is not possible without significant loss, i.e. accommodation by 'squeezing' or 'stretching' a class which is achieved by 'adjusting' or throwing away some of the information. If, on the other hand, we can maintain two different classification systems in parallel, we will maintain all the information that belongs to both groups in the correct structure.

Fig 2: the merging of two slightly different sets of classes for a single concept

Fig 2: the merging of two slightly different sets of classes for a single concept

To do this, we need a way of describing what we are doing so it can be unravelled automatically, especially when we have branches emerging at many points. So we have hierarchies that change; where a subject belongs in the hierarchy may change, etc, and all these variations need to be captured. At some times, there will be coincidence of subjects, but at others there will be extreme differences. We need a very simple way of describing a very messy set of relationships.

Fig 3: showing a thesaurus that involves a number of references across the classes

Fig 3: showing a thesaurus that involves a number of references across the classes

Simple Knowledge Organisation System (SKOS) provides one way of describing complex subject systems so that machines and humans can work with them. SKOS is a Semantic Web language, meaning that it is a constrained language built within the Resource Description Framework (RDF) (W3C 1). Using RDF rules about grammar and vocabularies, SKOS is merely an extension of already established ways of talking about resources; in this case, organisational systems such as a subject classification system. So SKOS can be used to describe the system used to describe the subjects of resources. And, therefore, what is proposed as Simple Cultural Classification System (SCOS) will be a way of describing a subject classification system for describing the subject of cultural resources. By virtue of its derivation from SKOS, SCOS has many qualities that make it suitable for this task. As an RDF language, SCOS has qualities that make it easy to build classification systems using graphical interfaces, supporting both involvement of non-literate communities and easy internationalisation.

After working in the familiar area of classification of the subject of resources, we move on to work on the areas of interpretation. Can we also use SCOS to describe the authority of the interpretation, or the disciplinary perspective being taken on the cultural object? In the case of the Quinkan culture, there are many opinions expressed about what some call the artefacts, the Rock Art of the region. Not everyone thinks of the Rock Art as artifacts, and some community members will want to speak about the paintings according to their audience at the time. Can SCOS help determine for whom the interpretation is intended and with what authority it is being offered?

In the cases proposed, the problem will be that a number of different organisational systems will be expected according to those who are viewing the cultural resources, as well as a system for organising those providing the interpretations. The aim of SCOS, then, is to provide a simple way of specifying these organisational systems that supports the underlying complexity. Being able to switch from one system to another; it is dependent on there being machine-readable representations of the systems so this process can be automated.

Let us now look more closely at what SKOS offers.

Simple Knowledge Organization System (SKOS)

SKOS is designed to provide a way to encapsulate and publish concept schemes: it is defined as 'A set of concepts, optionally including statements about semantic relationships between those concepts."Miles states that "'Concept scheme' is a blanket term for thesauri, classification schemes, taxonomies, subject headings, terminologies [and] other types of controlled vocabularies…" (Miles, 2005).

The benefits to be derived from SKOS include:

  • Ease of combination with other meta-information standards
  • Flexibility and ease of extension, to cope with variations in structure and style

It is within the RDF so the benefits include:

  • Use of graphs (serialisation is done for you but is human readable)
  • Very simple statements (triples)
  • Easy integration with other RDF

Other RDF information is of interest because it is authored by many people (often with critical reviews as for many Wikipedia entries), can be combined to provide information that the original author does not have, and has a longevity beyond the life of the active participation of any single author. Such benefits are explained in many contexts, including an earlier paper by the authors (Nevile & Lissonnet, 2003).

The SKOS process is a simple one of 'identify, describe and publish' a set of concepts. Miles' example is:

Step 1 - identify concepts - e.g.

  • http://www.example.com/concepts#love
  • http://www.example.com/concepts#awe
  • http://www.example.com/concepts#joy

Step 2: describe the concepts using a RDF graph

Step 3: publish…

Create an RDF/XML serialisation (concepts.rdf) and Put this file on an HTTP server (http://www.example.com/concepts) or Load statements into a dedicated RDF server (Joseki, Sesame, Kowari …)

SKOS provides for the identification of concepts and their relationship with other concepts and for the association of words (labels) with those concepts.

Given that the most precise explanation in SKOS is a concept rather than a term, it is easy to provide a variety of terms for a concept, and thus include terms from different languages. It is also easy to provide preferred and alternative and potentially miss-spelt terms (usually hidden from view) so that classification entries and searches can be easily handled. In addition, as well as word labels, icons or images can be used to identify concepts. The only limit is that the 'term' (or label) should be machine addressable so it must be an image or icon that can be de-referenced on the Web (it need not be a Web object; just identified on the Web). As in most thesauri, it is a rule of SKOS that there should be only one preferred label (per language) for each concept. But it is not a constraint that all concepts should be defined within the same thesaurus. A concept found within one thesaurus can easily be linked to the same concept in another thesaurus, or, indeed, related to it as narrower, broader or in some other way related.

So we can see that SKOS provides us with an easy way to work on a subject classification system, piece by piece, or in chunks, or by combining several systems. SKOS has a number of rules that provide for greater precision and interoperability, but they will not be discussed at this point. Instead, an example of developing a simple thesaurus in SKOS, a simple SCOS, follows.

A simple Quinkan Culture SCOS

One of the aspects of Quinkan culture that concerned Lissonnet was the description of physical sites where there was Rock Art. The types of sites were usually classified in the available governmental records as being a cave, a rock overhang, open, or a rock shelter. The other properties of sites of interest in the available records were the originator, the site attributes, the materials at the site and the structure of the site.

A standard thesaurus and a SKOS thesaurus look pretty much the same, in the sense that they are human readable:

<skos:Concept rdf:about="site">
	<skos:prefLabel>Quinkan site</skos:prefLabel>
	<skos:narrower rdf:resource="originator"/>
	<skos:narrower rdf:resource="siteType"/>
	<skos:narrower rdf:resource="siteAttribute"/>
	<skos:narrower rdf:resource="siteMaterial"/>
	<skos:narrower rdf:resource="siteStructure"/>
</skos:Concept>        
<skos:Concept rdf:about="originator">
	<skos:prefLabel>Originator</skos:prefLabel>
	<skos:altLabel>Original user</skos:altLabel>
	<skos:broader rdf:resource="site"/>
</skos:Concept>
<skos:Concept rdf:about="siteType">
	<skos:prefLabel>Site type</skos:prefLabel>
	<skos:narrower rdf:resource="cave"/>
	<skos:narrower rdf:resource="open"/>
	<skos:narrower rdf:resource="overhang"/>
	<skos:narrower rdf:resource="rockShelter"/>
</skos:Concept>

Figure 4 shows, in graphical form, the information about the SCOS system being considered. It avoids contact with the encoding while showing the structure of the particular thesaurus.

Figure 5 shows the concepts and their relationships as defined within the system. Both images are the representation that is available in the software package RDFAuthor (Steer, 2003) and show the same sort of information.

The round objects represent digital resources, in some cases concepts, in others values, or, as they are described in RDF language, the subjects or predicates within the triples represented in the graphs. All relationships in the graph involve triples, the classic structure of RDF information. This is because RDF requires the use of directed triples:

<subject> <relationship> <predicate>
as in
<Sunrise experimental thesaurus> <has language> <en>

It is a feature of RDF that these sentences are so simple - they both avoid ambiguity and, therefore, lend themselves to machine processing.

Fig 4: A graphical representation of the Dublin Core terms used to describe the Sunrise experimental thesaurus. The round objects are digital resources and the square objects are literals. This is a typical RDF graph.

Fig 4: A graphical representation of the Dublin Core terms used to describe the Sunrise experimental thesaurus. The round objects are digital resources and the square objects are literals. This is a typical RDF graph.

Note that in this case, the experimental thesaurus is on-line and can be accessed as a digital resource: the relationship is fully resolved by reference to what is known as a namespace where its meaning and use is defined, and the predicate is, in this case, just the letters 'en'. The value 'en' is known as a literal as it is not a resource that itself can is described and related to other resources. It could be a resource and would be if it were, for instance, a reference to the 'English language' entry in Wikipedia, as http://en.wikipedia.org/wiki/English_language, or any other resource.

The interesting aspect of integration and interoperability provided by RDF encoding is that as soon as a resource is linked to another resource that itself has relationships, those remote relationships become part of the world of the original resource. A reference to 'English language' in Wikipedia, for example, will bring into the world of the original resource all the references from the Wikipedia article. This may include information that the creator of the original resource does not even know.

Another relevant factor is that as the relationships are formed at the level of concept as well as terms or icons or images, and those relationships can be described, it is possible to determine if the relationship is real. For example, where a term is used in two contexts, the word may be the same but its meaning may be understood very differently in each context.. If the concept can be checked, this should be discoverable and the relationship between the term, as used, and the concepts and contexts can be clarified.

Finally, the graphical nature of the representations shown is considered valuable. It is not just that the thesaurus can be represented this way, but that it can be built this way.

Fig 5: a graphical representation of the concepts and their relations and descriptions within the Sunrise experimental thesaurus. The round objects are digital resources and the square objects are literals. This is a typical SKOS graph; that is, RDF SKOS.

Fig 5: a graphical representation of the concepts and their relations and descriptions within the Sunrise experimental thesaurus. The round objects are digital resources and the square objects are literals. This is a typical SKOS graph; that is, RDF SKOS.

At the bottom right hand corner of Figure 5 we see a relationship known as a 'depiction'. This is a description of the relationship of an image to the resource being described by all the other relationships. This image is a resource and so is sure to have its own descriptive information. It is easy to imagine that the sort of Dublin Core based information used to describe the Sunrise experimental thesaurus (in Figure 4) might also be used to describe the image. (In fact, the image is from a journal article; there is a considerable amount of information available about it, but not in machine-readable form.)

To build the description of the image is fairly simple using RDF authoring software. The software will produce the necessary encoding of the information as the graph is extended. If the image is in fact a cultural resource, it will, of course, be better to describe it not just using the Dublin Core properties but also relating the subject of the image to others within the cultural system. We can imagine an extension of the graph as shown in Figure 6.

Fig 6: a diagram showing the extension of information about an image object, including both Dublin Core RDF and SKOS descriptions of the subject of the image. In this case, the image is used as a 'label' for the concept because it is a clear image of something that is a good, recognisable example of the concept.

Fig 6: a diagram showing the extension of information about an image object, including both Dublin Core RDF and SKOS descriptions of the subject of the image. In this case, the image is used as a 'label' for the concept because it is a clear image of something that is a good, recognisable example of the concept.

A small example of how SCOS might help with identifying the authority of interpretations and information provided for a cultural collection follows. First, there is an extract of the text version, and then two images that show the relationships and entities.

<skos:Concept rdf:about="AnnoSourceUser">
    <skos:example>John Smith, an academic specialising in the history
    of landscapes.</skos:example>
    <skos:scopeNote>An annotator is a person who adds information to
    a record about Quinkan Culture using AnnoSource</skos:scopeNote>
    <skos:prefLabel>AnnoSource Annotator</skos:prefLabel>
    <skos:broader rdf:resource="http://purl.org/dc/
    terms/contributor"/>
    <skos:narrower rdf:resource="anthropologist"/>
    <skos:narrower rdf:resource="geologist"/>
    <skos:narrower rdf:resource="archaeologist"/>
    <skos:narrower rdf:resource="student"/>
    <skos:narrower rdf:resource="Elder"/>
</skos:Concept>
<skos:Concept rdf:about="archaeologist" />
    <skos:definition>A person who studies human cultures through the
    recovery, documentation and analysis of material remains and 
    environmental data, including architecture, artifacts, biofacts,
    human remains, and landscapes. (from Wikipedia)"</skos:definition>
    <foaf:depiction rdf:resource="http://www.nature.com/news/2004/041025/
    images/morwood.jpg" />
    <skos:prefLabel>Archaeologist</skos:prefLabel>
    <skos:narrower rdf:resource="architectureArcheologist"/>
    <skos:narrower rdf:resource="artifactsArcheologist"/>
    <skos:narrower rdf:resource="biofactsArcheologist"/>
    <skos:narrower rdf:resource="humanRemainsArcheologist"/>
    <skos:narrower rdf:resource="landscapesArcheologist"/>
</skos:Concept>
Fig 7: a graphical representation of the thesaurus extract showing the semantics.

Fig 7: a graphical representation of the thesaurus extract showing the semantics.

Fig 8: the structure of the thesaurus.

Fig 8: the structure of the thesaurus.

SCOS As SKOS And Interoperability With Other Ontologies

Miles et al (2005) talk about a sledge hammer and a nut cracker, both of which are used for describing ontologies.

In 2004, W3C released an overview document that described the features of their new Web Ontology Language (OWL) (W3C 2).

The OWL Web Ontology Language is designed for use by applications that need to process the content of information instead of just presenting information to humans. OWL facilitates greater machine interpretability of Web content than that supported by XML, RDF, and RDF Schema (RDF-S) by providing additional vocabulary along with a formal semantics. OWL has three increasingly-expressive sublanguages: OWL Lite, OWL DL, and OWL Full.

SKOS, as we have seen, is useful for developing thesauri within a context. It will be even more useful if that specialised or local thesaurus can be related automatically to other thesauri. If SKOS thesauri, then, can be re-expressed in OWL, this will be possible. As Miles and his colleagues argue, OWL is more comprehensive as a tool than is necessary in many situations.

At the time of writing, there are a number of examples of SKOS, RDF and OWL ontologies being used together. A clear transition from a SKOS thesaurus to an OWL ontology does not yet appear to be available although it is seen to be a goal of continuing SKOS development work.

Conclusion

In this paper we have considered the role of an emerging language for its ease of use and suitability for developing thesauri for cultural descriptions. We have demonstrated the use of the SKOS in a simple cultural description setting and have foreshadowed its use for describing the contributors of cultural interpretations. This extends work done previously on the process of annotating cultural interpretations, often stored as metadata, in a system known as AnnoSource (Nevile & Kateli, 2005).

Acknowledgements

A huge thank you to Alistair Miles for his excellent tutorial on SKOS delivered at the DC 2005 Conference in Madrid and the inspiration to work on SKOS. Thank you also to Sophie Lissonnet for her excellent work on an application profile for Quinkan Culture.

References

Lissonnet, S (2004). (re)collections: developing a metadata application profile for the Quinkan Culture Matchbox. Unpublished Masters thesis, James Cook University, Townsville (Australia).

Miles, A. (2005). SKOS Core tutorial (DC-2005 Madrid), accessed 2006-1-14. http://isegserv.itd.rl.ac.uk/cvs-public/skos/press/dc2005/tutorial.ppt

Miles et al, (2005) "SKOS, a Language to Describe Simple Knowledge Structures for the Web", accessed 2006-1-14. http://idealliance.org/proceedings/xtech05/slides/miles/SKOS-xtech2005.ppt

Nevile, L. & B. Kateli (2005). Interpretation and Personalisation: Enriching Individual Experience by Annotating On-line Materials. Museums and the Web 2005, In J. Trant & D. Bearman (Eds.) Museums and the Web 98 Proceedings. CD ROM. Archives & Museum Informatics, 2005. Available: http://www.archimuse.com/mw2005/papers/kateli/kateli.html

Nevile, L. & S. Lissonnet (2003). Dublin Core: The base for an Indigenous culture environment? Museums and the Web 2003. In D. Bearman & J. Trant (Eds.) Museums and the Web 98 Proceedings. CD ROM. Archives & Museum Informatics, 2003. Available: http://www.archimuse.com/mw2003/papers/nevile/nevile.html

Steer, D.(2003). RDFAuthor, Last modified: Sat Aug 2 11:11:34 BST 2003, accessed 2006-1-14. http://rdfweb.org/people/damian/RDFAuthor/

W3C (2004) last modified: 2005/11/30 08:39:23, accessed 2006-1-14. http://www.w3.org/2004/02/skos/core/

W3C 1. Last modified 2005/10/04 20:08:21, accessed 2006-1-14. http://www.w3.org/RDF/

W3C 2. Last modified 2004/02/10, accessed 2006-1-14. http://www.w3.org/TR/owl-features/

Cite as:

Nevile, L., Kateli, B., and Pulis, S., Simple Cultural Organisation System (SCOS) - An Interoperable Cultural Taxonomy, in J. Trant and D. Bearman (eds.). Museums and the Web 2006: Proceedings, Toronto: Archives & Museum Informatics, published March 1, 2006 at http://www.archimuse.com/mw2006/papers/nevile/nevile.html