Research Issues in Metadata
David Bearman
Electronic Records Meeting
Pittsburgh, PA
May 29, 1997
My research consists of model building which enables the construction of theories and parallel implementations based on shared assumptions. Some of these models are now being tested in applications, so this report reflects both what we don't yet know from abstract constructs and questions being generated by field testing.
ASSUMPTIONS:
As a simplifying assumption, let us imagine that we know:
CURRENT STATE:
All of these OPEN ISSUES are framed by the following conclusions, stated in several papers
OUTSTANDING ISSUES:
Let me briefly summarize a number of premises and conclusions and mention the major questions associated with each before reviewing where we stand and the still unresolved questions.
INVIOLABLE LINKAGE: Metadata which is required for evidence must continue to be associated with the record to which it relates over time and neither it nor the record content can be alterable.
CONCLUSION: Methods for keeping metadata over time and retaining its inviolable connection to the content of records are crucial.
To date we have only identified three implementations which, logically, could allow metadata to retain this inviolable connection. Metadata can be:
METADATA STRUCTURES: Metadata content was defined in order to satisfy a range of functional requirements of records, hence it ought to have a structure which enables it to serve these functions effectively and in concrete network implementations.
CONCLUSION: Clusters of metadata are required to operate together. Clusters of metadata are required by different processes which take place at different times, for different software clients, and within a variety of processes. Distinct functions will need access to specified metadata substructures and must be able to act on these appropriately. Structures have been proposed in the Reference Model for Business Acceptable Communications.
QUESTION: Are the BAC structures workable? Complete? Extensible in ways that are known to be required - for example, metadata required for "recordness" is created at the time of the creation of the records but other metadata, as premised by the Warwick framework, may be created subsequently. Are these packets of metadata orthogonal with respect to recordness? If not, how are conflicts dealt with?
QUESTION: Not all metadata references fixed facts. Thus, for example, we have premised that proper reference to a retention schedule is a citation to an external source rather than a date given within the metadata values of a record. Similar external references are required for administration of shifting access permissions. What role can registries (especially rights clearinghouses) play in a world of electronic records? How well do existing languages for permission management map to the requirements of records administration, privacy and confidentiality protection, security management, records retention and destruction, etc.
QUESTION: Not all records will be created with equally perfect metadata. Indeed risk-based decisions at taken by organizations in structuring their records' capture are likely to result in conscious decisions to exclude certain evidential metadata. What are the implications of incomplete metadata on an individual organization level and on a societal level? Does the absence of data as a result of policy need to be noted? And if so, how?
QUESTION: Since metadata has owners, how do owners administer records metadata over time? In particular, since records contain records, how are the layers of metadata exposed for management and administrative needs (if internal metadata documenting dependencies can slip through the migration process, we will end up with records that cannot serve as evidence. If protected records within unprotected records are not protected, we will end up with insecure records environments, etc.etc.).
METADATA SEMANTICS: Metadata required for recordness must, logically, be standard; that required for administration of recordkeeping systems is extensible and locally variable.
CONCLUSION: Records metadata must be semantically homogenous but it is probably desirable for it to be syntactically heterogeneous and for a range of protocols to operate against it. Records metadata management system requirements have both an internal and external aspect; internally they satisfy management requirements while externally they satisfy on-going recordness requirements.
QUESTION: In principle, the BAC could be expressed as Dublin metadata and insofar as it cannot be, the Dublin metadata will be inadequate for evidence. What other syntax could be used? How could these be comparatively tested?
QUESTION: Could Dublin Core metadata, if extended by qualifying schema, serve the requirements of recordness? Records are, after all, documents in the Dublin sense of fixed information objects. What would the knowledge representation look like?
METADATA SOURCES/RULES: Metadata adheres to records after a precipitating record-event (action). CONCLUSION: The metadata has to come either from a specific user/session or from rules defined to extract data either from a layer in the application or a layer between the application and the recording event.
QUESTION: Strategies for metadata capture currently locate the source of metadata either in the API layer, or the communications system, using data provided by the application (an analysis supports defining which data and where they can be obtained), from the user interface layer, or from the business rules defined for specified types of communication pathways. Can all the required metadata be obtained by some combination of these sources? In other words, can all the metadata be acquired from sources other than content created by the record-creator for the explicit and sole purpose of documentation (since such data is both suspect in itself and the demand for it is annoying to the end user)?
QUESTION: Does the capture of metadata from the surrounding software layers require the implementation of a business-application specific engine, or can we design generic tools that provide the means by which even legacy computing systems can create evidential records if the communication process captures the interchange arising from a record-event and binds it with appropriate metadata?
CONTEXTUAL METADATA & SEMANTICS: Contextual metadata locates the record-transaction within a specific business application context.
CONCLUSION: A representation of the business context must exist from which the record-creating event can obtain metadata values.
QUESTION: What kinds of representations of business processes and structures can best carry contextualizing metadata at this level of granularity and simultaneously serve end user requirements? Are the discovery and documentation representations of provenance going to have to be different?
QUESTION: Can a generic level of representation of context be shared? Do standards such a STEP provide adequate semantic rules to enable some meaningful exchange of business context information?
RECORDS MANAGEMENT WITH METADATA: Structural metadata of records will enable "least-loss" migration of evidence across time.
CONCLUSION: Structural metadata must both define the dependent structures and identify them to a records management environment which is "patrolling" for dependencies which are becoming risky in the evolving environment in order to identify needs for migration.
QUESTION: Are the risk times for such dependencies essentially universal, even if slightly different degrees of risk are experienced locally? If so does this mean that migrations could reasonable take place universally?
QUESTION: Using past experiences of expired standards as an indicator, can the defined structural metadata support necessary migrations? Are the formal standards of the source and target environments adequate for actual record migration to occur?
QUESTION: What metadata is required to document a migration itself?
SOCIETAL/LEGAL FOUNDATION: One benefit of the BAC's is that records conforming to it can be retained in any BAC conformant environment. A legally recognized system of this sort could significantly reduce risk.
CONCLUSION: BAC conformant environments could reduce overheads and if standards supported the uniform management of records from the point of issue to the point of receipt. Could redundancy now imposed by both paper and electronic processes be dramatically reduced if records referenced other records?
QUESTION: Reduction of redundancy requires record uses to impose post-creation metadata locks on records created with different retentions and access controls. To what extent is the Warwick Framework relevant to these packets and can architectures be created to manage these without their costs exceeding the savings?
IMPLICATIONS FOR RESEARCH: The various questions posed above are now sufficiently formulated that their resolution depends on development and implementation of test environments. Test environments have the disadvantage of not taking advantage of the economies of scale that would occur if the standards proposed by the BAC's and other metadata models were incorporated as native elements within computing communications environments (so economic outcomes will still have to be projected) but they could support answering range of practical questions including the issues about multiple and differing approaches to the storage and migration of metadata. These proofs-of-concept should be formulated and staged to address issues in a way that assures that high cost tests are not conducted before lesser cost tests have been passed.
A number of issues about proper implementation depend on the evolution (currently very rapid) of metadata strategies in the broader Internet community. Issues such as unique identification of records, external references for metadata values, models for metadata syntax, etc. cannot be resolved for records without reference to the ways in which the wider community is addressing them. Studies that are supported for metadata capture methods need to be aware of, and flexible in reference to, such developments.
"Item Level Control and Electronic Recordkeeping", Archives & Museum Informatics, vol.10, #3, p.195-245
"The State of Electronic Records Management Worldwide", Archives and Museum Informatics, vol.10#1, p.3-40
with Wendy Duff, "Grounding Archival Description in the Functional Requirements for Evidence", Archivaria, #41, Spring 1996, p.275-303
"Archiving and Authenticity", Research Agenda for Networked Cultural Heritage (Santa Monica CA, Getty Art History Information Program, 1996) p.63-67
"Standards for Networked Cultural Heritage" Archives and Museum Informatics, vol.9 #3, p.279-307
w/ Ken Sochats, "Metadata Requirements for Evidence", in press, "Automating 21st Century Science - The Legal, Regulatory, Technical and Social Aspects of Electronic Laboratory Notebooks and Collaborative Computing in R&D" by Rich Lysakowski and Steve Schmidt, TeamScience Publishing, 1996.
[Return to text]