Item Level Control and Electronic Recordkeeping

by David Bearman, Archives & Museum Informatics

Introduction: Archival Principles and Collective Control

Archivists are keenly aware of the huge volumes of records created by the day-to-day activity of a modern organization. They understand both that records are essential to the conduct of all the business process of the organization and that the vast majority of records should be disposed of when no longer required by the specific activity for which they were created. Ideally each record should be kept only for the amount of time deemed necessary for corporate purposes but be immediately available during that time as evidence of the transaction for which it was created. Practically speaking, this has meant that records managers and archivists have had to make sure that organizations segregated records according to the length of time they were expected to be of continuing value. In the paper world, this has been achieved by establishing discrete filing systems for the records of different transactions, and retaining these, under administrative controls and in proximity to where they are used, as long as they are serving a current business purpose. Records are arranged within these recordkeeping systems in the order in which the business process they serve needs to retrieve them and whole systems, or discrete physical parts of such systems, are retired to records storage, accessioned to archives, or destroyed, when their "schedule" dictates. For as long as the records are kept, the "original order" physically imposed by the office filing procedure is the basic method for supporting access within the recordkeeping system, a valuable clue to the meaning of records and an essential component of their evidential value. This approach to records is neither necessary nor desirable for electronic records.

The essential difference between electronic and paper records is that the former are only logical things while paper records are usually thought of only as physical things.[2] Physical things can be stored in only one place and in one observable order; logical things can be physically housed in many places but seen to be together. They can appear to have different arrangements depending on the views accorded to their users. In other words, the properties of logical things are associated with them through formal, defined, logical relations while the properties of physical things are associated with them as material objects with concrete locations, attachments and marking. Archival and records management methods have been developed to manage physical things, which has both limited and, in some cases, simplified them. Archival theories have been developed to validate these practices and are therefore based on the assumptions inherent in managing physical things. As a consequence, archivists have elevated pragmatic responses to the nature of physical things to the level of ideology. The more we examine electronic records, the more instances of such an elevation of practice to theory are revealed.

For example, archivists have traditionally insisted that the internal arrangement of records within series must be retained undisturbed from the way the records creators kept them. The theoretical importance of this was that the order in which recordkeeping systems retained physical records dictated how they could be used within the office of origin and therefore provided evidence of the conduct of the business processes which created them. When records were re-filed into some order considered more "convenient" by a subsequent custodian (as when an alphabetical or chronological order was imposed where none previously existed) the evidence which record order had conveyed about the business process context and the methods of retrieval supported by the original office was lost. Even re-filing correctly, items that were misplaced in the original files, distorted the record and could have implications for inferences made about the records as evidence. What the physical order was preserving was the logical associations between records, but this was not understood.

In electronic environments the methods by which the originating office can use the records are not a reflection of the physical storage order but are instead established by the capabilities of the software environments in which the records are used. These software functionalities are likely to change over time. The capabilities of any given individual within these systems is further determined by the permissions and views accorded to those individuals in different relations to the records and these also change over time and with each user. Finally, the ways in which records are "filed" depends on the assignment (or lack of assignment) of data values or on structural links defined in software architectures. Because the way that the records are organized on any storage device will not give evidence of their use or the business processes that employed them we must rely for such evidence on metadata (information about information systems and business processes) created contemporaneously with the record and its interaction over time with software functionality and user profiles. Thus the principle regarding original order is revealed to be that we must document the context of creation and use, including the logical associations of records in recordkeeping systems, in order to understand records as evidence, not that we need to literally "preserve" the original order of the physical records.

The concepts of series, record groups or fonds themselves are not truly physical but instead are logical associations. Because it is often the case that physical series or physical systems correspond to logical series and logical systems, archivists and records managers find it convenient to hold on to their physical notions. But in an electronic environment, where the boundaries of physical systems need not, and often do not, correspond to the boundaries of logical systems or organizations, the series and the fonds must be recognized as classification schemes intended to convey information about the transactions by which the records were created and by which they were maintained. Not only can the electronic record not be expected to have the physicality associated with either record series or fonds in paper records, it is unnecessary to impose it. All the actions we would have taken "collectively" based on physical proximity of records in traditional recordkeeping systems, we can take logically, without such physical aggregation, if appropriate logical relations are documented at the item level.

Beyond the pragmatic limits imposed by arrangement of paper records, physical aggregation has reflected the administrative boundaries of custody because physical control dictated who could see records and use them, which offices had access, and when records were retained and destroyed. The fonds reflected the ultimate legal and administrative responsibility for records and their recordkeeping systems. The procedures of this administrative entity were crucial to estimate the trustworthiness of the records inherited at a later date. The archival principle of respect des fonds is, therefore, a pragmatic response to the fact that physical records lived for many years within the custody and under the control of an active administration whose practices determined the value and legitimacy of the records. Knowing that entity and its processes enabled administrators to hold the proper persons responsible for recordkeeping and historians to judge the value of the remaining record.

These indirect methods are not necessary in the electronic environment where much more direct responsibility, process controls, and respect des fonds can be imposed by control over the record metadata and the way systems can interact with records based on that metadata. These methods of control are discussed in detail in the final sections of this paper. Before we address those implementation issues, however, it is necessary to examine the dysfunction's introduced into recordkeeping by the assumptions of paper-based records practices, particularly in its focus on the "life-cycle" of records, and to introduce a new set of requirements for recordkeeping grounded in the social and legal concept of evidence.

I. The "Records Life-Cycle" Metaphor and its Consequences

The framework within which archives have operated in North America has been defined by stages in the custody of physical records and their collective management. Simply, and a bit simplistically but still reasonably accurately, we can say that during their active life, records in North America are in the custody and under the control, of the records creator. During their inactive life, aggregates of records (often entire series but frequently accessions within them) are transferred into the custody and control of the records manager. And during their archival life, these record series and the larger collectivities they comprise, come into the custody and control of archivists.

In the United States especially, there has been a firm demarcation between records managers and archivists, between active records and inactive and archival records, and between records in agency custody and records in archival custody. The history of these distinctions is grounded in methods developed by the newly established National Archives in the 1930's-1950's as it struggled to deal with the rapid growth in the size of the Federal record and the simultaneous collapse of agency record keeping regimes. The concept of the life-cycle became firmly ingrained in archival thinking in the 1960's, and the separation of the records management and archival professions became increasingly pronounced until ultimately the National Archives gave up records management when it became an independent agency in the late 1980's. The consequences were predictable because even in the paper environment, only good recordkeeping from the moment of records creation can ensure good evidence and good archival control. In the electronic recordkeeping environment, separation between stages in the management of records from the moment of their creation onward, will not be viable.

Traditionally, archivists and records managers could not keep track of each individual record in a paper system, nor provide access to specific records based on their contents or on the specific transaction in which they took part unless the recordkeeping system of the office of origin used that characteristic as the basis for arrangement or a secondary index for retrieval. Hence archivists and records managers schedule, appraise, accession or destroy, describe and retrieve collectivities of records, generally at the series level. Because this practice does not best satisfy many users, the recordkeeping professions have developed theoretical defenses for it, but it is preferable to accept the obvious - we manage paper records collectively because it is too expensive to manage them individually. In the case of the electronic record, the reverse is true. It will be both more efficient and less expensive to control and describe records at the item level from the moment of their creation than it is to try to carry over into the electronic environment the methods of the paper world.

These two general assertions about the need to integrate records management and archiving, and the desirability of managing electronic records at the item level, can be understood better by looking in more detail at the traditional life-cycle stages and how they are impacted by electronic records.

1. Appraisal & Scheduling

Our practice has been to schedule aggregates of records, based on examination of those records. When those aggregates were essentially homogenous, that is when the record series in which the records were kept consist of records of a single type of business transaction, this worked reasonably well. When the record series were in fact defined as a matter of convenience to cover a large variety of types of business transactions, it worked less well. Often, as with a so-called "correspondence" series, in which the form or genre of the communication, rather than the specific type of business transaction defines the series, our practice failed altogether to dispose of records at the earliest practical time or ensure keeping only records required for longer periods or archival purposes.

In the paper world we have blamed these problems on records managers. If the records managers set up record keeping systems to distinguish between records of different transactions and keep them in separate series, they could be appraised more accurately. But the records manager could justly reply that the physical filing series which are established need to support the on-going functions of the office of origin. Simply establishing new physical series for each type of transaction would often unnecessarily complicate that business process and frustrate its retrieval and use of records.

In a universe of logical records we can easily have the best of both worlds. We can develop information systems which support a variety of views of those records to support different users in the organization since "arrangement" is a matter of logical views rather than a single ordering of physical objects. At the same time, recordkeepers can dictate that each record must retain information about the specific transaction by which it was created, thereby satisfying our evidential requirements. Instead of appraising records, we can appraise the requirement for documentation of the activity or transaction that generated the record. Appraisal takes place in the aggregate still, now focused on the actions not on the documents, but control is at the item level where the effect of the appraisal is recorded in each transaction record.

Electronic records will not satisfy the requirements of evidence, unless they are irrevocably associated with the information about the context of their creation and receipt and information about their structure which is adequate for their subsequent reconstruction. This information (called metadata, or information about the data) is captured at the item level. Based on knowledge of the specific types of transactions which generated records, we can appraise their long-term value. When a transaction of that type takes place, its record can be made to carry scheduling metadata. In this way, scheduled retention can be associated with individual electronic records, based on the nature of the transaction and business requirements for its maintenance, from the moment of their creation.

2. Disposition

At some point, the archivist or records manager needs to act on the scheduling to effect the correct disposition (accessioning or destruction) of the records. In the paper records environment, collective level actions at this stage are often frustrated by item level exceptions. For example, a record series may be scheduled for destruction but individual records within the series may have been the subject of law suits that prohibit their destruction. In paper, the records manager must decide whether to separate a physical record from the series, thereby violating its integrity, or to leave it in place and create a problem in managing disposition later. Occasionally, a record series may be scheduled for archival retention but individual records within the series may need to be returned to their originators as proprietary commercial secrets or destroyed for reasons of privacy protections. Such differences in the needs of the series as a whole and individual records within it currently cause significant administrative headaches and are not easily accommodated, but they are also not rare events.

If scheduling metadata is kept with each individual electronic record, any given record can be retained or destroyed when scheduled without acting on others. Computing systems make it simple to "collect" all the items which are supposed to be disposed in a particular way and act on them even though they are not physically stored together. If individual records that normally would be destroyed based on the rules associated with them at the time of their creation are covered by a court order, for example, their identification in response to that order would prevent their destruction at the time the rest of the series was destroyed. Similarly, the rules associated with disposal could ensure that records could be easily separated from a series, as when contractual obligations require items to be returned to a commercial owner. The nature of the metadata link to the record is such that this separation could occur without necessarily eradicating all traces of the fact that the records which were returned once existed as part of the series. While the contents of records would have been removed, metadata reflecting the existence of a transaction could (and I believe normally would) be retained by the archival system.

3. Description

The practical response to providing intellectual control over large volumes of records accessioned from paper recordkeeping systems was to employ top-down, collective description of records aggregates. Archivist typically described record after they were accessioned, which meant that certain things about the records were known only at the collective level - such as their provenance, the order of their arrangement within the recordkeeping system (and from that, implicitly, what could be inferred about their use), and their physical location. Because these attributes were known about the entire series, collection, or fonds, they were recorded in finding aids at that level of description. The individual items (record and filing units), inherited attributes associated with the larger collectivity, which in turn was presumed to have the properties associated with any larger aggregates of which it formed a part, such as the collection or fonds.

Item-level methods of managing paper based descriptions would have involved massive redundancy and substantially greater administrative effort since all information associated with the fonds, series and file would have to have been to written on each of thousands of separate cards and file folders on which details about specific items were recorded. These detailed cards or folders would then be filed according to their item level content descriptions and the knowledge of their context would be carried by the redundantly recorded data. To some extent these practices were incorporated into registry office functions in an age of paper recordkeeping which passed in the United States more than fifty years ago, although it can still be found in some parts of the world.[3]

It goes almost without saying that automatic context and structure description within the metadata of electronic records at the item level would serve user needs better than collective description. Item level information is fundamentally more valuable because it can generate more valid collective level data in addition to serving the needs of item documentation. For example, archivists have described the date span of a series of records from the earliest to the latest, as in 1953-1987. In fact, of course, the record series may have individual items dating from December 11, 1953 to February 7, 1987, or only parts of the first and last years respectively. In addition, it may contain no records at all which are dated in the twenty months between March 5, 1976 and November 14, 1977. Automatic analysis of item level electronic data might reveal that eighty percent of the records were created one nine month period. Many such important facts about the collection are completely disguised by the collective date range but would be easily revealed by data recorded at an item level within an automated control system. A researcher could even see a pictogram of the dates of the records when asking about the series as a whole.

Such item level description, even of such a simple element of information as the date of specific records, has not been a regular part of archival practice because of the expense of acquiring such data in a paper environment, not because archivists did not realize that researchers would find such metadata valuable. Experience with researchers in some eighteenth and nineteenth century manuscript collections and registry systems which provided item or file level documentation has proved the value of such detailed description.

4. Retrieval and Access Control

But even where item level descriptions have been available, as in some heavily indexed early manuscript collections, paper based description systems allow researchers only to retrieve the records of a particular correspondent (or, if they were filed by correspondent, the researcher will have a burdensome task assembling all the correspondence written to a given person in a single week). Again, little needs to be said about how these difficulties are obviated in an electronic environment, nor about the benefits this would give to the researcher who, as always, is pressed for time and needs to use it efficiently. What might be worth noting is how much more efficiently this researcher could work. For example, carrying all the materials needed out to a researcher desk could be achieved more efficiently (without the airfare and hotel bill, with the opportunity for simultaneous use by others).

Item level metadata also enables us to provide or limit access to materials which, for reasons of security, confidentiality or privacy, can only be viewed by some people, at some times and with some content masked. In paper there is no easy way to manage such records with the rest of the series or to administer access control without item level review. For records maintained in electronic form, appropriate metadata provided at the time of records creation can establish conditions governing access to the whole or parts of a record and pertaining to different users in different ways, and automatically ensure that records "show" themselves differently, and appropriately, to each class of different users. Confidence that records can be managed in this way can be critical to deciding that they can be retained at all, because if records with proprietary or personal information cannot be assured of safety, they are likely to be destroyed while still of value for other purposes due to risks associated with keeping them over time in less easily administered access regimes.

II. An Item Oriented Approach to Managing Electronic Records

1. Requirements

Over the past several years, archivists have been seeking an intellectual framework that can dictate an effective strategy for electronic recordkeeping. During the past three years, I believe such a framework has been defined by me and my colleagues at the University of Pittsburgh School of Library and Information Science.[4]

We began by asking what archives are for, and answered that they preserve evidence of human transactions for reasons of organizational accountability and personal identity. We conducted a focus group of experts in archives and records management to define the attributes of evidential records. Then we sought "warrant" in the literature of law, regulation and professional best practices that could help us define the necessary and sufficient attributes of evidence and by analysis of such statements of "literary warrant" derived a specification of the attributes of "recordness" or evidentiality.[5] The two sources were found to be in basic agreement, thereby supporting a consistent statement of the functional requirements for evidence in recordkeeping.

The specification of these functional requirements defines twenty properties which are identified in law, regulation and best practices throughout the society as the fundamental properties of records. The literature of specifications recognizes the danger of natural language because it is often ambiguous, imprecise and subject to a high degree of interpretation. In order to ensure that systems would be able to rigorously enforce the assignment of these characteristics, we expressed the functional requirements in formal English as "production rules" or logical statements of simple observable attributes.[6] Because this forced us to avoid the ambiguity usually associated with requirements expressed as prose, the production rule formalism informed the process of articulating the prose requirements in addition to being used as a representation mechanism for those requirements. It also allowed the specifications to be logically refined, in a top-down decomposition, such that the most atomic component statements of the specifications were, in principle[7], observable states or properties.

The necessary and sufficient characteristics of data purporting to be records is a concrete set of metadata which, when present, satisfies the specification. By the requirements of evidence, if this metadata is inextricably linked to, and retained with, the data associated with each business transaction, it will guarantee that the data object will be usable over time, be accessible only under the terms and conditions established by its creator, and have properties required to be fully trustworthy as evidence and for purposes of executing business.

The functional requirements for evidence in recordkeeping[8] dictate the creation of records that are comprehensive, identifiable (bounded), complete (containing content, structure and context), and authorized. These four properties are defined by the requirements in sufficient detail to permit us to specify what metadata items would need to describe them in order to audit these properties. This descriptive metadata cannot be separated from them or changed after the record has been created. Several additional requirements define how the data must be maintained and ultimately how it and other metadata can be used when the record is accessed in the future. The metadata created with the record must allow the record to be preserved over time and ensure that it will continue to be usable long after the individuals, computer systems and even information standards under which it was created have ceased to be. The metadata required to ensure that functional requirements are satisfied must be captured by the overall system through which business is conducted, which includes personnel, policy, hardware and software.

This specification for evidence serves to identify the data required for such purposes as were proposed in last year's draft NIST standard for a "Record Description Record" or the recent report of the Research Library Group/Commission on Preservation and Access Task Force on Archiving of Digital Information.[9] The approach has been embodied in independent proposals electronic recordkeeping made by Astra to the Swedish National Archives[10] and in proposals influenced by our work from an influential group of Australian archives [11]. In addition, we have seen direct implementations of our work in projects in Indiana, Vermont, and Philadelphia, funded by the NHPRC as follow-ons to our study.[12]

It is important to understand that the requirements for evidence in recordkeeping are not the same as, the requirements for an electronic records management system. An application system will be implemented in a concrete place and time, and operated by real people in the course of specific assigned duties. As a consequence, requirements for system security (as contrasted with records security, or integrity), systems compatibility, interfaces and standards (as contrasted with records inter-operability and migratability), and support for concrete business processes of the records management function would need to be addressed in a comprehensive statement of the requirements for acquiring a records management system along with requirements of evidence in recordkeeping.[13]

We begin with a single, simple conceptual framework encompassing what constitutes a business transaction, evidence and an acceptable record:

Transactions

Transactions (trans-actions) by definition are actions communicated from one person to another, from a person to a store of information (such as a filing cabinet or computer database) and thereby available to another person at a later time, or communications from a store of information to a person or another computer.[14] Because such trans-actions must leave the mind, computer memory, or software process in which they are created (or must be used, "over-the-shoulder" as it were, by a person with access to the same computer memory), they must be conveyed across a software layer, and typically across a number of hardware devices.

Evidence

Not all data that has been communicated or created by information systems in contemporary organizations is captured as evidence. Information systems are generally designed to hold timely, non-redundant and manipulable information, while recordkeeping systems (information systems designed to capture and maintain evidence) store time bound, inviolable and redundant records. Therefore, application environments that support the ongoing work of the organization frequently, or even usually, do not satisfy the requirements for creating evidence. Recordkeepers need to provide in-house information managers with a rigorous definition of the distinct requirements for recordkeeping. Without explicit and testable specifications, computing application and electronic communications systems will continue to fail to satisfy the requirements for recordkeeping and will be a growing liability to companies even while they are contributing directly to day-to-day corporate effectiveness.

Records

Records are at one and the same time the carriers, products and the evidence, of business transactions. Any organization that wants to use electronic documentation as evidence needs to create records. Records oriented professionals within organizations, such as senior management, legal counsel, auditors, Freedom of Information and Privacy officials require records, not just information, to support their work. Business transactions must create records which logically are metadata encapsulated objects, although in our implementation model records need not be physically stored in this manner. In these records, the contents of the transaction would be preceded by information identifying the record, the terms for access, the way to open and read it, and the business meaning of the communication. Metadata encapsulated objects may contain other metadata encapsulated objects, because records frequently consist of other records brought together under a new "cover", as when correspondence, reports and results of database projections are forwarded to a management committee for decision.

Our concept of evidence makes it important to know when records were used and how, in what ways they were filed, classified and restricted in the past, and, if they have been destroyed under proper disposition authority, when and by whom that act took place. It is also important to know what redacted versions of records were released over time. Transactional data reflecting the history of its use (events in its life subsequent to creation), provides the documentation traditionally associated with archival description, but instead of such data residing only at aggregate levels, it is possible to define electronic records metadata structures that enable us to search for specific records based on information about the instance or concrete business transaction which generated them.

In addition to ensuring that the data we capture is a record, and can serve as evidence, metadata should be defined so that it makes data objects communicated across software and hardware layers (and therefore any communications over a network):

self-documenting
self-authenticating
self-redacting
self-migrating
self-disposing

These properties, while important for simplifying the management of records (especially in an inter-networked environment in which hundreds of millions of records may be created daily), can be made direct consequences of keeping records if attention is paid to the structuring of the metadata that makes records evidence. Appropriate metadata can ensure a degree of software independence.[15] In addition, an ideal model would ensure that all record objects we create would be interoperable between recordkeeping systems environments to give them independence of specific custodial settings. Furthermore, a system for metadata management which has appropriate modularity and content standardization can support formally auditing the business system which generated the information object. It can enable the auditor to locate the transactions and the software, hardware, procedures and policies surrounding a transaction, to determine where they contribute, or fail to contribute, to the creation, maintenance and use of evidence. While no system of management can be self-auditing, a communications system built to ensure that appropriate metadata is captured for evidence can support a level of management accountability that it was not possible to implement or enforce in paper-based environments.

B. A Reference Model for Business Acceptable Communications

A rigorous technical standard is required if we are going to implement the functional requirements for evidence in recordkeeping within all electronic communications environments in the future. The goal of such a standard would be to make communications received over networks trustworthy for the purposes of conducting business. It would be designed to ensure accountability and protect organizations against the risks of loss of proof of their past behavior. As a consequence, it would greatly simplify:

the management of huge volumes of communications from heterogeneous hosts, the proper retention and disposition of records, auditing the use of records for business, and the appropriate management of private, secure, proprietary or confidential data.

A side effect of such a recordkeeping standard is that it would enhance the business value of the data that it preserves. These business benefits would include:

providing data for market and other research.
documenting decisions, policies, events, etc.
documenting R&D and other business related processes.

For archivists, the most important consequences of adoption this kind of standard would be that electronic communications carried on in the regular course of business would always be captured in a way that was:

evidential cost effective to store, whether in centralized or decentralized repositories
securely controlled against alteration from the moment of creation
highly cost effective to migrate across software and hardware dependencies
supportive of automatic management of future access and use, including disposition and selective release

In December 1994, I proposed such a standard in a draft "Reference Model for Business Acceptable Communications" (abbreviated as BAC).[16] The Reference Model recognized that while the metadata requirements for evidentiality or "recordness" are necessary components of business acceptable communication, and must be accommodated by any reference model, they are not the only source of functionality of such electronic records. Requirements of object standardization efforts designed to provide support for a system of access and use rights management, networked information discovery and retrieval and registration of intellectual property have led to elaboration beyond the properties identified as necessary for assurance of evidence. Specifically, over the past eighteen months I have modified the model to reflect requirements being addressed by other efforts to develop widely applicable models for network metadata management, such as those which support:

registration of persistent addresses for information resources and intellectual property[17]
networked information discovery and retrieval[18]
a system of access and use rights management, as required by electronic commerce[19]

As a consequence the model now supports an environment of "registration" services for unique domain identification, "resolver" services for dealing with terms and conditions of access or use, and information discovery and retrieval services. These extra requirements are seen as positive and supportive of the overall tactics of the BAC model because they validate the assumption and represent commitments by other actors to participate in the scheme.

The emergence of a class of electronic commerce applications based on encapsulated and encrypted objects and token interchange between resolvers and object users (such as the IBM infoMarket launched in May 1996) further validated the assumptions I made in 1994/95. Not only are these classes of applications promising in themselves, I believe they are quite likely to be implemented in widespread network-based toolsets by Microsoft, Netscape, and others as well as to find their way into API tools and communications structures.

The proposed Reference Model for Business Acceptable Communications attempts to specifically address these additional requirements as part of a dialog that must take place between advocates of mechanisms to support these different fundamental purposes through an overall structure for metadata encapsulated objects.[20] It does so by clustering these metadata data categories and elements so as to achieve functional modularity [21] and then arranging them into six layers to support the technical processing and interchange requirements of a widely distributed networked environment. The six layers and their clusters are:

Layers and Data Clusters in a Proposed Reference Model for Business Acceptable Communications

Handle Layer

Registration Metadata/Properties

Record Identifier

Information Discovery and Retrieval

Terms & Conditions Layer

Rights Status Metadata

Access Metadata

Use Metadata

Retention Metadata

Structural Layer

File Identification <Repeatable for each file>

File Encoding Metadata <Repeatable for each file>

File Rendering Metadata <Repeatable for each file>

Record Rendering Metadata <for whole record>

Content Structure Metadata

Source Metadata

Contextual Layer

Transaction Context

Responsibility

Business Function

Content Layer

Content-Description

Use History Layer <Repeatable Transaction Table>

III. Implementing Item Level Control in Electronic Recordkeeping

However attractive item level control might be in principle, archivists and records managers need to understand how it will work in considerable detail before they are likely to commit themselves to trying to bring it about. The following discussions of implementation approaches, and options, therefore addresses how:

business transactions, retention requirements, and structural requirements are documented,
metadata and records are stored,
records are uniquely identified,
disposal and migration are managed,
a record is captured from the business information system to the recordkeeping system,
access control and redaction are imposed,
the records are retrieved and delivered to users, and
a history of use is kept for uses which require it according to business rules.

1. Capture

A trans-action is communicated from one physical or logical place to another, whether it is from one person to another, one hardware/software machine to another, or both. As such it crosses a logical switch, and when it does so, it can be captured. What a business considers a transaction, we have called a "business transaction" and the Swedes have more recently dubbed a "causa".[22]

Electronic recordkeeping require us to distinguish between computer transactions and "business transactions". Most existing information systems are designed to update computer record for software transactions which have no business meaning, such as background saving of a file on which someone is working or during the spelling check through a long document, but will typically not create a record of common business transactions which do not change data in the system. Yet some such transactions, such as querying a decision support database, probably do require evidence under most definitions of what constitutes a business transaction. Implementations will need to impose the concept of business transactions, rather than that of systems transactions, on their environment.

Every time a business transaction crosses such a 'switch' implementers will want to create a record of the transaction. This record will consist of the content of the transaction encapsulated with metadata, while allowing the data and systems instructions created by the application to be communicated within the information system where it will do the work of the application and be available for further manipulation. In other words, the data in the information system continues to act in the way the application designer intended (updating databases, being available for users to store as information copies, etc.), but from the perspective of the recordkeepers, all data resident in application system becomes a convenience copy, rather than a record, and can be modified under the rules of those systems because the record exists elsewhere, as a separate object, which is not subject to modification.

When users generate a "Business Acceptable Communication", consisting of content encapsulated by all the metadata necessary to ensure its integrity and longevity, the record should be split off from the application systems environment and sent to a separate recordkeeping system or API layer recordkeeping service where it will be kept intact. This means that systems implementers need to construct 'traps' in which they can capture the business trans-action along with the metadata required for evidence. Most of this data, such as the time of the transaction, the identity of the sender and recipient, and the structural dependencies of the data, can be readily adduced from information available to the application and operating environment. The issue is how to generate, and capture, the metadata which identifies the business transaction-type or task of which the record is evidence.

In structured applications, each application system task can be identified and appropriate metadata defined for any transaction resulting from the task. Capture can easily take place using escape code sequences attached to each application system task. In unstructured business uses of application "utilities" such as word processors, email systems, scheduling facilities or spreadsheets, identification of the business transaction in which users are engaged is more difficult. Implementers will require some cooperation of users, although they can enforce this cooperation and make it minimally intrusive if they are clever. The basic strategy is to capture the requisite metadata items by assigning them to forms, style sheets, distribution lists and other objects created by the application software and used for specific business purposes. Users are then encouraged, or required, to employ the relevant "style sheets" and other settings in the conduct of what was previously unstructured activity.

At its most permissive, implementers provide users with value-added functionality launched by business process icons located in the user interface along with the familiar software application icons. The choice of business-process methods would launch capture routines for the resulting transaction and users would be enticed to use the value-added services by their benefits. Fortunately, 1996 seems to be the year in which workflow management tools with object oriented metadata assignment finally come into their own which, for archivists, means a variety of off-the-shelf applications that can be set into user interfaces between users and the software environment in which they work. These interface managers can make the off-the-shelf application software appear to be a series of business applications and to label the records communicated by conducting those business transactions according to the retention, indexing and access requirements of the underlying business requirement. Prototypes of such implementations, which can even be placed over legacy systems, have been done for John McDonald at the National Archives of Canada, and by many others. [23]

Sophisticated approaches to "automatic" metadata capture would provide icons representing the business tasks in which a user may engage, based on process data models and business rules of the organization, rather than icons representing software applications; user selection of tasks would assign metadata to the objects created by the application. For example, a manager drafting a "directive" would open an icon for "Directives" rather than for "word processing". The "Directives" icon would run a configuration "client" designed to open appropriate software applications. The client executes a kind of "macro" which configure the application software in a way that utilizes its style sheets, self-documenting features, views, device drivers, etc. for the particular business function in which the user is engaged. Thus, in the case of our "Directives" client, it would call up the "Directives" style sheet. When the "Directive" is sent, it would pull up the correct distribution list from the company databases and send the directive by email, fax, internal mail, etc. based on settings in that database. It would automatically schedule retention, file the directive in the organizational directives series (perhaps indicating the obsolescence of the prior version) and otherwise execute the business process needs and rules that should be imposed based on the "Directives issuing" process.

Such "clients" also provide the metadata necessary to identify the business transaction when a record of it is created. In a more rigid implementation, we might allow access to application software functionality only through software clients launched by icons in the user interface. In either case the transactional locus metadata, and metadata dependent on that information, such as retention period, access and use restrictions, filing rules and structural metadata are embedded in the selection of the business transaction icon/settings without explicit definition by the user.[24]

A variant of this option is being employed in "Intranet" implementations in which corporate users employ functions provided by the action office to make requests for services. Because the action office defines the types of requests, it can embed metadata into the resulting records. Likewise, in a distributed filing environment, records filed in certain places and under particular headings would be given metadata attributes upon arrival at the filing server application. Records deemed to be lacking appropriate metadata to leave an organizations' boundaries, or even to pass outside the LAN serving one work group, could be assigned those attributes or be returned to sender to provide the necessary descriptors.

2. Documentation

Implementing electronic recordkeeping means ensuring that metadata associated with records provides adequate evidence of electronic transactions at the level of the specific transaction within a defined business process. As such, recordkeeping systems need to interface with business process models to capture business transaction identification data. Some of this data, as discussed earlier, can be obtained from users when they sign on to the system, while other information such as the identity of the business task, authorizations, and the terms and conditions associated with the record of the transaction, must be brought into the system by actions taken after signing on, as discussed briefly above.

It would be possible to design application software that recognizes, or could be set to accept, definitions of business transactions boundaries but the differences between organizations would likely make implementing such software complex and maintaining its knowledge of local business processes costly. Already, organizations have certain parameterized features of application systems that can be employed to ensure the satisfaction of some of the functional requirements for recordkeeping. For example, word processing systems can support corporate record creating requirements if the users of such systems exclusively employ style sheets defined in such a way as to distinguish between transactions based on their process location and business purposes. Geographic information systems often have reporting features that allow the user to create output files of all the relevant layers of data incorporated into a query response. But these are rarely implemented due to the human costs. Instead what is needed are automatic means of labeling records with contextual and structural metadata.

However we go about it, we will want to document four aspects of the record at the time of capture:

First, we want to assign as much as possible of the contextual metadata to a transaction record based on knowledge given to the system by the user during the routine process of getting to the position that the transaction can be executed. This means capturing who the user is (and hence the full organizational context) through user sign-in. It means capturing what the user is doing (and hence the full procedural or business context) through a combination of the system functions chosen and the users explicit selection of a business process in order to facilitate its execution.

Second, we want to assign as much as possible of the structural metadata to a record based on the choices we have made about the format in which to capture records. The relevant structural definition is not the definition of the dependencies of the application environment itself but rather the dependencies of the recordkeeping structure. Thus we could save records from a proprietary application systems in a widely accepted standard, such as SGML, and capture all relevant metadata simply by recording that, and any publicly registered DTD that was being followed.[25] If the applications software we are using does not offer the option of saving in SGML, at least we could make a record in RTF or PDF (or the most robust standards to which our current application could natively write). In all cases we would document the dependency on the standard, in the record metadata.

Third, we want to capture as much of the content metadata as possible through automatic means and make sure it is a independent of applications as possible. By using features of the application environment, for instance "declare" type functions by which one application informs another of its content structuring rules (data definitions and variable values), we can get the system to incorporate content metadata in the content layer.

Finally, we need to generate the business rules for keeping, providing access to, and managing these records over time and make sure that they are executable. Some of this information, such as the retention rules which govern keeping and destroying the record, can be made to come along with the identification of the correct transaction if we build business process models with appropriate retention rules linked to their records. Other information, such as the presence of proprietary, private or secret information, cannot be linked simply by the kind of transaction (although some transactions cannot have some of these classes of data in them). Here we need rules by which to recognize, and flag, portions of records that require special treatment, or authors would need to do this as part of the initial business transaction.

These steps in capturing metadata, together with metadata inherent in the communication process such as the date/time and identity of the recipients, makes up all that is required to ensure the evidential character of the records at the time of their creation. It should be noted that not all the metadata required for recordness needs to be recorded in detail in each record. For example, audit findings pertaining to compliant organizations and accountable systems are themselves business transactions of the organization. Transactions created between such audits need only cite the audit (e.g. meta-meta-data) to document that the system which created them was compliant; they need not carry all the metadata of such an audit. Similarly, much detailed structural information is contained in volumes of technical compliance tests for standards. Records conforming to standard structures may therefore be documented by reference to such external standards and need not carry all the metadata items required to specify the standard itself.

3. Storage

To be evidence, records must be inextricably linked with their metadata and inviolable in their content for as long as they are kept. Where the are, physically, is irrelevant as long as they are properly protected and controlled.

No specific computing model must be employed in the maintenance of recordkeeping systems although it may seem that the discussion of communicated transactions to this point has used the terminology of object orientation. The content of the record need not be physically stored in the same place, or same computer record, as the metadata, but storing all of the metadata with the record content in one encapsulated object so that metadata is always stored and transported with the record simplifies the long-term management. The encapsulation approach also has the advantage that a record, when retrieved, is physically self explanatory. A perceived disadvantage of encapsulation is that a considerable amount of redundant metadata is stored with each record, adding to the overhead associated with every action taken. However, models of likely metadata content which I have developed strongly suggest that this overhead will, in normal business environments, be trivial with respect to the size of the data content. I believe the advantages of not having to worry about integrity of pointers in separate structures will make encapsulation a better option than keeping the redundant data in separate, relationally linked, files.

Nevertheless, records can also be stored in standard relational, hierarchical or even flat database management systems. While this approach avoids the overhead associated with communicating records, it requires more sophisticated management over time since assurance must be provided that neither the record, nor its associated metadata, could have changed. If we store the metadata elsewhere than in an encapsulated object, it will need to contain a pointer to the content and a signature (hash) of the contents, and the content will need to contain a pointer to and hash of the metadata. Even with such cross pointing and cross hashing, database integrity and security will need to be ensured by the system over time as part of the control environment. Some implementations to date seem to be choosing a tactic between these two extremes.[26]

The model itself allows for either centralized or decentralized custody. The declining cost of decentralized storage and the rising cost of central backup and hot site management suggest however that the solution of centralizing custody in an electronic environment lacks the major benefit associates with centralization of storage in paper-based environments. My findings in contemporary organizations are that unused storage on desktops far exceeds the capacity (already nearly full) of central storage services. It is possible to develop corporate recordkeeping strategies that use such completely distributed custody by taking control over unused desktop disk space for corporate storage. In such a model, records must be protected from destruction by ensuring that they always exist in multiple copies, preferably in topographically independent LAN's, so that the disappearance of one of these copies from the network (as when a local machine has been brought down for servicing) automatically results in the creation of another copy somewhere else. In most topographies, three copies kept in systems that are independent of each other and are not susceptible to the same natural disasters or human interventions, with an object status monitoring directory in a fourth location, would ensure protection equivalent to that of central storage with off-site backups and a hot site, at a fraction of the cost.

Of course, records also must be protected from change. Regardless of how they are stored, only copies of records should be given out to other systems, and as soon as they are opened they need to lose the validation bits which certify their recordness. These validation bits could be the leader of the encapsulated object which is managed by its "read" applet and/or a "certification" of authenticity given out by an external agent. The same function can be performed by cross-hashing records stored in linked relational tables; each part of the logical record which is stored in a discrete physical record, needs to carry the hash of the content, and the content needs to carry the hash of the metadata. In this way neither can be altered without the fact of alteration being apparent.

4. Identification

One of the potential advantages of item level control and especially of metadata encapsulated objects all of which conform to the Reference Model for Business Acceptable Communications, is that records from everywhere, including records from more than one records creating organization, can be stored together in a simple and uniform recordkeeping environment. Therefore, it is a major concern how the record identifier uniquely assigned by one domain is guaranteed to be unique when the object is incorporated into a universe in which identifiers assigned by other domains are present. We know that uniqueness can be ensured by combining a unique identifier within a domain with a unique identifier for the domain. Practically speaking, however, how can we ensure that domain identifiers will be truly unique to a person or organization? This issue is being addressed by the Internet Engineering Task Force and others who are assessing schemes to "register" domain identifiers, or issue them without serious overhead. Because billions of unique business transactions will flow through worldwide communications systems within and between organizations and between individuals and/or computers daily, it must be possible to uniquely identify them all.

Below the level of domains, there are two different strategies for identifying transactions. We can give transactions an identifier based on an analysis of the business context in which they were created, or we can assigned them an arbitrary identifier, consisting of a sequence number. Arguments can be made in favor of either approach, although I believe that analysis of business function, process, activity and task, which is necessary in any case to appropriately document the transaction, should be encoded here so that retrieval of objects by functional provenance, which will be quite important during the active life of the record, is facilitated. In either case, a time stamp will ensure complete uniqueness and assist in subsequent retrieval.

5. Management

Because we have found that the most practical approach to management electronic records is to capture contextual metadata at the item level, terms and conditions for access and use can be determined either for all instantiations of a particular type of transaction (task) or for a single item, and will be enforced at the item level in either case. For example, retention rules will be the same for all transactions of a particular type, but disposition can be automatically determined based on the specific date or recipient. Because every transaction creates a record, we can obviate the need for such costly processes as item by item review of case files in order to remove records used in a subsequent transaction, such as a court case. Instead we destroy all the original records as intended by their retention schedule, secure in the knowledge that a record copy is incorporated within the transaction that reviewed the files under the discovery process or in the management meeting which considered them. If the use of appropriate style-sheets or forms is enforced in business transactions, we could not only further segregate out of a series those records involving attorney client privilege, or containing confidential medical or proprietary information, but also provide for automatic redaction of records based on the profile of the potential future user.

Archivists and records managers with whom this model has been discussed extensively during a recent series of workshops in Australia tended to agree on a set of business rules for recordkeeping that would be likely to be put into effect as methods governing metadata encapsulated objects ( MEO's). These included:

when records are opened in the course of a business transaction, the method makes a convenience copy lacking "recordness" bits and retains a full "evidential" copy with "recordness" bits for incorporation into the resulting transaction (this was called big fish contains little fish). when records are "deleted" under records retention schedules, the contents of the record and the structural metadata and terms and conditions of access and use are destroyed, but the handle, context and use history are not, and a final transaction is added to use history to document the rules under which the disposal took place. when records are incorporated into other records, the terms and conditions for disposal of the parent record govern the incorporated records, but the terms and conditions for access and use in the original records still apply to their use within the subsequent transaction. when records are released under restricted terms and conditions, either because access to them is limited to a specific class of people or because view or use restraints are placed on released copies, the use and user are recorded in the use history of the record being released in addition to the actual content released being incorporated into a new transaction record.

As a practical matter, we must also develop means to monitor metadata values in order to make the necessary software migrations at appropriate times in the life of records. If the records are kept as encapsulated objects, a secondary index may be desirable. Of course the most important management issue faced in migration is not just to migrate records to new structures before the old ones are no longer supported, we need to make good decisions about logical mappings in order not to introduce too much noise with every migration and ultimately lose the message in digital copying as surely as with did with multi-generational copying of analog messages. Needless to say, some people also worry that these software migrations, if they continue to need to be done as often as once a decade or more, will become too costly to support and that as a consequence some records of value will be abandoned. Within the environment in which recordkeeping takes place, stringent approaches to configuration management will be essential to ensure that record documentation retains critical usable metadata.

It is noteworthy that the proposed approach to archiving and maintaining business acceptable communications does not require us to include information about physical formats and media within the record metadata. Rather the environment in which records are kept will need to be one in which managers move data from one medium to another as required to assure backup and preservation of the data. Because it is presumed that media that are currently supported will always be used and that data transfer to current media will take place in the normal course of operations, we assume that operational data management systems will be employed at all times to keep track of physical locations of data. Failure to track media, or refresh data to new media, will of course lead to a loss of the ability to read it at all.

6. Access control

The definition of a standard for Business Acceptable Communications (BAC) presumes the existence of software and services that can use the metadata which is associated with a BAC object. "Resolver" services, for example, are envisioned to translate Terms and Conditions metadata into concrete prices, permissions, and data views. The presumption is that Terms and Conditions metadata will be expressed in abstract categorical terms, rather than in concrete terms. Thus rather than stating in metadata that the retention period of a record is a specific date or a number of years from creation, the sophisticated user will place into the metadata the rule under which disposition should take place, so as to accommodate future change. Similar resolvers can negotiate other access and use metadata to accommodate such changes as inflation in prices, dilution of restrictions based on elapse of time since the transaction, differences in access rules based on specific characteristics of the requester, or re-assessment of risks associated with secrecy or confidentiality. The "resolver" must be put in place by the owner, creator, or manager of a record, and it must be maintained so that it correctly enforces rules. It is presumed that resolver applications will be maintained by those interested in restricting rights. As a practical matter, their on-going operation can be ensured by establishing a socio-legal mechanism that allows users access to the records if, after seeking out the resolver where the record indicates it should be found, they find that no restrictive permissions manager is operating.

Strong pressures are operating in contemporary organizations and society to encourage the encryption of communications. It may be that organizations will continue to feel that communications need to be encrypted, even when they are encapsulated. While resolvers could, in principle, hand users back decryption keys, it would, in my opinion, be extremely dangerous to permit objects being archived for evidence to remain encrypted when written to recordkeeping systems because the encryption method becomes a dependency and is very likely to prevent records from being accessed over time.

7. Retrieval and delivery

Archivists and records managers will not be able to design good retrieval systems, or define the most important retrieval metadata or how to represent it, unless they study the questions which potential users ask.[27] For the time being, archivists will doubtless continue to employ both content-based and provenance-based methods of discovery because they don't adequately understand which, if either, better satisfies users needs. [28]

The metadata required for evidence specifically requires (provides warrant for) documentation of the business or functional context of records creation. If intellectual control is provided based on functional analysis of business transactions and the relationship between such functional " competences" and the structural units of organizations, the critical issues for recordkeeping will be how to best represent that knowledge. As is clear from the various models (RAD, ISAD) adopted as standards in recent years, items inherit many of the properties of the provenance that created them. Keeping that data in the item solves numerous problems that have in the past been associated with changes in structures and functions over time. Appropriate ways of representing such knowledge so that structure and function are independent of each other have recently been proposed by Chris Hurley and were explored a decade ago in articles by Richard Szary and myself on multiple independent authority files.[29] It is evident that archivists and records managers will need to maintain such "finding aid systems" to facilitate searches independently of the records themselves, even if the metadata about context is present in the records.

It may also be necessary in the future to search for records that satisfy criteria based on their content, even though this is not essentially a recordkeeping requirement. The Reference Model for BAC is designed to hold metadata that can satisfy such requirements but it is not currently populated by metadata designed to support Networked Information Discovery and retrieval (NIDR) such as that recently proposed by the Coalition for Networked Information and by the U.S. library community.30 Since the volume of records that are created has always defied cataloging individual records and the content description of records, which are not created to be about their content but rather as a consequence of business transactions, tends in any case to be either misleading or inadequate, we may find that these metadata fields are rarely populated.

Ironically, archivists will be able to focus more of their attention on content based access because the provenance based issues can be resolved by implementation of the business process analysis methods required at the time electronic records are created and by the assignment of contextual metadata at the item level. Not only will archivists be able to provide full-text searching of the contents of textually based records, they should be able to augment content-based access by using metadata needed for structural and contextual evidence. For example, we know that the meanings of words in a document are dependent to a great extent on where they appear; each genre of document has different structural elements and the occurrence of a word in one indicates that this was the author, while in another location within the document it references the recipient, the subject, or the object of the transaction. In addition, words having meanings within particular business contexts, and it is quite different if we find a statement that indicates a job was done poorly in a reminder from a colleague, a performance report by a superior, or an audit report by an outside agency. These differences in meanings reflect the fact that words fill out a "frame" (to use artificial intelligence terminology) of discourse in a given domain. Retrieval systems could bring domain knowledge and document genre knowledge into play in content searches of metadata documented objects because the references to business context and document genre are present. This should make the otherwise haphazard aspects of full-text searching much more rigorous. First, domain knowledge can bring the correct thesauri into use in term lookup. Second, domain knowledge and genre knowledge together can suppress meaningless terms and weight meaningful ones in full-text. Third, domain knowledge can provide interpretive frames in which the words used in a particular document can be properly related to each other. Finally, genre knowledge can locate the parts of the record content that should be the source of retrieval terminology.

8. Use history

Recordkeeping systems will store and provide access to metadata-bound evidence.

Sometimes the purposes of such access will be to make use of the data content of records in subsequent business transactions which create their own records. When needed for these purposes, records from recordkeeping systems may be copied to information systems which require their content, but the record itself will never be deleted from, or changed within, the recordkeeping system except with specific records disposition authority. In addition, when copied to an information system, the record will lose its "recordness" and becomes just information, available for incorporation into a new transaction. These transactions will take place through application systems, which like most information systems, are not designed to make or keep records. They will keep data of use to the ongoing work of the organization. Incorporating a record as a record into a transaction (as when records are attached or forwarded to a further authority) is a different business transaction than the original. Engaging in a transaction that incorporates previous records will therefore result in the creation of a new record which will have different business rules associated with its scheduling metadata. Because the creation of records should always result in attaching metadata to them describing the time, place and circumstances of their creation and receipt and their contents subsequent transaction can effect scheduling without any change in the basic system logic and without need for post-hoc, sub-aggregate level re-appraisal by archivists when, for example, a relatively routine record that would have been destroyed in six months, becomes the object of an audit or legal proceeding.

Sometimes the purpose of access is simply to view the records outside of the business purposes of the creating organization. Traditionally such reference uses of archives have not created new records, although logically they are the record of the use of the archives which is itself a function of the organization. In an evidential environment, viewing a record in conjunction with a business transaction creates a new record for the recordkeeping system and leaves a transaction trail in the original record. This is far more documentary than we have been in the paper age, but the rationale for it is consistent with evidence as well as reflecting realities of the electronic environment. For example, organizations which are succeeding in implementing executive or management decision support systems used to have mid-level managers who wrote reports on issues of corporate strategy. These reports would then go to the executive who would make decisions based on them. Now the same executive can query corporate data bases, display results in graphic form, and make immediate decisions based on these queries without any record being retained of the basis for the decision.

Implementers will recognize that when a user requests a record, a copy of that record is passed to the information retrieval sub-system, but if the user opens the record contents under the control of another application, the contents are incorporated within the application in which he or she is working and while the contents will become part of the contents of a new transaction, the record of the prior transaction will not be. If the user intends to append or forward a record, they will do so in a step that does not involve opening the record. They will need to invoke an applet that in incorporates an encapsulated version of the record within the current transaction.

One of the most exciting aspects of item level control using metadata encapsulated objects is the opportunity it provides for allowing copying of archival records while still monitoring, and if desired, limiting, uses. This permits archivists and records managers to maintain control over records, from the moment of their creation, while permitting distributed custody and enabling distributed access. Users can obtain copies of records, and once cleared to use them by the terms and conditions resident in the object metadata interacting with the resolver, will be able to open the record, and exploit its informational value, without risk of confusion between the manipulated information and the real record. Multiple convenience copies of this sort can be kept under little or no control without threatening the integrity of the actual record. From the point of view of the active records manager, the fact that all such information is a convenience copy means that it can be destroyed at any time by individuals who don't need it any longer. Implementations could be devised to allow such informational copies to be kept past the period that they would normally be destroyed under retention requirements simply by devising a method that jettisons the record metadata at the point of expiration of the record on which the information is based. This deprives the information of its warrant but ensures that no copies of records, however non-evidential they have become, could survive past the scheduled retention.

In discussions with recordkeepers in Australia in June/July 1996, it was clear that different business rules operate in different organizations to govern whether, and when, use history transactions are written to the originating records. There was agreement that redacted releases, transactions involving filing and indexing, and dispositions under approved schedules all required use history documentation. These actions could be described as recordkeeping transactions of little importance in themselves but considerable impact on the future meaning or uses of the original record.

A second, quite different class of transactions was identified as possibly requiring use history. These transactions were highly consequential in themselves and were considered to be worthy of recording in the original record use history because they reflected on such a significant use. Examples of such transactions were files that were sent to the highest level of the organization (such as the board of directors) for approval or review and/or records involved in lawsuits, commissions of inquiry, and other externally driven discovery processes. Doubtless each organization will devise its own rules about what types of uses to document. Perhaps the profession will eventually reach consensus on best practices in this arena.

IV. Challenges in Item-Level Electronic Recordkeeping

1. Technology Challenges

Can organizations implement architectures that remove records from active information management systems at the point of transactions and satisfy the functional requirements for evidence in recordkeeping by capturing and retaining metadata required to reconstruct the context, structure and content of all transactions?

I believe that they not only can do this, and do it for legacy systems as well as new systems, but that the adoption of the Reference Model for BAC within a single organization, together with management of resolvers within that organization, is as practical as universal adoption and does not depend on it. Widespread implementation could come about either by each organization writing and implementing its own methods for capturing metadata and encapsulating objects or by archivists and the business community insisting that software developers and networks implement standards such as the Reference Model for BAC. Since metadata content must either follow an external standard or contain its own declarations (e.g.,. meta-metadata), it would be greatly more efficient for the society at large if instead of requiring individual organizations to implement systems in ways that supported the requirements for evidence, a standard for communications could be adopted that placed the burden for creating metadata encapsulated objects on the application software and network software developers.

However, even if only records conforming to the BAC model were permitted to travel between networks, and even if API and network software developers built all the needed tools for implementing metadata encapsulated object stores, the major tasks facing an organization intent on implementing the model would be the same:

First, records have to be captured with metadata documenting the nature of the transaction and the business consequences of having a record of this type. This requires up front analysis of business processes and the implementation of information systems with appropriate transaction-stamping both for routine and non-routine applications.

Second, resolvers need to be maintained to enforce business rules for captured records.

Third, the authority will need to build finding aid systems containing metadata and content analysis to support queries by both the staff needing access to current records and others desiring access to inactive records at a later date.

These three requirements will have significant impacts on the profession, because they transform what recordkeepers actually do.

2. Professional Challenges

Will recordkeepers be willing to fundamentally transform their business processes and radically change their day to day jobs? Will archivists be able to see their future as managers of virtual records?

Item level control from creation obviates the need to accession, arrange, process, and describe records after "transfer", or to even take custody. These are the processes which have, historically, been the largest part of the archival and records management role and consumed the greatest resources. On the other hand, item level control forces archivists to invest in the front end tasks involving not just scheduling and appraisal but also definition of requirements for description and access. These responsibilities have traditionally been associated with records management, and records managers have by and large not been able to successfully do them. If archivists take on the new tasks, they will find that their role is more one of steering than of doing. And they will come to recognize that they need new skills in systems analysis, business process analysis, and knowledge representation in order to be of assistance to the line managers who will be responsible for and accountable for, good recordkeeping.

Archivists will also need new allies and will need to stop thinking of themselves as "information professionals". Archivists are recordkeeping professionals and that their organizational allies are senior management, lawyers, auditors, Freedom of Information Act and Privacy officers, and citizens who need records to substantiate claims, rather than information professionals. The shift away from the information professions will need to be pronounced in order for people in the organization to understand the rediscovery of records, but it will not make their lives as users of, and dependents on, electronic information systems easier. As recordkeeping professionals, archivists will need to forge peer working relationships with information technology professionals to successfully identify electronic record dependencies and make records migrations work, and to maintain the front-end software that supports the capture of records, but they must take care not to confuse their goals with those of IT professionals..

In the long run, archivists and records managers will find that they can give the rear-end tasks of providing access to librarians and information service centers whose mission it really is and focus their attention on generating the metadata on which successful retrieval will be based. This metadata should support access either by provenance (contextual metadata) or by content (structural metadata and full-text analysis), and should be assigned to items based on front-end practices. Given distributed metadata encapsulated objects and the resolvers with built in access and use rules others can easily provide good information retrieval services to clients without the assistance of recordkeepers.

A framework for this new professional role and philosophy is being articulated by faculty of Monash University in Melbourne Australia where Frank Upward, Sue McKemmish, and their colleagues Chris Hurley and Barbara Reed are proposing to substitute a "records continuum" approach for that of the traditional "records life-cycle".[31]

Records Continuum model as adapted from Frank Upward [32]

The Records Continuum Model is constructed around the assertion that management of the record is a continuous process from the moment of creation. Management issues arise from four dimensions which are not related to the age of the records but to the point of view of the observer. My presentation of their position here reflects my own use and elaboration on their insights as they were developed during the Monash University sponsored workshops in Melbourne and Canberra in June and July 1996 and is not the orthodox position as represented in their own work, but it shares with them the view that traditional life-cycle frameworks focus professional energy on tasks that archivists and records managers engage in over the life of the record but that these do not add value to the record and make a fundamental distinction between the pre-archival and archival "life" of a record which is misleading. In the continuum model, the record comes into existence at the moment of the transaction and requires continuous care from that time forward until its disposal. It does not pass through phases, although issues in it on-going management can be understood as reflections of its life in four dimensions:

The first dimension, to which I have given the name the Event dimension, consists of the act, the trace, the instrument and the information. In this dimension, the transaction has yet to take place.

The second dimension, to which I give the name the Documentation dimension, is likewise characterized by four attributes along the same dimensions: the act becomes the business transaction or causa; the trace becomes the evidence; the instrument becomes the competence; the data becomes the record. In this dimension the act is witnessed by the system and the transaction becomes evidence.

The third dimension, to which I give the name Risk, is characterized by function, corporate memory, organization, and recordkeeping system. In this dimension, the record is appraised by the organization and either kept or destroyed.

The fourth, or societal, dimension has the attributes of purpose, collective memory, domain, and archives. In this dimension the society gives meaning and institutional form to its record.

The attributes are themselves related along spokes or axes which are called the evidential axis (trace, evidence, corporate memory, collective memory), the transactionality axis (act, causa, function, purpose), the responsibility axis (instrument, competence, organization, domain), and the recordkeeping axis (data, record, recordkeeping system, archives). It may be easiest to understand this in a graphical representation, below.

The records continuum is a pedagogical and conceptual framework that can help reunify recordkeeping around its proper focus, the documented event. As such it supports the control of records at the item level as described in this paper and places the issue of control of the record from the moment of its creation within the context of the event that gave rise to the record and the organization or person whose activity it documents. It places the archival and records management tasks conducted by any given organization into the context of the society as a whole and the evidence of an act. Along its various dimensions it informs a novel view of the nature of recordkeeping activity and its social purposes. It seems to me a further example of how the intersection of traditional archival methods and the requirements for electronic recordkeeping has yielded useful and interesting new perspectives on the nature of archives and their management. One of the most important of these, I believe, is the return to item level control.

FOOTNOTES

1. The author wishes to thank participants in the Monash University, `Managing the Records Continuum' workshops, held in Melbourne and Canberra in June and July 1996 for their help in forcing the clarification of ideas in this paper, and in particular to Sue McKemmish, Frank Upward, Barbara Reed, Chris Hurley, David Roberts and Adrian Cunningham for reviewing earlier drafts and for their cogent criticism. My thanks also to Lisa Weber for a critical review of the penultimate draft which I believe helped to greatly improve it. A version of this paper was presented at the annual conference of the Society of American Archivists, in San Diego, August 29, 1996.

2. Sue McKemmish in "Are Records Ever Actual?", ]The Records Continuum: Ian Maclean and Australian Archives First Fifty Years[, ed. Sue McKemmish and Michael Piggott, Ancora Press, Melbourne, 1994, pp. 187-203 has recognized that paper records are also, and should be understood as, virtual things, but this sophistication is rare indeed.

3. I was delighted to see a standard Australian registry file folder, pre-printed to support the recording of file and item-level "metadata", at our workshop in Canberra this year. It both concretely illustrated the practical problems associated with managing paper records at the item level, and at the same time served to identify the specific metadata items thought necessary for evidence by a traditional registry office.

4. NHPRC grant (#93-030) "Variables in the Satisfaction of Requirements for Electronic Records Management" see http://www.lis.pitt.edu/~nhprc for a specification for the full requirements of evidence in recordkeeping.

5. David Bearman, ]Electronic Evidence: Strategies for Managing Records in Contemporary Organizations[ (Pittsburgh, Archives & Museum Informatics, 1994)

6. David Bearman and Ken Sochats, "Formalizing Functional Requirements for Recordkeeping" unpublished draft paper included in University of Pittsburgh Recordkeeping Functional Requirements Project: Reports and Working Papers (LIS055/LS94001) September 1994

7. I say in principle here because this is not a finished and fully validated standard and has not been subjected to the kind of testing that would give complete confidence in the specific low-level observables. Since the discussion here is more about strategy, however, it should be stressed that as a strategy the method seems fully proved.

8. op.cit, fn.4

9. www-rlg.stanford.edu/ArchTF

10. Ulf Andersson, "SESAM. Philosophy and Rules concerning Electronic Archives and Authenticity" (ASTRA AB, 28 Feb.1996) 86p.

11. Australian Council of Archives, "Corporate Memory in the Electronic Age: Statement of a Common Position on Electronic Recordkeeping", May 1996.

12. "Functional Requirements for Evidence in Recordkeeping: Invitational Meeting. University of Pittsburgh, February 1-2 1996", Archives and Museum Informatics, vol,.9#4, p.433-437

13. As an exercise in gaining a better understanding of the difference between requirements of the property of "recordness" and requirements of the application of records management, many of these further requirements were identified in the Monash University workshops in Melbourne and Canberra in June and July 1996. I believe this confusion between records management systems requirements - that is the requirements of an application system designed to support the functions assigned to records management offices and archives - and the requirements of recordkeeping, has confused the debate over many years. It was one of the difficulties faced by the SAA CART committee which was charged with both electronic records and applications technology, and it currently confuses the work of the Records Management Task Force of the U.S. Department of Defense, which is designing a records management application system but slips into trying to impose those requirements as the requirements for recordkeeping.

14. David Bearman, "Electronic Records Management Guidelines: A Manual for Development and Implementation" in United Nations, Administrative Coordinating Committee for Information Systems, Management of Electronic Records: Issues and Guidelines (New York, UN, 1990) reprinted in Electronic Evidence, op.cit.

15. The actual degree of software independence that can be achieved depends on how long any given "standard" can be expected to remain a standard. In archival terms, this is often not very long. When the independence provided by standards expires, the fact that the data was recorded in a standard will usually provide a route to low cost migration, often directly into a successor standard. Many data objects we create today will not be standard and the metadata with which we label them must flag the dependencies of the data (including their dependency on standards) so that a future review of record headers can locate sources of brittleness and segregate records requiring migration to new software formats before they become unreadable.

16. David Bearman, "Functional Requirements for Recordkeeping: Metadata Specification" (Unpublished draft, 12/21/94)

17. The Internet Engineering Task Force work on Persistent URL's, and work on handles as part of the Library of Congress Electronic Copyright Registration project as reported by Bill Arms of CNRI at CNI Spring 1995

18. For example, the Networked Information Discovery and Retrieval study directed by Clifford Lynch for the Coalition for Networked Information. See also the reports of the Library of Congress Electronic cataloging meeting in October 1994 and the results of the Dublin, Warwick and September 1996 Dublin-image metadata workshops.

19. Such as the IBM infoMarket Cryptolopes , which have recently announced plans to use Xerox's "Digital Property Rights Language (DPRL). Other commercialization's have been announced, but not yet launched, the EPR. For discussion of these and other encapsulated object based intellectual property mechanisms see their web sites, or those of ELSI. In addition, see articles by John Erickson (Cornell), also easily accessed on the www.

20. see, David Bearman and Ken Sochats, "Metadata Requirements for Evidence" at www.lis.pitt.edu/~nhprc/

21. This concept refers to the fact that each group of metadata elements performs a specified task, and that these tasks are logically required to be performed in the order in which the metadata clusters appear, hence the identification of an object, the establishment of its relevance, the determination that the user has rights to access and use it, the decoding of its structure, and reporting on its context, all take place prior to the presentation of its contents to the user.

In certain areas, particularly regarding structural dependencies of data objects representing non-textual content, we have specified a potentially extensible set of modality specific data elements by naming a metadata category but not identifying specific metadata fields/elements within that category. This reflects the recognition that we can never completely specify the data that will be required to document the structural dependencies of future data types.

22. Ulf Andersson, op.cit., fn.11

23. See, John McDonald, SAA 1996, report on the prototype developed for the National Archives of Canada by The Workflow Automation Company Inc., Toronto which follows essentially the same principles I used in the designs for the RLG AMIS Project in the early 1990's.

24. Other, more complicated or less precise, methods of identifying transactions which are the source of records have been proposed. Some thinkers have argued for an artificial intelligence approach in which an `evidence service' in the Application Platform Interface which would capture transactions based on a knowledge-base of organizational communications content, form and addressing. Some variant on this would be required if all transactions are not considered records, which is a major reason I've argued against this concept as articulated by NARA. In principle, such an intelligent service could analyze transactions and assign them metadata attributes required to ensure their authenticity and survival but in practice, a service would need significant knowledge of the rules of communication within a particular business so as to identify transactions of specific types and adhere to the appropriate retention periods, access and use rights, and filing rules and studies to date show little likelihood that this level of knowledge is available or could be implemented in a rule base. At one time I thought information systems staff could identify components in the systems architecture, from storage devices serving as corporate file rooms to telecommunication switches linking to other LAN's, WAN's or systems, and have metadata attributes assigned to records based on where they originated, to whom they were communicated, and the technical characteristics of the transmission. While this might be possible, it also requires a significant analysis of transactional traffic and is susceptible to collapse as the characteristics of the content change (which they will). Finally, in conjunction with corporate policy and procedure individuals could be required to complete document profiles as part of routing and filing transactions, but such intrusive approaches tend to be resented, and often subverted, by users and should be avoided.

25 . The reference here to DTD's (Document Type Definitions), which could be any kind of registered data set whether EDI, a MARC record format, or otherwise, reflect my belief that when the metadata needed by a specialized domain has an essentially application related purpose, but is not required for recordness, it is preferable to satisfy this application purpose by definition of an interchange format or inter-operability model. The interchange standard can be cited in the metadata for Business Acceptable Communications and the data content can then be opened by knowledge of the requirements and structures of the standard without further elaboration. This has the dual advantage of efficiency of definition and ease of migratability as all records corresponding to a specified protocol can be re-presented in a new standard if the old format is superseded.

26. This is explicit in Philadelphia, see Archives and Museum Informatics, v.9#4 p.435

27. An early example of this is reported in David Bearman, "User Presentation Language in Archives", Archives and Museum Informatics, vol.3#4, p.3-7; see also the report of the NIST study in the summer of 1995 of Nebraskans interests in Federal records.

28. Richard Lytle coined the terms and raised the question in his seminal dissertation. The fact that it still has no answer is clear from on-going discussion, such as Chris Hurley, "Ambient Functions - Abandoned Children to Zoos', Archivaria 40, Fall 1995, p.21-39.

29. David Bearman & Richard Szary, "Beyond Authority Control: Authorities as Reference Files in a Multi-Disciplinary Setting" in Karen Markey ed. Authority Control Symposium(Tucson AZ, ARLIS/NA, 1986) p.69-78 ; Chris Hurley, "The Australian ('Series') System: An Exposition", The Records Continuum op.cit. p.150-172; Chris Hurley, " Problems with Provenance", Archives and Manuscripts, Vol. 23, No. 2, Nov. 1995, p.234-259.

30. See, David Bearman, "Developments in Metadata Frameworks", Archives & Museum Informatics, vol.10#2 p.185-188, and "The Research Process, Metadata and the Image as a Document" in this issue

31. See, especially, Frank Upward, "Postcustodial Structural Properties", ]Archives and Manuscripts, Vol. 24, No. 2, Nov. 1996 (forthcoming) and "The Continuum: Principles, Structures and Dualities", Archives and Manuscripts, Vol. 25, No. 1, May 1997 (forthcoming); Chris Hurley, "Standards, Standardisation and Documentation", paper presented at the Australian Society of Archivists Conference, Alice Springs, May 1996; Chris Hurley, "Ambient Functions - Abandoned Children to Zoo", op.cit.

32. My graphical representation differs from that used by Frank Upward and Sue McKemmish, but owes its origins to theirs and to the Monash University workshop in Canberra where it was developed on July 2, 1996.