MW2000

Register
Workshops
Sessions
Speakers
Demonstrations
Exhibits
Events
Best of the Web
Key Dates
Minneapolis
Sponsor


When is the next
Museums and the Web?


A&MI home
Archives & Museum Informatics
158 Lee Avenue
Toronto, Ontario
M4E 2P3 Canada
info@ archimuse.com
www.archimuse.com

Search Search
A&MI

Join our Mailing List.
Privacy.

Copyright
Archives & Museum Informatics
2000

Unexpected help with your web-based collections: Encouraging data quality feedback from your online visitors

Paul F. Marty and Michael B. Twidale, Graduate School of Library and Information Science University of Illinois, USA

Abstract

This paper examines how monitoring the access and use of web-based museum collections can improve the overall quality of artifact data. All databases have quality problems that arise both from input errors and also from errors that emerge over time; museum databases are no exception. Indeed, the complexity and variety of the information recorded in museum databases actually adds to the probability of such errors occurring. Given the expense of checking through each record for errors, we here present alternative approaches that can contribute to the improvement of data quality of web-accessible museum database systems.

Introduction

All organizations collect data and then use, analyze, manipulate, add to and modify it. Increasing automation offers both potential cost savings as well as opportunities for new forms of data manipulation hitherto impossible or infeasibly expensive. However, as many failed automation attempts of white collar work have shown, it is all too easy for an impressively efficient system to be unable to cope with exceptions, and particularly with erroneous or unexpected data. Although some problems were clearly present in older paper-based systems, others have appeared as a product of automation itself. For example, when a paper form passes through many hands, there is at least the possibility of errors being detected and remedied by alert members of staff. However, where much of the same work is partially automated, there will be fewer eyes looking at the data; moreover, those spotting a problem may not be able to report or correct it easily.

This paper explores ways of reintroducing such aspects of informal error checking back into advanced systems, while maintaining the cost and functionality advantages of those systems. We explore the possibilities for improving the quality of a systemís data by taking advantage of the usage of that same data by people both inside and outside the organization. We use the results of a study of data management in a particular museum to provide exemplars of good practice and raise wider issues for consideration.

Our approach has been to examine and understand existing practice and then consider how activities may be extended by suitable changes involving the use of new systems functionalities. In particular, we consider how the insights and techniques from Computer Supported Cooperative Work (CSCW) may be applied in this area to support the goal of improving data quality. By such an analysis, we are able to formulate a number of research questions, and, we hope, encourage a debate on this method of analyzing the problem. Our aim is to uncover issues applicable not merely to other museums, but also to related organizations such as libraries and archives, and ultimately to databases in general.

We believe that databases should make it easy for their users to indicate the existence of an error so it can be corrected by a suitably authorized person. This process relies on the altruistic behavior of the users and so raises questions of why and whether people will bother, as well as issues of how to minimize the effort required. We provide preliminary evidence for the plausibility of such activity occurring in certain circumstances. We also explore a complementary approach where metadata (including usage information) is employed to predict data elements that have a greater probability of being erroneous and so prioritize the data checking activity.

Why Museums?

Museum studies may be regarded by many as unrelated to the fields of advanced computing, interface design and CSCW. Nevertheless, we believe that there is the potential for productive transfer of ideas between these disciplines.

There is a growing interest in the field of museum informatics, where studies in information science are informing (and we hope, being informed by) the problems and challenges of the museum context. These challenges include managing widespread non-standard data, the evolving knowledge of the meaning of that data, and various attempts to interlink data sources between institutions. Moreover, the use of computers for educational and cultural purposes by visitors (both real and virtual) to museums and the integration of advanced information technology into the daily tasks of museum staff members raises the potential to contribute powerful insights into how technology and the iterative design process can have an important impact across the information society.

Moreover, from the point of view of this study, museums provide accessible examples of general problems that pervade databases of all types. A museum database serves as a useful source of examples of all types of database errors: inaccurate descriptors are typed into fields, data becomes obsolete over time, accurate data is difficult to obtain, and quality of data is often compromised by constraints of time and money. The complexities of cataloguing museum artifacts yield a rich variety of errors. In addition, the public nature of museum research means that this information is generally more accessible for study than the conventional commercial database. Thus, a study of data quality issues in museum databases provides us with a wide variety of examples that can easily generalize to other kinds of database systems.

The Context of Case Study

The Spurlock Museum at the University of Illinois is a cultural heritage museum with a collection of over 45,000 ethnographic artifacts from around the world. These artifacts represent a broad spectrum of history and culture ranging from ancient Sumeria to modern day Ecuador, from Paleolithic chipped-stone tools to tiles from the Space Shuttle.

The museum is currently in the middle of a five-year process begun in 1996 to complete a 100% re-inventory of the museumís collections, as part of a move to a new building. As part of this process, each artifact is being retrieved from storage, analyzed and evaluated, weighed, measured, and photographed, identified and catalogued; the resulting data is stored in specially developed relational databases. Each record in the primary Artifacts database features over one hundred fields tracking such detailed specifications as nomenclature classifications, physical dimensions, material analyses, geographical, cultural, and temporal designations, accession records, artifact histories, exhibit information, scholarly remarks, condition and conservation records, research notes, etc. (Marty, 1999).

This effort represents a project of enormous scope, requiring the cooperation of museum staff members, various external experts, and dozens of part-time undergraduate student employees. Since the people involved in this project have varying levels of expertise, there is the potential for many different errors of various types. The challenge (as for many other databases) is to maintain high data quality when it is known beforehand that the work of some participants will necessarily be of relatively low quality. The museum registrar has overall responsibility for monitoring quality and organizing and modifying work practices to maximize this quality under the constraints of budget and time.

Data Quality in the Spurlock Museum

During the process of data gathering, entry, and analysis, an unexpected phenomenon was observed: in accessing and using the data, individuals have consistently detected errors, reported them, and even volunteered corrections. However, these individuals were not directly involved in the task of searching for errors; they were instead working on other tasks.

For example, a curator working remotely over the Internet to plan a new exhibit for the museum retrieved several artifact records using his web browser. In one of these records, he found an erroneous entry and sent an email to the museumís registrar, noting the error and suggesting a correction.

Because of examples such as these, we decided to undertake a pilot study of error detection and correction of museum databases at the Spurlock Museum. This study has revealed a number of important issues which lead to the development of research questions and design implications. These issues will now be discussed.

Error Detection

The detection of errors may occur in various ways:

Example. While browsing a particular data record, the museum registrar notices that an artifact has no weight entered in the database. This problem must be noted, the artifact retrieved, weighed, and value entered into the database.

Example. A museum staff member notices that the length of a Merovingian brooch is entered in the database as 3000 cm. Clearly this is an error; brooches are never 30m long. The artifact must be remeasured and the correct value entered into the database.

Example. A undergraduate senior in anthropology working at the museum reads an artifact record for an Egyptian earthenware bowl. She notes that the time period designation is entered as "predynastic." However, the manufacturing process field notes that the artifact was wheel-thrown. Because of her knowledge of Egyptology, she is aware that this represents an inconsistency in the data: predynastic Egyptian pots were not wheel-thrown. However, not being an expert, she doesnít know which field is incorrect, nor (because of her junior status) is she allowed to change them. She reports the problem, but a specialist in Egyptian archaeology must examine the pot to resolve the contradiction.

From examples such as these, several different types of errors can be identified.

Input Errors

The most obvious kind of error is a data input error, where the correct value was known (or knowable) at the time of input, but that value is not what was entered. Such errors can be slips or misconceptions. A slip is when the person entering the data knows the correct value but accidentally enters another one. This is frequently due to typing errors, but can also be caused by interruptions or distractions. A misconception occurs when the person entering the data believes that she knows the correct value and actually enters that value, but it is not correct. The likelihood of a misconception varies with the expertise of the person making the decision. Even an expert may have misconceptions, but a novice is likely to have more.

In general terms, appropriate ways of dealing with input errors include the use of controlled vocabularies (Lancaster 1986), which can detect some, but not all, typing errors. Selection from menus eliminates the problem of the entry of an invalid entry, but not the problem of an incorrect entry ñ an accidental tremor with the mouse can lead to the selection of the wrong value. Data dictionaries can be used to forbid or flag values outside expected norms and therefore suspicious. Although careful checking will help to reduce the problem, it will not eliminate it.

Jacsó (1993) notes the contribution that effective use of controlled vocabularies can make to the problem. Nevertheless, problems remain with existing commercial databases, including typographic and spelling errors and the use of some fields as a dumping ground for values that don't fit into the databaseís current field structure. Further problems are caused by legitimate variant spellings, especially in cases where names change over time. In such cases, cross-references are a powerful solution, but problems occur when they are not used universally in the database.

One possible solution is to have the data entered independently by two people and to check for discrepancies. If errors are rare and randomly distributed, this is effective. It is also, however, expensive, which is the reason it is not done for every entry at the Spurlock, and we would presume, at many other locations.

Another possibility is to split tasks into relative layers of expertise. For example, a novice can be assigned the task of entering already extant data from paper forms into the database system. More advanced students trained in artifact handling procedures can measure the museumís artifacts and record this data on paper forms for later data entry. Finally, the museumís most experienced students are set to such advanced tasks as assigning nomenclature classifications, identifying the provenance, or analyzing the material composition of a given artifact.

However, even apparently relatively simple activities can lead to confusion. For example, at the Spurlock, the length, width, height, and other dimensions of each artifact is entered into the database. While measuring these values is not difficult, deciding which value goes with which field is often confusing because this presupposes knowledge of the artifactís correct physical orientation. An error here could lead to confusion for exhibit designers about the intended orientation of the piece, resulting in incorrectly shaped exhibit cases or display mounts.

It is important to note that this sort of error could not be solved by the implementation of controlled vocabularies. Moreover, although ideally all such data should be entered by an expert, this is not always feasible due to financial constraints. In the following sections we shall consider how collaborative approaches can make the best use of the limited resource of expertise.

Emergent Errors

Even if a value is known to be correct and entered correctly into the database, it can become incorrect later. Over time, if nothing is done to maintain accuracy a database will degrade in quality. In short, data exhibits entropy. Where that is acknowledged, for example, with cash balance accounts in commercial databases, policies can be put in place to address it. Problems arise, however, when the change is so slow that it is considered imperceptible or not worth the bother of tracking. Furthermore, complicating the problem and leading to what we are calling emergent errors is the phenomenon that scholarship and knowledge evolves. What was once considered a correct attribution may become incorrect in the future.

Example. The museum owns many artifacts from the former Soviet Union. The "country" designation for these artifacts was originally entered as "USSR." As the museum staff re-inventories the collection, these artifacts are being re-classified as coming from "Russia," "Georgia," etc.

Example. The Spurlock has a sizeable collection of Paleolithic chipped stone tools. In the museumís older ledgers, these are all classified as weapons. Subsequent scholarship has revealed that many of these so-called weapons were actually multi-purpose tools, also used for cooking, scraping, cleaning hides, etc. Therefore, many of these artifacts originally classified as a weapon may now need to be re-classified.

Thus the data in the database has to refer not just to the artifact within the walls of the museum but also in its larger context; whether in the world at large or in the scholarly community. As external events change, whether they are geopolitical in nature or represent an evolving state of knowledge, the database needs to be updated.

Collaborative Approaches to Data Quality Improvement

Given the problems outlined above, we believe that automated and semi-automated solutions, such as the use of controlled vocabularies, although important, and to be strongly advocated, will not solve all the problems. Careful manual checking of every item is infeasibly expensive, not least because data entropy means that it requires an ongoing commitment. We need to consider ways to improve quality that are not prohibitive and which are capable of fitting with existing work practices. The proposed solutions must also acknowledge the wide variation in expertise of the people involved in the process. Our approach has been informed by consideration of several observed cases such as the following illustrative example.

Example. A student entering cultural data on a particular Mesopotamian ceramic vase, did not know whether this artifact was Sumerian or Akkadian in origin. Rather than guessing at random or failing to enter anything, she entered the data as Sumerian but appended a note to the record (using a specially designed problems field) indicating her uncertainty in this matter, the nature of the problem, and her recommendation that a specialist in Mesopotamian history double-check the record. This example illustrates that even relatively low status employees, with presumably less institutional commitment, may be willing or can be encouraged to flag problems, provided that we give them suitable mechanisms to do so easily (see below).

Error checking and creative volunteering of information have been seen in various different ways throughout the museumís system: it is almost commonplace to see phrases such as "I donít know," "Needs further research," or even simple question marks scattered throughout the records. Thus, even though this process of recording errors has not been formalized as a required duty of the student employees, these examples serve as evidence that people are not only willing but will find their own ways to indicate quality problems of their own accord.

The relative frequency of this kind of activity raises a question: can web-accessible museum collections incorporate mechanisms that will allow people browsing the data to report errors they detect and, should they wish, volunteer corrections? Although clearly this will not happen for every database record, we need to understand more about the process of error recognition and correction in order to facilitate it when it does occur. Consequently, it is our hope that if we encourage this activity, both managerially and through appropriate systems re-design, we can produce usage based data quality improvement.

In general terms, this leads to the following questions:

  • Is it possible to take advantage of the use of the data and the expertise of users to identify and remediate errors?
  • What kind of computer support would facilitate this?
  • What kind of organizational and managerial arrangements would facilitate this?

This approach has precedent elsewhere. Gasser (1986) noted the various workarounds that occurred as workers devised mechanisms to cope with the inflexibility of a computer system, including dealing with erroneous information. Hughes et al. (1992), in a study of air traffic control work, observed that the less experienced assistants picking up flight strips from a computer and moving them to a display sometimes spotted errors in the printouts and reported them. This was an important part of the informal error checking that was integral to the safety culture of air traffic control. The study emphasized the social mechanisms employed to create a total system that was far more reliable than its very fallible components ñ exactly the situation needed for data quality.

Mintz (1990) proposed the FIXIT feature to enable quality control by online searchers of a fee-based database. This attempted to address the barriers to giving feedback to database producers, and to provide a credit for the accessing of erroneous data. FIXIT would switch the user to a free database (avoiding further online charges) and then ask the user for details about the faulty record that had been detected. She further proposed that the search services should publish details about how many corrections were sent to each provider and how many linked records were added to correct the FIXITs sent. Rather than just giving a credit for each erroneous record the user had retrieved, Mintz suggested that the database providers pay double for error reports in order to encourage them.

Encouraging Data Quality Feedback

In particular, one very important question is whether people will actually bother to volunteer this information. It seems reasonable to assume that the more difficult it is to volunteer the information, the less likely it is that a person would bother. Thus a design challenge is to develop mechanisms that are as low cost as possible while also exploring the potential benefits of use. What are the incentives for people to volunteer their help? These can include a commitment to scholarship, irritation with an observed error, and internal altruistic motives. For some, the intrinsic reward is to see that their actions have an impact. Thus it is important that reports be rapidly acted on since the volunteers of information may check on the results of their suggestions. If they choose to provide contact details they can be thanked and told the consequences of their report.

This question deserves a much more detailed study. However, we can provide some evidence in the form of observed examples of it occurring. Examples of this form of altruistic error correction behavior date back to the initial stages of the museumís re-inventory process. In late 1997, when digital records for the museumís collections were first made available for review over the Internet, a specialist in African art located in Phoenix, Arizona, examined over 200 records and sent detailed comments on each to the museumís registrar via email. These comments were evaluated by the museum staff for accuracy and subsequently entered into the museumís database systems.

This type of feedback has continued throughout the re-inventory process. For example, recently, a University of Illinois professor examined from his office a number of on-line records of cuneiform tablets and emailed the registrar with his suggestions and corrections for the records. On occasion, he noted that although the textual data within the record was accurate, the digital photograph which accompanied the record was upside-down!

In a related example of end user feedback, Davis (1989) describes work by OCLC (The Online Computer Library Center) to elicit response from the users of their Online Union Catalog about its quality. The process for the users (librarians at affiliated libraries) was laborious, involving the postal mailing of Change Request forms along with photocopied supporting documentation. Accordingly, user feedback generated less than half of the 125,000 records replaced per year, 31% of respondents said they never reported errors, and 42% only reported a few errors. However, what to us is remarkable is that anyone bothered to report errors. We use this as evidence for the feedback approach being feasible, provided that ease of use issues are addressed: 70% of the librarians surveyed said that they would increase their reporting of errors if a more accessible online error reporting system were available.

Orr (1998) also advocates user feedback as a way to maintain data quality (this is the closest in approach to the mechanisms outlined in this paper). He notes the impossibility of perfect data quality and rather focuses concern on quality that is good enough. Where the database is considered part of a feedback control system, there is the possibility of usage statistics leading to the detection and correction of errors.

Raymond (1998b) explores the development of Open Source Software (such as Linux) as a kind of gift economy to account for the apparently altruistic behavior of the participants. Although seemingly very remote from the world of museums and even museum informatics, we believe that it can shed light on the mechanisms that are necessary to encourage collaborative errors detection and recovery, as well as serving as an existence proof for the plausibility of the idea. The gift economy analysis emphasizes that the altruistic behavior needs to be acknowledged, probably publicly. Furthermore, the community of testers and developers needs to be acknowledged, along with continuing feedback that overall improvement is resulting from the collective efforts of that community. We would expect that museum curators and scholars would form the most likely members of an error-correcting community, but in addition to studying how to support them, we believe that it would be worthwhile to at least investigate the possibility of contributions from a wider set of visitors.

These examples show how data quality may be improved by its usage, that as people look at the data for their own purposes, they may spot errors. This is a valuable resource, but one that is difficult to exploit. Information can come from different kinds of people and involve different degrees of knowledge and detail. It may involve users both within and without of the organization. It may involve experts who can identify the problem and its correct solution, or people who just have a hunch that there may be a problem. The complicating factor is that error correction and remediation may itself involve errors: people who believe there is an error when the data is in fact correct, and people whose suggested corrections to real errors are also erroneous.

We believe that research into the following two areas can help solve these problems: error metadata and error probability.

Error Metadata

Once we move from a desire solely to get data right the first time towards a consideration of continual change in data quality, it becomes important to consider the issues of error metadata. Much can be learned from the use of an earlier data recording technology, the index card. There are many instances where index cards reveal not just the information that was first recorded on them, but how that information has been modified, annotated and added to over time. Often this information includes details of the date and person who made the change or addition, and even why. The resultant information, although often created for reasons of expediency, can in certain cases be more useful than that conventionally obtained by just entering the final values on the index card into a database and destroying the older incorrect one. All this error metadata is of great practical interest in managing error checking and management. It helps to know who has considered the data, when, and why, and what they did. This metadata can help in deciding how much credit to give to the resultant data and how much checking time to devote to it (Baker 1994).

We can contrast this with a conventional (computer) database. Certain people have permission to make changes to the data, but this is often by overwriting, obliterating the old information, often with no record of who did it, when, and why. It would seem that such a database ironically suffers from amnesia about itself. Each field in the database should itself have associated with it data about its creation, checking, and modification. This would include date and time information, who was involved and a continuing record of the previous values that have subsequently been updated. Rothenberg (1996) outlines a range of metadata fields that can be used in improving data quality. Our proposals, focusing on aspects of usage and evolution, form a subpart of his wider analysis.

Based on some of our preliminary suggestions, the Spurlock has implemented two systems for tracking error metadata. First, a "modification history" is maintained for each record in the computer system that tracks the digital evolution of the data contained within the record. Every time a modification is made to any field in any record, this fact is recorded in the modification history for that record. Each entry in the modification history logs the nature of the change, who did it, when and why. By tracking the contents of this history throughout the lifespan of a record, the museumís Registrar, among others, is able to gain an understanding of the interaction over time between museum staff members and the computer system. Second, a "problems field" in each record exists for the recording of any inconsistencies, difficulties, or uncertainties that a user of the system might encounter. When a museum employee finds a problem of any type with a given record, he or she records this in the problem field and sets a flag that alerts the museumís Registrar to the existence of a new problem with this record.

Looking to future developments, we would advocate extending the index card idea into the online world; we could add information about the usage of the database, such as incrementing a counter every time a user (or a certain category of user) looked at it. This is much harder to do in the physical world, although its inspiration is the different physical condition of documents: papers and index cards that are used a lot become dirtier and dog-eared. This idea, termed by Hill & Hollan (1992) "read wear and write wear", can contribute to improving data quality. As a working hypothesis we propose that heavily used data is in general of higher quality, because the chances of error detection, reporting and correction are higher than that for rarely used data. If this is so, then usage information, although unlikely to be definitive in identifying errors, can contribute to a probabilistic approach to data quality management (as outlined later). We must acknowledge though that the hypothesis is only concerned with an overall trend; a single heavily used data item may still contain an error and indeed by virtue of its heavy use, that erroneous value may be widely believed to be especially reliable.

There are many different ways in which such metadata might be recorded and considerable variation in the quantity that might be retained. For example, for each field one might provide an ever growing list of its earlier values, when they were entered and who entered them, along with an optional text field for annotations about the updating process. Similarly there are many different kinds of usage data that might be collected. If, for example, the use of a database requires logging in, it would be possible to record a user identity along with each usage incident, although this raises questions of privacy.

All these ideas need further research to explore their computational and social consequences as well as the degree to which they support cost effective data quality improvement. Of all the costs, that of storing what appears to be substantially more information than is conventionally recorded for a database is perhaps of the least concern. The costs of computer storage continue to fall rapidly and compared to image data (such as one or more high quality photographs of an artifact), this proposed additional textual information will likely be only a marginal increase. The ongoing costs of collection should also be relatively low, since most of the data described can be collected automatically by a suitably designed system. The main additional costs are in adapting the systems in place to include the collection of this information, and the development and use of the systems to make use of the data, including ensuring agreed privacy safeguards.

Error Probability

Data mining is a well established technique for discovering useful unexpected features in existing data. We can consider the search for errors in our data as an example of data mining. If we extend the mining metaphor, the spontaneous altruistic error reporting described above can be considered as analogous to the happenstance of a natural outcropping. This can then be used as an indicator of a cluster of linked errors. Drilling for oil is an expensive endeavor; one attempts to improve one's probabilities of a strike by accumulating multiple pieces of contributing evidence. In the same way, once an error is brought to oneís notice it can be used to adjust the probability field of likely error clusters.

For example, if one Mesopotamian pot is mis-classified as Sumerian instead of Akkadian, one begins to suspect other Mesopotamian pots. In particular, one suspects those classified at about the same time and by the same person. One is also especially suspicious of data items that have been rarely looked at. The linkages between the data that adjust the probabilities can be both aspects of the data itself (pot, Sumerian) and about its metadata (who entered it, when, if it has ever been checked for this or any other error). The probabilistic nature of this approach is important to consider. One could in theory check everything, but usually resources do not permit this. It becomes useful to maximize the potential of identifying errors even when this means that not all errors are detected. Multiple factors can contribute to this information.

For example, in recording details about the geographical provenance of a given artifact, students are asked to enter the artifactís continent, country, region, city, etc. When undertaking a random check of artifacts from Africa, it was noticed that one artifact had in the fields continent, country and region, the values Africa, Kenya and East Africa, respectively. This is incorrect. The region field should be a sub-part of a country (e.g. South Kenya), not a sub-part of a continent. Subsequent analysis led to the discovery that the same individual was responsible for this misinterpretation of the region field in other instances. This fact led the museumís registrar to suspect all geographical entries by this individual and allowed her to search the system for all such entries by this individual and correct them. It is easy to see the cause of the misconception, and consequently worth checking (if time permits) whether the same error occurs elsewhere.

Another example can be found in the analysis of records pertaining to maps. It was discovered that some students cataloguing these items incorrectly recorded in these same geographic provenance fields the details of the country depicted on the map rather than the geographical information about where the map was created (i.e. a map of West Africa made in Spain would have a value of Africa in the continent field rather than Europe). As in the previous example, once one such error was detected, suspicions were raised about other records. Those map records entered by the same student seem most likely to be in error, followed by (to a lesser extent) all records about maps and other artifacts that depict countries, which could cause the same kind of confusion. The example of the misclassified Paleolithic chipped stone tools discussed above shows the same issue. Once it was realized that one given record of this type was misclassified as a weapon, it was necessary to reassess all similar artifacts.

These examples serve to show how the data can be regarded as a probability field, with each data item having a certain chance of being incorrect. It is not the case that each item has the same error probability. We can draw on information about prior experience to ascribe higher probabilities based on a range of factors, such as how difficult it is to obtain the correct value for a certain category of artifact, and how expert the person making those decisions is. As new information is obtained (such as discovering the map misconception), the probability field shifts. We can use the information in the probability field to help decide where to allocate scarce expert error-checking resources, that is, where to drill for errors.

Conclusions

Based on an ongoing study of the re-inventorying practices of a particular museum, we have attempted to identify existing collaborative activities that can serve as the inspiration for future technical innovations. By linking those findings with current CSCW research, we have outlined a new collaborative approach to data quality management.

This analysis of the Spurlock Museumís data quality has highlighted the following two approaches:

  • Spontaneous volunteering of data quality problems, sometimes accompanied by proposed solutions; and
  • The use of usage information and data quality metadata to inform quality control activities by addressing an evolving error probability field.

We believe that these approaches offer great potential benefits not only for museum and bibliographic databases, but also for commercial ones. We intend to continue this work by a combination of observational studies and the development and testing of prototype systems embodying the features outlined in this paper.

References

Baker, N. (1994). Discards. The New Yorker, 70(7), 64-86.

Davis, C. C. (1989). Results of a survey on record quality in the OCLC database. Technical Services Quarterly, 7(2), 43-53.

Gasser, L. (1986). The integration of computing and routine work. ACM Transactions on Information Systems, 4(3), 205-225.

Hill, W. C., & Hollan, J. D. (1992). Edit wear and read wear. Paper presented at CHI '92: Proceedings of the Conference on Human Factors in Computing Systems, Monterey, CA.

Hughes, J. A., Randall, D., & Shapiro, D. (1992). Faltering from ethnography to design. Paper presented at CSCW'92: Proceedings of the Conference on Computer-Supported Cooperative Work, Toronto.

Jacsó, P. (1993a). Searching for skeletons in the database cupboard Part I: errors of omission. Database, 16(1), 38-49.

Jacsó, P. (1993b). Searching for skeletons in the database cupboard Part II: errors of commission. Database, 16(2), 30-36.

Lancaster, F. W. (1986). Vocabulary control for information retrieval (Second ed.). Arlington, VA: Information Resources Press.

Marty, P. (1999). Museum informatics and collaborative technologies. Journal of the American Society for Information Science, 50 (12), 1083-1091.

Mintz, A. P. (1990). Quality control and the zen of database production. Online, 14(6), 15-23.

Orr, K. (1998). Data quality and systems theory. Communications of the ACM, 41(2), 66-71.

Raymond, E. S. (1998). Homesteading the Noosphere. First Monday, 3(10). http://www.firstmonday.dk/issues/issue3_10/raymond/

Rothenberg, J. (1996). Metadata to support data quality and longevity. Paper presented at the 1st IEEE Metadata Conference, Silver Spring, MD.