MW-photo
April 11-14, 2007
San Francisco, California

OpenCollection Web-Based Collection Cataloguing and Access Software

Carl Goodman, Museum of the Moving Image, USA; Megan Forbes, Museum of the Moving Image; and Seth Kaufman, Whirl-i-Gig, USA

Abstract

OpenCollection is an open source, Web-based collections management and access application created by Museum of the Moving Image and software developer Seth Kaufman for use by museums, libraries, and archives. The application supports the cataloguing of physical objects, media, and native digital content, and is designed to meet the needs of large heterogeneous collections that

  1. have complex cataloging requirements, and
  2. require support for a range of metadata and media formats.

OpenCollection is a true Web application. All cataloguing, search, and administrative functions are accessible via the Internet, using standard Web browsers, and cataloguing and on-line access to collections information is easy, efficient, and inexpensive.OpenCollection is, to the Museum's knowledge, the first software of its kind. It represents an alternative to the expensive proprietary collections management software used by some of the country's largest museums and the ad-hoc collection databases that other institutions construct in lieu of appropriate software.

OpenCollection was developed for the Museum's own use between 2004-2006 with support provided by the Institute of Museum and Library Services, and with the intention that the software would ultimately be made available to museums, libraries, and archives. The software's programmer, independent developer Seth Kaufman, is committed to the software's distribution in this manner.

In early 2007, the Museum will make OpenCollection and its source code freely available under the GNU Public License (the image viewer is now available under the same license). This is being done with the full participation and involvement of independent software developer Seth Kaufman, the author of much of the software's underlying code.

Keywords: Open Source, Web applications, digital collections, collections documentation, software

Introduction

The Museum of the Moving Image and software developer Whirl-i-Gig recently developed and implemented Web-based museum collections management and presentation software called OpenCollection. The development of OpenCollection was paid for with funds and resources from the Museum's operating budget and with a 2004 grant from the Institute for Museum and Library Services. In addition to realizing expected benefits from the implementation and use of this software, there have also been several unexpected benefits, as well as some unforeseen challenges.

Though made for the Museum's use, OpenCollection has always been envisioned as software that could be shared with other institutions via an open source model. In 2007, the source code was put on-line. However, for the software to truly be of use to the Museum community, and not just to those with large technical staffs and an itch for do-it-yourself software, more needs to be done. Moving Image and Whirl-i-Gig are now engaged in activities, and exploring various models, to ensure the future sustainability, integrity, and utility of OpenCollection.

Additionally, as a result of this project, we believe that thin-client, Web-based collections software can forge dynamic, multi-directional links between museum collection functions, normally focused internally (back office), and Website functions, normally focused externally (front office). This has implications for how the Web is utilized within the Museum enterprise. It also suggests that changes to museums' organizational structures are necessary in order to take full advantage of an expected emerging class of Web-based museum software applications.

About the Museum & Whirl-i-Gig

The mission of the Museum of the Moving Image is to advance public understanding and appreciation of the art, history, technique and technology of film, television, and digital media. It does so by collecting, preserving, and providing access to moving-image related artifacts; screening significant films and other moving-image works; presenting exhibitions of artifacts, artworks, and interactive experiences; and offering educational and interpretive programs to students, teachers, and the general public.

The Museum is located in Astoria, Queens, on the site of the Astoria Studios, one of the largest motion picture and television production facilities in the United States. Currently, the Museum maintains a collection of over 125,000 moving image artifacts, one of the largest and most important such collections in the world. The collections include artifacts from every stage of producing, promoting, and exhibiting motion pictures, television, and digital media, including exceptional holdings in television sets, licensed merchandise, rare photographs, and video games. The Museum is characterized as mid-sized, with approximately 40 full-time employees. During the development of OpenCollection, the Museum had one part-time IT/MIS consultant, Remy Ching, and one Assistant Curator of Digital Media, Tim Schwartz, who is also the Museum’s Webmaster.

Founded in New York in 1995, Whirl-i-Gig is a two-person software development firm specializing in software applications for museums, archives and biological research and conservation. Since its inception, Whirl-i-Gig has developed a succession of Web-based collections management and dissemination applications, of which OpenCollection is the latest and most advanced.

How OpenCollection Happened

A 2002 grant from the New York State Council for the Arts (NYSCA) was used to establish a plan to create a collection database and make the information available online. Moving Image originally planned to use a standard, off-the-shelf collections management software package. Towards that end, the Museum conducted a comprehensive software review, in order to determine which of ten software packages was the best option for the Museum. After a thorough review, it was determined that no available product met the museum’s functional requirements. Further, the applications reviewed were expensive and encumbered with restrictive licensing. Their closed source nature meant that all but the most superficial customization was impossible, and many packages were designed in ways that made migration or sharing of collections data costly and difficult. The Museum decided to utilize Microsoft Access as an interim collection database.

At the same time, the Museum hired Whirl-i-Gig to study the possibility of devising a custom, Web-based application to present collection information online. In addition to off-the-shelf Museum collection software, we had seen and been intrigued by the possibilities of Whirl-i-Gig's prior Web-based database work for the American Museum of Natural History and the Cooper Union for the Advancement of Science and Art. Existing digital asset management programs, rather than museum-specific collections management packages, were an additional source of inspiration. Whirl-i-Gig recommended building a single Web-based system, built especially for the purpose of Museum collection management and cataloging and based upon a foundation of code that had been successfully employed in previous projects.

In 2004, the Museum of the Moving Image received a two-year grant from the Institute of Museum and Library Services (IMLS) to develop OpenCollection, using a test bed of 700 items from its collection. At that time, the bulk of the Museum's collection documentation was captured primarily in paper accession files, and in the Access database developed in 2002. The complex nature of moving image-related materials meant that not only did the new collections management system need to support the description of a vast range of artifacts, but it also needed to capture the relationship of these objects to various people, corporations, and productions.

Development – Cataloging Interface

A basic version of the design for the Moving Image cataloging interface of OpenCollection was included in the 2004 grant proposal to the IMLS. This design included a field-by-field description of the cataloging module of the database, along with specifications for separate databases to hold authority records for people, corporate bodies, and productions. The database design was based on the Visual Resources Association's (VRA) Core set of metadata elements (3.0), and required, among other things, separate records for objects and their digital surrogates.

During the grant period, OpenCollection moved from a list of specifications on paper and smaller modules already created by Whirl-i-Gig to a working collections management model. Early versions of the software were created and tested by the collections staff at the Museum, with updates to the software made on a weekly basis.

At the beginning of the project, much of the data that the curatorial and collections staff was working with was housed in the Museum's temporary Access database. Once enough of the cataloging interface was completed, the Museum was able to migrate the 20,000+ records in its Access database to OpenCollection, where they could be updated to conform to the Museum's new standards of description.

The first large-scale test of the software came in the summer of 2005, when the Museum hired four full-time graduate level interns as catalogers. The interns possessed backgrounds in both library sciences and the humanities, and assisted with not only artifact cataloging, but also in the continued refinement of the software. This included streamlining the addition of records with tombstone-level information, and the ability to duplicate catalog records of similar objects.

Since the summer of 2005, OpenCollection has been the sole collections management system in use at the Museum of the Moving Image. Not only the collections department, but also curatorial, exhibitions, development, and public relations staff use it. A rudimentary public front end was made available in August 2006, and has since been upgraded to the version now online at http://collection.movingimage.us.

About OpenCollection

OpenCollection is a general-purpose system designed to be applicable to a wide variety of collection types. Key features include:

A completely Web-based user interface

All access to OpenCollection is via a Web browser-based user interface. No other software is required. Thus any operating system that can run a modern Web browser is supported, including Mac OS X, Windows 2000/2003/XP, Linux, the various BSD OS’s and Solaris. The lack of specialized software and hardware requirements — virtually any internet-capable computer will do — makes remote access for both data entry and search simple.

Configurable type-specific metadata system

In addition to the standard set of OpenCollection fields representing concepts applicable to anything that can be catalogued — things like "accession number" — sets of custom fields may be defined. These sets can map to established metadata standards such as Dublin Core, Darwin Core, VRA Core 3.0, CDWA Lite, etc. Custom fields may be type-specific: they can be defined such that they are only available for specific types of catalogued items (ex. photographs, video tapes, films). They may also be repeating, and it is possible to impose controls on input formats.

Support for a wide variety of digital media types

OpenCollection understands and can process, convert, and display digital media files in many formats, including:

OpenCollection can auto-convert non-Web-viewable formats such as TIFF into Web-friendly formats (such as JPEG) at various sizes. The original format can be retained and made accessible for download. For small files, conversion and resizing may be done in near-realtime. For larger files, which can take a considerable amount of time to process, conversion tasks can be queued for later processing on a designated media processing server. Whatever the uploaded file size, catalogers are never forced to stop working while media files are processed. Support for individual media types are implemented using a modular architecture, which makes it possible to add support for new media formats without requiring modifications to the core, OpenCollection system.

Automatic extraction of metadata from uploaded media files

Metadata embedded in uploaded media files in EXIF, IPTC, IRB and XMP formats can be extracted for search or display.

Batch upload of media files

The "File Space" is a holding area for media files to be added to catalog records. Using a browser-based user interface, media may be uploaded to the File Space in large batches (as ZIP, Tar-GZip or GZip encoded archives) for later cataloging. In most cases this is considerably faster than uploading media file-by-file.

Flexible search engine

The built-in search engine supports full text searching over all fields in database, field-limited searches, wildcards, stemming, phonetic matching, spell correction, Boolean combinations, exclusion (Boolean "NOT" operator), phrase searches, synonymy and more. Both simple Google-like and advanced search interfaces are offered.

Built-in Web-based high resolution "pan-and-zoom" image viewer

Images may be viewed at any resolution, and with continuous zooming and panning, using OpenCollection's built-in Tilepic viewer. Tilepic is an open multi-resolution image format designed by the University of California Berkeley Digital Library Project (http://elib.cs.berkeley.edu/tilepic/). Uploaded images can be automatically converted to Tilepic from any supported image format.

Extensive support for authority lists

OpenCollection has a full set of tools for managing and cataloging with the following types of authority lists:

Controlled vocabularies

An unlimited number of hierarchical controlled vocabularies may be loaded into the system and used side-by-side for cataloging. Management tools allow selected users to edit existing vocabularies or create new ones from scratch. Tools have been implemented to import Getty Art and Architecture Thesaurus (AAT) data files into OpenCollection. It should be possible to load other thesauri into OpenCollection without modification to the core system.

Reporting

The search engine's support for Boolean combination, exclusion, wildcards and field-level limiting makes it possible to pose very specific queries suitable for reporting. The result of any search in OpenCollection may be downloaded as a tab-delimited file suitable for import into Microsoft Excel or similar applications for reporting purposes. The list of report fields and their output order may be customized.

Benefits & Challenges for the Museum

In developing and implementing OpenCollection, the Museum has realized benefits that one would expect from full-fledged collections management software. The collections department at the Museum has exponentially increased its control over the more than 125,000 artifacts in the Museum's collection. With the migration of data from the old inventory database, staff are now able to quickly and accurately identify what artifacts they have, where they are, and what condition they are in.

Researchers have benefited from the software in several ways. First, collections staff are able to do searches for outside researchers quickly and efficiently. In many cases, simply checking the records attached to a particular authority file can give a researcher a quick snapshot of the Museum's holdings related to any given person, corporation, or production. With the launch of the collection catalog online, researchers may also perform independent collections-based research. Full-text catalog records that allow for keyword searching, zoomable high-resolution images, and linked subjects combine to amplify the value of the artifacts themselves.

Internally, OpenCollection has allowed the physical collection to grow beyond the walls of the storage room. Exhibition staff, curators, development, and public relations can all access the system to learn about donations, download images, and plan exhibits. OpenCollection's simple interface enables staff to use the system, only rarely needing the intervention of the collections department. A system of permissions and access control enables collections staff to determine who can see records, and who has the ability to change them.

The Web-based nature OpenCollection has brought the Museum many unexpected benefits. The Museum underestimated the value of being able to perform off-site cataloging. So far, the Museum has used this ability to allow interns to continue working on the digitization and access initiative after they have returned to school or moved on to jobs elsewhere. In the future, this capability will provide subject area experts to access detailed collection information and work directly with catalog records.

Implementing and supporting OpenCollection presented some challenges to the Museum. The software’s “LAMP”(Linux, Apache, MySql, and php) orientation placed the burden of implementation/technical support squarely in between two different staff functions. The person responsible for supporting back-office applications specialized in Windows-based client-server applications, and was not familiar with the Linux-based computing environment. Alternately, our Webmaster, while knowledgeable about the LAMP environment and excited by the underlying do-it-yourself ethos of the project, was not oriented towards supporting Museum staff use of internal software applications. He was instead oriented outwards, towards external users of our Website. The adoption and implementation of OpenCollection challenged the Museum to alter its organizational structure to take advantage of the potential benefits of the software.

Sharing OpenCollection

The nature of the Museum of the Moving Image's collection required that OpenCollection be flexible enough to handle a wide array of physical, analog, and digital item types; enable the description of structured relationships beyond those of a work and its maker; and be able to work with multiple metadata schemas and controlled vocabularies. This flexibility makes the software suitable for use by a wide variety of collection-holding cultural heritage organizations.

OpenCollection has now reached a turning point. The Museum is using the software for its own needs, and will continue to do so as part of its ten-year digitization project. By March 1, 2007, the OpenCollection code will be made available online through www.opencollection.org. While some institutions with a do-it-yourself ethos and internal development capabilities will be able to download the software and modify it to fit their needs, other, smaller institutions will require a greater level of support. It is precisely these smaller institutions that may stand the most to gain from free, Web-based collections management software.

Next steps for Moving Image and Whirl-i-Gig include working to develop the infrastructure necessary to administer and nurture OpenCollection as a generalized software application distinct from its utility to the Museum’s own collection. While the software has been in use at Moving Image and shared with several other institutions, there is still no formal documentation for installation, administration, or front-end use. Until now, the staff of Whirl-i-Gig has been sufficient to handle both development and support for OpenCollection within the small user community, but additional developers will need to become involved in both maintenance and support in order to support a wider user community.

If adopted and supported, OpenCollection’s ease of use and the fact that it is free will enable organizations to direct their financial resources towards the research, cataloging, and digitization of their collections, rather than towards the purchase of software licenses and service contracts. In addition, by using open and widely known technical standards, Museum users can work with a potentially deep pool of technical talent that can provide support, extensions, and modifications, and a user community that can provide advice and expertise. We especially hope that OpenCollection (and other software of its kind) will encourage, or at least not discourage, collaborative and multi-site/distributed research and cataloging, collection sharing, federated search, and the incorporation of user/researcher-generated information, something that cannot be accomplished with non-Web-based software.

Museums and the Web 2007 is the first public presentation of OpenCollection. The landscape for Museum collections management software has changed in the five years since the Museum and Whirl-i-Gig set out to develop OpenCollection, and other approaches to collections management software are now available to Museums that were not available to the Museum of the Moving Image in 2003. Such strategies include customization of open-source content management systems and other types of ‘roll-your-own’ approaches. However, we are receiving a steady stream of requests for the software just from the www.opencollection.org Web page and word-of-mouth from some informal demonstrations. As of this writing, several organizations, ranging from the Harry Ransom Center at the University of Texas at Austin to the Coney Island History Project, are testing versions of the software for their own collections.

The Museum is currently seeking assistance and support to establish a formal effort to fuel the continued evolution of OpenCollection. Planned activities include forming a working group, conducting a formal test-bed, and possibly establishing a private foundation. The Museum and Whirl-i-Gig hope to remain at the center of these developments, but also recognize that open source projects, to be successful, must take on a life of their own.

Cite as:

Goodman, C., et al., OpenCollection Web-Based Collection Cataloguing and Access Software, in J. Trant and D. Bearman (eds.). Museums and the Web 2007: Proceedings, Toronto: Archives & Museum Informatics, published March 1, 2007 Consulted http://www.archimuse.com/mw2007/papers/goodman/goodman.html

Editorial Note