The Little Search Engine That Could: How an On-line Database is Paving the Way for Enhanced Access to Research Collections
Pauline Rennick, Victor Gatnicki, Jim Whittome, Janine Andrews, and Frannie Blondheim, Museums and Collections Services, University of Alberta, Canada
Scholarly researchers bring a unique set of expectations to bear upon the building of databases in a university museum environment. Their investigations on species distributions, biodiversity and climate change, to name a few, require analysis of large sets of data. The creation of on-line interfaces to collections databases, complete with sophisticated Web-based tools, will empower researchers to convert raw data into new scientific knowledge. The University of Alberta Museums has developed an innovative Web interface to one of its museum databases, providing researchers tools to facilitate, the generation of not only entomological specimen data held in the database, but also dynamically generated seasonal histograms, the ability to plot search results on a map, and the presentation of knowledge summaries of entomological species. This paper discusses how the University of Alberta Museums, in conjunction with other departments on campus, develops this type of enhanced Web interface to collections databases. This initiative will be explained within the context of the University of Alberta Museums' unique decentralized administrative model.
Keywords: university, database, collection, research, teaching, evaluation, search
Like many museums, the University of Alberta Museums, in Edmonton, Canada, has developed Web interfaces to databases in order to broaden access to collections. However, because it is a university with a decentralized museum system, the University of Alberta Museums faces unique challenges in terms of its diverse collecting community and the specific teaching and research needs of its faculty. This paper will describe the administrative model of the University of Alberta Museums and the role of a central department, Museums and Collections Services, in developing collection databases and Web interfaces as part of a comprehensive Virtual Museum. It will also provide detailed technical information on the Web interface to the E.H. Strickland Entomological Museum as an example of an interface with enhanced features for research purposes. Future directions for digitization at the University of Alberta Museums will also be outlined to indicate the potential of this 'little search engine'.
The University of Alberta Museums Administrative Model
The University of Alberta is located in Edmonton, Alberta, Canada Edmonton is the province's capital city with a population of one million people in the greater metropolitan area. One of the largest universities in Canada, the University of Alberta has a student enrolment of 36,000 students and a staff of 7,000. Established in 1908, the University of Alberta has collected artifacts and specimens for nearly 90 years and now houses one of Canada's largest collections of approximately 18 million artifacts and specimens in both human history and natural history. Human history collections include art, classics, archaeology, ethnography, anthropology, and clothing and textiles. Natural history collections include paleontology, both invertebrate and vertebrate, mineralogy, meteorites, petrology, entomology, zoology and herbaria.
The collections of the University of Alberta Museums are distributed throughout the University campus. There is no one central building housing museums. Instead, it is a distributed system of 35 museums and collections spread over 16 departments and four faculties. Each collection is housed within the academic unit that uses the collection for teaching and research. While each collection has a curator designated by its respective department, curators are faculty members with teaching and research responsibilities. Few collections have collections management or technical staff.
The University of Alberta Museums system is a collaborative group orchestrated through the efforts of:
Development of Foundation Databases
As part of its mandate, Museums and Collections Services provides a coordinated approach to the computerization of collection artifacts and specimens through the development of standardized procedures for organizing collections and the implementation of centralized collections management software. In the area of digitization, Museums and Collections Services, in conjunction with curators and collections staff, develops and maintain foundation databases of collections data and develop Web interfaces to those collections.
Before the Web interfaces to collections can be examined, it is important to review how the collections management system, in which the on-line collections are housed, was developed. The decentralized nature of the collections impacted how the collections management system was implemented and, later, the nature of the Web interfaces.
In the early 1990s, Museums and Collections Services led the implementation of a coordinated centralized collections management software system. Multi MIMSY was selected as the collections management system of choice, following consultation with curators, collections staff and the University of Alberta's Computing and Network Services.
By 1994, two collections, the University of Alberta Art and Artifact Collection and the University of Alberta Zoology Museum, began using Multi MIMSY. Given the diverse nature of these collections, a decision was made to create two separate instances of the database. As other collections adopted the collections management system, this precedent was followed, and each collection was allocated a new database instance. A total of twenty-three collections, on sixteen database instances, now use Multi MIMSY as their collections management system. The remaining collections on campus either have their own collections software or have not yet computerized their holdings.
This approach to collections database development has been advantageous in many ways. With only one collection to take into account, it has been a relatively easy task to develop the generic system interface to meet the specific needs of the collection. In terms of data integrity and security, collections staff can only access their own collections data, thereby eliminating the risk to other collections data. Furthermore, as the relative number of records on each database is low, the systems can perform at a consistently high level.
There are also some disadvantages of maintaining multiple systems. For example, each time an upgrade is required, the upgrade must be run on each database, resulting in additional staff time and resources needed for database management and maintenance. The development of multiple versions of authorities, such as place names, is also a draw on the limited time and resources of collections staff. To address these issues, in conjunction with curators, the possibility of amalgamating some databases with similar needs is being explored. It is anticipated that some collection databases will be amalgamated, thus reducing the number of database instances. Collections staff are also looking at ways to eliminate the duplication of authorities through the sharing of resources.
While Museums and Collections Services has successfully created numerous database instances and collections data entry is proceeding on a regular basis, there is still a significant quantity of collections information awaiting on-line cataloguing. Future goals are to greatly increase the number of records in the database. However, as is the case with many museums, this goal is dependent on the availability of financial resources.
While some collections at the University of Alberta Museums have sufficient staff dedicated to on-line cataloguing, others do not. Many collections hire students or temporary staff to perform collections work. It is often not cost-effective to train these individuals to enter data directly into the collections management system. In order to circumnavigate this issue, Museums and Collections Services has developed load utilities. Temporary staff and volunteers can now easily create load files in a spreadsheet application. The load files are then run through the load utility in order to populate the database. This method is working well, especially for loading specimen data into natural history collections. For example, as a result of using the load utility, content in the E.H. Strickland Entomological Museum more than doubled in 2003. Plans are underway to adapt the load utility, where appropriate, to other instances of the database.
Developing Web Interfaces
When computerization of collections at the University of Alberta began, it was considered sufficient to provide access to the information using the collections management system application. It quickly became apparent, however, that this method of retrieval was inadequate. Curators, for the most part, did not have the time to learn the querying skills required to gain intellectual access to their collections. Furthermore, providing access to other scholars, researchers, and students was proving difficult. Sets of data had to be assembled, vetted, and shipped in order to provide access to the data held in the collections database. A means to provide simpler and broader access to collections data was required.
To fulfill this need, and as a result of the development of the internet, Web interfaces to collections data were considered. In collaboration with selected collections, Museums and Collections Services began developing Web interfaces to the databases. As a result, the onus was no longer on collections staff to provide access via the collections management system. The implementation of Web interfaces not only saved time spent assembling data sets for researchers, but also made collections data available to a much wider audience. By broadening access to collections data, Web interfaces also facilitated the University's goal to share resources with the community and engage community users. Broadened access also served to demonstrate the richness of collections housed at the University of Alberta Museums.
Due to the multi-disciplinary nature of the collections and the fact that separate database instances were created for each collection, it quickly became apparent that a Web interface would have to be developed for each database instance. These interfaces needed to be tailored to the foundation databases from which the information was drawn and also to the requirements of individual curators and the diverse audiences of each Web interface.
An example of a unique collection need was evident in the first Web interface developed by Museums and Collections Services. The interface was to the Pathology Gross Teaching Collection of the Faculty of Medicine. The Department of Laboratory Medicine and Pathology had a large pathology collection, and professors in the department expressed interest in integrating the collection with teaching. Due to the sensitivity of the collection, it was necessary to password-protect the site in order to limit access to medical students only. The collection has since been incorporated into WebCT (Web Course Tools) and is used extensively for teaching medical students.
In 2001, Museums and Collections Services launched the Virtual Museum Web site http://www.museums.ualberta.ca. The Web site, of course, has many purposes, but one of its major objectives is to provide access to the research collections of the University of Alberta. Since that time, Museums and Collections Services has developed seven Web interfaces to diverse collections including, the University of Alberta Art and Artifact Collection, the W.G. Hardy Collection of Ancient Near Eastern and Classical Antiquities, and the Clothing and Textiles Collection. Natural history collections with Web interfaces include the E.H. Strickland Entomological Museum, the Laboratory for Vertebrate Paleontology, and the Mineralogy Museum. A further Web interface to the Ukrainian Folklore Archives is ready for launch.
Throughout the early projects, the intention was to model the Web interface on the functionality provided by the database. But over time, it became obvious that the interface had become the primary tool for researching collections data. Computational Web-based tools, such as data plotting, demonstrated the powerful additional functionality that could be provided through the Web interface. This shift in thinking has impacted the way Multi MIMSY has been adapted to meet collection needs. For example, the Ukrainian Folklore Archives database was originally developed in a complex way to allow searching on the anglicized version of a Ukrainian name. Later, when a Web interface was being developed for the collection, it became apparent that the search functionality at the database level was unnecessary and created data entry difficulties. Although the data was still drawn from the database, the search interface on the Web was able to provide this functionality in a much simpler fashion.
As Web interfaces have been developed, collections staff increasingly see the potential of adding enhanced features. In one notable case, the E.H. Strickland Entomological Museum envisioned providing current species data as well as specimen data to those using the site. In order to do this, a creative solution had to be found within the database to allow for this functionality. This led to an adaptation of the Subject Authority, a module where content and themes represented in the collection are recorded, into a Species Authority. Specimen records of that particular species were then linked to the Species Authority. Using the same relationships established in the database, users are now able to search on both species and specimen data.
As the interfaces are used in teaching, further demands have been made on the collections management system. The Paleobotany Collection, for example, uses the teaching collection in laboratory work. The data for this collection has been entered into the system complete with images. Paleobotany students needed to have a convenient way to review the content of their labs. In order to meet this need, a field was added to the database indicating the lab to which the specimen belongs. As a result, students are able to obtain this information directly from the Web interface when completing their lab assignments.
In a similar vein, the University of Alberta Art and Artifact Collection is currently reviewing subject access models for works of art in order to provide researchers and students a thematically based tool for discovery. It is anticipated that art history students will be able to drill down through a hierarchically structured subject index to find works of art based on their research themes.
Museums and Collections Services has always worked with collections to develop metadata, value and content standards. From the beginning, Museums and Collections Services has employed the Canadian Heritage Information Network (CHIN) metadata structure in the development of databases. As distributed networks become more prominent, the data structure for some of the natural history collections has been revisited in order to comply with Darwin Core, a set of standards used for search and retrieval of natural history collections. (http://tsadev.speciesanalyst.net/documentation/ow.asp?DarwinCoreV1)
Furthermore, to ensure commonality between interfaces, efforts have been made to ensure that data is treated in a similar manner and that authorities are not developed completely independent of each other. Plans are in place to further harmonize and share the authorities, as appropriate, between collections.
In order to ensure interoperability with partner research networks, Museums and Collections Services has recently converted a set of data into the DiGIR (Distributed Generic Information Retrieval) protocol. The University of Alberta Museums currently participates in two distributed research networks. It is a partner with the Canadian Biodiversity Information Facility (CBIF) and the Global Biodiversity Information Facility (GBIF). As a result, data on the University of Alberta Museums Freshwater Invertebrate Collection is being made available through the Species Analyst database. The University of Alberta Museums also participates in the HerpNet project based at the University of Kansas.
The technical infrastructure used to develop Web interfaces has also evolved over the past few years. The first Web interface was built using frames, ASP (Active Server Pages) as a scripting language, and was hosted on an IIS (Internet information server) Web server. The drawbacks of using frame pages were soon realized as creating links directly to case pages was very difficult, and navigating through the site was complex. As a result, subsequent Web interfaces developed do not use frame pages.
ASP as the scripting language was eventually replaced by PHP (Hypertext Preprocessor). PHP was chosen due to its speed. In PHP modules, all activity runs in the PHP memory space and, as there is no need to communicate with objects in different processes, PHP code runs faster. PHP supports all major platforms (UNIX, Windows and mainframes), and features native support for most popular databases. In addition, PHP was selected due to its short learning curve, quick development time and high performance (www.php.net).
The Web interface of the E.H. Strickland Entomological Museum was developed using PHP. PHP is embedded into HTML pages and is executed by a Web server for every user request. The interface to the E.H. Strickland Entomological Museum uses the Apache Web server. The Apache Web server is platform independent, highly configurable, extensible and provides a stable Web application environment without the constant worry of security flaws and patches (www.apache.org). Digital collections information is stored within an Oracle 8i database and OCI (Oracle Call Interface) is used within PHP to communicate information from application to the database and vice versa.
The E.H. Strickland Entomological Museum interface is able to generate a histogram of the seasonal distribution of adult specimens within the Entomological database. The histogram image is dynamically generated by a Java application that retrieves specimen data and groups the data by collection date. The histogram uses a blue bar to represent all specimens collected within a specified date range and a red bar that excludes specimens collected at the same date and location.
The histogram assists researchers to determine the seasonal occurrence of a particular species. A histogram diagram can be created for specific time range to display the occurrence of specimen and observation data over time. Factors such as plant growth or predator population can influence the specimen occurrences in a given date range.
Another enhanced feature of the interface is the capability of plotting geo-referenced collections data onto a two dimensional map. The client side mapping application was developed using Java servlets that communicate with ESRI's ArcGIS 4.01 mapping server in order to generate two-dimensional maps. The ArcGIS 4.01 server is a Java application that requires a Java capable Web server. Therefore TomCat 4 is used as the Java enabled Web server. The map server is set to generate a 10TM projection of Alberta. For that reason, the geo-referenced collections data is converted into 10TM coordinates before it is plotted onto the map. UTM (Universal Transverse Mercator) divides the world into 60 numbered zones, both north and south, separated by the equator. Alberta falls between two zones (UTM Zone 11 and UTM Zone 12). Therefore, a decision was made to use 10TM projection because 10TM projection runs from the 120th degree of longitude to the 110th degree of longitude and completely covers the Province of Alberta.
Once specimen results are displayed, a user has the option of plotting the specimens onto a map of Alberta. This allows the researcher a completely new ability to explore spatial patterns in the biodiversity data right from the start of the analysis process. As the maps are on-screen, they can be manipulated to zoom in or out and pan around. Shading schemes and classification methods can be changed, and data added or removed at will.
A spatially-referenced specimen database provides researchers with answers to questions such as: What species are present at this location? What geographic features are common among the habitat of a certain species? Plotting specimens on a map is useful, but the full potential of GIS lies in its ability to integrate data from a variety of layers. At a basic level, this merely involves combining layers on-screen to compare patterns, but it also permits researchers to integrate data from a variety of sources. For example, a researcher may want to compare human population census data with geo-referenced specimen data to determine the impact of human populations on a particular species. Researchers could also add other types of data, e.g. data on rivers, to provide information about changes in water sources. As a result, rather than being a product of finished research, the geo-referenced collections data becomes an integral part of the research process (Gregory, 2002).
Another important feature of the E.H. Strickland Entomological Museum interface is the ability to integrate a time search with specimen locality information. This form of temporal GIS can answer a variety of legitimate research questions. Are there changes to species locality information - i.e., has the species migrated within the last 50 years? Has the spatial distribution changed due to affects of human population or development over time? What effects has the expansion of agriculture had on species distribution?
A word of caution regarding the use of GIS data: based on Museums and Collections Services experience, accuracy of the data is only as good as the original source. Specimen data is often taken from historical maps and, as the data on historical maps may not always be accurate, the representation of features from these maps in the GIS may also contain errors.
Recently, the Freshwater Invertebrate Collection was geo-referenced. During this process, it was discovered that different Datums were used depending on the time period the specimens were collected. All specimens collected before 1983 were geo-referenced using the NAD27 (North American Datum 1927) geographic coordinate system, whereas specimens collected after 1983 were geo-referenced using the NAD83 (North American Datum 1983) geographic coordinate system. As there are differences in the two Datums, ranging from 200-300 feet in western North America to several tens of feet in central and eastern North America, the specimens collected prior to 1983 had to be converted to NAD83. This resulted in extra staff time to complete the project. This problem should be taken into account when planning geo-referencing projects.
Evaluation of Web Interfaces
For Web interfaces launched since 2001, the process of developing a new Web interface begins with Museums and Collections Services staff working with curators of collections to evaluate the needs of a particular digital collection and the audience deemed to be the primary users of the interface. The objectives of the interface are considered, with accessibility to the digital collections being the prime motivator. Initially, the team discusses search criteria, search fields and whether enhanced search features are required.
Throughout the Web development stages of the interface, accessibility testing is essential to determine how well the interface complies with accessibility guidelines and how well it can be accessed by users with varying browsers, operating systems and/or disability software. A text-based editor is used for Web development of the interface. In the development process, W3C quick tips to make accessible web sites http://www.w3.org/WAI/References/QuickTips/) are exercised to make certain the interface is accessible.
In the post development stage, a group evaluation process has been used to evaluate Web interfaces. The process begins by selecting potential users for a group interview. This interview identifies targeted visitors, the site's objectives, key tasks, and measures of success. Using the data from the interview, the group of users is then provided with tasks they must accomplish using the searchable collection interface. Each individual's performance is recorded and then they are compared. This evaluation process will determine potential navigation and design flaws and uncover areas of user confusion. After this evaluation, the searchable interface is adjusted to address areas of concern.
In our experience, Web server log files can be used as an evaluation tool. Parsing the log file can determine which search features are used most heavily, how users navigate through the site, and what search terms are used. For example, the collections team parsed the log files for the University of Alberta's Clothing and Textiles Collection and discovered that the advanced search page was rarely being used. It was determined that the advanced search was too complicated and was causing confusion. As a result, the search form was simplified. A few months later, the Web server log files were parsed again and there was a significant increase in the use of the simplified advanced search form.
Evaluation is an ongoing process. As three years have passed since the launch of the first Web interface, steps are underway to embark on a thorough revision of the Virtual Museum Website. It is recognized that our evaluation processes are not rigorous enough and that further evaluation of the Web interfaces needs to be incorporated into any future plans for the Virtual Museum.
Future Directions for Web Interfaces
Providing access to the research collections of the University of Alberta Museums is a high priority for Museums and Collections Services. Future goals include greatly increasing the number of records in the databases and building additional Web interfaces to collections databases.
To enhance the content in the foundation databases, Museums and Collections Services, in collaboration with curators, plans to increase the number of images and multimedia available through the Web interfaces. According to Kravchyna and Hastings in their article Informational Value of Museum Web Sites (2002) (http://www.firstmonday.dk/issues/issue7_2/kravchyna/, visitors to virtual museums are looking for information about objects in museum collections. Kravchyna and Hastings' research found that 63% of virtual visitors want to search museum collections and, when they search, they expect to see digital images of collections (Kravchyna and Hastings, 2002).
Goals for the future, following a needs assessment process, could also include the development of Web-based applications that will help professors enhance their courses by incorporating collections data drawn directly from the foundation databases. This would also serve to increase the visibility of collections data on campus. As one means to achieve this goal, Museums and Collections Services has begun investigating the use of Learning Objects-packets of information - whether in the form of textual data or multimedia files - that can be used in the development of course material. The feasibility of developing a Web-based repository of Learning Objects could be considered. This repository would be generated from the collections data and associated multimedia files of all thirty-five University of Alberta Museum collections. Although very much in its infancy, it is envisioned that the collections databases would play an integral role in the production of quality on-line learning resources which professors and other instructors could use to enhance their courses. The University of Alberta currently uses WebCT (Web Course Tools), a set of tools for developing and delivering interactive courses or course component over the Web. Museums and Collections Services hopes to work with a selected collection-based discipline in the near future to more fully align database resources with teaching initiatives using WebCT and available new technologies.
In future, the University of Alberta Museums will continue to partner with research institutes regionally, nationally and internationally. The University of Alberta Museums is also exploring, along with other partner institutions in Alberta, the possibility of building a regional interoperable network of natural science databases and repositories. This network would unite all collections data among regional participating institutes and would improve collections based research by allowing researchers the ability to access a huge pool of data. If this project proceeds, Museums and Collections Services will, in conjunction with curators, build the technical infrastructure to facilitate the network.
The University of Alberta is one of the partners of WestGrid (http://www.WestGrid.ca), a collaboration to provide high performance computing resources for researchers. Researchers at the University of Alberta will be able to use WestGrid to analyse collections data. For example, grid computing would provide researchers with the ability to model and simulate the effects of local or regional climate change on particular species of plants and animals. This would be an innovative way for collections data to enhance research.
The decentralized model of the University of Alberta Museums provides challenges to Museums and Collections Services, but it also provides benefits. Curators come together to work strategically on issues that affect the whole University of Alberta Museums community. Our museum model enables and facilitates communication across departments and faculties and is an ideal medium for the University of Alberta to connect to and engage the broader community.
Digitization of collections at the University of Alberta Museums and the building of Web interfaces to collections data further enable connections to the community by making collections more accessible to communities both on and off campus. The progress of digitization has moved forward significantly in the past few years. An increased number of records have been added to the collections management system. Seven Web interfaces to collections databases have been developed, and the goal is to create eight more over the next two years. These initiatives have paved the way to increased and enhanced access to collections. In future, Museums and Collections Services will continue to work towards its goal of bringing the wonders of research collections at the University of Alberta Museums to the world.
Andrews and Blondheim (2003). Janine Andrews and Frannie Blondheim. Seize the Day! Museums in the Changing Culture of Universities: A Canadian Perspective. A UMAC 2003 Conference Paper http://www.lib.mq.edu.au/mcm/world/icom2001/2003conf/andrews.html
Giorgini, Fabrizio and Fabrizio Cardinali (2003). From Cultural Learning Objects to Virtual Learning Environments for Cultural Heritage Education: The Importance of Using Standards, in Learning Objects from Cultural and Scientific Heritage Resources, DigiCULT Thematic Issue 4, October 2003.
Gregory, Ian (2002). A Place in History: A Guide to Using GIS in Historical Research. In AHDS Guides to Good Practice (ISSN 1463-5194) http://hds.essex.ac.uk/g2gp/gis/index.asp
Jackson, Chris and Adam Cooper 2003. Learning Object Structure – A Critical Assessment. In Learning Objects from Cultural and Scientific Heritage Resources, DigiCULT Thematic Issue 4, October 2003.
Kravchyna, V. and S.K. Hastings( 2002). Informational Value of Museum Web Sites. First Monday, February 2002 Volume 7 Number 2.
Paterno, F. and C. Mancini (2000). Effective Levels of Adaptation to Different Types of Users in Interactive Museum Systems. Journal of the American Society for Information Science, January 2000 Volume 51 Number 1, pp. 5-13.
Royan, Bruce (2003). Learning Objects for the Cultural and Scientific Heritage Sector: A Position Paper. In Learning Objects from Cultural and Scientific Heritage Resources, DigiCULT Thematic Issue 4, October 2003