MW98: PAPERS

Archives & Museum Informatics

info @ archimuse.com

www.archimuse.com

published April 1998
updated Nov. 2010

New Web-Based Interfaces to Old Databases

Stephen R. Toney and Karen Donoghue

[Note: The interface being described was designed for a browser maximized on a 800x600-resolution monitor (or higher) displaying more than 256 colors. The figures in this paper will be hard to appreciate at a lower resolution and color-depth. The background of the "Monument Search" menu button in Figure 2 should be light purple, not gray. Since these figures cannot do the interface justice, we invite you to try the demo version at http://www.systemsplanning.com/census-dem/]

Introduction

Like the rest of the world, our work has increasingly been involved with the World-Wide Web since 1996. Although a great deal of what we know about databases and how people use them is relevant to putting databases on the Web, there is still a lot to learn that is new.

My first Web project was the development of a new interface to an old database. In this case the database was almost 50 years old and had been implemented in two earlier automated systems.

Figure 1. The Janus logon screen (click on images for larger version)

The new system, called Janus, was developed under contract to the J. Paul Getty Trust to provide access to the Census of Antique Art & Architecture Known to the Renaissance.

Doing such a massive project as my first Web application was truly being thrown into the deep end, although my co-author, who was the interface designer on this project, has been doing this work for several years. The Census consists of records for about 15,000 antique monuments and works of art, and records for 25,000 Renaissance writings about and drawings of these works; there are also 150,000 other records, such as preservation histories and authority records; plus 25,000 images. As you can imagine, the information landscape that the Census represents is one that has many dimensions and relationships. Our job was to make this complex data explicable, navigable, and easy to use.

Our overall goal was to create a user experience that enabled effective browsing and searching of the content, while affording the best use of the technology and the available bandwidth. For example, we enabled single-click access to supporting vocabularies, so it would be easy for the user to create queries. We also chunked search results into digestible bits of information, using image thumbnails and icons to represent information about the search results. In this way, the user would not be overwhelmed by detail about the results of a search, but could choose which results to pursue. By using image thumbnails, we made judicious use of precious bandwidth, allowing the user to decide when to commit resources to downloading a full-size image. Often the thumbnail had enough information in it so that the user knew whether it was the correct one to pursue.

Since then, Karen and I have collaborated on a number of other Websites, both for museums and for other organizations. We have discovered that the solutions developed for Janus are applicable to an amazing variety of problems, giving us additional confidence in our approaches. Although the visuals are taken from Janus, this talk is also based on knowledge gained from deploying the MWeb website tool in a variety of situations.

For more info on Janus, the Census, or our other work, see the URLs listed on the title-page image above. As you can see on the logon screen, there is a freely available demo version of Janus which you can reach from my website. You will also find there links to implementations of MWeb.

Databases on the Web

The World-Wide Web provides a wonderful opportunity to make old databases widely available at low cost. While some sacrifices of functionality are inevitable, the potential to reach a world-wide audience through a standard interface far outweighs the drawbacks.

The potential is truly amazing. Only four persons in the world could use the first Census system that required a Unix server, a PC, two TVs, and two videodisc players. The second Census system requires a dedicated high-end computer with 8 gigabytes of storage - for a DOS application. Janus, the third Census system, can be used by anyone with a browser.

All Web applications inherently gain from the numerous benefits that come with the platform:

they are usable by PCs, Macs, and Unix client computers, regardless of location and modem speed;
they have graphical interfaces;
they have hyperlinks for navigation, a wonderfully intuitive method, especially for the general user;
there is no need to distribute software;
and most important for museums, they have world-wide availability.

I mentioned sacrifices. The most important one of these is speed. The Web is just plain slow in transmitting data, and this can be compounded by distance and by local networks that must also process the transmissions. Although we generally keep the size of downloaded data small, we wanted to make an exception here to permit the user to see the entire set of values for a given field, without having to ask for it in sections; this is one of the best ways to empower the user to create good search strategies. However, there is no getting around the fact that the complete list of values for a large database can generate a large download. We had trouble with this during the development of Janus, which implements this with the buttons called "See List' in Figure 2.

Figure 2. Seeing a list of all values for a field

A large list may have tens of thousands of entries (limited to 35 characters each, however). These lists take no more than 10 seconds to display in my office, only a mile from my Internet Service Provider. At the Getty, 300 miles away, and complicated by a local network, these lists were unusable; the browser simply assumes the data is unavailable. (We did not abandon this method because Janus was always intended be installed at the Getty using their fiber-optic network.)

A newer solution used by MWeb is to put on the "See List" buttons the number of values. Since MWeb is installed at museums for public access, speed is even more important since modems will be the primary means of access. Thus in MWeb the user has buttons that say "See all 589 values". This involves another trade-off, since the display of the buttons is delayed a fraction to perform the counts, and they cannot be done ahead of time if the database is live; but this is preferable to frustrating the user with a question like "Are you sure you want to see all 589 values?" every time a button is clicked.

The lesson here is that any large downloads of data must be at the user's option. In a theme that will be repeated throughout, the user should never be surprised except pleasantly.

The other sacrifice is the poverty of HTML and Javascript for building a sophisticated interface. Karen's original design called for tabbed dialog boxes to group the various Census fields, as in this figure:

Figure 3. Original design for Monument Search

However, once an HTML page is displayed, it cannot be modified without a call to the server, which causes delays (although there are now some new ways to make a page seem to change). Since at the time we were designing Janus, Java and ActiveX were not serious options, we had to find a Javascript solution. In the final Monument Search shown in Figure 2, Javascript had to be written for the tabs to simulate scrolling a list of fields.

In other words, while HTML is an amazing in what it can do with so little, its interface features are impoverished compared to Windows, the Mac, XWindows, or similar platforms. Dynamic HTML, style sheets, and other things in development will improve this, but for public access, you should not use features that are less than two years old or you will reduce the size of your audience.

Overcoming Limitations of the Web

How do we overcome these two important limitations of the Web?

First, we pay a lot of attention to the user's experience, and what is needed to enable a "richer" interface. We design custom interface components that married the "look and feel" of a traditional Microsoft Windows application interface with "Web-interface" designs, such as a persistent navigation bar at the top of the screen.

Second, we use techniques that address the issues of speed and database access.

Karen and I hold one principle very dear: that the user of the system is paramount. Simplicity, speed, and meeting user expectations are the goals of all our development work. These are not easy to achieve in querying a large, complex database, which can put heavy processing burdens on the computer. This was one of the major challenges in developing Janus - the Census is so complex that the Unix system often had "response" times (if that is the appropriate word) of 5-10 minutes for some queries. I consider this unacceptable. Period.

How can the speed problems be overcome? We use three techniques, each discussed below:

Preprocessing of the database to simplify searches and reduce the amount of data sent
Javascript to avoid calls to the server
Client-side image maps, also to avoid server calls

Maintaining Flexibility

After user considerations, my second concern is always flexibility of deployment. This is something I would encourage you to strive for, as it will be of value to you when the next platform after the Web comes along. Our work implements this in many ways, but two are noteworthy if rather esoteric. First, the query syntax for each type of search is stored in a database, accessed when the user performs a search. Thus the syntax can be changed and new types of queries added without changing the programs.

Second, a data-access library called ODBC (Open Database Connectivity) is used for all database access. This allows the underlying database management system to be almost anything without changing the programs. The downside - which affects only the developer, not the user - is that all access of the database must be in the standard SQL language, which makes some things harder to program than using direct database commands.

Later work has shown both these techniques to be equally useful to perform database updates as they are for searching.

Speeding Response Time

Database Preprocessing

By preprocessing, I mean transforming the data from its original form to achieve better speed. This may involve denormalizing tables, adding redundant data, making hyperlinks in advance, and other techniques.

Although normalized data and non-redundant data are usually taken to be sacred concepts, they are really of most value in conceptual data models. Performance requires minimizing the number of tables accessed. For installations that do not update the database, we are free to derive a new database from the working one to suit the needs of Web access. The only downside is extra disk space, which is almost free now.

To speed up searches, I write programs to rip apart the database and reassemble it for performance. Indexing is also given a great deal of attention - more sophisticated indexing is used, even for keywords, than just indexing every word.

A simple example of preprocessing can be seen in this display of Janus search results, showing which records have images.

Figure 4. Brief-record display of search results

Here it would be possible for the CGI program to count the image links for each record and create the label on the fly. But since the number of links cannot change in a read-only database, I saved the user's time by counting the images during preprocessing.

Another example of preprocessing is the use of a separate table just for image queries, which puts all the data together that is displayed with images (Figure 5). This saves having to get the few fields needed from the main database which can have hundreds of fields per record. The result is that image queries are actually faster than data queries, even though images are inherently slower to transmit and display.

Figure 5. Results of image search (bottom cut off intentionally)

Here, although the same number of records were found, and each image retrieved from a separate file, the response time is quicker because of the separate file. Redundant and denormalized - but efficient!

A final example of preprocessing: Renaissance notebooks and sketchbooks tend to have been unbound and their pages dispersed throughout the world. Thus it is impossible for a scholar to view the book as its author created it. Janus implements a feature that partially overcomes this problem - it has a page-view for Renaissance documents that shows their images as well as the transcribed text (Figure 6).

Figure 6. Viewing document pages virtually

All the labels you see represent multiple levels of links and are built during preprocessing; during output, only the text needs to be manipulated.

Although the examples focus on manipulating data to reduce the delay for the user, huge speed gains can also result from having searches result in what the user expects, thus saving trial and error. During the successive releases of Janus we paid attention to what data elements were returned from searches, and fine-tuned these in order to minimize follow-up searches. Thus speed must be a priority for both the architecture and the interface design.

To summarize, speed is an old consideration that is of new importance on the Web. With a traditional application, you have a variety of solutions if performance is unacceptable, the simplest being to use a faster computer. With the Web, however, the transmission delays are unpredictable and out of the developer's control. Thus the emphasis in our work on minimizing the number of calls to the server; and secondarily minimizing the amount of data transmitted, although this is much less important.

Javascript

Javascript is another feature that is good for the user but hard on the developer. It is actually harder than writing functions into the CGI program. But using the CGI program means a delay while the user's request is transmitted, processed, and the results received. Instead, Janus and MWeb user Javascript run by the user's browser, so the results are instantaneous. Of course, when the database is involved, the request must go to the server; but query building, checking user input, processing buttons, and other client-side processes are all done by the browser. Because the displays (which contain the hidden Javascript) are generated by the CGI programs, it meant writing Javascript code in C++. In fact, this work provided a rather traumatic opportunity to program in 5 languages at once: Javascript, C++, HTML, SQL, and the native language of the database management system, only the first two of which are at all similar!

Here is an example of Javascript to insert the user's selection into the search form:

Figure 7. Inserting data values with Javascript

If the user clicks the "Go" button, this search is sent to the server. However, if the user clicks the "Edit" button ...

Figure 8. Creating the search string using Javascript

... the query, created from the fields above using Javascript, is displayed as it will be sent to the server. The user who knows boolean notation can edit the search before sending it (remember, this is an interface for scholars, not the public). If the user changes the default AND/OR operators using the radio buttons in Figure 8, the search string is instantaneously updated using Javascript - no waiting for the server to parse and reformulate the query and send it back! (Field numbers are used instead of names because of the limited space in the textarea, and also to save numerous server calls to translate them, a compromise I'm not too happy with.)

It might be added here that Java and ActiveX were avoided completely in Janus, partly because of the outstanding security questions, but more because they require the user to wait while large programs are downloaded. As I said before, I don't think the user should ever have to wait.

Javascript has one severe limitation - it is not implemented the same way across platforms. We had to stop supporting Microsoft Internet Explorer version 3 for Janus because it did not implement Javascript correctly. This was not a problem since Janus was developed for use internally at the Getty, which has standardized on Netscape Navigator; but in later work, I have had to write programming code to have these functions processed by the server for users of IE3. The downside is that using the server instead of Javascript causes delays for processes that the user can expect will not take time (such as jumping to a specific previous display). In fact, functions like the Monument Search you have just seen would not be worth doing without Javascript.

Client-Side Image Maps

This is no longer a new technique, but worth mentioning. Image maps are images that perform different actions depending on where the user clicks. Originally, the coordinates of the user's click were sent to the server for action. Now you can embed the logic in your HTML pages so some work is done by the browser. That is how the tabs you see here are implemented.

Planning the User Interface

I hope you'll agree that Janus is a very attractive system. I think the fact that both of us have a passion for clean and attractive interfaces is one of the reasons so much complexity is not overwhelming.

However, a lot of planning and prototyping is also required. A lot of credit must go to the testing team at the Getty that gave us feedback. By using rapid prototyping techniques, the interface went through many iterations without wasting precious development time. Mockups were done as hand-drawn sketches, then rendered as image maps to create a storyboard of the user interface.

Here are some thoughts on the user interface for database applications on the Web.

Simple Searches

One way to avoid overwhelming the user is by having multiple levels of searching. This is an old idea, but we took it as far as we could. Besides the advanced searching described earlier, we also added a simple way for users to search by keywords, and a simple way to "search by thumbnail" since users indicated early on that images were a crucial part of their browsing and searching process. These latter two were combined into the same screen; we made this look simple, but it has enough power so that the advanced searching is rarely needed:

Figure 9. Janus Keyword Search

This is the initial screen after logging on. The essential objects on this screen are only 1 instruction, 1 place to type, and three buttons. On the left are more controls that I'm not too happy with because they distract from the total simplicity of the rest of the screen. I put them in colored boxes to make them look like "sidebars"; that is, not essential to the main purpose of the screen. Even though the sidebars detract from simplicity, for a user familiar with the conventions of graphical interfaces, I believe the functionality is self-evident. But if the user ignores them, the expected occurs - which is what we should strive for.

In fact the most common search performed consists simply of typing a single word and pressing the enter key. The resulting display (shown in Figure 4) is also fairly simple, given the complexity of the possible results. And the use of links means not having to tell the user how to get to the full record.

Meeting user expectations also enables the avoidance of instructions and help files. For example, one would hope that typing "apollo belvedere" and pressing Enter would bring up that statue, and in fact it does. This is because the keywords are ANDed by default and are not case-sensitive.

A user can truncate one or more keywords by simply adding the most common character for this: the asterisk. Have we needed the Help file so far?

The first two radio buttons control whether the search will show data or thumbnails. The drop-down list enables the user to search only specific record types; for example, the person authority file can be searched for "Caesar" to find the nine or ten persons with that name, instead of wading through the hundreds of other records with the word in them.

Finally, just as the user can see the complete list of values for any field in the advanced search, the last two radio buttons permit searching the complete keyword index and the Art & Architecture Thesaurus.

For the record, we could not avoid a Help file - although the users do!

Single-Record Displays

I haven't yet discussed the single-record display. Users can see the full data for any record, no matter how arcane (such as notes entered by the data-entry person or the barcode number on photographs scanned for the Census).

The next two Figures show a full record for an only moderately complex Renaissance document:

Figure 10. Full record for a Renaissance document (1 of 2)

Figure 11. Full record for a Renaissance document (2 of 2)

Here is another use of links to make it simple for the user. All records related to other records have links in both directions. Every field controlled by authority files has a reciprocal link to the record for that authority (the Creator field in the second slide). Records in hierarchies can be clicked on to see their ancestors and descendents (these include not only placenames but monuments and documents that have multiple parts). Every image is a link to its larger sizes. Images and data are linked so that the user can go from one to the other simply. Here again intensive processing of the data was done to take advantage of the potential of hyperlink navigation.

Saved Searches

Complex databases can be hard to use. When you do develop a search that gives results you want, it is helpful to permit the user to save it to be used again later. We implemented this by permitting the user to save search strategies. In contrast to the obvious approach, which is to save them on the server, coded by the user's ID, we took the simpler option of storing them on the user's computer. Thus we don't need to maintain a list of users, keep track of expiration dates, etc.

Since Web browsers have only one way to store data, namely the cookie, this was the only approach available. Cookies have some limitations, not least of which is that some users don't trust them, but I believe they are acceptable if used only for optional functions.

Imaging Considerations

Although using multiple image sizes is not a new concept, a great deal of effort was spent finding the optimum sizes for all monitor resolutions. The goal was to be able to display two images side-by-side on standard 800x600 monitors without having to scroll the images. In addition, a large image can be seen in its entirety at this resolution. On high-resolution monitors, of course, more or larger images can be seen. As usual, the user's needs were the only consideration.

Figure 12. Two full-size images

This shows two images of the same monument, scanned from different photographs.

Figure 12. Two zoomed images

By clicking on the images to zoom them we can easily read the inscription in the left image, and confirm the poor photography and/or scanning of the right image. The point is that no single image size will serve for both the overview and studying the details. Using large images for both slows down the user's experience unacceptably, but large ones are needed when the user wants detail.

Here is another interface design question. During the design of Janus we considered various schemes for predetermining where and at what size images should open. Finally we realized that for art scholars, no predetermination could be made. Thus any number of images can be opened and the windows can be moved and resized at will.

A final word on images. These are absolutely marvelous considering what we started from - low-resolution scans of what often were tourist snapshots. Get professional help here; it's worth the money.

Systems Development Techniques

Two additional topics relate to how the system was built.

Progressive Releases

It is important with large projects to keep the user population interested throughout a long development process. Although the Census is of great importance to a small group of scholars, it is obviously not a major priority to most people. Thus we had to keep our group of testers enthusiastic. One obvious way was to provide a free lunch at each review meeting.

The other way was to implement what I call "progressive releases", a development process in which the system is developed and released in parts. With Janus, although the development took a year, the first release was out only 21 days after we began. This gave testers their first look at keyword searching, a feature they had been wanting for 10 years. These progressive releases are more than prototypes or betas. They are intended to be complete, bug-free releases of part of the system, in order to provide useful features as soon as possible. And in fact Janus was being used by scholars long before it was finished. This approach is more costly, in that finishing work is done on features that may be changed later; but the benefits are higher also.

Here again I must credit the testing team at the Getty for their prompt and valuable contributions to twelve releases of Janus in eight months.

There is a danger to this approach besides higher costs. As the simpler features were released first, the demand grew to elaborate these to solve problems better addressed by later features. It was necessary to resist this pressure to ensure that the simple features did not lose their simplicity.

Security

A final important consideration for databases over the Web is security.

Passwords are the obvious solution. But we have to ensure that the password process cannot be avoided. The primary means of doing this is to avoid using static HTML pages, since if these can be displayed in a browser they can be used to submit valid database requests. Instead, HTML pages are generated by the server after each request.

An alternative way is to use cookies to transmit the user's logon information with every request. This was rejected because of inherent security problems with cookies, and even more because of the perception of security problems.

Conclusion

In designing a new "face" for the Census, we have learned a great deal about the challenges faced by developers in rearchitecting databases to live in a new and dynamic environment such as the Web. These challenges come from attempting to create a compelling and elegant solution in the context of a novel and unsophisticated design/technology environment. By placing the user's needs squarely at the center of our design goals, and by exploiting the available technology with an eye towards the future, we have crafted a solution that answers many of these challenges, and serves as a strong basis as new tools and environments emerge.

Last modified: March 18, 1998. This file can be found below http://www.archimuse.com/mw98/
Send questions and comments to info@archimuse.com