Databases on the Web
The World-Wide Web provides a wonderful opportunity to make
old databases widely available at low cost. While some sacrifices
of functionality are inevitable, the potential to reach a world-wide
audience through a standard interface far outweighs the drawbacks.
The potential is truly amazing. Only four persons in the world could
use the first Census system that required a Unix server, a PC, two
TVs, and two videodisc players. The second Census system requires
a dedicated high-end computer with 8 gigabytes of storage - for a
DOS application. Janus, the third Census system, can be used by anyone
with a browser.
All Web applications inherently gain from the numerous benefits that
come with the platform:
- they are usable by PCs, Macs, and Unix client computers, regardless
of location and modem speed;
- they have graphical interfaces;
- they have hyperlinks for navigation, a wonderfully intuitive method,
especially for the general user;
- there is no need to distribute software;
- and most important for museums, they have world-wide availability.
I mentioned sacrifices. The most important one of these is speed.
The Web is just plain slow in transmitting data, and this can be compounded
by distance and by local networks that must also process the transmissions.
Although we generally keep the size of downloaded data small, we wanted
to make an exception here to permit the user to see the entire set
of values for a given field, without having to ask for it in sections;
this is one of the best ways to empower the user to create good search
strategies. However, there is no getting around the fact that the
complete list of values for a large database can generate a large
download. We had trouble with this during the development of Janus,
which implements this with the buttons called "See List' in Figure
2.
Figure 2. Seeing a list of all values for a field
A large list may have tens of thousands of entries (limited to 35
characters each, however). These lists take no more than 10 seconds
to display in my office, only a mile from my Internet Service Provider.
At the Getty, 300 miles away, and complicated by a local network,
these lists were unusable; the browser simply assumes the data is
unavailable. (We did not abandon this method because Janus was always
intended be installed at the Getty using their fiber-optic network.)
A newer solution used by MWeb is to put on the "See List" buttons
the number of values. Since MWeb is installed at museums for public
access, speed is even more important since modems will be the primary
means of access. Thus in MWeb the user has buttons that say "See all
589 values". This involves another trade-off, since the display of
the buttons is delayed a fraction to perform the counts, and they
cannot be done ahead of time if the database is live; but this is
preferable to frustrating the user with a question like "Are you sure
you want to see all 589 values?" every time a button is clicked.
The lesson here is that any large downloads of data must be at the
user's option. In a theme that will be repeated throughout, the user
should never be surprised except pleasantly.
The other sacrifice is the poverty of HTML and Javascript for building
a sophisticated interface. Karen's original design called for tabbed
dialog boxes to group the various Census fields, as in this figure:
Figure 3. Original design for Monument Search
However, once an HTML page is displayed, it cannot be modified without
a call to the server, which causes delays (although there are now
some new ways to make a page seem to change). Since at the time we
were designing Janus, Java and ActiveX were not serious options, we
had to find a Javascript solution. In the final Monument Search shown
in Figure 2, Javascript had to be written for the tabs to simulate
scrolling a list of fields.
In other words, while HTML is an amazing in what it can do with so
little, its interface features are impoverished compared to Windows,
the Mac, XWindows, or similar platforms. Dynamic HTML, style sheets,
and other things in development will improve this, but for public
access, you should not use features that are less than two years old
or you will reduce the size of your audience.
Overcoming Limitations of the Web
How do we overcome these two important limitations of the Web?
First, we pay a lot of attention to the user's experience, and what
is needed to enable a "richer" interface. We design custom interface
components that married the "look and feel" of a traditional Microsoft
Windows application interface with "Web-interface" designs, such as
a persistent navigation bar at the top of the screen.
Second, we use techniques that address the issues of speed and database
access.
Karen and I hold one principle very dear: that the user of the system
is paramount. Simplicity, speed, and meeting user expectations are
the goals of all our development work. These are not easy to achieve
in querying a large, complex database, which can put heavy processing
burdens on the computer. This was one of the major challenges in developing
Janus - the Census is so complex that the Unix system often had "response"
times (if that is the appropriate word) of 5-10 minutes for
some queries. I consider this unacceptable. Period.
How can the speed problems be overcome? We use three techniques,
each discussed below:
- Preprocessing of the database to simplify searches and reduce
the amount of data sent
- Javascript to avoid calls to the server
- Client-side image maps, also to avoid server calls
Maintaining Flexibility
After user considerations, my second concern is always flexibility
of deployment. This is something I would encourage you to strive for,
as it will be of value to you when the next platform after the Web
comes along. Our work implements this in many ways, but two are noteworthy
if rather esoteric. First, the query syntax for each type of search
is stored in a database, accessed when the user performs a search.
Thus the syntax can be changed and new types of queries added without
changing the programs.
Second, a data-access library called ODBC (Open Database Connectivity)
is used for all database access. This allows the underlying database
management system to be almost anything without changing the programs.
The downside - which affects only the developer, not the user - is
that all access of the database must be in the standard SQL language,
which makes some things harder to program than using direct database
commands.
Later work has shown both these techniques to be equally useful to
perform database updates as they are for searching.
Speeding Response Time
Database Preprocessing
By preprocessing, I mean transforming the data from its original
form to achieve better speed. This may involve denormalizing tables,
adding redundant data, making hyperlinks in advance, and other techniques.
Although normalized data and non-redundant data are usually taken
to be sacred concepts, they are really of most value in conceptual
data models. Performance requires minimizing the number of tables
accessed. For installations that do not update the database, we are
free to derive a new database from the working one to suit the needs
of Web access. The only downside is extra disk space, which is almost
free now.
To speed up searches, I write programs to rip apart the database
and reassemble it for performance. Indexing is also given a great
deal of attention - more sophisticated indexing is used, even for
keywords, than just indexing every word.
A simple example of preprocessing can be seen in this display of
Janus search results, showing which records have images.
Figure 4. Brief-record display of search results
Here it would be possible for the CGI program to count the image
links for each record and create the label on the fly. But since the
number of links cannot change in a read-only database, I saved the
user's time by counting the images during preprocessing.
Another example of preprocessing is the use of a separate table just
for image queries, which puts all the data together that is displayed
with images (Figure 5). This saves having to get the few fields needed
from the main database which can have hundreds of fields per record.
The result is that image queries are actually faster than data queries,
even though images are inherently slower to transmit and display.
Figure 5. Results of image search (bottom cut off intentionally)
Here, although the same number of records were found, and each image
retrieved from a separate file, the response time is quicker because
of the separate file. Redundant and denormalized - but efficient!
A final example of preprocessing: Renaissance notebooks and sketchbooks
tend to have been unbound and their pages dispersed throughout the
world. Thus it is impossible for a scholar to view the book as its
author created it. Janus implements a feature that partially overcomes
this problem - it has a page-view for Renaissance documents that shows
their images as well as the transcribed text (Figure 6).
Figure 6. Viewing document pages virtually
All the labels you see represent multiple levels of links and are
built during preprocessing; during output, only the text needs to
be manipulated.
Although the examples focus on manipulating data to reduce the delay
for the user, huge speed gains can also result from having searches
result in what the user expects, thus saving trial and error. During
the successive releases of Janus we paid attention to what data elements
were returned from searches, and fine-tuned these in order to minimize
follow-up searches. Thus speed must be a priority for both the architecture
and the interface design.
To summarize, speed is an old consideration that is of new importance
on the Web. With a traditional application, you have a variety of
solutions if performance is unacceptable, the simplest being to use
a faster computer. With the Web, however, the transmission delays
are unpredictable and out of the developer's control. Thus the emphasis
in our work on minimizing the number of calls to the server; and secondarily
minimizing the amount of data transmitted, although this is much less
important.
Javascript
Javascript is another feature that is good for the user but hard
on the developer. It is actually harder than writing functions into
the CGI program. But using the CGI program means a delay while the
user's request is transmitted, processed, and the results received.
Instead, Janus and MWeb user Javascript run by the user's browser,
so the results are instantaneous. Of course, when the database is
involved, the request must go to the server; but query building, checking
user input, processing buttons, and other client-side processes are
all done by the browser. Because the displays (which contain the hidden
Javascript) are generated by the CGI programs, it meant writing Javascript
code in C++. In fact, this work provided a rather traumatic opportunity
to program in 5 languages at once: Javascript, C++, HTML, SQL, and
the native language of the database management system, only the first
two of which are at all similar!
Here is an example of Javascript to insert the user's selection into
the search form:
Figure 7. Inserting data values with Javascript
If the user clicks the "Go" button, this search is sent to the server.
However, if the user clicks the "Edit" button ...
Figure 8. Creating the search string using Javascript
... the query, created from the fields above using Javascript, is
displayed as it will be sent to the server. The user who knows boolean
notation can edit the search before sending it (remember, this is
an interface for scholars, not the public). If the user changes the
default AND/OR operators using the radio buttons in Figure 8, the
search string is instantaneously updated using Javascript - no waiting
for the server to parse and reformulate the query and send it back!
(Field numbers are used instead of names because of the limited space
in the textarea, and also to save numerous server calls to translate
them, a compromise I'm not too happy with.)
It might be added here that Java and ActiveX were avoided completely
in Janus, partly because of the outstanding security questions, but
more because they require the user to wait while large programs are
downloaded. As I said before, I don't think the user should ever have
to wait.
Javascript has one severe limitation - it is not implemented the
same way across platforms. We had to stop supporting Microsoft Internet
Explorer version 3 for Janus because it did not implement Javascript
correctly. This was not a problem since Janus was developed for use
internally at the Getty, which has standardized on Netscape Navigator;
but in later work, I have had to write programming code to have these
functions processed by the server for users of IE3. The downside is
that using the server instead of Javascript causes delays for processes
that the user can expect will not take time (such as jumping to a
specific previous display). In fact, functions like the Monument Search
you have just seen would not be worth doing without Javascript.
Client-Side Image Maps
This is no longer a new technique, but worth mentioning. Image maps
are images that perform different actions depending on where the user
clicks. Originally, the coordinates of the user's click were sent
to the server for action. Now you can embed the logic in your HTML
pages so some work is done by the browser. That is how the tabs you
see here are implemented.
Planning the User Interface
I hope you'll agree that Janus is a very attractive system. I think
the fact that both of us have a passion for clean and attractive interfaces
is one of the reasons so much complexity is not overwhelming.
However, a lot of planning and prototyping is also required. A lot
of credit must go to the testing team at the Getty that gave us feedback.
By using rapid prototyping techniques, the interface went through
many iterations without wasting precious development time. Mockups
were done as hand-drawn sketches, then rendered as image maps to create
a storyboard of the user interface.
Here are some thoughts on the user interface for database applications
on the Web.
Simple Searches
One way to avoid overwhelming the user is by having multiple levels
of searching. This is an old idea, but we took it as far as we could.
Besides the advanced searching described earlier, we also added a
simple way for users to search by keywords, and a simple way to "search
by thumbnail" since users indicated early on that images were a crucial
part of their browsing and searching process. These latter two were
combined into the same screen; we made this look simple, but it has
enough power so that the advanced searching is rarely needed:
Figure 9. Janus Keyword Search
This is the initial screen after logging on. The essential objects
on this screen are only 1 instruction, 1 place to type, and three
buttons. On the left are more controls that I'm not too happy with
because they distract from the total simplicity of the rest of the
screen. I put them in colored boxes to make them look like "sidebars";
that is, not essential to the main purpose of the screen. Even though
the sidebars detract from simplicity, for a user familiar with the
conventions of graphical interfaces, I believe the functionality is
self-evident. But if the user ignores them, the expected occurs -
which is what we should strive for.
In fact the most common search performed consists simply of typing
a single word and pressing the enter key. The resulting display (shown
in Figure 4) is also fairly simple, given the complexity of the possible
results. And the use of links means not having to tell the user how
to get to the full record.
Meeting user expectations also enables the avoidance of instructions
and help files. For example, one would hope that typing "apollo belvedere"
and pressing Enter would bring up that statue, and in fact it does.
This is because the keywords are ANDed by default and are not case-sensitive.
A user can truncate one or more keywords by simply adding the most
common character for this: the asterisk. Have we needed the Help file
so far?
The first two radio buttons control whether the search will show
data or thumbnails. The drop-down list enables the user to search
only specific record types; for example, the person authority file
can be searched for "Caesar" to find the nine or ten persons with
that name, instead of wading through the hundreds of other records
with the word in them.
Finally, just as the user can see the complete list of values for
any field in the advanced search, the last two radio buttons permit
searching the complete keyword index and the Art & Architecture
Thesaurus.
For the record, we could not avoid a Help file - although the users
do!
Single-Record Displays
I haven't yet discussed the single-record display. Users can see
the full data for any record, no matter how arcane (such as notes
entered by the data-entry person or the barcode number on photographs
scanned for the Census).
The next two Figures show a full record for an only moderately complex
Renaissance document:
Figure 10. Full record for a Renaissance document (1 of 2)
Figure 11. Full record for a Renaissance document (2 of 2)
Here is another use of links to make it simple for the user. All
records related to other records have links in both directions. Every
field controlled by authority files has a reciprocal link to the record
for that authority (the Creator field in the second slide). Records
in hierarchies can be clicked on to see their ancestors and descendents
(these include not only placenames but monuments and documents that
have multiple parts). Every image is a link to its larger sizes. Images
and data are linked so that the user can go from one to the other
simply. Here again intensive processing of the data was done to take
advantage of the potential of hyperlink navigation.
Saved Searches
Complex databases can be hard to use. When you do develop a search
that gives results you want, it is helpful to permit the user to save
it to be used again later. We implemented this by permitting the user
to save search strategies. In contrast to the obvious approach, which
is to save them on the server, coded by the user's ID, we took the
simpler option of storing them on the user's computer. Thus we don't
need to maintain a list of users, keep track of expiration dates,
etc.
Since Web browsers have only one way to store data, namely the cookie,
this was the only approach available. Cookies have some limitations,
not least of which is that some users don't trust them, but I believe
they are acceptable if used only for optional functions.
Imaging Considerations
Although using multiple image sizes is not a new concept, a great
deal of effort was spent finding the optimum sizes for all monitor
resolutions. The goal was to be able to display two images side-by-side
on standard 800x600 monitors without having to scroll the images.
In addition, a large image can be seen in its entirety at this resolution.
On high-resolution monitors, of course, more or larger images can
be seen. As usual, the user's needs were the only consideration.
Figure 12. Two full-size images
This shows two images of the same monument, scanned from different
photographs.
Figure 12. Two zoomed images
By clicking on the images to zoom them we can easily read the inscription
in the left image, and confirm the poor photography and/or scanning
of the right image. The point is that no single image size will serve
for both the overview and studying the details. Using large images
for both slows down the user's experience unacceptably, but large
ones are needed when the user wants detail.
Here is another interface design question. During the design of Janus
we considered various schemes for predetermining where and at what
size images should open. Finally we realized that for art scholars,
no predetermination could be made. Thus any number of images can be
opened and the windows can be moved and resized at will.
A final word on images. These are absolutely marvelous considering
what we started from - low-resolution scans of what often were tourist
snapshots. Get professional help here; it's worth the money.
Systems Development Techniques
Two additional topics relate to how the system was built.
Progressive Releases
It is important with large projects to keep the user population interested
throughout a long development process. Although the Census is of great
importance to a small group of scholars, it is obviously not a major
priority to most people. Thus we had to keep our group of testers
enthusiastic. One obvious way was to provide a free lunch at each
review meeting.
The other way was to implement what I call "progressive releases",
a development process in which the system is developed and released
in parts. With Janus, although the development took a year, the first
release was out only 21 days after we began. This gave testers their
first look at keyword searching, a feature they had been wanting for
10 years. These progressive releases are more than prototypes or betas.
They are intended to be complete, bug-free releases of part of the
system, in order to provide useful features as soon as possible. And
in fact Janus was being used by scholars long before it was finished.
This approach is more costly, in that finishing work is done on features
that may be changed later; but the benefits are higher also.
Here again I must credit the testing team at the Getty for their
prompt and valuable contributions to twelve releases of Janus in eight
months.
There is a danger to this approach besides higher costs. As the simpler
features were released first, the demand grew to elaborate these to
solve problems better addressed by later features. It was necessary
to resist this pressure to ensure that the simple features did not
lose their simplicity.
Security
A final important consideration for databases over the Web is security.
Passwords are the obvious solution. But we have to ensure that the
password process cannot be avoided. The primary means of doing this
is to avoid using static HTML pages, since if these can be displayed
in a browser they can be used to submit valid database requests. Instead,
HTML pages are generated by the server after each request.
An alternative way is to use cookies to transmit the user's logon
information with every request. This was rejected because of inherent
security problems with cookies, and even more because of the perception
of security problems.
Conclusion
In designing a new "face" for the Census, we have learned a great
deal about the challenges faced by developers in rearchitecting databases
to live in a new and dynamic environment such as the Web. These challenges
come from attempting to create a compelling and elegant solution in
the context of a novel and unsophisticated design/technology environment.
By placing the user's needs squarely at the center of our design goals,
and by exploiting the available technology with an eye towards the
future, we have crafted a solution that answers many of these challenges,
and serves as a strong basis as new tools and environments emerge.
![](../dot_clear.gif)
![](../../mw98-line.gif)
Last modified: March 18, 1998. This file can be found below http://www.archimuse.com/mw98/
Send questions and comments to info@archimuse.com
|