Archives & Museum Informatics: MW99

Archives & Museum Informatics
2008 Murray Ave.,
Suite D
Pittsburgh, PA
15217 USA

info@archimuse.com
www.archimuse.com

Join our Mailing List.

Published: March 1999.

Developing Distributed Applications on the Web

Nicholas Crofts, Direction des Systèmes d'Information (DSI), Switzerland

Introduction

Many museums today have assimilated the potential of the Web as a medium for distributing or publishing information. Presentations about the institution and its collections, sometimes virtual tours, are now commonplace. However, the web also has an enormous potential, at present almost completely unexploited by the museum community, as a front end for interactive applications which are used to input information. Rather than just pumping information out, the web can be used for information gathering, building distributed applications which are accessible to a far wider range of users than would be the case with traditional client- server application architectures.

The technology is still relatively new and different approaches exist. Based on the experience of Geneva's Musinfo project, the aim of the present paper is to outline the technical options, highlight some of the pitfalls and explore the potential impact of web enabled applications.

Musinfo example

The Musinfo application is an example of what can be achieved using readily available web technology and at relatively moderate cost. Based on the object oriented ICOM/CIDOC Conceptual Reference Model, the project was developed in-house for the city's museums by the Direction des Systèmes d'Information - the city's IT division. The application is currently used by three major institutions, which, together, represent close to 100 users. Other organisations will be joining in the near future. More than 350'000 records are already available on online, many with associated digital images. The application is a fully fledged collections management system and provides all the information management services for the for inventory, cataloguing and research: data entry and updating, queries, reports, thesaurus and authority lists, etc. Currently it is used by curators and administrators, and by some external researchers; public access is also planned for this year. The application covers all the disciplines represented by the participating institutions: natural science, ethnography, fine arts, and applied arts, archaeology, etc.

Fig. 1: Musinfo application screen

The entire Musinfo application can be accessed using a web browser such as Netscape or Internet Explorer . The web browser effectively functions as the client application. This allows for an unprecedented degree of flexibility and user mobility - the application and the entire database are potentially available on any work station connected to the Internet, anywhere in the world.

Advantages

Apart from the obvious major advantage of uncomplicated global access via the Internet, there are a number of other advantages to this approach which effectively leverage the technology and ease of use of the web browser:

Simplified client installation and maintenance. With little to install on client stations, there's less to go wrong and support staff spend less time dealing with configuration problems. There are fewer problems too with software conflicts. Upgrades to the application require no distribution and are instantly available to end users.
Multi-platform compatibility. Since the client application is entirely based on Internet protocols, HTML and javascript, it runs with no modification on PCs Macs and Unix clients - anything that supports a web browser.
Multimedia integration. All multimedia formats supported by the web browser, images, video, sound, etc. can be readily integrated with other data.
Integration with external data. URLs for resources available on the Web : images, tools such as TGN and AAT, library catalogues, etc. can be incorporated in the database to reference extra mural information.

How it works

A traditional client-server application uses a two tier architecture. Application software, installed on each client machine, communicates directly with the database engine via a protocol such as SQL*NET. The client software has to be installed on each machine which uses the database and a different version of the client application has to be compiled for each platform.

By contrast, the web enabled architecture is multi-tiered : application software executes at an intermediate level on an application server. No application specific software is installed on the client machine, only a web browser such as Netscape or IE. The client communicates with the application server using standard Internet protocols. The application server is takes over the responsibility of communicating with the database. It transforms client requests into native database queries, and processes database output into an appropriate format for the client.

Client Browser

--- HTTP ---

Web Server

--- SQL*NET ---

Database server

Fig. 2: Three tier Web architecture

Information sent to clients is displayed as standard HTML forms. These are generated dynamically by stored procedures on the application server. These can be written in almost any programming language, Musinfo uses PL/SQL, Oracle's procedural extension to SQL, which makes database operations very simple. Client side data processing is achieved using Javascript (not to be confused with Java).

The Musinfo system uses an Oracle database running on a DEC alpha UNIX server. A powerful server is necessary to handle the number of users accessing the database. The application server can be installed either on the same machine as the database or on a separate platform. Geneva's system is based on a twin processor pentium machine running NT. In order to enhance security, the Musinfo system places a firewall between the application server and the database server. The application server software is Oracle's Web Application Server. Most client machines within the museums are connected via a fibre optic FDDI network, however, external access is also possible.

Fig. 3: Musinfo network architecture

Drawbacks

Naturally, there are some drawbacks to the Web enabled approach which have to be taken into account.

Due to the relative novelty of web enabled application development, software tools are still rather primitive. And limited in their functionality. Development is also rendered more complex by the evolving nature of HTML, which is still 'missing a few widgets'.
As a consequence of these handicaps, the cost of developing web enabled applications is higher than that of developing a traditional client-server application. Direct costs for the Musinfo project, including hardware, licences and programming, were 500'000 CHF (around 360'000 USD), spread over four years.
Web applications are heavily dependant on network speed and availability. A slow network or poor connections can have a disastrous effect on response times. Careful design and programming is needed to make the best use of the available bandwidth so as not to penalise users with unnecessary calls to the server.
By its nature, HTTP communication over Internet is discontinuous. Once a request has been handled the connection established between client and server is dropped. Subsequent requests have to reestablish the connection. Some thought and programming effort is needed to preserve the impression of continuity in these conditions : state information, implicit in a normal client server connection, has to be handled explicitly. This inevitably adds a complexity overhead to application development.

Tips and tricks

Handling state information

State information is meta level data about the user's session. Such information is implicit in a standard database connection. After successfully logging in to an application, for example, it can be assumed that the user is authorised to access the data : the result of successful log in is implicit as long as the connection is maintained. Typically, when the connection is interrupted, for whatever reason, this information is lost and the user must reenter a password to reestablish the connection. The same logic applies to information about the user's context - the fact that the user has succeeded in navigating through a series of menu options and is now on screen X3 of module B14 in data entry mode, for example.

Web connections are normally discontinuous , so all this implicit meta data about the session state needs to be made explicit in order to maintain continuity. Each transaction between client and server effectively has to reestablish the user's authorisation and context, preferably without the user being aware of the fact.

Maintaining and managing state information is perhaps the most conceptually complex obstacle to designing Web enabled applications, although it s not in fact very difficult to overcome. Two different approaches exist, each with specific advantages:

Server image. The server can maintain an 'image' of the client session. Typically this image will contain the user's identification, current context and details of the current transaction. A 'session key' is passed between client and server at each interaction and used to maintain the session information. A major advantage of this approach is the negligible communications overhead needed to maintain the session information. However, it does imply a higher degree of server processing overhead which is incurred at each interaction. Another difficulty is the risk of losing 'synchronisation' between the client and the server image. This would typically arrive if the user clicks on the browser's 'back' button. Special measures need to be implemented, such as disabling the standard navigation buttons, in order to minimise these risks.
Client image. Similarly, the client may maintain an 'image' of the current session state. This can be achieved using browser 'cookies' or with HTML 'hidden fields'. This approach inevitably incurs a higher degree of communications overhead since state information has to be transferred back and forth between client and server, along with actual data, at each interaction. However, the processing overhead on the server is much reduced and there is far less risk of losing synchronisation.

The choice between these two approaches depends on three factors : the available server processing capacity, the available network bandwidth, and the typical client configuration. For example, low-end client stations connected to a powerful server via a moderately rapid network would benefit from using server-side state information. The reverse would be true for powerful workstations connected to a modest server via a fast network.

Currently the Musinfo system uses the server side approach, primarily to allow for some extremely low-end client stations which are still in use. However, client side state information is being adopted for future developments as the relative balance of power between client work stations and the central server improves.

If dealing with state information sounds like a terrible headache which you would rather avoid, it is worth noting one very agreeable side effect which can make all the effort seem worthwhile: the highly robust nature of Web application sessions. Interrupted telecom links can be reestablished, even via a different provider, without any loss of session continuity. In most cases, sessions can even be reestablished after a server crash. So long as the state information is preserved, the application really doesn't notice any difference.

Record locking

One of the minor inconveniences of discontinuous web connections is that record locking becomes slightly more complex. Record locking is the mechanism which prevents two users from modifying the same record simultaneously. In a standard environnement, any outstanding record locks are dropped as soon as a connection is lost. This is difficult to arrange in a web environment since the server receives no signal from the client to indicate that he has, for example, forgotten to log out before turning off the computer. A time out period could be used to remove outstanding locks, but this would mean that records might remained locked for relatively long periods at a time, blocking other transactions unnecessarily.

To avoid these problems, Musinfo uses a simple form of 'optimistic' locking, which assumes that update conflicts are relatively rare. Records are time stamped at the moment they are created or modified. When they are retrieved, this timestamp is included along with the data. When a user attempts to update a record, the timestamps are compared to ensure that the record has not been modified in the meantime. If it has, the user receives an error message. This approach is easy to implement and ensures database integrity, but has the potential drawback that users may be unable to save a record they have been working on. This policy is acceptable in an environment where conflicts are rare; a more restrictive locking scheme would be necessary in if conflicts were frequent.

Accented characters

Although not of much concern to anglophones, accented and non latin characters do present certain problems for other languages. Fortunately, HTML provision for common accented characters is quite good. HTML 'entities' are provided as equivalents e.g. é is used for é. Accented and non latin characters present in data should be converted into HTML entities to ensure correct display. While non converted characters are generally displayed correctly on PCs, but MACs are likely to pose problems, particularly when using javascript variables, because the MAC uses non standard page codes. Data sent back from the client is converted using a different protocol and needs no special processing.

Improving performance

Most standard client server development tools allow for a degree of load balancing between the client and server in order to distribute processing and thereby reduce network traffic. Simple data validation and screen updates, for example, can be performed by the client, which provides rapid response to the user and avoids a round trip to the server.

Load balancing is more difficult in a web environment because HTML provides few possibilities for dynamic screen control. The easy way out is to make a call to the server every time the display needs to be modified, but this can be extremely inefficient and time consuming since data, presentation information and processing results are all constantly sloshing back and forth across the network; only the data need really be transferred.

Fortunately, javascript provides a number of ways to circumvent these difficulties, and optimise performance.

Simple data validation - obligatory fields, date formatting, etc. is performed on the client. Complex validation, involving look up tables and referential integrity, may still need to be performed on the server, but many common errors can be eliminated at source.
Javascript procedures are also capable of generating HTML dynamically. This effectively means that the screen display can be updated locally, without needing to call the server. The Musinfo application uses this possibility to generate most screen displays, reformatting data received from the server. This allows for considerably enhanced graphic user interface, keeps network traffic to a minimum and reduces the workload on the server!
Though complex to implement, data buffering can offer considerably increased performance. By storing data for several records in a hidden frame, the number of round trips to the server can be considerably reduced. After the initial query, browsing through a set of records can be done entirely on client machine. Server access is needed only when records are updated. The performance benefits can be extremely impressive - particularly when running over a relatively slow internet connection. It even becomes possible to continue consulting the result set off line .

Potential impact

So far we have considered only the technical aspects developing web enabled application. I would like to turn now to consider briefly the potential impact of this technology in the museum context.

Traditionally, museums have acquired or developed their own information systems and have therefore had to assume all the associated technological burden that implies - running machines, making backups, dealing with engineers, developers and technicians, software upgrades, etc. Generally speaking, the investment in money, time and energy needed to run the information system is subtracted from the museum's core mission - dealing with collections, research, education, and exhibitions. Interactive Web applications provide a means of reducing this burden since they have the potential to be shared by a number of institutions. The potential for access to collections management applications on a remote server means that museums could envisage outsourcing virtually all the technical aspects of their information system. This approach could be particularly advantageous to small institutions, who could benefit from sophisticated software and powerful servers for a minimal capital investment. The web architecture has the potential to transform the information system into a service rather than a product - more like renting a car or flying with an airline than the current trauma of having a technico-religious sect and a chemical processing plant installed in your apartment.
The possibility of entering and updating data remotely also adds a great deal of flexibility to the task of creating documentation. Traditionally an exclusively 'in house' activity, the web makes it possible to envisage outsourcing certain data entry tasks. External experts can add comments to the documentation, curators in other institutions can make links to similar or related objects in their collections, even students and members of the public could contribute where appropriate. Web enabled applications have the potential to completely transform the business of creating and maintaining information about cultural heritage by rendering institutional barriers far more transparent.
By regrouping institutions in a common database web applications may bring us a step closer to the dream of global cultural information resource. Cross linking of references and interdisciplinary research become far easier and the value of the information as a commodity is enhanced. Geneva is currently working on a test bed site licensing project to make information in the Musinfo system available to educational users and channel receipts back to museums. The more information the system contains, and the higher its quality, the greater its value. Licensing helps to reverse the cost spiral of traditional information systems - museums are more like shareholder. The more information they invest the higher the potential dividends.

Conclusion

The use of web for the gathering of information has yet to be exploited by the museum community. The technical difficulties involved in creating web enabled distributed applications do not present major obstacles. Given a little imagination and careful design it is already possible to build attractive, fully functional applications which take advantage of the technology and familiarity of the web browser - today's most universally available software.

The commercial and scientific impact of having software applications available over the Web could be enormous. Museums will have the option of sharing or outsourcing the burden of running an information system, and thanks to the global availability of the Internet, a far broader and more flexible range of experts and sources becomes available for the creation and maintenance of cultural heritage resources.

Finally, by sharing application software and a common database, institutions will in effect be contributing to an information repository which breaks out of the confines of individual institutions, disciplines and departments: a step towards the creation of a global resource for cultural heritage.