David Ellis, Gavin Foster, Ray Shah and Petar Bojkov, Think Design Inc, USA

Abstract

Before describing our vision for a distributed service oriented museum Web, we assess the projects nexhibition (An On-line Exhibition tool), steve.museum (An Artwork tagging tool), Pachyderm (an open source museum presentation manager) and other open source products of interest to museums. At first sight, one might choose to see these as distinct, stand-alone products that require some investment to utilise effectively. We argue the case for breaking down these tools into their constituents and exposing their functionality through a distributed API such as SOAP.

The paper explores how previously monolithic systems can instead be seen as re-usable components. Such services may include calendaring event programming/publishing; collections management systems; the tagging and analysis tools of steve; and the on-line exhibition generating tools of nexhibition and Pachyderm. This model empowers Web developers with the services to construct a rich set of functionality without necessarily having to learn how to implement these services in-house.

We envisage a service market whereby vendors will offer out services such as those discussed. Museums will be able to pick and choose the services and offerings and may even choose to host services themselves. This will consolidate running costs, and increase the available range of services, and ensure the effective utilisation of I.T. Infrastructure by museums.

Keywords: software design, service oriented architecture, Web 2.0, museum service

Introduction

Software engineering for a long time has moved toward the production of re-usable software components. The advent of Object Oriented Programming has helped programmers produce higher quality software that is easier to maintain. The emergence of best practices allows the production of software that reuses both components and common design patterns (Gamma et al. 1995).

With the advent of the Web in the 1990s, people were able to start producing their own content. Later, scripting languages like PERL, PHP and ASP enabled Web developers to make dynamic Web sites. At first, “Web scripting languages” such as PHP facilitated the production of “one-time” use code, as many less experienced developers found it was easy to produce dynamic Web pages. Now, many programming languages for the Web allow developers to work within frameworks that have been proven effective (e.g. Jakarta Struts, Java Server Faces, Ruby on Rails, Cake PHP).

More recently, a number of technologies that have enabled the production of distributed service components have come about, most notably the SOAP standard (Box et al. 2000). This shift in the function of the Web gives developers access to an array of useful tools, that are hosted and maintained externally.

Now with Web 2.0 (O'Reily 2005) firmly in place, we find the number of Web services and the uses of these growing daily. Mashups (Mashup) are typical examples of how Web services can be used to enrich a Web site utilizing external resources. An excellent example can be found at (http://www.housingmaps.com/).

We are proposing that the Museum community think carefully about their Web sites and their I.T. Infrastructure. What might look like a one-time development task could be a reusable service that can be utilized by a number of applications. Both internal and external entities could use these services and effectively reduce future development and maintenance costs. Before implementing a service, a check should be made to see if similar usable implementations already exist. If either the component or the design of the component can be reused, it should be. With the ever-growing adoption of open source software, this is a serious consideration for any museum. If many museums use the same software or services, maintenance costs are reduced via economy of scale.

Section 2 discusses some of the well-known principles of good software design and also best practices regarding the production of a quality service component, whether distributed or otherwise. Section 3 discusses how the museum community might host and make use of services both in-house and externally. Section 4 discusses the disadvantages of distributing components over a LAN and the Internet: we give some suggestions on how to avoid these problems.

Principles

The first principle of good programming is what is commonly referred to as “Loose Coupling”(Gamma et al. 1995). Loose coupling describes programming components that are not dependent on each other. For example, one might have a Web site that takes credit card payments via Paypal. One way to design such a system is to include some code to process the Paypal (http://www.paypal.com) payments. We subsequently discover that we need to use another payment provider, e.g. WorldPay (http://www.worldpay.com). But the existing system is now tightly coupled to Paypal, and the code that achieves this must be duplicated and adapted for the site to accept WorldPay payments. The principle of loose coupling tells us that the system should have been designed to work with the abstract concept of a payment provider, with the implementation details that differ between payment providers being hidden behind a common interface or API used within the system. Applying the loose coupling principle reduces the amount of work required to make modifications to the system.

This brings us to the next point, the issue of an API (Applications Programming Interface). This is the abstraction layer that enables programmers to build on existing systems and “stand on the shoulders of giants”. Without this idea we would not have the computer systems we have today. The most commonly used APIs are found in operating systems (e.g. Microsoft Windows/Unix). These APIs allow programmers to write to files on the disk without needing detailed knowledge of the internal disk management. When we design software, we may want to enable other people to use the base functionality again in other products. Several common products with the same API allow the component that uses the API to be moved between implementations. –In the Java world, there are many application containers. If software is written against the standard API, we are free to move between vendors and our software will still work (in theory).

We have covered the design practices that make “software as a service” possible, but we are missing the details of how the service-consumer communicates with the service-provider. This is simple if the programming constructs are all written in the same programming language and are all on the same machine. But there are many reasons why this is often not the case.

Newer programming languages usually offer many benefits over older ones. The more recently a system was developed, the more likely it will have been programmed in a different language to older (legacy) systems still in use.
The common programming language of a development team may not be optimal for implementing a new feature, so a different language is used.
The service is being offered over the Internet.

If the service-consumer and service-provider are to talk to each other without knowledge of programming languages, or even if they are on different machines, an appropriate messaging format is needed. This idea is not new: COM+(MS COM+) and CORBA (CORBA) have been around since 1993. Newer messaging formats such as SOAP and REST are of interest because of their text based (xml in the case of soap) protocol, which can be easily deciphered by humans.

The ideas we have mentioned so far have been around for quite some time in the software development world. They have recently become particularly relevant in the Web development world with the advent of Web applications and software-as-a-service, developed using scripting languages such as PHP and ASP. The historical usage of these programming languages in solving simpler problems may have contributed to Web developers being slow to recognize and adopt existing good design practices, as they failed initially to spot the applicability of such practices to new paradigms, or were just not aware of such practices in the first place. The development of Web programming frameworks based on sound design principles such as Java Struts, Java Server Faces and CGI::Application for PERL, among others, along with general awareness raising within the Web development sphere, has helped in improving this situation.

A newer twist in the ideas of loose coupling has been the introduction of aspect oriented programming (AOP Alliance) techniques and libraries. The idea is to separate cross-cutting concerns within your code: a cross-cutting concern is secondary functionality that affects various parts of a program. One can then re-introduce them declaratively: this means that a separate declaration is made to apply the secondary functionality.

Examples of cross cutting concerns are transactional code, logging code or security interception.

One might decide that all methods of our service that retrieve information from the database should be optimized for read-only transactions, and all others be wrapped in a read-write transaction. It might be that all methods that save data to the database should be declared “secure” and therefore require relevant privileges to be used. We can test our code without transactions and without security and apply these “concerns” separately, without tainting the core function of the service. Before, we had to think about all these concerns at once. We are now able to slice them up and let our Aspect Oriented Engine “weave” them back together. Good AOP libraries provide code generation of an exceptional quality, both in style and efficiency, so there is no problem with performance. AspectJ (http://aspectj.com) for Java, and PHPAspect (http://www.phpaspect.com) for PHP are excellent examples of this paradigm.

Fig 1: AOP DIAGRAM

Figure 1 shows an example of how one might apply AOP to a software project. The method’s really complicated task is wrapped by aspects of security and is placed within a transaction. The aspects represent layers of the program, separating out concerns that are not directly related to the operational task.

With AOP, programmers can write cleaner code that only concentrates on the functionality they are writing and does not force them to deal with other concerns. This also helps in the testability of your code, because you can test the core task and each concern in isolation, making locating bugs quicker.

The final recent advancement worth noting was born in the Java world (although applies to any programming language). It is called Dependency Injection or Inversion Of Control (Fowler 2004). Examples of Inversion Of Control (IoC) containers include Spring for Java, and the PHP Garden project. The idea here is to inject programming objects or services into other programming constructs, so that objects do not need to explicitly carry information on how to acquire their dependencies. This again, encourages good design because it is very easy to switch out components that support the same interface. When it comes to testing, we can replace “real” objects with “fake” or “mock” objects that emulate the real service. It may be that testing with a real remote payment system incurs a charge: it would be nice to swap this out and replace it with a fake local version during the test period.

Museum Web Services

When we think about museums and what services we could offer, we immediately see huge opportunity for reuse of software and service offerings. Many museum Web sites already offer services that are not formally exposed by a standard mechanism. Examples of these are the portcullis service of the parliamentary archives (http://www.portcullis.parliament.uk), and the American Revolution site that provides access to its collection (http://amrevonline.org), among others. These are Web sites that are closely coupled to the search and retrieval logic, but one could imagine a well-defined separation enabling a great number of usages from the same retrieval engine.

We already have one example of a service that museums can offer: search and retrieval of collection data. This leads us to the discussion of standards. If we were to implement such a system for a museum like the New York Historical Society, then another museum might implement a second version. Users of these services would need to write two different versions of code that does the same thing. We propose the introduction of a well-documented standard for such services. Concerns that are not directly related to the task at hand also need to be taken care of, such as security and access control.

In addition to this, there are many more services that can can be offered to or utilized by museums. We see the functionality offered by our own open source on-line exhibition system Nexhibition (http://www.nexhibition.com) as one such service. This service allows the collation of a number of images from a collection into a database which in turn allows an on-line exhibition to be created. Users are also able to write narrative around the images. Our original design for this system was very monolithic as there didn't seem any need to do it any other way. As we revisit the design, we are able to see the benefits of employing a distinct separation of the constituent components. We have built a service that allows the search and retrieval of images, a service that allows one to assemble images from our retrieval engine and other sources into a structured exhibition, and we have provided some work flow to aid in the publication of exhibitions. Whilst the services exist, they cannot be separated and are very much interwoven. In our ideal Nexhibition, we would build a system that allows these modules to be composed and used many times, in different software products.

The steve.museum project (http://steve.museum) was built from the ground up with Web services in mind. Stakeholders in the project realized the potential, the need and the architectural significance of providing a Tagging Service. It allowed the developers to more easily respond to changes in the way the software is used, and allows future development to more easily fit-in to the architecture. It has been an assumption from the outset that the same steve engine will be used to drive many tools, from tagging interfaces to search engines and analysis tools.

The Pachyderm (http://pachyderm.org) project is an open source project that allows the generation of flash based exhibitions. Whilst they don't expose any of their own code as services, the project certainly makes use of a distributed service based architecture. In particular, the asset repositories are accessed via a distributed service interface. The compilation engine could be run on a separate machine, enabling the use a variety of tools to author the content. It might also be possible to share data access and business logic among a number of presentation engines (we could consider html or flash, among others).

Other suggestions of services:

A shopping service – Museums could offer rights to sell digital images of items and provide search and shopping services for them.
Event Information – Events going on in museums could be queried and displayed on third party Web sites.

There are many more opportunities to expose functionality as a distinct service. We simply need to be mindful as they present themselves.

The Design Of A Service Oriented, Loosely Coupled Online Exhibition Environment (Nexhibition Next Generation).

We have many ideas for the next generation open source product Nexhibition. We want to reduce dependencies on specific software products and design an application that is flexible and re-usable. Whenever we see an opportunity to factor out a common service that could/is used else where, we will take it.

Fig 2: services required to achieve Nexhibitions aims

Nexhibition is a system that allows different types of users to assemble assets from a number of collections in order to produce on-line exhibitions. We need a Collection of objects and their associated images; we need a way to manage work flow as exhibitions are passed between users for approval and editing stages; we might also need some tagging of the images we display. Finally, we need a service that assembles assets into exhibitions that can be displayed on-line. The five APIs required to support these services are depicted in the diagram. We have not specified which collections management system we are going to use, nor have we said which work-flow, tagging or exhibition manager implementation we are using. We have simply expressed that we need a service that supports those APIs. Accessing CMS data through a common API means our new software is not dependent on a particular brand of CMS.

In the same way that data specifications are being standardized, such as with Dublin core (Dublin Core) and CDWA lite (CDWA), we propose that a number of external service APIs also be standardized. All vendors would produce API spec compliant software, open source or proprietary. This would allow us to easily add CMS resources and upgrade exhibition managers. This approach requires support from both the community and vendors themselves. In the Request For Comments (Ellis et al. 2007), we examine options for the instantiation of standards bodies to house and vet these specification documents.

Disadvantages and Coping Strategies

We have looked at the evolution of good software design and discussed how we can extend these ideas to a distributed service architecture. Whilst loose coupling makes software easier to maintain, there are a number of inherent problems when we move to a distributed model. When programming in a distributed environment, we must assume a level of uncertainty when it comes to contacting services over a network: this is exaggerated on larger networks such as the Internet. We are now relying on a number of assumptions: the Internet connection between us and the server is operating at a reasonable level of service at the moment; the server itself is under a reasonable load; the server is able to serve our request in a timely manner. Our code needs to be able to take account of these conditions, and be able to respond accordingly.

We find that the service is in the hands of a third party, and clients of the service provider are now dependent on a third party’s state. If the service is taken down for maintenance or there is a security breach, clients of the service are directly affected. When designing the implementation of software that uses external services, all of these concerns need to be considered and appropriate measures must be put in place. Some systems may not be able to function if an external dependency is down. Some might be able simply to shut off that particular part of the system or temporarily replace it with another similar service. When operating a service, good communication of events goes a long way to its success.

Even if the service is always available and is as efficient as possible, it might not be enough. Latency requirements for a Web site may be unachievable, depending on the service being offered. It may be that it is simply impossible to maintain a tolerable service level using a distributed Internet based service. If this is the case, it may be possible to improve performance by introducing some caching on the client system.

Closing Remarks

We began by discussing some of the principles behind good software design. Loose Coupling of software components has been a driving force made popular in (Gamma et al. 1995). A further extension to loosely coupled components resulted in the transformation into a distributed (often Internet-based) environment, with technologies such as CORBA, DCOM, and later REST and SOAP, emerging.

Secondly, we addressed the museum-based Web sites and software we are aware of and have worked with, and we began to think about these in a service-oriented way. We hope others will start to produce reusable software components that offer services to their peers. Indeed, we encourage readers to produce and run Web services as well as make use of them.

It isn't all plain sailing, and we have talked about some of the inherent problems faced when building any kind of distributed system, especially with Web services. The advice is to keep it simple and assume that it will break at some point: indeed, failing conditions should be tested and reproduce-able.

If the community begins to move this way, we have some interesting economic possibilities with brokers offering services and consumers paying to use them. If we think about the mechanics of this now, as a community, it will drive vendors and other participants to commit more fully.

Acknowledgments

Thanks to Joshua Archer from the Pachyderm team for his insight into the project. Thanks to David Bearman for his suggestions and guidance with the paper.

References

AOP Alliance. http://aopalliance.sourceforge.net/

Box, D., D. Ehnebuske, G.Kakivaya, A.Layman, N. Mendelsohn, H. Nielsen, S. Thatte, and D. Winer(2000). "Simple Object Access Protocol (SOAP) 1.1". May 2000, http://www.w3.org/TR/2000/NOTE-SOAP-20000508.

Cerami Ethan (2002). Web Services Essentials Distributed Applications with XML-RPC, SOAP, UDDI and WSDL. O'Reily 2002.

CDWA: http://www.getty.edu/research/conducting_research/standards/cdwa/index.html

Dublin Core: http://dublincore.org/

Ellis, David et al. The Museum Software Foundation Request for Comments, 2007. Circulated electronically

Fowler, Martin (2004). Dependency Injection. http://www.martinfowler.com/articles/injection.html

Gamma E, R. Helm, R. Johnson, J. Vlissides (1995). Design Patterns, Elements of Reusable Object-Oriented Software. Addison-Wesley, xv, 395

New York Historical Society – American Revolution: http://amrevonline.org

Nexhibition: The Nexhibition Project, http://www.nexhibition.com/

(Pachyderm). The Pachyderm Project, http://www.pachyderm.org/

Open Collection: http://www.opencollection.orgtion

O'Reily, Tim (2005). What Is Web 2.0? Design Patterns and Business Models for the Next Generation of Software . http://www.oreillynet.com/pub/a/oreilly/tim/news/2005/09/30/what-is-web-20.html

Parliamentary Archives Search: http://www.portcullis.parliament.uk, Parliamentary

REST: Representational State Transfer. http://en.wikipedia.org/wiki/REST

Mashup. http://en.wikipedia.org/wiki/Mashups

SOA: Service-oriented architecture. http://en.wikipedia.org/wiki/Service-oriented_architecture

Steve: Steve.museum. http://www.steve.museum

Cite as:

Ellis, D., et al., The Service Oriented Museum Web, in J. Trant and D. Bearman (eds.). Museums and the Web 2007: Proceedings, Toronto: Archives & Museum Informatics, published March 1, 2007 Consulted http://www.archimuse.com/mw2007/papers/ellis-d/ellis-d.html

Editorial Note