MW-photo
April 15-18, 2009
Indianapolis, Indiana, USA

Software as a Service and Open APIs

Paul Walk, UKOLN, University of Bath, UK

Abstract

What are the potential benefits which Open APIs and Software as a Service (SaaS) seek to provide? What about the associated risks in moving from an environment in which software is installed and managed either locally or by a hosting agency with formal contractual agreements to a environment in which there may be no formal agreements, the services may be hosted in different countries and governed by different legal frameworks? And at a time of global economic uncertainties, is it sensible to be seeking to make use of Open APIs and SaaS?

This paper attempts to outline some of the important characteristics of and issues surrounding what are becoming known as ‘cloud services’. A related workshop at Museums and the Web 2009 will explore strategies for exploiting the benefits of and managing the risks associated with these services.

Keywords: open APIs, SaaS service, infrastructure, cloud services

Introduction

While SaaS and Open APIs are not necessarily intrinsically connected, they have much in common and are two important aspects of what we might call cloud services. All of these newly fashionable concepts are currently subject to a degree of marketing hype, with service providers and pundits alike jockeying to introduce variations on these themes. Some examples would include, platform as a service (PaaS), data as a service (DaaS), infrastructure as a service (IaaS), cloud computing, cloud storage and so on. The extent of this proliferation can be seen in Peter Laird’s Cloudmap.  It is worth visiting his accompanying blog post (Laird 2008) for more information and links to all of the services mentioned on the map.

figure 1

Fig 1: Peter Laird’s  ‘Cloud Map’ http://saaslink.googlepages.com/Laird_CloudMap_Sept2008.png

Inevitably, this does involve a certain amount of re-selling of old ideas. Nonetheless, the increasing viability of the Web to offer a platform for service development and delivery cannot be ignored.

An important difference is revealed in the fact that the functionality offered by many SaaS services can be subject to frequent change. Although not true of all SaaS, the perpetual beta approach (which I examine in a later section) is increasingly used to deliver a steady stream of improvements and new features. This can be contrasted with APIs which, by their very nature, are not changed frequently or, indeed, lightly. In fact, as we shall see, there are systems which already depend on the availability and predictability of APIs on the Web.

Software As A Service

Software as a Service (SaaS) is not a new idea. The notion of the Application Programming Interface (API) is, arguably, even older. Yet these two terms have sprung into, if not the mainstream, then at least increasingly common and widespread usage. A third term, The Cloud, is frequently used in the same contexts and, as a concept, may serve to unite them.

In its simplest sense, SaaS describes a software delivery model. Rather than delivering the software in a packaged form for the user to install and run locally, the vendor deploys the software on a controlled platform, and the user accesses its functionality remotely via a network connection.  In the typical case the vendor deploys and maintains a single instance of the software which is accessed by many users, an architecture which is known as multitenancy.

In the currently fashionable sense of SaaS, some generalities are made more specific. For example, the network is the Internet and, furthermore, in the great majority of cases implies the Web. The contemporary usage of SaaS frequently, but certainly not always, implies that the user is armed with nothing more than a standard desktop-based Web-browser, although in the last year particularly this implication has been challenged more frequently with increasing interest in the idea of the rich-Internet-application (RIA) and the new generation of rich mobile software found on such devices as Apple’s iPhone and phones running Google’s Android operating system.

Some notable examples of contemporary SaaS providers include SalesForce.com (http://www.salesforce.com), which offers a comprehensive suite of ‘customer-relationship-management’ tools, and Google, which offers a set of set of collaborative productivity tools called Google Docs (http://docs.google.com/).

Application Programming Interfaces And The ‘Open API’

The concept of the API has its origins in the development of modular programming environments and languages. Modularity of programming environments has allowed developers to exploit the concept of encapsulation, where a computer program is divided into self-contained modules each handling a discrete function, or set of functions. Such modules are often described with the black box metaphor, indicating that they can operate while being opaque to the system which uses them. The essential characteristic of such modules is that they explicitly define their allowed inputs and their possible outputs. If you understand what kind of data (if any) you can feed into the module, and what kind of data (if any) you expect to get back, then you should be able to use the module with confidence, without needing to understand how it works internally. This opaque boundary between the internal workings of the module and the system or agent which seeks to use it, together with any mechanism or documentation which aid the understanding and exploitation of the module, is what constitutes the application programming interface or API. As the development of this approach to software development has matured, the use of encapsulation and well-described, predictable interfaces has been extended to the development of components distributed across a network, and hence to the concept of the Service Oriented Architecture (SOA), which describes how services can be invoked and exploited via documented interfaces.

In recent years the Web has come to be recognised more as a platform upon which applications can be developed, driving much of the development behind the ‘Web 2.0” phenomenon. From a traditional programming point of view the Web, when compared to a local operating system, is a relatively limited platform for developing software, yet its ubiquity, scale, proven resilience and openness are very attractive features. The notion of the API has been adopted as an integral part of the Web 2.0 paradigm, underpinning especially the fashion for a type of point-to-point service integration known as the mashup. Because the Web essentially constrains much of the interaction between servers, clients and users through its reliance on open standards and protocols, a large part of what might be considered part of the API is pre-determined. To give the obvious example, the great majority of services which expose an API on the Web will expect it to be accessed via the HTTP protocol. The maturity and ubiquity of HTTP allows the average Web API to be primarily concerned with the naming of resources or functions, and the format and structure of data. In the world of Web 2.0 mashups, the great majority of APIs give access to data-centric services. It would appear, on the face of it, that there is a resonance with Museums which typically have structured data, or metadata, which is is of interest to others.

The term open API, while gaining some currency, might be considered a fairly poor one. APIs are, essentially, open by definition – we might better call them ‘Web APIs’. However, the use of ‘open’ in the Web context emphasises the point that the Web service in question offers some functionality which can be directly exploited by remote, third-party software. In most cases it implies that the data entities underlying the Web site can be accessed as structured data suitable for use in another system.

Some good examples of Open APIs of this sort on the Web include the API to Flickr (http://www.flickr.com/services/api/) and that of Twitter (http://apiwiki.twitter.com). Both of these are popular Web 2.0 services, each with a well documented API which is freely available for non-commercial use.

The Case For Using SaaS

Core and chore services

Harrison (2008) used the terms core and chore to differentiate between types of IT services managed within an organisation – a university in his case. Core activities or services are those which are part of the business focus of the organisation. Core services are where the organisation puts most of its creative energy and are often related to those areas in which the organisation competes with others. In a museum context, IT services related to the curation of collections, marketing, and event management might be examples of core services. Chore services, on the other hand, are those which the organisation is simply obliged to provide.  Many examples of this kind of IT service are common to most organisations; examples include payroll and finance systems, Web hosting, and the provision of ‘office’ tools to staff. Generally, the organisation’s priority in the provision of chore services is to seek to improve efficiency and reduce cost. In terms of budget and resource allocation, the organisation typically wants to invest more in core services, and correspondingly less in chore services.

Outsourcing IT, and paying for what you use

The SaaS model of software deployment and service delivery introduces a new opportunity for the organisation to out-source core services. SaaS and core services might fit naturally together because the service provider gains from delivering services to the widest possible user-base, while the service consumer is happier to outsource those services which are not its business focus.

The removal of the need for users to deploy, secure, upgrade and maintain complex systems on their own infrastructure can be a significant benefit. Beyond this, even the possibility of using powerful and ‘expensive’ software may be something which an SaaS model can provide to cash-strapped organisations which might otherwise not be able to afford such software in a more traditional packaged form. A significant difference between purchasing ‘off-the-shelf’ software and SaaS is that the latter introduces the possibility of a ‘pay-as-you-go’ model of payment: essentially, software can be leased instead of paid for as a straightforward purchase. In terms of budget planning, the reduced need for up-front investment in infrastructure (servers and more powerful PCs) coupled with the lack of a single large payment for software means that the cost of some IT services is no longer classed as capital expenditure, but becomes part of the organisation’s operational costs. This can have an impact in simplifying financial planning for the organisation.

Perpetual beta

The concept of the ‘perpetual beta’ level of service has become almost synonymous with Web 2.0 services, and is an aspect of much of contemporary SaaS. With this usage, the meaning of ‘beta’ has shifted slightly. When applied to traditional packaged software it means that the software is unfinished and still being developed – that it still contains bugs but that it is stable enough to use to some extent. Such software would tend to be released, at no cost, to a limited group of users with the expectation that they will find and report problems with it to the supplier. The use of beta in the context of Web-based SaaS implies that the software is subject to continuous change and improvement. A certain degree of stability is expected and implied, but the service provider might reserve the right to add or remove features.

The ability of SaaS to operate in a perpetual beta mode has ushered in a new model of software upgrade. Rather than the periodic deployment of patches and new versions of software - occasioning an often disruptive process of upgrading local systems – the SaaS model allows for the software to be updated with little or no disruption to the user. This also means that updates can be applied more frequently. If a problem is uncovered in the software, it can be fixed in situ – and all users gain the benefit of this fix.

Infrastructure as a Service

Many of what we might characterise as chore services in IT provision relate to infrastructure, including such services as data storage and Web service hosting. In the last 2-3 years a new form of SaaS has emerged offering infrastructural, rather than user-facing, services, which are becoming known as IaaS. Probably the most prominent vendor of IaaS and one of the first to offer such services is Amazon. Amazon offers a number of infrastructure services - prominent among these are its remote data storage service, S3, and its remote server deployment service, EC2. The S3 service allows the user to store and make available data on Amazon’s infrastructure. For example, a museum can elect to store digital photographs of its collections on this service. These are then made available in a standard way through the Web. The museum pays for this service in a very simple variable cost model, paying only for the storage space and the bandwidth it uses. If it needs more space for a new collection, this can be made available immediately. Amazon provides a highly resilient service, and additionally offers options for backing up data.

While the procurement of storage technology is not especially expensive at the outset, its maintenance can be costly, and it will need frequent replacement as well as, inevitably, expansion. With the use of a service such as S3, this upgrading becomes significantly easier to deal with as much of the management burden is adopted by the service provider. S3 has proven its reliability as an infrastructure service and now underpins a number of very popular Web 2.0 services.

With the subsequent introduction of its EC2 service, Amazon rounded out its offering by providing the platform to deploy software as well as data. EC2 allows the user to forsake  local hardware (especially servers), and to deploy software on one or more virtual servers running on Amazon’s infrastructure. There are a number of immediate potential benefits to this arrangement. The payment model for this service is very simple and linear according to use, so what can be a considerable capital expenditure (buying, configuring, maintaining and upgrading servers) can become a simple, recurring, operational cost. When more capacity is need, the service can be scaled-out (adding more virtual servers) easily and rapidly. Similarly, capacity can be reduced, perhaps when a service is retired, resulting in an immediate cost-reduction. EC2 is built upon S3, so the two services complement each other well.

In the UK, museums are often obliged to rely on the provision of IT services and resources by local government authorities. This level of infrastructure is rarely large enough to exploit the sorts of economies of scale available to Amazon, and it lacks the levels of control and flexibility which come from dedicated, local deployment. The potential benefits to be realised by utilising infrastructure ‘in the cloud’ will undoubtedly become increasingly interesting and attractive to museum management.

The Case for Providing Open APIs

“The coolest thing to do with your data will be thought of by someone else.”

This statement, attributed to Rufus Pollack (Walk 2008/07/23), is a provocative challenge to the established approach to data management. Until recently the effort involved in making data available to interested third-parties could be considerable as, unless the raw data was packaged up in some way and delivered, it would have to be processed and delivered through an application. In this sense, even a simple Web site is an application constituting some simple functionality (navigation, search etc.) built upon data. Without the expectation of some sort of return on this investment, the cost can be prohibitive. Applications, like Web sites, must be designed according to specific requirements or their value is considerably reduced; this means that those requirements need to be understood in advance. Different users may have different requirements and so a trade-off between general usefulness and specific utility is inevitable.

However, recent interest in making data available on the Web in a structured and open way introduces the possibility of a new way of thinking. Rather than trying to predict every possible use of your data, and building, or more likely failing to build, applications and services to facilitate this, an alternative approach is to accept that it is entirely possible some other party might find an interesting use for your data – even one which has never occurred to you. In terms of technical difficulty, the barrier to exposing data on the Web in such a way that it can be used by others is being lowered steadily. The widespread adoption of a few standard technologies, such as XML and JSON, for encoding and exchanging data, together with the ubiquitous HTTP, has played an important role in accelerating the development of software to process data described in this fashion. The mashup phenomenon is largely founded on these three technologies.

A recent attempt to accelerate the development of this approach in the museums sector can be seen in hoard.it (http://feeds.boxuk.com/museums/), a Web service which extracts collection data from museums’ Web sites using a process known as ‘screen-scraping’. Screen-scraping is an approach which attempts to extract semantics from (often loosely) structured Web pages. Crucially, the use of such a technique implies a lack of commitment by those managing the source Web page to maintaining the page in such a way that the screen-scraping approach will continue to work into the future. While screen-scraping can be effective, it is also time-consuming to develop and brittle in the sense that this approach can be broken by a simple change to the source Web page.

There is, therefore, a case to be made for designing Web services around APIs, with a view to exposing those APIs so that others may build more services. Notwithstanding the local value of designing a system using the principles of encapsulation afforded by APIs, there is also the potential benefit arising from a third-party’s use of these APIs. Another, more business-oriented way of looking at it is to view this approach as a ‘loss leader’ – an investment which creates the conditions for possible future return on that investment. For a museum, that ‘return’ might be a partnership with another cultural-heritage organisation, or an exciting combination of its collections data with some other data-source, creating value for all. It might simply be more exposure for its data, increasing awareness of the value of the museum’s collections. The creators of hoard.it express this idea thus:

The aim is that by syndicating your content out in a re-usable manner, whilst still retaining information about its source, an increasing number of third-party applications can be built on this data, each addressing specific user needs. As these applications become widely used, they drive traffic to your site that you otherwise wouldn't have received: "Not everyone who should be looking at collections data knows that they should be looking at collections data". (Ellis, M 2008)

Risks and Issues

Neither a move to reliance on SaaS solutions nor the opening up of data through Web APIs is without risks. In both cases, most issues arise from a real, or perceived, loss of immediate control.

Increased dependance on the software supplier

When purchasing software of any kind, it makes sense to make some appraisal of the supplier as software needs to be supported. Such dependance on a supplier carries a risk. For many years, one approach to mitigating this risk has been to use open-source-software (OSS). Especially when software was backed by a healthy development community, the risk of sudden loss of support of OSS is considerably less than it is with closed, proprietary software. However, a move to the SaaS model generally means a step back to using closed, proprietary software. It is interesting to note that, as reported by Johnson (2008), GNU creator and Free Software Foundation founder Richard Stallman considers ‘cloud computing’ to be “simply a trap aimed at forcing more people to buy into locked, proprietary systems that would cost them more and more over time”.

With SaaS, there is a real risk of the software,s being suddenly withdrawn. This can happen for two reasons: the supplier decides to stop providing the software, or the supplier itself stops trading. While the size of the supplier can be a factor in deciding its stability as a business, it appears to be no indicator of its likelihood to maintain particular services. Google is clearly a very large company, yet it has recently announced the end of a handful of its services, including Jaiku, Google Video and Google Notebook (Needleman, 2009), as it rationalises its portfolio. Customers of these services will be given a period of notice in which to consider their options and, if possible, extract their data so that they can import it into an alternative system. Such a migration of data, assuming it is even possible, can be costly and time-consuming.

Service level agreements

The service level agreement (SLA) is an arrangement which gives reassurance about the level of service being offered, and a route for redress should this level not be met. SLAs are frequently encountered in the case of remote Web-hosting, with hosting companies offering quantified assurances of ‘99.9%’ availability of your data. Such SLAs can be reasonably effective in providing a mechanism for risk management when using remote services. However, in the case of data hosting, and especially archiving, such quantified guarantees are less effective. After all, who would be reassured to be told that “we guarantee not to lose 99.9% of your data – we might lose the odd byte or two but we’ll compensate you for that if it happens”?

Data privacy

In cases where data is not ‘public’ or open access, remote hosting introduces risks associated with entrusting sensitive data with third parties. Aside from the question of the degree to which a consumer can trust an SaaS provider, the variety in international law around data privacy is an important factor. As a specific example, in the United States the Patriot Act gives the US Government the right, under certain conditions, to access data held by private businesses. If an organisation based in the UK uses a remote data storage service provided in the US, then its data is subject to the Patriot Act. Recognising that there are genuine concerns about this, Amazon, a US company, has opened data-centres in other countries, including some in the European Union. Customers of the S3 data-storage service can now choose where, in a geographical sense, they want their data to be stored – effectively selecting the jurisdiction which will apply to it.

Distributed services – chaining responsibilities

Lamport (1987) said, “A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable.”

Bearing in mind that this statement was made several years before the Web was even established, Steve (2006), in a post about Google’s sudden removal of its SOAP API to its search engine, proposed updating it to, “You know you have a distributed system, when a company you didn't know you had a relationship [with] changes their business plan and your application stops working”.

As we have identified, there are risks associated with even the relatively simple SaaS relationship of consumer and service-supplier. However, as the viability of using remote services and data is increasingly recognised, we can expect to see such relationships become more complex and, crucially, to involve more parties. If it is not a trivial task to establish a relationship of trust between supplier and consumer, how difficult will it become if the supplier is also a consumer of a second supplier, and so on? Those percentage-based SLAs start to become very difficult to calculate once there is a chain of dependency. There are signs of this issue already starting to affect services on the Web. On the 15th February, 2008, Amazon’s various infrastructure services suffered major problems resulting in them being off-line for a period. One service which uses S3 is Twitter, a Web-based communications tool growing in popularity.  Twitter was therefore adversely affected by the problems with S3. However, there is already an ecosystem of tools and services which add value to the Twitter service. Stone (2007) claims that the Twitter API has in order of magnitude more traffic than Twitter’s own Web site. So a very significant number of people using a multitude of mainly unrelated tools and services were affected by the problems with S3.

It is an article of faith that all software systems will fail at some point. No one system should be singled out just because it has not maintained perfect reliability. But an increasingly distributed system like this clearly requires new models of risk assessment and management.

Being open – managing usage

An obvious risk with providing open APIs to data services is the possibility of increased demand for those services. Predicting future demand, with a view to planning capital investment in new hardware, for example, is difficult if a strategy of ‘open up and see what happens’ is adopted. The supplier of data through open APIs also needs to know something about who is using the data and for what purpose. In the case of a museum, it may well need to justify the expense of maintaining such data services. A popular approach to managing this issue with Web 2.0 services is to issue an API ‘key’ to prospective users. This has the effect of establishing a more explicit relationship between supplier and consumer, and allows the supplier to manage usage and load on the system. In the event of some sort of abuse of the service, an API key can be rescinded. Strategies of throttling bandwidth usage of APIs are common also.

Recession

While we have already briefly covered the risk of reliance on a supplier (and the suppliers it depends on too) for services, we should note that the risk of suppliers disappearing suddenly is increased in a time of global economic recession. In August 2008, just as the current recession was starting to look like a certainty, I suggested that:

...the vast majority of Web 2.0 companies are a fraction of the size of Google. As it is, many Web 2.0 services appear to exist with no visible means of support, other than venture capital. I imagine that venture capital can become harder to find in a period of economic down-turn. Much Web 2.0 service delivery is supported through an advertising model, relying on a revenue stream coming from a small percentage of advertisements ‘clicked’ on. Again, perhaps people are less likely to respond to advertisements in a recession….? (Walk 2008/08/17)

Recession will affect museums like any other organisation or business. It may be that budgetary pressures force the issue – that such risks are worth taking for the potential costs saved. This in turn may help to preserve certain SaaS suppliers from business failure. Massie (2009) suggests that “SaaS vendors cannot be considered ‘recession-proof’.  Perhaps ‘recession-resistant’ would be more accurate”, pointing out that SaaS suppliers are vulnerable to a sudden loss of customers in a way that packaged software vendors are not.

Why This Is Only the Beginning

Clearly, over the last several years there has been an explosion of Web-based functionality, loosely characterised as Web 2.0. Initially operating at the veneer of dedicated user-interfaces, Web 2.0 has increasingly come to depend on open APIs. These APIs are allowing a convergence between the Web, the users and the many different devices they might use. A quick check on Apple’s iTunes application store reveals more than thirty iPhone applications which make some use of the Twitter API.

The new Web-based SaaS model needs time to mature, and it will be fascinating to observe how SaaS and Web APIs develop over the next few years. If these models survive the expected economic climate, then it may well provide the crucible from which they emerge thoroughly tested and stronger than before.

We have been in a similar situation before:

Once we got past the recession at the end of the dot-com bubble in the first years of this century, the notion of an open-source operating system had reached a level of sufficient maturity for it to enter the mainstream. Web 2.0 services and SaaS as a viable, mainstream approach will likely reach similar levels of maturity in time. (Walk 2008/08/17)

References

Ellis, M. and D. Zambonini (2008). FAQ for Hoardit. Consulted 2009/01/30.  Available http://hoardit.pbwiki.com/Frequently+Asked+Questions+(FAQs)

Johnson, B. (2008). Cloud computing is a trap, warns GNU founder Richard Stallman. Consulted 2009/01/30.  Available http://www.guardian.co.uk/technology/2008/sep/29/cloud.computing.richard.stallman

Laird, P.(2008). Visual Map of the Cloud Computing/SaaS/PaaS Markets: September 2008 Update. Consulted 2009/01/30.  Available http://peterlaird.blogspot.com/2008/09/visual-map-of-cloud-computingsaaspaas.html

Lamport, L. (1987). Email. Consulted 30/01/2008. Available http://research.microsoft.com/en-us/um/people/lamport/pubs/distributed-system.txt

Massie, P. (2009). Is SaaS Recession-Proof? Consulted 2009/01/30.  Available http://www.glgroup.com/News/Is-SaaS-Recession-Proof--31558.html

Miller, P. (2008). XTech Day 3 – Rufus Pollock and Jo Walsh talk about ‘Atomisation and Open Data’. Nodalities blog. Consulted 2009/01/30.  Available http://blogs.talis.com/nodalities/2007/05/xtech_day_3_rufus_pollock_and_.php

Needleman, R. (2009). Google killing Jaiku, Dodgeball, Notebook, other projects. Consulted January 20, 2009.  Available http://news.cnet.com/8301-17939_109-10143245-2.html?tag=mncol;txt

Steve (2006). No model, just view. Consulted January 20, 2009.  Available http://www.1060.org/blogxter/entry?publicid=303B91C59A56BB10798BB9739CE80131

Stone, B. (2007). Interview with Biz Stone, Co-Founder Twitter. Consulted January 20, 2009.  Available http://readwritetalk.com/2007/09/05/biz-stone-co-founder-twitter/

Walk, P. (2008/07/23). “The coolest thing to do with your data will be thought of by someone else”. Paul Walk’s Blog. Consulted 2008/01/30.  Available http://blog.paulwalk.net/2007/07/23/“the-coolest-thing-to-do-with-your-data-will-be-thought-of-by-someone-else”/

Walk. P. (2008/08/17) “Did Google just make me look like an idiot?”. Paul Walk’s Blog. Consulted 2008/01/30.  Available http://blog.paulwalk.net/2008/08/17/did-google-just-make-me-look-like-an-idiot/

Cite as:

Walk, P., Software as a Service and Open APIs. In J. Trant and D. Bearman (eds). Museums and the Web 2009: Proceedings. Toronto: Archives & Museum Informatics. Published March 31, 2009. Consulted http://www.archimuse.com/mw2009/papers/walk/walk.html