MW-photo
April 15-18, 2009
Indianapolis, Indiana, USA

Museums and Cloud Computing: Ready for Primetime, or Just Vaporware?

Charles Moad, Edward Bachta, and Rob Stein, Indianapolis Museum of Art, USA

Abstract

The promise of distributed computing has long been touted by computer scientists working on Grids or in Clouds, but is the realization of these goals finally at hand? Recently several commercial offerings have generated a lot of attention for their potential to revolutionize the way business computing is conducted. The field has already seen several interesting and innovative business models spring up with their core computing process firmly anchored in the cloud (Horrigan, 2008). Does the same hold true for museums? What are the benefits and risks associated with moving our institutional computing to the cloud? Is cloud computing a viable option for hosting the rich media and content common to many museums today, or is it still vaporware in need of more time? This paper discusses these questions and poses suggestions regarding how museums can begin to utilize cloud computing services.

Keywords: cloud computing, Web services, distributed computing, streaming video, ArtBabble

Introduction

Cloud computing is a natural progression in the utility of computing. Early computers required that many users shared a single console. The advent of personal computing brought the convenience into our homes. More recently, the Internet has changed the ways we connect to information and each other. Museums provide access to their collections and programs by hosting Web sites and applications for the public to use. Many museums manage these servers themselves or through vendors which can prove to be expensive and time-consuming. Cloud computing now allows us to offload this burden, which in turn can save museums time, energy, and money. Though many on-line users may not realize it, over half of them are already using cloud resources (Horrigan, 2008) in some shape or form.

Academics have been touting the promise of distributed computing models for many years. The complexity and unreliability of these grid computing models prevented content providers from hosting production quality content on them. In recent years commercial vendors have appeared with computing services that offer service level agreements and guaranteed uptime. In addition to the requisite reliability needed for production systems, cloud computing services now also offer simple interfaces for managing and administrating these systems. These commercial solutions also provide far cheaper alternatives for museums to host their software systems on (MacManus, 2009). One might wonder why this is the case. Perhaps we can think of Cloud Computing like other utilities we already pay for, such as natural gas or electricity: it is far cheaper to pay only for what you use and offload the maintenance and reliability issues to the gas or electric company. Likewise, many smaller museums do not employ the expertise required to run a server room, nor do they have the budget to maintain it. Cloud computing offers a chance for even the smallest of museums to take advantage of world class computing hardware and IT management infrastructures, while paying for only the resources they actually use.

Cloud Computing Survey

Cloud computing offerings can be loosely assigned to two specific categories: Software as a Service (SaaS) and Platform as a Service (PaaS). SaaS systems require users to deploy software according to a particular interface in order to host applications. PaaS systems typically allow access to virtual hardware devices that do not restrict the software toolsets that can be used. Both approaches come with distinct advantages and disadvantages. By providing API only access, SaaS implementations can provide features for the developer to build upon, scalability being among the most important. Allowing the Cloud to control where your data is stored and where your applications are hosted via a SaaS system removes a great amount of workload and complexity for the developer. These benefits do not come without a price, however. SaaS systems can affect the portability of your application, making it difficult to migrate between system offerings. Having your application tied to a specific API means the cost penalty to move your application to another service becomes quite large. PaaS systems do not suffer from this portability issue because they offer hardware infrastructure as opposed to API’s. PaaS systems give developers and system administrators complete control over the devices which house their applications. Moving your application from one service provider to another might be as simple as copying your virtual machine image between services.

Google App Engine (http://code.google.com/appengine) is a primary example of a SaaS system. It offers developers API access to data storage, user authentication, caching, and more. Applications are written using Python, the preferred programming language of the system, while the cloud takes care of scaling and load balancing issues for you. This easy scalability comes at the price of flexibility. Google’s AppEngine restricts the developer to the execution of pure Python applications. Developers cannot access local files or sockets, and user software can only run in response to a Web request for a few seconds. These restrictions prohibit any form of scheduled processing or large batch operations. Given its current offering, Google claims that developers can provide roughly 5 million page views per month for free. While the price is right, portability becomes very difficult. If Google decides App Engine is no longer viable to their business model, moving to a different service would require a complete rewrite of all application software.

Azure Services Platform (http://www.microsoft.com/azure/) is Microsoft’s offering in the cloud computing space. It is a collection of services that aim to ease the transition to the cloud for developers by using toolsets they are already familiar with. Using developer tools such as the .NET library and Visual Studio, users can utilize on-demand computing and storage services. This hybrid approach allows developers to host applications in house, while off-loading some resources as cloud services. In addition to these resource offerings, Microsoft adds value to these services by providing hosted solutions for Exchange, Office, and SharePoint.

Amazon Web Services (AWS – http://aws.amazon.com) is the premier cloud computing platform. It was launched in 2002 and offers the use of the same global data centers that Amazon uses to host its own Web site. AWS offers everything from traditional infrastructure services such as computing and storage to the more non-traditional services such as credit card transaction services, a content delivery network, and human intelligence processing. All of these services are accessed through the Web using SOAP or REST protocols.

The infrastructure of Amazon’s Web Services consists primarily of the Simple Storage Service (S3), an unlimited data storage facility, Cloudfront, a content delivery network for data stored on S3, and the Elastic Compute Cloud (EC2), a scalable virtual machine hosting platform. In addition to these, there is a database service called SimpleDB, and a primitive queue messaging service called the Simple Queue Server (SQS).

Real world companies have been using Amazon’s Web Services for quite some time to reduce costs and routine maintenance. The Web software development firm 37signals uses S3 to store well over 1 terabyte of data (Amazon, 2009) for their popular project management system called Basecamp. The Indianapolis Motor Speedway uses S3 as well to house a century’s worth of digital images (Amazon, 2009). In addition, they use EC2 to host their Web sites, stream live events, and drive live scoring applications. Paying for only what you use is vital to the Indianapolis Motor Speedway since they see extremely large spikes right before and during an event, but otherwise have relatively moderate Web traffic.

Cloud computing offers advantages and disadvantages for museums. The Cloud’s pay-as-you-go model and the embedded costs for energy and maintenance of these systems make price a clear advantage. Scalability and ease of deployment is perhaps the most enticing benefit of using Cloud Computing services. If a physical server were to run out of disk space at your museum, it could take days to get new drives shipped. In the cloud, virtually infinite amounts of disk space are available at a moment’s notice. Server machines local to your museum frequently feature hot-swappable or redundant disk arrays (RAID), but these systems are expensive and require staff to maintain them. In the rare case that a physical machine went down on Amazon’s EC2, you could start a new instance of your machine within minutes using your virtual machine image. S3 becomes your redundant backup for not only your data, but also the actual configuration and operating system of the machine itself.

In addition to the operational advantages of cloud computing, the environmental impact of using services in the cloud is another compelling reason museums might wish to make the switch. The ability to leverage the efforts of data center managers to improve the energy efficiency of their systems means that museums can depend on others to help reduce their own energy footprints. Google, for instance, has been proactive in designing energy efficient data centers (Google, 2009). They monitor their systems rigorously and have demonstrated that by using chilling towers; they are already running at efficiencies that the EPA deemed optimistic for 2011 (Google, 2009). Meanwhile, joint research by Pennsylvania State University and Microsoft has explored the impact of application consolidation on energy efficiency (Srikantaiah, 2008). This paper describes a simple energy-optimizing algorithm for allocating resources in a cloud, but they note a number of complications that need to be addressed to handle optimization practically in real data centers. Because reducing energy use saves money, it is in the economic and social interests of service providers to continue researching and implementing new methods to improve efficiency.

Museums must also understand some of the disadvantages and limitations of cloud computing before making the decision to move some of their facilities to the cloud. Data security is always an issue when storing your data on machines you don’t control. There are many types of museum data which are intended for public consumption. For these types of data security is less of a concern than it might be for most businesses. When storing data in the cloud, physical security is limited to that of the provider’s server rooms. Data will more than likely be stored on shared machines and may be intermixed across physical disks with information from other customers. Data transfer in and out of the cloud will pass over the Internet and therefore be subject to whatever level of secure transfer is used for that. For museum data meant for public consumption, many of these risks are moot; however, museums should tread carefully when considering moving more sensitive data into the cloud until more appropriate tools for data security are developed. The existence of appropriate backup solutions is more of a problem for museums wishing to make the jump. It is not enough to trust that the service provider will safeguard your data. Museums will want to ensure that cloud based data storage is also integrated with a larger institutional strategy for data backup and retention. Museums will also want to consider the intended use of the Web service with regard to bandwidth. Since bandwidth costs are variable with most cloud computing services, applications which require frequent data transfer between the cloud and the museum might incur significant bandwidth charges. This will need to be considered in comparison to other cost savings to determine whether or not a locally hosted service may be more cost efficient.

Case Study: ArtBabble.org

During late 2008, the Indianapolis Museum of Art worked hard to create an on-line video Web site dedicated to art related content. ArtBabble (http://www.artbabble.org) runs entirely in the cloud and allows streaming of high definition video content in a scalable and cost-effective manner. Using services provided by Amazon’s Web Services, costs are minimized by only paying for bandwidth and storage used monthly. All videos are stored on S3 and most are publicly available for download. As displayed in Figure 1, the Web site is hosted on an EC2 instance running a virtual machine image of a typical LAMP (Linux-Apache-MySQL-PHP) stack using Drupal (http://www.drupal.org) as a content management system.  Video streaming is provided by running EC2 instances of images provided by Wowza Media Systems (http://www.wowzamedia.com). Wowza’s servers can stream content directly from S3 to end-users. Scaling is as easy as starting and terminating instances according to need. Streaming video offers advantages over progressive downloads, since a user can jump anywhere in the video immediately, and network usage is optimized. This in turn minimizes the bandwidth costs incurred by ArtBabble.

Figure 1

Fig 1: Network Layout for ArtBabble

Amazon’s EC2 offers a variety of instance types that vary the amount of memory, local storage, and network and disk I/O performance. In order to determine how to get the most performance for the least amount of money, we performed a benchmark test for running Drupal on the different instance types. The results of this benchmark can be seen in Table 1. Each instance was launched with a stock Amazon Machine Image running Fedora 8. Only Apache, Mysql, and Drupal were installed. The Drupal module, Devel, was used to generate 500 nodes with 10 comments per node. Using the Apache HTTP server benchmarking tool, (http://httpd.apache.org/docs/2.0/programs/ab.html) and a separate EC2 instance, 1000 requests with 3 concurrent requests at a time were made to the default Drupal front page. Using the results from the benchmark we can easily see that the c1.medium instance type gives the greatest performance. In addition, the m1.small instance type was found to have intermittent performance since it is given lower priority to network resources. Sites with moderate traffic should avoid hosting their Web site on this instance type.

Instance Type

m1.small

m1.large

m1.xlarge

c1.medium

c1.xlarge

m1.small (2x)*

c1.medium (2x)*

Responses / Second

9.02

24.48

40.77

44.50

27.85

10.94

24.92

Architecture

32-bit

64-bit

64-bit

32-bit

64-bit

32-bit

32-bit

EC2 Compute Units

1

4

8

5

20

2x1

2x5

Price / Instance Hour

$0.10

$0.40

$0.80

$0.20

$0.80

$0.20

$0.40

Price / Million Responses

$3.08

$4.54

$5.45

$1.25

$7.98

$5.07

$4.46

* 2 EC2 instances were run with one hosting the database and the other hosting the web server. This is a common server layout for improving performance.

“One EC2 Compute Unit (ECU) provides the equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor.” – AWS

Assuming a complete saturation of requests and excluding bandwidth costs.

Table 1: EC2 Instance Type Performance

The majority of the monthly charges for ArtBabble come from the cost of running EC2 instances. In fact, at this time, less than $20 of the total monthly bill comes from data storage and transfer. Table 2 offers a detailed breakdown of all monthly charges incurred. It should be noted that ArtBabble has not been widely launched and it is assumed that bandwidth usage will increase dramatically. Using these figures, we can calculate that serving 3 minutes of video at standard quality to one million users would cost roughly $2,000 in transfer fees.

Application Name

Usage

Cost

ArtBabble Web Server (m1.medium)

$0.20 per High-CPU Medium Instance

744 hours

$148.80

$0.100 per GB Internet Data Transfer In

0.500 GB

$0.05

$0.170 per GB Internet Data Transfer Out

2.000 GB

$0.34

$0.10 per GB-month of provisioned storage (EBS)

7.000 GB

$0.70

$0.10 per 1 million I/O requests

700000 IOs

$0.07

$0.15 per GB-Month of snapshot data stored

1.500 GB-Mo

$0.23

Wowza Web Server (1 – m1.small)

Recurring monthly charge (charged by Wowza)

 

$5.00

$0.14 per Small instance-hour

744 hours

$104.16

$0.12 per GB of data transfer in

1.000 GB

$0.12

$0.20 per GB of data transfer out

60.000 GB

$12.00

S3 Storage

$0.100 per GB in

20.000 GB

$2.00

$0.170 per GB out

12.000 GB

$2.04

$0.01 per 1,000 PUT, COPY, POST, or LIST requests

10000 Requests

$0.10

$0.01 per 10,000 GET and all other requests

20000 Requests

$0.02

$0.150 per GB storage

10.000 GB – Mo

$1.50

Total $277.13

Table 2: Sample Monthly Costs for ArtBabble

Museums can reduce their costs dramatically by hosting their Web sites and applications in the cloud. Most would only require a LAMP stack and not have the video storage or streaming needs of ArtBabble. Removing the video requirements would drive costs below $150 per month. Amazon Web Services has become mature and stable enough to host mission-critical applications, and it also provides the portability needed in case a better solution comes along.

Conclusion

As discussed in this paper, Cloud Computing technologies have reached a level of maturity which allows museums to take advantage of some significant improvements in many areas as compared to existing practice. The utility model of computing and storage provided by a pay-as-you-go business model allows museums to recoup costs associated with lightly utilized server resources. The environmental impact of moving computing resources into more efficient server environments and leveraging the economic incentives for service providers to decrease their energy costs allows museums to see positive changes concerning their energy footprint by adopting these technologies where possible. Advantages in scalability and ease of deployment of server resources dramatically simplify the administrative overhead for services which feature variable demand, or have very high uptime requirements. The middleware software tools provided by some service vendors provide time-saving and robust shortcuts for developing enterprise level tool suites for museums.

Museums will still want to think carefully about migrating data and services which require a higher level of physical and information security. A suitable long-term data preservation plan will need to be well understood, and bandwidth considerations for applications which require frequent or intensive data transfer between the museum and the cloud will need to be justified.

That being said, there are many classes of applications which are good candidates for migration to the cloud. Certainly most museum Web sites could benefit from the flexibility that cloud computing provides. Considering that the majority of that information is meant for public access, issues of information security are not as much of a concern. Cloud-based storage solutions coupled with offsite storage strategies could provide a robust and cost effective on-line data replication facility for the vast numbers of images and multimedia produced in many museums. Applications such as those for booking tours, events, classes, and some e-commerce applications (with the exception of credit card processing) are also probably good candidates for migration. The flexibility of deployment using cloud computing services enables us to begin to think about hours within which these systems might be used. It would be possible, for instance, to think about a space scheduling application which only runs during normal business hours, thereby saving the museum up to 66% of the cost of running a server for this purpose. Virtualization technologies in the machine room will offer some of these same benefits, but still require that the hardware and virtualization software be purchased, maintained, licensed, and managed by the museum.

In short, there are already many ways in which museums can take advantage of these tools for improving our practice. The business cases presented in the commercial market will continue to push this technology to solve some of the inherent problems related to security and backup. As these tools continue to mature, museums will certainly be there to take advantage of them

References

Amazon.com (2009). Case Study: 37Signals. Retrieved Jan. 19, 2009, from http://aws.amazon.com/solutions/case-studies/37signals/.

Amazon.com (2009). Case Study: Indianapolis 500. Retrieved Jan. 19, 2009, from http://aws.amazon.com/solutions/case-studies/indianapolis-500/

Google.com (2009). Efficient Data Centers. Retrieved Jan 19, 2009 from http://www.google.com/corporate/green/datacenters/step2.html

Google.com (2009). Data Center Efficiency Measurements. Retrieved Jan 19, 2009, from http://www.google.com/corporate/green/datacenters/measuring.html

Horrigan, J. B. (2008). Cloud Computing Gains in Currency. Retrieved Sep. 24, 2008, from http://pewresearch.org/pubs/948/cloud-computing-gains-in-currency.

MacManus, R. (2009). Report: Cloud Based Email Cheapest Option for Most Companies. Retrieved Jan. 6, 2009, from http://www.readwriteweb.com/archives/cloud-based_email_cheaper.php.

Srikantaiah, S. et al. (2008). Energy Aware Consolidation for Cloud Computing. USENIX HotPower'08. http://research.microsoft.com/pubs/75408/srikantaiah_hotpower08.pdf

Cite as:

Moad, C., et al., Museums and Cloud Computing: Ready for Primetime, or Just Vaporware?. In J. Trant and D. Bearman (eds). Museums and the Web 2009: Proceedings. Toronto: Archives & Museum Informatics. Published March 31, 2009. Consulted http://www.archimuse.com/mw2009/papers/moad/moad.html