MW98: PAPERS

Archives & Museum Informatics

info @ archimuse.com

www.archimuse.com

published April 1998
updated Nov. 2010

Play it Again, SAMI -- Finding a Tune Museums Might Learn to Hum

Jim Blackaby, Senior Systems Developer, United States Holocaust Memorial Museum

Synchronized Accessible Media Interchange Format

SAMI is an interchange format used to support closed captioning and audio descriptions to provide extended accessibility to digital media.

In its broadest sense, closed captioning provides an alternate and synchronous presentation of some form of media that has time as one of its components. Most usually, this is a transcription of words to accompany a video stream, but it might be extended to include images to accompany a musical stream, translations of spoken audio texts. Most usually, closed captioning appears as a text overlay at the bottom of a screen, but it might be extended to include multiple elements appearing anywhere within a presentation format.

Audio description is apparently less familiar than closed captioning. It involves some kind of narrative overlay to describe visual materials. Currently, this is interpreted as creating text descriptions of video materials to make the visual materials available to blind audiences. In the past, however, the techniques of audio description were commonly experienced by anyone listening to a sports or live news event on the radio.

SAMI is an open interchange format, but in the examples you can view from here, it is realized for the web with the Microsoft ActiveMovie control and Explorer 4.0. At its core, it is a very simple, and it is intended to be something that can be replicated in many environments.

At its core, SAMI depends on some kind of timer or timed object such as a digital sound or video file that can be used to trigger events. Whne those events are triggered, something happens. That's about it, really. The "something" may be the display of closed caption text, but it can be other things as well.

What Makes SAMI run?

SAMI depends on the convergence of several acronymic developments on the WWW:

DOM (Document Object model)
CSS (Cascading Style Sheets)
VBS or JS (Visual Basic or Java scripting)
Tabular Data Control (Data Binding, vague enough not to need an acronym)
DHTML (Dynamic HTML)

DOM (Document Object model)

As early as Netscape 2.0, browsers were enabled not only to display the materials that passed through them but to create an index of each of the "objects" that they were displaying. These objects were defined as anything that was surrounded a pair of opening and closing HTML tags, such as <h2>Everything in this heading plus its tags are an object</h2>. For the browser's own purposes or peace of mind, it kept track of the fact that there was an object that was surrounded by <h2> . . . </h2> tags, that it had some sequence in the document that was being displayed, that it had some particular position - absolute or relative - on the screen, and so on. In addition, and this is the important part, each object was allowed to have a specific identifying name and to be perceived as part of a class of similar things, though these are optional.

So, the heading mentioned above might have a unique identifier of "subtitle" and it might be in the class of things that was to be displayed in blue Garamond old style text. Its presentation would then be:

<h2 id="subtitle" class="bluestuff">Everything in this heading plus its tags are an object</h2>.

Rather than having to refer to this object as "the fifth item in the DOM list" which you and I might have had trouble knowing anyway, it can be called by its unique id, "subtitle."

This allowed specifically manipulating the characteristics of the object in the DOM's view of the world. It might have thought that this object was to be displayed as the fifth object on the screen all the way to the left. But for browsers that allow it (and not all do all the time or for everything), the DOM's view could be modified to present this object as, let's say, the second thing presented, all the way to the right of the screen. This could be done if the browser allowed it. For reasons probably having to do with performance and "yeah? So what??" the possibility of manipulating objects on the web page was not widely adopted. Internet Explorer 4.x is the only browser that makes all objects able to be exposed to manipulation in this way.

CSS (Cascading Style Sheets)

The underlying principle of cascading style sheets allows certain attributes to be attached either to documents or to individual HTML objects that might affect their appearance. This can be done in a global declaration of style attributes at the beginning of a document. The fact of these style sheets five greater freedom in managing the appearance of objects on an HTML page, but it also makes managing style much easier, because one only has to change the style sheet rather than individual items.

One way of expressing things in the style sheet is to assign stylistic values to tags. To make all ordered lists in an otherwise uninspired document appear in red, arial type, the ordered list tag would be given the following definition in the <style> portion of a document:

<ol> {color:red; font-family:arial}

Another way of expressing things in the style sheet is to assign stylistic values to classes. Then, objects that are defined as being in those classes will take on the class stylistic attributes rather than whatever they might normally have. This would be expressed in the <style> portion of a document as:

.bluestuff {color:blue; font-family:garamond}

for instance. The heading in mentioned in the previous section <h2 id="subtitle" class="bluestuff"> would follow the rules of the bluestuff style characteristics. Ordered lists that were not assigned to any class would be in red arial, but if one ordered list were specified as being <ol class="bluestuff"> it would follow the rule for the class rather than its more general object definition.

VBS and JS (Visual Basic and Java scripting languages)

These scripting languages allow programmatic things to happen to pages that are otherwise pretty static when certain events occur. The things that can happen are limited to simple things that can be managed by sets of instructions put in scripts, but that does not make this a trivial approach to making things happen. Among the things that scripts can do is to make things happen to objects that it can identify. This is generally done by referring to the id's that DOM made available. One thing that a script might do, for instance, would be to find the object named "subtitle" and change its style (currently it is blue, you'll recall) characteristics so that it displays as green. Another thing a script might do is to modify the content of that object changing the text between the tags that defined it as an object to be "Hey, How Did that Happen?" instead of "Everything in this heading plus its tags are an object"

Tabular Data Control (Data Binding, vague enough not to need an acronym)

I think that SAMI uses this. If it does not use this, it uses something very like it.

Tabular Data Control allows taking an outside data source that is in a specified format and attaching it to a particular object on a web page. If the Data Table has, for instance, a list of painting titles, it could be attached to the object with the id of "subtitle" we have been discussing, and then a script could be written that would have the contents that was written between the two tags change every time there was some event like a mouse click.

DHTML (Dynamic HTML)

Dynamic HTML kind of glues all of the above together in a structure that enables all of these tricks and also is always reflecting on itself so that if items in the DOM listing have been modified - like their class has been changed or the text between the tags in an object has been specified to be different - those changes are reflected on the screen.

One more necessary thing that has no acronym:

A clock or something that works like one.

When SAMI is running what happens?

Lots of things are possible, but the simplest to imagine is closed captioning. As a piece of digital media runs in a web page, words appear beneath the picture that reflect what is being said in the media.

To make this happen you need to have a piece of media that has time codes that can be identified, a transcript of what is said on the media with an idea of how the transcript relates to the time codes, and a little thoughtful patience.

The media is embedded in a web page as an object, and the transcript is marked with tags to identify each passage that is to appear as closed captioning. The text is then made invisible, and the tags are bound to an external SAMI file. Each tag is given an id that corresponds to its time in the media stream. So, you might have something that looked like this:

<span id=1 class="cc-eng">This is the beginning of my talk</span>
<span id=1000 class="cc-eng">which is being measured in milliseconds</span>
<span id=2000 class="cc-eng">so that it seems like lots is happening.</span>
<span id=3200 class="cc-eng">Actually, it is not long enough</span>
<span id=4100 class="cc-eng">to fall asleep to.</span>

The bound file has some SAMI administrative stuff including parameters that pointed to the piece of media and that indicated its length and the time units. In addition, it defines a CSS class called ENUSCC (for English-American Closed Captions) and has as its body:

<SYNC Start=1>
<P Class=ENUSCC><SPAN ID="1"></SPAN>
<SYNC Start=1000>
<P Class=ENUSCC><SPAN ID="1000"></SPAN>
<SYNC Start=2000>
<P Class=ENUSCC><SPAN ID="2000"></SPAN>
<SYNC Start=3200>
<P Class=ENUSCC><SPAN ID="3200"></SPAN>
<SYNC Start=4100>
<P Class=ENUSCC><SPAN ID="4100"></SPAN>
<SYNC Start=4900>
<P Class=ENUSCC>END

What seems to be going on is that as the media plays, it sends time codes to this SAMI file, and when a time code is passed, it sends the name of the object that ought to be on view (via something like Tabular Data Control), and when that is received, a script makes the object that is connected with the one that has been sent visible.

It is pretty simple, really.

So what?

There are several amazing things that this technology enables:

You can quickly and easily provide synchronous closed captioning to media events -- any kind -- songs, video, video games that have timers, visits into museum gallery space if there is some kind of timing mechanism or triggering mechanism that fires when something happens (like a long look at an image). This is great for everyone.
You can make anything visible. Text is the easiest, but there is no reason why you can't show pictures, change the text on the screen, or do any of a number of other simple multi-media like things.
You can attach SAMI files in multiple languages so that this mechanism can become one for providing multi-lingual access.
You can use the technology "backwards" Just as you can play a media file and have it display text by virtue of having the text marked as DOM objects, you can write a script that will identify a part of the text that has been selected and have that launch the media.

These are all wonderful tricks, but besides that, they go far to provide useful techniques to help us speak to our audiences. Those who have worked with issues of access and special needs will not be surprised to see that the techniques of reaching a wider audience serve the interests of all.

Where to Get It

SAMI has been developed jointly by WGBH and Microsoft among others as an open interchange format. You can read more about it and the other access programs of Microsoft at: http://microsoft.com/enable/. SAMI is more of an idea than an "it" that you can get. What is important is to think about how you can use this approach, and after you've played with it a while, you'll find that you've got it!

David Bolnick (davebo@microsoft.com) is the head of the accessibility program, and he is an enthusiastic supporter of the possibilities of these technologies. At the present time, the SAMI format is being integrated into Microsoft's active controls. But, it is important to recognize that this is really a very straight forward format. At the USHMM, we expect to be writing our own versions of SAMI as part of making use of video cards that use their own media players. Using the MS Direct Show control is easy because it is available, but this approach to linking text and media is by no means limited to Microsoft products. Indeed, they should be given due credit to putting the resources behind providing these kinds of tools for all audiences.

What has been most useful to me is taking apart the samples that Dave Bolnick has made to see how they work. There are many ways of doing this. I am happy to share my own experiences. My address is jblackaby@ushmm.org and Bob Twitty who works with me (rtwitty@ushmm.org) can provide information about some of the issues of incorporating these techniques into other API's and other media players.

This file can be found below http://www.archimuse.com/mw98/
Send questions and comments to info@archimuse.com