An in Depth Analysis of HTML5 Multimedia and Accessibility

In this tutorial, you’ll learn how HTML5 helps to provide you with several ways of presenting your media content to users. As a result, you’ll increase the availability of your media to users with different
needs and requirements, making it more accessible.

This tutorial comes courtesy of the recently released HTML5 Multimedia book.

Media and Potential Accessibility Issues

I’d strongly encourage you to think about making your content accessible...

When thinking about the users who will be attempting to view your media content, you might make a number of assumptions:

  • Users will view your content on a desktop, laptop, tablet, or phone.
  • Users will have some way of listening to the audio of your content, be it via
    headphones or speakers.
  • Users will be able to understand the language in which you deliver the media.
  • Users will be able to successfully download and play your media.

All are fairly reasonable assumptions to make and most likely cover the vast majority of users who will want to access your content. You may be happy with your content being accessible to these users only; after all, majority rules, doesn’t it?

Well, I’d strongly encourage you to think about making your content accessible to users who do not fall into the category of the assumptions just listed. Who are these viewers? They include:

  • Users who have a sensory impairment that prevents them from listening to your content’s audio or viewing video.
  • Users who don’t understand the language the media is delivered in.
  • Users who use devices such as screen readers and/or use keyboards to access
    media content on the web.
  • Users who can’t successfully hear or view your content due to the environment they are in or because of device limitations.

Because most media content will usually include some audio, not being able to hear or understand the audio it contains is quite a showstopper in comprehending the content’s message and information.

Equally, being able to access the content through a device such as a screen reader but then not being able to actually use it due to the media controls not being properly set up (e.g., for keyboard access) would annoy any user.

You’ll explore the accessibility of media controls later in this tutorial. You’ll also take a look at what HTML5 brings to the table in an attempt to address the issue of users being unable to see, hear, or understand your media content. But first, let’s take a quick look at what led to HTML5’s attempt to confront this accessibility problem — SRT.

A Brief Look at SRT

SRT is an existing file format for containing video subtitles and their timings.

SRT is an existing file format for containing video subtitles and their timings. An SRT file is often produced automatically using a Windows program called SubRip, which uses optical character recognition (OCR) to obtain the subtitles from the specified video source.

The SubRip file format is a basic text file with the .srt file extension that follows a basic format:

  • Subtitle number
  • hh:mm:ss,msmsms —> hh:mm:ss,msmsms
  • Subtitle Text (one or more lines)

Each subtitle set begins with a unique subtitle number, followed by the start and end timestamps of the timing the subtitle represents on a separate line, which is then followed by one or more lines of subtitle text. Each subsequent subtitle set is separated by a blank line. The timestamp format hh:mm:ss,msmsms specifies the hours, minutes, seconds, and milliseconds of the time in question. Note that the millisecond separator is a comma.

An example of such a file follows:

The SRT file format is quite popular and is often the format that video subtitles are released in. This file format isn’t currently used as part of HTML5’s attempt to tackle accessibility, although it was to begin with but has now been extended and given a new name, WebVTT.

Introducing WEBVTT

WebVTT (Web Video Text Tracks) is a file format that is intended for marking up external text tracks. It was initially part of the WHATWG and the W3C HTML5 specifications, and was an extension of SRT called WebSRT (Web Subtitle Resource Tracks). But the W3C was concerned that HTML5 should be independent of any chosen captioning format, and therefore, it was removed from that specification.

Note: even though the SRT in WebSRT stands for subtitle resource Tracks, the original acronym didn’t stand for anything and merely reflected
the file extension used. WebSRT is a “backronym"; subtitle resource Tracks was shoe-horned into the three letters to actually mean something.

The presence of WebVTT is currently one major difference between the WHATWG HTML5 specification and the W3C specification.

Although no browser currently supports WebVTT, major browser vendors have indicated that they will implement support for WebVTT in the future. This indication has led to the creation of a WebVTT Working Group Charter at the W3C, whose mission is to:

“create a W3C specification starting from the WHATWG WebVTT (Web Video Text Tracks) language and solidify it through the creation of a WebVTT test suite and through the creation of semantic mappings of other subtitle formats to or from WebVTT in order to facilitate browser implementation and market adoption."

This promise of vendor support will hopefully in turn eventually lead to a formal standardization of the WebVTT specification at the W3C. With browser support and that of the W3C, you can be sure that WebVTT is here to stay and is destined become the de facto method of marking up text tracks within audio and video content on the web.

So what is the WebVTT file format and how can it help you make your content accessible? Read on.

What Can WebVTT Do?

The WebVTT format also allows you to provide a textual description of the video content

You use the WebVTT file format to define WebVTT files. One of the main uses of these files is to provide subtitles to video content, although the format of the file doesn’t indicate what its contents are used for.

The WebVTT format also allows you to provide a textual description of the video content, which can then be used by various accessibility devices (which might read the descriptions out loud) to describe the content of the video to those who cannot see it. You inform the browser of the WebVTT file and of its purpose using HTML markup; you’ll find out how this is done later in this tutorial, when you read about the track element.

Let’s take a look at the WebVTT file format in more detail.

WEBVTT File Format

A WebVTT file is a simple text file with the .vtt extension that needs to follow a specified format, which you will look at shortly. The file must be encoded as UTF-8 and labeled with the MIME type text/tt. The line terminators within the file can only be \r (a carriage return), \n (a new line), or \r\n (a carriage return followed by a new line). It must also contain a WebVTT file body, which consists of the following:

The WEBVTT string at the top identifies the contents as a WebVTT file and must then be followed by at least one blank line, which is then followed by any number of cues, each of which is separated by a blank line.

A cue is defined as:

idstring is a unique identifier within the file that identifies the cue. It can consist of one or more characters that do not contain the substring “—>" or any of the line terminators mentioned earlier. [hh:]mm:ss.msmsms —> [hh:]mm:ss.msmsms indicates the timestamp range within the video file that the cue is specified for. [hh:]mm:ss.msmsms is a simple timestamp; the hour portion is optional (depending on the length of the video in question of course).

Note: The millisecond separators are full stops, not commas as in SRT.

cue settings allow you to specify the positioning of the text; you’ll read more about them in a moment.

TextLineN is the actual text in the video file that the timestamp range in the cue represents. The content can be all in one line or presented in any number of separate lines. Any lines will be contained within the cue until a blank line is encountered, which indicates the end of that particular cue.

Let’s take a quick look at a sample WebVTT file containing two timestamp ranges:

This example defines two cues: The first one starts 10 seconds and 500 milliseconds into the video and ends at 13 seconds in, and the second one starts 15 seconds into the video and ends 3 seconds later. The subtitle text for each cue is given below its timestamp.

How a subtitle cue might appear on a video with no cue settings specified.

Using cues is relatively straightforward, and you can see how the file can be built up with a number of cues to cover the length of an entire video. You can also specify some settings on a per-cue basis. These affect the positioning of the cue on the related video. You can have a number of setting values, and a cue setting can contain one or more values, each one separated by a space. The various settings are listed below:

If no cue settings are specified, the text will align to the middle, at the bottom of the video frame.

Let’s add some of these settings to the example used earlier:

The text in the first cue will be aligned to the left of the video (much the same way as the CSS rule text-align:left works).

How a cue subtitle might appear on a video with a cue setting of A:start.

The second cue has two settings applied to it: The text will be aligned to the end of the line (similar to text-align:right in CSS) and will be placed on the line 10 percent down from the top of the video.

How a cue subtitle might appear on a video with a cue setting of A:end L:10%.

In addition to specifying cue settings for controlling the positioning and alignment of cue text, there are also a number of inline styles that you can apply to the text. These look and act the same as HTML elements. They contain a start and an end tag, and the formatting is applied to the text in between.

Let’s extend the example further and use some of the text tags to format the cue text:

With the first cue, a class name of “dream" has been added, a style for which you can define within your HTML file in the same way as you’d create any CSS style rules.

Video-cue text with a style defined using CSS and the WebVTT c text tag.

Note: any CSS class names that you might use within your WebVTT subtitle definitions can be defined in the containing HTML file or an external css file in the same way as you’d specify any other css classes.

The second cue now has tags that will display the word “left" in italics and “see" in bold type.

Video-cue text that uses the i and b text tags.

An extra cue is added to this example to show how the timestamp is used to display the text “karaoke style." When the cue starts, the words “At the right" will appear first. Then the text “we can see…" will be displayed at the appropriate time- stamp (Figure 8.6).

Note: if you want the characters &, to appear in the text of a video cue, you need to escape them with & < and > respectively.

This video-cue text shows the text in stages.

WEBVTT Future Developments

It’s worth noting that because the WEBVTT file format is relatively new to the specification and with the recent creation of the WEBVTT Working Group Charter, additions to the specification are likely.

If you want to keep abreast of any changes to this specification, keep an eye on the Working Group Charter’s site and the blog of Silvia Pfeiffer who is currently editor of the Working Group Charter. Silvia also blogs regularly about HtML5-related accessibility topics.

You can see how the complete narrative in a video could be added to a WebVTT text file with formatting and styling.

But how do you connect a WebVTT file with a particular video? This is where the new HTML5 track element steps in.

The Track Element

The track element is one of the new HTML5 elements. Its purpose is to allow external text tracks to be specified for media elements, such as audio and video. The track element does not represent anything on its own and must be used in conjunction with, and as a child of, a media element.
The track element takes a number of attributes, which are listed below:

The following example shows how a track element might be used in connection with a video to provide subtitles:

The track element in the example specifies that the en.vtt file contains English subtitles (as the label says) in the English language (srclang is set to en) of kind: subtitles for the surrounding video element. From this example, you can see just how easy it would be to add a second subtitles file that might be in a different language:

Here, another track definition has been added, pointing to a de.vtt file that contains German subtitles; srclang is set to de.

Notice that the default attribute has been added to the English subtitles definition, marking it as the default subtitle set to be used if the user doesn’t specifically select one.

If you wanted to extend the example further and add a chapter listing in each language (English and German), you would do the following:

Once the various WebVTT files have been created with the content you want, it’s a fairly simple process to add them to the appropriate video.

Everything you’ve just read about WebVTT all sounds quite promising; however, even though some browsers support the track element to some degree, currently no browser supports the WebVTT file format.

Note: at the time of this writing, the Webkit (which chrome and safari are based on) nightly build has some support for WebVTT.
All is not lost, though, because several JavaScript libraries are available that enable you to start using WebVTT today.

Using WEBVTT and the Track Element Now

A small number of browsers support the track element to some degree. The latest WebKit browsers (e.g., Chrome 12 and Safari 5.0.5) recognize the element but don’t do anything with it. The current version of Firefox (5) parses the element but also does nothing with it. Although these browsers are taking steps in the right direction, they don’t really help you implement WebVTT now.

Update (March 19th, 2012): Internet Explorer 10 Preview 4 and upwards parses the track element and renders WebVTT. It's also now available in Google Chrome (needs to be enabled via chrome:flags - "Enable <track> element").

Fortunately, four JavaScript libraries allow you to define the track element with WebVTT files in your web document that will deliver what you want:

  • J Playr – Supports: subtitles, chapters, some cue settings Browsers: Opera, Chrome, Safari, Firefox
  • LeanBack Player – Supports: subtitles
    Browsers: All major browsers with fallback to Flash if required
  • Captionator – Supports: subtitles, all cue settings Browsers: Opera, Chrome, Safari, Firefox, IE9
  • MediaElementJS – Supports: subtitles (timing format uses SRT format)
    Browsers: All major browsers with fallback to Flash if required

SRT Support

Although only a handful of JavaScript players support WEBVTT, a number of them support SRT subtitle files. those players that support WebVtt (Playr, LeanBack, Captionator, and MediaElementJS) also support SRT in addition to the following that provide support for SRT only:

None of these libraries offer support for all the different values for the kind attribute of the track element: They only support the subtitle value (Playr also supports the chapter value). Because subtitles are one of the most important values, it’s a good start. This support also allows you to begin adding subtitles to your videos now and seeing them in action.

Let’s look at how you might use the Playr JavaScript library to add subtitles and chapters to a video.

Playr Example

To use Playr, you must first download it from the Playr download website. Once downloaded, you need to include the Playr CSS file and JavaScript in your web document:

When defining your video, you simply add the CSS class “playr_video" to your video element, and Playr will automatically be used for that video.

A sample of Playr with a short animated film called Elephant’s Dream (© copyright 2006, Blender Foundation, Netherlands Media Art Institute, www.elephants is available here.

The code used for this video is as follows:

The Playr video player with Elephant’s Dream.

Also, three track elements are used to point to English and German subtitles, and English chapters.

Note: Playr currently doesn’t support multiple chapter files or the default attribute but will do so in a future release.

Playr’s menu allows viewers to choose English and German subtitles plus chapters.
Subtitles have been placed 6 percent from the top (using L:6%) and bolded with <b>.
The same video with German subtitles chosen.
A sample of how chapter selection looks in Playr.

Playr is a handy video player, and its ability to display subtitles and chapters is very useful. Support for the other kinds of track element contents are planned, so like other available video players, it will keep on improving.

Another important part of making media content accessible are the controls. Next, you’ll learn how accessible the default players are and what can you do to make your own custom controls more accessible.

Media Controls and Accessibility

It’s quite important for accessibility that the media controls can be accessed from the keyboard.

As mentioned earlier, it’s quite important for accessibility that the media controls can be accessed from the keyboard. Browsers have their own set of controls for media elements, but how accessible are they from the keyboard? Unfortunately, at the moment, the answer is not very. Opera seems to be the only browser whose default control set is immediately accessible from the keyboard. You can easily tab from one control to the other, use the Return key to toggle the Play/Pause button, and use the arrow keys to control the seek bar and volume control.

So, if you want to make your media content fully accessible across all modern browsers, you need to implement your own custom controls.

Improving the Accessibility of Custom Controls

You’ve already used the HTML button element to implement nearly all of the controls. Using button elements immediately increases the accessibility of the controls because the button element is automatically accessible from the key- board. That fact alone makes the custom controls keyboard accessible. Because the controls are also listed in the same order as they appear on the player, their tab order is also pretty much in the required logical order. However, you might want to change the tab order of the progress bar and the Play/Pause button. Most likely, users would want to play the video first, so that button should be the first control that they can access.

You can specify the tab order of HTML controls using the tabindex attribute. The order specified by this attribute is the one that the browser will tell the key- board to follow. So you apply a tabindex of 1 to the Play/Pause button and 2 to the progress bar, and then apply subsequent tabindexes in the order in which they appear in the source:

In this code listing, the onclick() events have been omitted for brevity.

The Range Element

The range element would be ideal for use as a progress bar if support was better because it too automatically provides keyboard accessibility, and the seek bar would work via the keyboard (the up and down keys would toggle the seek when the element has focus) without any further requirements.

Because the buttons provide keyboard accessibility automatically, all you need to tackle now is the progress bar, which uses a div and a span.
You need to add an event listener for the keypress event, which fires when a key is pressed, and then act on it. You’re interested in just a key press on the progress bar, so the event is added to the progress bar only:

The function that is called when a key press is detected is called checkKey() with a parameter indicating the numeric code of the key that was pressed:

The checkKey() function simply checks the key code to see if it’s the up arrow key (code 38) or the down arrow key (code 40). Depending on which key it is, the video’s currentTime attribute is increased or decreased by 0.05 (an arbitrary time value, but it seems a good step to move the video forward or backward by).

And that’s it. The progress bar’s seek capabilities can now be accessed via the keyboard with the up and down arrow keys when it’s in focus. The end result renders your media custom controls a lot more accessible than they would have been.

Wrapping Up

With regard to accessibility, HTML5 has advanced and expanded from its initial definition of the WebSRT file format to WebVTT. With browser vendors planning to support this format, a new W3C Working Group was formed with the intention of formalizing the WebVTT specification for browsers to start supporting. So hope- fully, browser support is only a matter of time.

Although native support is currently patchy, you can use existing JavaScript libraries to add subtitles to your videos now. These libraries will undoubtedly increase their functionality and capabilities in the future.

Overall, accessibility is a goal you should be thinking about when serving multimedia content to your users. The more users who can access your content the better, right?

Be sure to visit the HTML5 Multimedia website, or buy the book to learn more!



Related Articles