Sound sample description version 1 ('stsd')

An extended sound sample description that supports compressed audio, and includes the ability to add atoms to the sound description.

Overview

The version field in the sample description is set to 1 for this version of the sound description structure. In version 1 of the sound description, introduced in QuickTime 3, the sound description record is extended by 4 fields, each 4 bytes long, and includes the ability to add atoms to the sound description.

These added fields are used to support out-of-band configuration settings for decompression and to allow some parsing of compressed QuickTime sound tracks without requiring the services of a decompressor.

These fields introduce the idea of a packet. For uncompressed audio, a packet is a sample from a single channel. For compressed audio, this field has no real meaning; by convention, it is treated as 1/number-of-channels.

These fields also introduce the idea of a frame. For uncompressed audio, a frame is one sample from each channel. For compressed audio, a frame is a compressed group of samples whose format is dependent on the compressor.

The four new fields are:

When capturing or compressing audio using the QuickTime API, the value of these fields can be obtained by calling the Apple Sound Manager’s GetCompression function. Historically, the value returned for the bytes per frame field was not always reliable, however, so this field was set by multiplying bytes per packet by the number of channels.

To facilitate playback on devices that support only one or two channels of audio in 'raw ' or 'twos' format (such as most early Macintosh and Windows computers), all other uncompressed audio formats are treated as compressed formats, allowing a simple “decompressor” component to perform the necessary format conversion during playback. The audio samples are treated as opaque compressed frames for these data types, and the fields for sample size and bytes per sample are not meaningful.

The new fields correspond to the CompressionInfo structure used by the Macintosh Sound Manager (which uses 16-bit values) to describe the compression ratio of fixed ratio audio compression algorithms. If these fields are not used, they are set to 0. File readers only need to check to see if samplesPerPacket is 0.

Redefined sample tables

If the compression ID in the sample description is set to –2, the sound track uses redefined sample tables optimized for compressed audio.

Unlike video media, the data structures for QuickTime sound media were originally designed for uncompressed samples. The extended version 1 sound description structure provides a great deal of support for compressed audio, but it does not deal directly with the sample table atoms that point to the media data.

The ordinary sample tables do not point to compressed frames, which are the fundamental units of compressed audio data. Instead, they appear to point to individual uncompressed audio samples, each one byte in size, within the compressed frames. When used with the QuickTime API, QuickTime compensates for this fiction in a largely transparent manner, but attempting to parse the sound samples using the original sample tables alone can be quite complicated.

With the introduction of support for the playback of variable bit-rate (VBR) audio in QuickTime 4.1, the contents of a number of these fields were redefined, so that a frame of compressed audio is treated as a single media sample. The sample-to-chunk and chunk offset atoms point to compressed frames, and the sample size table documents the size of the frames. The size is constant for CBR audio, but can vary for VBR.

The time-to-sample table documents the duration of the frames. If the time scale is set to the sampling rate, which is typical, the duration equals the number of uncompressed samples in each frame, which is usually constant even for VBR (it is common to use a fixed frame duration). If a different media timescale is used, it is necessary to convert from timescale units to sampling rate units to calculate the number of samples.

This change in the meaning of the sample tables allows you to use the tables accurately to find compressed frames.

To indicate that this new meaning is used, a version 1 sound description is used and the compression ID field is set to –2. The samplesPerPacket field and the bytesPerSample field are not necessarily meaningful for variable bit rate audio, but these fields should be set correctly in cases where the values are constant; the other two new fields (bytesPerPacket and bytesPerFrame) are reserved and should be set to 0.

If the compression ID field is set to zero, the sample tables describe uncompressed audio samples and cannot be used directly to find and manipulate compressed audio frames. QuickTime has built-in support that allows programmers to act as if these sample tables pointed to uncompressed 1-byte audio samples.

Sound sample description version 1 ('stsd')

Overview

Redefined sample tables

Topics

Data fields

See Also

Using sample descriptions