Contents

Sound sample description version 2 ('stsd')

An updated sound sample description that adds high resolution audio.

Mentioned in

Overview

QuickTime 7 introduced a new version of the sound sample description, version 2, which extends QuickTime capabilities to include high resolution audio with another expansion of the sound sample description structure. In QuickTime 7, the sound and audio facilities are based on the Core Audio framework facilities and the Sound Manager has been deprecated. In this version of the sound sample description, the format field is set to ‘lpcm’ for uncompressed data. For compressed data formats, the format field is set to the compression type code (normally ‘mp4a’) and the compression specifics and other features of QuickTime 7 are supplied by extensions.

The version field is set to 2 for this version of the sound sample description structure.

The sound sample description v2 structure adds the following new fields, appending to the v1 structure and renaming the four fields added in v1 to help ensure backwards compatibility with older applications.

Some definitions for sound sample description version 2:

  • LPCM Frame: one uncompressed sample in each of the channels (for instance, 44100Hz audio has 44100 LPCM frames per second, whether it is mono, stereo, 5.1, or other possible values). In other words, LPCM Frames divided by the audioSampleRate value is duration in seconds.

  • Audio Packet: For compressed audio, an audio packet is the natural compressed access unit of that format. For uncompressed audio, an audio packet is simply one LPCM frame.

  • Fields prefixed by const: Note the three sound sample description v2 fields whose names start with const. These fields are only nonzero if the value is a constant. A zero in each field implies that the value is variable. For example: AAC audio would have a zero in constBytesPerAudioPacket because AAC has variable sized audio packets. Codecs with variable duration audio packets set a zero in constLPCMFramesPerAudioPacket.

LPCM flag values

The formatSpecificFlags field carries flags significant to the layout and formatting of audio streams defined in the Core Audio underpinnings for sound sample description v2. These are enumerated in the Apple QuickTime/CoreAudioFormat.h interface file and are subject to a fuller interpretation in the context of the AudioStreamBasicDescription data type. See the CoreAudio, “Core Audio Framework Reference” in the OS X Developer Library.

enum
{
    kAudioFormatFlagIsFloat                  = (1 << 0),  // 0x1
    kAudioFormatFlagIsBigEndian              = (1 << 1),  // 0x2
    kAudioFormatFlagIsSignedInteger          = (1 << 2),  // 0x4
    kAudioFormatFlagIsPacked                 = (1 << 3),  // 0x8
    kAudioFormatFlagIsAlignedHigh            = (1 << 4),  // 0x10
    kAudioFormatFlagIsNonInterleaved         = (1 << 5),  // 0x20
    kAudioFormatFlagIsNonMixable             = (1 << 6),  // 0x40
    kAudioFormatFlagsAreAllClear             = (1 << 31),
    
    kLinearPCMFormatFlagIsFloat              = kAudioFormatFlagIsFloat,
    kLinearPCMFormatFlagIsBigEndian          = kAudioFormatFlagIsBigEndian,
    kLinearPCMFormatFlagIsSignedInteger      = kAudioFormatFlagIsSignedInteger,
    kLinearPCMFormatFlagIsPacked             = kAudioFormatFlagIsPacked,
    kLinearPCMFormatFlagIsAlignedHigh        = kAudioFormatFlagIsAlignedHigh,
    kLinearPCMFormatFlagIsNonInterleaved     = kAudioFormatFlagIsNonInterleaved,
    kLinearPCMFormatFlagIsNonMixable         = kAudioFormatFlagIsNonMixable,
    kLinearPCMFormatFlagsSampleFractionShift = 7,
    kLinearPCMFormatFlagsSampleFractionMask  = (0x3F << kLinearPCMFormatFlagsSampleFractionShift),
    kLinearPCMFormatFlagsAreAllClear         = kAudioFormatFlagsAreAllClear,
    
    kAppleLosslessFormatFlag_16BitSourceData = 1,
    kAppleLosslessFormatFlag_20BitSourceData = 2,
    kAppleLosslessFormatFlag_24BitSourceData = 3,
    kAppleLosslessFormatFlag_32BitSourceData = 4
};

Topics

Data fields

See Also

Using sample descriptions