Sound sample description version 2 ('stsd')
An updated sound sample description that adds high resolution audio.
Mentioned in
Overview
QuickTime 7 introduced a new version of the sound sample description, version 2, which extends QuickTime capabilities to include high resolution audio with another expansion of the sound sample description structure. In QuickTime 7, the sound and audio facilities are based on the Core Audio framework facilities and the Sound Manager has been deprecated. In this version of the sound sample description, the format field is set to ‘lpcm’ for uncompressed data. For compressed data formats, the format field is set to the compression type code (normally ‘mp4a’) and the compression specifics and other features of QuickTime 7 are supplied by extensions.
The version field is set to 2 for this version of the sound sample description structure.
The sound sample description v2 structure adds the following new fields, appending to the v1 structure and renaming the four fields added in v1 to help ensure backwards compatibility with older applications.
Some definitions for sound sample description version 2:
LPCM Frame: one uncompressed sample in each of the channels (for instance, 44100Hz audio has 44100 LPCM frames per second, whether it is mono, stereo, 5.1, or other possible values). In other words, LPCM Frames divided by the
audioSampleRatevalue is duration in seconds.Audio Packet: For compressed audio, an audio packet is the natural compressed access unit of that format. For uncompressed audio, an audio packet is simply one LPCM frame.
Fields prefixed by
const: Note the three sound sample description v2 fields whose names start withconst. These fields are only nonzero if the value is a constant. A zero in each field implies that the value is variable. For example: AAC audio would have a zero inconstBytesPerAudioPacketbecause AAC has variable sized audio packets. Codecs with variable duration audio packets set a zero inconstLPCMFramesPerAudioPacket.
LPCM flag values
The formatSpecificFlags field carries flags significant to the layout and formatting of audio streams defined in the Core Audio underpinnings for sound sample description v2. These are enumerated in the Apple QuickTime/CoreAudioFormat.h interface file and are subject to a fuller interpretation in the context of the AudioStreamBasicDescription data type. See the CoreAudio, “Core Audio Framework Reference” in the OS X Developer Library.
enum
{
kAudioFormatFlagIsFloat = (1 << 0), // 0x1
kAudioFormatFlagIsBigEndian = (1 << 1), // 0x2
kAudioFormatFlagIsSignedInteger = (1 << 2), // 0x4
kAudioFormatFlagIsPacked = (1 << 3), // 0x8
kAudioFormatFlagIsAlignedHigh = (1 << 4), // 0x10
kAudioFormatFlagIsNonInterleaved = (1 << 5), // 0x20
kAudioFormatFlagIsNonMixable = (1 << 6), // 0x40
kAudioFormatFlagsAreAllClear = (1 << 31),
kLinearPCMFormatFlagIsFloat = kAudioFormatFlagIsFloat,
kLinearPCMFormatFlagIsBigEndian = kAudioFormatFlagIsBigEndian,
kLinearPCMFormatFlagIsSignedInteger = kAudioFormatFlagIsSignedInteger,
kLinearPCMFormatFlagIsPacked = kAudioFormatFlagIsPacked,
kLinearPCMFormatFlagIsAlignedHigh = kAudioFormatFlagIsAlignedHigh,
kLinearPCMFormatFlagIsNonInterleaved = kAudioFormatFlagIsNonInterleaved,
kLinearPCMFormatFlagIsNonMixable = kAudioFormatFlagIsNonMixable,
kLinearPCMFormatFlagsSampleFractionShift = 7,
kLinearPCMFormatFlagsSampleFractionMask = (0x3F << kLinearPCMFormatFlagsSampleFractionShift),
kLinearPCMFormatFlagsAreAllClear = kAudioFormatFlagsAreAllClear,
kAppleLosslessFormatFlag_16BitSourceData = 1,
kAppleLosslessFormatFlag_20BitSourceData = 2,
kAppleLosslessFormatFlag_24BitSourceData = 3,
kAppleLosslessFormatFlag_32BitSourceData = 4
};