Providing metadata for xHE-AAC video soundtracks
Ensure volume normalization by including metadata for loudness and dynamic range control.
Overview
Soundtracks that use xHE-AAC (Extended High-Efficiency Advanced Audio Codec) encoding include MPEG-D DRC metadata for loudness and dynamic range control (DRC). When you create video soundtracks with xHE-AAC, provide at least the following metadata to ensure consistent results across different services. For playback, set up the MPEG-D DRC tool at the decoder by following the guidelines below.
Configure metadata for content generation
The loudness and DRC metadata that you include in video content needs to fulfill the MPEG-D DRC requirements for the Basic DRC Metadata Profile, and always include the following values:
Loudness Metadata
loudness info fields | value |
|---|---|
Include a | Measure anchor loudness using speech-gating or estimate when speech activity is low. |
| True peak according to ITU-R BS.1770 or sample peak level. |
Measure anchor loudness of the dialog stem using the ITU-R BS.1770 standard because methodValue must reflect the actual anchor loudness of the content. Apply speech-gating to the full mix to obtain the anchor loudness value when only the full mix is available for measurement.
Anchor loudness can be inaccurate when the speech detector can’t find much speech in the full mix. Monitor this situation by computing the speech activity, which is the duration of detected speech divided by the duration of the content. When speech activity is low, ignore this measurement because it can be inaccurate. Instead, derive the anchor loudness value from the program loudness value and other applicable measurements to model the value from statistics of a variety of content. See Adjusting anchor loudness for additional information.
DRC Metadata
| Required minimum level that supports playback with minimal peak limiter engagement (LKFS) | Position of bit in |
|---|---|---|
| -24 | 1 |
| -16 | 2 |
| -16 | 3 |
| -24 | 6 |
Match as close as possible the output anchor loudness of the DRC-processed versions with the anchor loudness of the unprocessed output.
The DRC for General Compression can have several instances to accommodate various target loudness values, which provides just enough compression to reach the target without engaging a limiter. If no compression is necessary or desired for specific loudness targets, include a corresponding DRC for General Compression that doesn’t have a compression effect.
Configure metadata for playback
Configure the MPEG-D DRC decoder for playback according to the specifications below. The configuration occurs completely or partially at the system level and those settings don’t appear at the API level.
Loudness Metadata
Set up the MPEG-D DRC decoder to assign the highest priority to the following loudness metadata:****
Metadata field | Value (highest priority) |
|---|---|
|
|
|
|
The methodDefinition field defaults to Program Loudness, if present, if you don’t specify Anchor Loudness. This configuration deviates from the default configuration specified in ISO/IEC 23003-4, which selects Program Loudness and ITU-R BS.1770 with highest priority. However, the standard specifies an interface to customize the configuration, including the loudness metadata priority.
Some previously deployed implementations may use the default ISO/IEC 23003-4 configuration and may not support the interface for customization. These systems may select loudness metadata with a methodDefinition value of Program Loudness, if present, in addition to other loudness metadata. This can result in a deviation of the output loudness of the same content from systems that select Anchor Loudness, if present, in addition to Program Loudness.
The following table (ANSI/CTA-2075) provides recommended target loudness value settings of the DRC tool to control the integrated loudness at the output:
Transducer SPL range | Maximum SPL (dBA) | Target loudness (LKFS) |
|---|---|---|
| below 75 | -16 |
| between 70 and 90 | -24 |
| above 85 | -31 |
| NA | -24 |
To achieve sufficient output SPL, ensure the target loudness value depends on the SPL range of the active transducer, which has three categories (small, medium, large). Choose the SPL range category by measuring the maximum SPL of the transducer at the anticipated listener location using pink noise at -24 LKFS. Assign the category according to the middle column as ANSI/CTA-2075 Annex G describes. For example, micro loudspeakers in portable devices typically fall into the small SPL range category.
DRC Metadata
The following table specifies the appropriate DRC requests for different listening environments and transducer SPL ranges (ANSI/CTA-2075):
Environment | Transducer SPL range | DRC request |
|---|---|---|
| small |
|
|
|
|
| all |
|
Request general for DRC when you want loudness normalization unless a different DRC request is applicable for the playback scenario. This applies appropriate compression to reach the target loudness, such as when applying gain during normalization.
User preferences can override DRC settings. The following table provides examples of two preferences and the conditions, transducer SPL range, and environment under which to apply these preferences:
User preference | Environment | Transducer SPL range | DRC request |
|---|---|---|---|
| all | all |
|
|
|
|
|
See Also
Specifications and other documents
HTTP Live Streaming (HLS) authoring specification for Apple devicesUsing content protection systems with HLSAbout the Common Media Application Format with HTTP Live Streaming (HLS)Enabling Low-Latency HTTP Live Streaming (HLS)Links to additional specifications and videosVideos about HLSAdjusting anchor loudnessProviding JavaScript Object Notation (JSON) chapters