Language code values
Indicate the language associated with a particular object using language codes.
Overview
Some elements of a QuickTime file may be associated with a particular spoken language. To indicate the language associated with a particular object, the QuickTime file format uses either language codes from the Macintosh Script Manager or ISO language codes (as specified in ISO 639-2/T).
QuickTime stores language codes as unsigned 16-bit fields. All Macintosh language codes have a value that is less than 0x400 except for the single value 0x7FFF indicating an unspecified language. ISO language codes are three-character codes, and are stored inside the 16-bit language code field as packed arrays, as described in Language code values. If treated as an unsigned 16-bit integer, an ISO language code always has a value of 0x400 or greater unless the code is equal to the value 0x7FFF indicating an Unspecified Macintosh language code.
If the language is specified using a Macintosh language code, any associated text uses Macintosh text encoding.
If the language is specified using an ISO language code, any associated text uses Unicode text encoding. When Unicode is used, the text is in UTF-8 unless it starts with a byte-order-mark (BOM, 0xFEFF), whereupon the text is in UTF-16. Both the BOM and the UTF-16 text should be big-endian.
Macintosh language codes
The following table lists QuickTime language code values.
Language | Value |
|---|---|
English |
|
French |
|
German |
|
Italian |
|
Dutch |
|
Swedish |
|
Spanish |
|
Danish |
|
Portuguese |
|
Norwegian |
|
Hebrew |
|
Japanese |
|
Arabic |
|
Finnish |
|
Greek |
|
Icelandic |
|
Maltese |
|
Turkish |
|
Croatian |
|
Traditional Chinese |
|
Urdu |
|
Hindi |
|
Thai |
|
Korean |
|
Lithuanian |
|
Polish |
|
Hungarian |
|
Estonian |
|
Lettish |
|
Latvian |
|
Saami |
|
Sami |
|
Faroese |
|
Farsi |
|
Russian |
|
Simplified Chinese |
|
Flemish |
|
Irish |
|
Albanian |
|
Romanian |
|
Czech |
|
Slovak |
|
Slovenian |
|
Yiddish |
|
Serbian |
|
Macedonian |
|
Bulgarian |
|
Ukrainian |
|
Belarusian |
|
Uzbek |
|
Kazakh |
|
Azerbaijani |
|
AzerbaijanAr |
|
Armenian |
|
Georgian |
|
Moldavian |
|
Kirghiz |
|
Tajiki |
|
Turkmen |
|
Mongolian |
|
MongolianCyr |
|
Pashto |
|
Kurdish |
|
Kashmiri |
|
Sindhi |
|
Tibetan |
|
Nepali |
|
Sanskrit |
|
Marathi |
|
Bengali |
|
Assamese |
|
Gujarati |
|
Punjabi |
|
Oriya |
|
Malayalam |
|
Kannada |
|
Tamil |
|
Telugu |
|
Sinhala |
|
Burmese |
|
Khmer |
|
Lao |
|
Vietnamese |
|
Indonesian |
|
Tagalog |
|
MalayRoman |
|
MalayArabic |
|
Amharic |
|
Galla |
|
Oromo |
|
Somali |
|
Swahili |
|
Kinyarwanda |
|
Rundi |
|
Nyanja |
|
Malagasy |
|
Esperanto |
|
Welsh |
|
Basque |
|
Catalan |
|
Latin |
|
Quechua |
|
Guarani |
|
Aymara |
|
Tatar |
|
Uighur |
|
Dzongkha |
|
JavaneseRom |
|
Unspecified |
|
ISO language codes
Because the language codes specified by ISO 639-2/T are three characters long, they must be packed to fit into a 16-bit field. The packing algorithm must map each of the three characters, which are always lowercase, into a 5-bit integer and then concatenate these integers into the least significant 15 bits of a 16-bit integer, leaving the 16-bit integer’s most significant bit set to zero.
One algorithm for performing this packing is to treat each ISO character as a 16-bit integer. Subtract 0x60 from the first character and multiply by 2^10 (0x400), subtract 0x60 from the second character and multiply by 2^5 (0x20), subtract 0x60 from the third character, and add the three 16-bit values. This will result in a single 16-bit value with the three codes correctly packed into the 15 least significant bits and the most significant bit set to zero.
Example: The ISO language code 'jpn' consists of the three hexadecimal values 0x6A, 0x70, 0x6E. Subtracting 0x60 from each value yields the values 0xA, 0x10, 0xE, as shown in the following table.
Character | UTF-8 code | 5-bit value | Shifted value |
|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
The first value is shifted 10 bits to the left (multiplied by 0x400) and the second value is shifted 5 bits to the left (multiplied by 0x20). This yields the values 0x2800, 0x200, 0xE. When added, this results in the 16-bit packed language code value of 0x2A0E.