---
title: WWDC2003 Session 405
framework: wwdc
role: article
path: wwdc/wwdc2003-405
---

# WWDC2003 Session 405

## Transcript

Kind: captions Language: en hello I so I thought as an introduction to James's session here he's going to talk about the details of several of our API including the audio file AP is the audio converter api's I just show a quick demo program I wrote which shows you how this looks from the outside this doesn't use QuickTime it's using all of our native Audio API to read and convert audio files so here I've got a few files I brought with me this is a simple aiff stereo 16-bit file and I can listen to it and so that's just using the audio file API to recognize the file we the samples out of it and pump it through an audio output unit there's very little actual manipulating of audio data in this program it's all using our API so another thing I can do here I can do simple sample rate conversion these are the same API that quick time is based on top of in their upcoming version this program also by the way is going to be available sample code and we can see some of the things the audio file API provides some sample here are in cancer here we can see all the file formats are supported and so I'm just going to keep this in a is f2 could say same the source actually if I were to change if I were to choose aiff here I could see well I can make it a 24-bit AFS which doesn't make sense because 16 bits I'll just say saying the source and I'll downsample it to 22 k hit convert and this is going through our sample rate converter which is built into our audio converter API this is very heavily altivec optimized code as are all of our integer to float conversions now in cancer these are good reasons for using our api's for your into Floyd conversions they're heavily optimized for both g4 and g3 and soon on g5 as well and here's my converted file to ascend above the claim okay so another thing you can do with the audio converter is to use the to encode and decode formats such as AAC sure I've got an example of a six channel aiff file which one of my co-workers authored in logic a year ago here it is in a cifs format and i'll leave it playing while i convert it to an AC I want to turn the volume down to the bullfighting keep talking please okay so i can say i want to convert it to AC leave it to the same sample right there's some more parameters you can put on the converting here such as the bit rate the quality you can do channel remapping and multi-channel AAC for example you know depending on the channel layout it can save bit by putting a filter on the subs or LSD channel so this example program doesn't show that right now but that is an option that there is on the converter so while it keeps flying in turn the volume back up down this is a pretty substantial file its forty megabytes as a nice houses and so we're encoding that a six channel AC file and a little less than real time here it's almost time and so here we have to go to the finder will see now to the three point one megabyte file and it's also in surround this is kind of cool and scrubbing around inside the AAC file it's decoding multiple packets to be really nice and efficient by the way our AAC decoder in our tasks is about three times as efficient as mp3 decoding so if you're considering putting sounds and games you might take a closer look at using AAC encoding instead of mp3 okay so that's the simple program and I'll turn it back over to James to explain the AP is that are underneath this program thank you ok back to slides please slides and there we go okay so what I'm going to talk about is a how to handle audio formats with core audio first I'll sort of review the basics of how formats are represented in karate oh then I'll talk about the audio converter API and the audio format API which is new and Panther I'll talk about some new features for audio units for supporting multi-channel and surround and I'll talk about the new audio file get global info API and the new matrix mixer audio unit okay so one thing that's been a source of some confusion in karate is what is the definition of frames and packets and bites examples and when you're dealing with compressed formats it gets pretty important to take them all into account so this graphic shows a what a 5-channel interleave 24 bit stream looks like you can see that one sample is three bytes and five channels are interleaved into one frame and for linear PCM one frame equals one packet in the way we count thanks in the car audio ap is and this information is is the way you describe a format is is by specifying an audio stream basic description which is a structure that's used throughout the core audio ap is it it has the sample rate for the stream and the format ID which tells you whether it's a PCM screen or it's some kind of compressed format that's the format ideas of for character code for the format then there's flags that are specific to that format and some fields that tell you the relationships between the bites the packets and the frames in that format and a number of channels in that format and how many bits per channel each sample is in that format okay so here's an example of how to fill one of these out for the five channel 24 bit interleaved stream that was in the first graphic there the format ID is linear PCM the flags are sets a big Indian signed integer packed it's one frame for packets all linear PCM is one frame per packet and then for this interleaved rain there's 15 bytes per packet and 15 bytes for frame 5 channels for frame 24 bits per channel ok so you know this shows an example of an on interleaves stream there's a two buffers each of them holds floating point samples therefore by speech you see on the Left there's a these names and buffers sub 0 M data that's that's a fields in the audio buffer list structure which I'll show in a minute so in each each buffer there's samples from one channel of the data so this non-arabic stream has two channels and so there's two buffers ok and then here's the audio stream basic description for that the one difference you'll see here well for one thing there's the flag for non interleaved is set the format flag and another difference you'll see here is that vice for packet invites for a frame is for describes one of those buffers so even though there's two channels it's not two times four bytes it's just four bytes per package because they're split into two buffers ok so now when you get into compressed formats the simplest kind of compressed format it's a constant bit rate format this is there's a constant number of bytes per packet a number of frames for packet depends on the compressed format that you're dealing with so the other kind will be a variable bit rate data and that's the number of bytes per packet can vary and in the audio stream basic description the n bytes per packet it's just such a zero because it's not a constant value ok now when you're dealing with some formats like AAC that are don't have information in the bitstream about where the package boundaries are you need something an external piece of data to tell you where those package boundaries are and we use the audio stream package description in RIT is to tell you what the starting byte offset of a packet is and the length in bytes of that packet and when you're passing around a fee data you have to pass around a ray of these packet descriptions to tell you where the packet boundaries are ok now audio data is stored in audio buffer list in core audio throughout our api's this is an audio unit and the audio converter and the how the audio buffer list is a so it tells you the number of buffers and then there's a buffer structure an array of buffer structures in each buffer tells you how many interleave channels there are in that buffer and the size of the buffer and then there's a pointer to the buffer ok all these structures on talking about and some I'll talk about later are defining Corrado types eh they're used everywhere in car audio so they're to maintain consistency throughout Claudio and there's public utility classes that are shipped with the SDK that provide common operations on these structures and make it easier to fill them out and do the things that you commonly need to do with them ok so in audio units I'm going to talk about how frames and packets are used by the various core audio api's Audio Units express their buffer sizes and frames by default the format is 32-bit float non interleave people have sort of gotten the impression that that's the only format you could do with audio units but since there's a stream description that you can set on the inputs and outputs of the audio units you can actually put any kind of audio data into those audio units ok the how counts time in frames so when you get the e rio proc call back the post time I mean the sample time is provided there in frames in PCM mode the how tells us buffer size in frame the buffer size are said it Bolin frames and and unless you tell it otherwise the format will be 32-bit float in non linear PCM mode you're restricted to dealing one packet at a time with the date with the data and the buffer frame size ranges restricted to the number of frames for packet ok so for the audio file API there's two calls audio file read bytes and write bites deals and bites and then audio file read packets and right package deals and packets and if you're since some frames equals packets for PCM then you can use audio file read packets for PCM and that will return you PCM frames if you're dealing with compressed data and read packets will return you some number of packets of that compressed data ok so now you can describe different formats how do you convert from one format to another and for that we have the audio converter it can do a floating-point integer various death sample rate conversion interleaving DNR leaving channel reordering and new and panthers that convert convert between PCM and compressed formats for codecs that are installed in the system in order to create an audio converter use audio converter new you give it an input and output format and it returns you an audio converter rest which is your audience inverter object so in order to call it here you would fill out to audio stream descriptions you can use see a stream basic description to help you do that the SDK clow and then you have a decoder well in this example I'm just showing a decoder creating a decoder I do a lot of converter new and I'll get my decoder instance out from that in order to will say after you've created your audio converter you want to convert audio with it so you call audio converter fill complex buffer that is a call that takes an audio converter instance in it takes a pointer to an input procedure which is the data source for getting into it into the audio converter a user data field for storing your instance data for the audio converter and then there's a I'll output data packet size is the number of packets you want to get out of the audio converter and on return it will be the number of packets you actually got out that number could be less than what you requested if you're at the end of the stream or there was an error then you pass in the audio buffer list which you want to be the audio converter to write your data converted data into and and then there's a a pointer to an array of packet descriptions if you're converting a AC and you're asking for some number of packets of AC you need to pass in an array a packet description so you'll know where the package boundaries of the data you get back is ok so in order to call audio converter field complex buffer you need to prepare an output buffer list if it's interleague data and there'll be one buffer in the buffer Lister array and if it's non interleave you'll need multiple mono buffers the M data pointer contains the pointer to the buffers that will have the audio data written into it and the data byte size is the size tells the size of the buffer and if you asked for more Pakistan will fit into that size it will get truncated down to whatever you the space that you provided to it so in order to call audio converter fill complex buffer basically this shows just passing the arguments this is the decoder the input procedure pointer the user data the packet list I'm requesting which is 8192 in this case and a buffer list which I've filled out and then on passing here null for the pack of descriptions because I'm probably dealing with well I'm dealing on decoding so on I'm getting PCM out so I don't need back in description in this case so now when you use an audio converter you need to write an input procedure and the input procedure implements the source of the demand-driven model so the audio converter when you ask it for for converted data it will call your input proc to get data the input side data and it's it's demand-driven so that if you're doing a sample rate conversion then it may be polling data you know at a different race and then you're then you're getting it out so and also there's a some internal buffering that can happen for doing compression or sample rate conversion so that the pulling needs to be decoupled from the pulling of info it needs to be decoupled from the pulling of output and that's what the input prop implements on so you need to set up your buffer list okay inside your input proc the job your job is to provide data to the audio converter you don't have to copy data into audio converters buffers you just give the audio converter pointers to your data and so it passes you a buffer list and you fill out that buffer list with the pointers to the data that it wants to convert the audio converter in general goes out of its way to eliminate copying and so it tries to just convert it mean if it can it will just convert from your input buffer and into your output buffer without buffering internally in certain cases that cannot do that but so in the input brockie you provide pointers to your your data not not copy it and you need in your input proc when you pass data to the audio converter you have to keep that data valid until the next time your input proc is called and that might be across calls the two audio converter field complex buffer so you may call audio converter field complex buffer it calls your input proc which returns data to the audio converter and then you exit audios audio converter field complex but for exit and returns you data you still have to keep that that input data live until the next time you call audio converter fill complex buffer and it's called your input proc because it's still looking at that data now your input Brock gets past the number of packets that the auto converter once from you but you're you're allowed to return either more or less data than its asked before if you return less then it will be called again the word again has been removed here and if you return more than it will just ask you less frequently I'll just ask you the next time it needs data okay so the input proc you have an audio converter instance and there's the number of data packets on input is the number of data packets that you could have been requested and on output you return the number of data packets that you're actually returning and it passes you a buffer list which you need to fill out for the data and then there's a the audio stream packet description if you're returning AAC data and you need to set this pointer to the array of package description to describe the packet boundaries of the data and then there's a the user data that's passed in which is your own instance data for use however you wanted to use it in your head put prod there's a couple special conditions for feeling that you have to deal with in your input proc one is when you reach the end of the stream and you're out of data what you need to do is set the number of data packets you're returning to zero and return no error your input proc may be called several more times and you just keep returning zero and that will signal to the audio converter that you're indeed out of data and it will flush its buffers another situation you could be in is that if you're doing real-time streaming you may be in a situation where you're not at the end of the screen but you don't have any data available right now so the audio converter needs to just return whatever it's got converted so what you do in that case is you return no packets available and you return an error and this return error gets propagated back to the caller and any data that had been converted up to that point will be returned to the caller but the audio converter will keep any unconverted data that it has internally until the next time audio converter field complex buffers called okay so here's an example of what an input proc might look like in the first line of code here I'm just getting a pointer system of my user data stuff which is just my own data which I have a buffer list stored in that points to my data and then I have a loop here which just copies the pointers from my in my input data buffer list into the audio converters a buffer list so I'm just copying pointers not data here and then I'm returning the number of packets that were in my that were in my buffer so in this in this example here I'm just completely ignoring what the audio converter actually asked me for but that's generally not the best thing to do but it will work you just would get extra copying in that situation okay so the next thing I'm going to talk about is the audio format API it's a new API in pancer for asking questions about about audio formats what things you can do it provides operations for handling audio stream basic descriptions and audio channel layouts which I'll talk about audio channel layouts are a new structure in a panther which describes the locations or describe the channels present in an audio stream and what order they're in and then there's a you can also ask it about what compressed formats are installed on the system so for audio stream basic description operations with audio format 80 you can get a formats name you pass an audio stream basic description to the audio format API and it will generate a name either name for a compressed format or if you pass it a certain linear PCM format it will tell you generate a CF string and tell you what that is so you can also pass it up partially filled out audio stream basic description that's mostly useful for constant bit rate data formats to find out what the bytes per packet is for IMA for for example so you can ask is a format variable bitrate is it externally framed you can ask what encoders i have installed on the system and what decoders i haven't installed on the system okay so this is what the audio format ati looks like there's a two calls one is the audio format get property info you use that to find out the size of a property you pass in a specifier which is some argument to the property this is well there's a property ID which tells the audio format a key what what API what property you're asking about and they're so the specifier which gives an argument and then there's a it returns you the size of the property you're asking for and then there's the get property call which you use to actually get the value of the property alright so here's an example of using it you hear i'm finding out what encoders are installed in the system i make a get property info call to find out the size of the array of encoders it's going to return me a list of OS types for these encode formats and then i call audio format get property to get the list do you actually get the array of format ids and then i enter in the loop here where I call the format API I to to give me a name for that format I created a small audio stream basic description and get a name for the format and print them out and then on my system this is why I got turned it out so okay so a new structure and Panther is the audio channel layout this uh describes the channel ordering of a stream now audio stream basic description tells you the number of channels in the stream but it doesn't really tell you what they are so if you have a one channel or two channel you can pretty much guess that's mono or stereo but if you have five channels it could be one of several orderings of 50 if you have six channels it could be 60 or 50 five dot one in several different orderings so you need a way to find out what that is the audio channel layout has several ways of specifying ordering there's an integer tag for a bunch of predefined layouts in a lot of cases you can just pass around these integer tags to tell you what channel ordering you have there's a bitmap for USB wave layout style layouts on wave files or USB there's a bitmap to tell you which channels are present in this stream and then they have to be present in a certain order and then there's a an array of channel descriptions which you can use to describe arbitrary layouts so the structure looks like this there's a channel layout tag which is one of these predefined layouts there's a bitmap then there's a ray of channel descriptions so lots of formats are predefined we define just about everything we could think of or find references to so and then there's a one thing you can do with these integer tags is mask off the low 16 bits and that will tell you the number of channels in the format and then the the the whole tag will tell it will be more specific and tell you what kind of ordering there is there's two special tags one is used channel descriptions which means that you can't know anything from looking at the tag you have to look at the array of channel descriptions to find out what channels are present and then there's used channel bitmaps or if you're dealing with a USB wave style channel layout so you can see here that there's a four different kinds of layouts for it for five dot one this sort of illustrates the kind of problem that this is trying to solve various strains will be in different ones of these formats you have to be able to differentiate them okay the channel description struck for the array of channel description to the channel label that tells you whether it's left right let's round and then there's some optional coordinate so if you wanted to specify speaker positions using floating point coordinates you can do that in rectangular or spherical coordinates okay the channel label is an integer it tells you just basically which channel you're dealing with there's basic ones and then there's lots of more esoteric ones this is sort of a channel some of the channels defined by the theater industry along with my favorite channel the left surround direct channel LSD channel the audio channel layout operations you can get the number of channels and a layout you can get a full description from a layout tag or a bitmap so if you have a one of these integer tags like five dot 1a you can have it give you an array of the channel descriptions telling you what with the channel labels all filled out so you know which channel is where in certain layout you can get a matrix of coefficients for using with the matrix mixer which I'll go over in a bit for doing down mixing from like five got 12 stereo you can get a name for a layout so if you have some layout you can pass it here to the audio format API and it'll give you a CF string which you can print out you can get a name for a channel so if you want to you know certain channel you want to find out a name you can print for the user these are these are localized strings so if you want to see what the left channel looks like you can get that and that will be localized so and then there's a you can also put these audio channel layouts into files so alright so with audio units you can get the channel layouts and audio unit supports this is a new feature for Panther all units can support channel layouts and you can get or set channel layout for an audio units dream for example the matrix reverb can have stereo quad or 50 channel layout as output ok you don't have to support channel layouts if you're doing an audio unit that's it's only doing mono or stereo or it doesn't really care about spatial location like a filter then you can go have to support audio channel layouts but if you're doing a reverb or a Panar or something that can deal with certain numbers of channels like if you're dealing with five channels and you want to be able to support multiple channel or drinks and you want to do that use audio channel layout so the audio converter also supports channel layout you can use the AAC codec to support various channel layout when encoding to AAC there's a property for getting the available in code channel outs and for are studying the channel layout when you're in coding so this also the audio files global info API which is related to audio audio format API but it gets some audio it gets information about audio files in a skia what file types can be read what can be written you can get names for the file types and you can find out what stream formats a certain file type can have put into it you can find out what file extensions apply for a certain file type or for all the pile tides so it's it's basically symmetrical to the audio format a TI there's a property ID a specifier and and then for the info you get the size of the property and for the for the get property call you get the you get the property data so so here's an example of finding out what Y double file types are on the system it's almost identical to the finding out what encoders are on the system I find out what the the size of the array I'm going to get back is using the info size call and then I get the array of file types that can be read and I go into a loop and I use a the file type name property to get a CF string that I can print out and so this is what I get so in Panther audio file now supports MPEG layer 3 files a ac3 and AAC a DTS files ok one new audio unit in in Panther is the matrix mixer audio unit it's a naughty guy that can take n inputs to or any number of inputs to any number of outputs and they can be bundled and strange of any size so and the CPU usage depends only on the number of nonzero cross points in the matrix not the size of the matrix so and you can get metering on inputs cross points and outputs so matrix matrix mixer is useful for signal routing channel reordering surround down mixing generalized panning and generalize mixing all the input buses are flattened to an array of mono channel so for all the input and output buses so it's a big matrix of mono channel and for gains control each cross point there's a gain on input a cross point gain an output gain and a master gain for the entire matrix so this is a this is a game of go where black is losing so this shows this shows the input buses coming in they're flattened to a set of four channels I have there's a stereo buses that are flattened 24 channels there's a gain on each input and the channels get numbered across all the buses in just a linear fashion so they go 0 1 2 3 4 addressing each channel and then a I'm using a black circle to here to represent a cross point that has a nonzero gain and then the open circles are show 0 game so here I've got an input x 0 is being mixed to output buses 0 and 1 and input bus one is being mixed just two output bus 0 so that's that's how its laid out and you're only paying CPU cost for the for the black circle so as you turn out more gains fuel you'll have more cpu load okay so i'm going to demonstrate that now okay so here's the matrix mixer or just my UI on top of the matrix mixer I've got two stereo input buses coming in and I've got five channels going out in one bus so if I hit play here you can see on the left these are the pre fader input meters and turn out my master gain here and then I'm so Isaiah so I turn these input faders on you can see this is the postpaid input meter and I'll turn on these cross point gain so I want to map this challenge to channel 0 of the output no map this channel 1 into output 1 I channel one of the output and then I can sort of put those backwards into the surround channels okay so you can see the metering on the cross points here in the metering on the output on mixing another found in here and I can mix it also into the surrounds here [Applause] this is this is center here okay so let's shows the basically the studying the levels and the metering in addition you can enable and disable buses and that basically turns off pulling for that branch of the of the on that bus of input so basically it's like pausing a section of your your input oh yeah so all right now just I implemented this so i can just demonstrate a that you know you can automate these programmatically to do in kind of panning algorithm you wanted to do one of the kind of interesting things you can do with a matrix mixer is multi-channel panning and various manners so that's just the rear ok I can also disable the output not just needs the basement okay so that's that's the matrix mixer and get back to slides now alright so in order to set the gains on the matrix mixer you can set all gains from global scope Audio Units when you set parameters you set parameters using either input scope output scope for global scope and with the matrix mixer you can set everything from global scope although you can set input gains and output gains from from input or output scope but this shows how you would specify a certain gain in the crosspoint cross points of the matrix you would you need to specify these using the element argument to a unit set parameter and you do that by shifting the input channel left by 16 bits and then pouring that with the output channel and then for studying an input gain you said output channel 22 hex all s or 4s and then for that with the input channel strips less Sixteen and for the output gain you do the converse operation and then master gayness all s okay so you can as you saw you can get metering on prepaid and postpaid metering on inputs and postpaid metering on outputs and cross points in order to get metering you need to set audio unit property meeting metering mode 21 metering does take some cpu so you want to have the option to turn it off if you don't need it and so and the parameters for metering are accessing the same method as the gains by shifting the input last-16 in the element i'm wearing with the output okay so the bus enable that I showed you major mr. parameter bus unable or enable if a input buses disabled then it won't be pulled pulled so that's some that's one way you can use to manage cpu load when you're using a matrix mixer so if you disable input buses and basically turning that part of your input graph off so if you just set the input gain 20 that that input will still be playing you just not hearing it so it's just differently to do something like that okay so in order to set up the matrix mature before you can use it you need to set the number of input and output buses using audio unit property bus count and you need to set the number of channels in the stream formats of each bus this defines the size of the matrix so that it can allocate itself to the proper size okay also in the IMP answer there's a new panner unit as as bill was saying a panel unit class from fora for doing a mono stereo or end channel inputs to end channel outputs and possibly using audio channel layouts you can do you can use it to do panning or you can use it to do like in line faders for channel volumes alright so that's that's about it wrap up ok I think a couple these are over others audio and quicktime tomorrow morning and clearly documentation you