---
title: WWDC2004 Session 200
framework: wwdc
role: article
path: wwdc/wwdc2004-200
---

# WWDC2004 Session 200

## Transcript

Kind: captions Language: en good morning welcome to session 200 core audio and depth please welcome your digital audio plumber James McCartney ok ok I'm going to talk about a new features for tiger in the audio toolbox first the audio converter first of all okay I guess I introduced what's gonna happen I'm gonna speak first and Doug Wyatt will speak about some other new features for tiger and Bill Stewart will speak so the so alright the audio converter is is a what is it it's it's a an API for a core audio that allows you to convert different audio formats from one to another it does floating point integer conversion interleaving and dieter and leaving channels sample rate conversion channel reordering and encoding and decoding compressed formats and any codec that's installed on system the system can be used by the audio converter to convert data the API looks like this these are the calls you create a new audio converter using audio converter new and you set that properties like setting the bitrate for encoding and then you use audio converter fill complex buffer to to fill your data with the converted fill your buffers with the converted data and you pass a in book data proc for for getting the input data to the converter it's a pull model and then if you want to stop a stream and restart the converter you'd use audio converter reset to flush out all the buffers and start over again and then audio converter dispose will dispose the converter so new in tiger is a new call called audio converter news specific and that allows you to choose a specific manufacturers codec by default when you create a new Auto murder you get the default codec for a certain format so this allows you to specifically choose a certain manufacturers codec you can see the difference between audio converter news Pacific and audio converter new is that there's a an array of audio class descriptions that are passed in and you pass in a list of audio class descriptions and it will choose the earliest matching converter in the list earliest matching codec in the list so you would put the codec manufacturers in the in the order that you preferred that they matched the audio class description looks like this there's three fields that are four character codes the first is the type which is either a deck or a ink for decoder and encoder and then the second one subtype should match the format ID code for the for the format so like AAC space for AC or dot mp3 for mp3 or whatever the format type is I am a four and then the last for char code is the manufacturer code so for Apple's AC decoder it would look as shown here the type is a deck for decoder subtype is a AC space for AC and then Appl for Apple manufacturer okay another new feature for the audio converter is the audio converter property settings this is a CF array that you get back from the audio converter that contains the parameter settings for the converter and you can use that for building a user interface for the audio converter or configuring another audio converter similarly to the current one without having to run through and do all the parameter settings again the property settings so so the the format of the CF CF object is it's a CF array of CF dictionaries for each converter that's in the audio converter chain an audio converter when it converts audio may have several components inside of it like a float to integer conversion or sample rate converters or inter leavers and DNR levers or encoders or decoders and so each each sub converter that has property settings in the chain will have a CF dictionary that contains all the property settings for that converter so each converter dictionary has is a CF dictionary and it has several keys the first as' converter which is a nonlocalized the first key is converter and then the value for that key is the nonlocalized name of the of the converter this is like a key that you would use for matching like if you want to find this sample rate converter in the audio converter chain you would go through and look for sample rate converter and that will always be the same and then name the key name the value for that is the localized name of the converter so it would that would change depending on the local language and then parameters is a key that is a dictionary it was an array of dictionaries each for each parameter so each parameter of the audio converter will have a dictionary that describes it and so the parameter dictionaries look like this there's a key the key key is the nonlocalized name of the parameter which he is for matching again and then name is the localized name of the parameter for displaying to the user and then available values is all all the possible values for that parameter so if it's sample rates it would be a list of the sample rates they're supported or if it's bit rates it would be the list of the bit rates that are supported there'll be all the bit rates supported for a codec and then limited values would be the ones that would be enabled at the current time for the current settings of the of the converter or codec so so if if you have the codecs set up for like in a C's case for a large number of channels then you can't choose lower bit rates so those values would not appear in the limited values list and then current value contains the current value it's be an index into the list of available values and then hint is a boolean which tells you if it's a normal or expert parameter and in summary is a little help string for the parameter so if someone moused over it on the user interface you could show that to the user to describe the parameter okay so this is what a dictionary looks like this would be for an audio converter that just has a sample rate converter in it so there's the type the top level it's a CF array there's one entry in it it's a CF dictionary that that contains the parameters for the sample rate converter or contains the the keys for the sample rate converter so the first key is converter which is sample rate converter that's what you could use for matching to find it then name is sample rate converter that will change depending on the local language and then you have a parameter array so that there's two parameters for the sample rate converter the first parameter is called quality and this next parameter is called priming method so then you have the localized name for those parameters and available values for quality would be minimum low medium high maximum and then for priming method would be pre normal or none and then you have limited values which is are is basically the same in this case that doesn't change and current value would be the index into the available values array and summary the help string you show to the user and then the the hint field tells you whether it's normal or expert so priming priming method is a is an expert parameter so okay that's the LD converter so that's just the new features basically in a short review of the what it is but so audio file is claudio's interface to audio files parsing and add this is the newest feature for Tiger is audio file component so if you have an audio file format and you say gee I wish car audio could read this file format well you can you can write your own format comparison component and install in the system and then everybody who uses the audio file API will be able to read the format that you wrote your component for so so that their component plug-ins for the audio file API and we're going to provide SDK code so that you can write your own component and there's this be a sample component implemented there and now if if your file container contains a compressed format then you'll want to write a it will be able to people will be able to decode in encode to that data format and then put it in your file format or or some other file format so this is the audio file component API it's basically a mirror of the audio file API the audio file API would just call down to these and the most important thing about the audio file component API is that you don't call this API it's called from the audio toolbox framework your component your audio file component implements the component selectors that this API calls in to so so this API is basically glue code you can access it through the audio toolbox if you wanted to test your component directly but for normal use it's it's it's meant to be called from the audio file API calling air component for the for the client so and the SDK has classes that that implement those calls so you would only have to subclass one of the classes we provide in the SDK to create your your audio file parsing component so as an aside the audio codec API is another similar it's a plug-in API for the for the audio converter and you don't call the other audio codec API either it's it's meant to be called from the toolbox by the audio converter so if you write an audio codec or if you want to convert data you shouldn't use the audio codec directly she use the audio converter so in the audio converter instantiates the codecs when someone requested to do a certain conversion so and there's a sample codec implementation in developer examples as well okay so that sort of brings us to the what's the sort of a philosophical split between a container for data and the and the data itself audiophile API is meant to deal with the container for the data it doesn't when when you open an audio file and you say I want PCM samples out of this audio file provides you access to the raw data as it is in the file so if you want to convert that from whatever format it's in to what you're interested in may be PCM then you would need to use the audio converter to do that so if for extending the audio file API you have audio file components to support new file types and then to extend what new data formats you can handle you have the audio codec API for writing components for for converting compressed formats and then to find out what capabilities are available in the system for audio file you use audio file get global info that allow you to find out what file types are supported on the system and then for audio converter you use the audio format API which will tell you what audio formats there that have codecs installed in the systems and you can get information about those formats so well I just blew through those okay so here's an example of using the audio format API to find out what encoded encoders are installed in the system basically you have a get property info to find out what's the size of the list you need to allocate to get the format ID back and then you call audio format get properties and that will give you a list of all the four character codes that would go in the subtype field of the audio class description or the format ID field of a audio screen basic description and so you have an array there of the format IDs and then I have a loop gear that goes through and ask the audio format API to to get the format name as a CF string and then I printed out there so that's that's how you could print out a list of all the encoder in vail formats for encoding on the system and similarly here's a code example for printing out all the file types that are writable on the system it's a same setup basically you call via file get global info to find out how big of a lists you need to allocate an audio file get global info to audio file get global info size for the size of the list and auto file and key global info for getting the list of file types those are also four character codes and then you can get the name CF string name for each file type that's printed out in the loop there as well so ok another new feature in tiger is to be able to get and set chunks and chunky file formats like AIFF and WAV files so and the data that you get back or set should be in the raw native format of the file there's no parsing done on what the contents of the chunk are so there's a audio file property chunk IDs which allow you to get the list of four character codes for the chunks that are in the file and then audio file count user data will be if you have a certain chunk ID that you're interested in and you want to find out how many of that chunk are in the file you can call this and get the number of that chunk a number of occurrences of that chunk in the file and then get user data size will tell you the size of a specific chunky you pass it the file and the chunk ID and the index of the chunk ID in the file so if there's like four and you want the second one they can pass one for the index at zero based so and then that will tell you the size of that chunk so then you could allocate a buffer and you use audio file get user day to to get the data for that chunk out of the file and it's so if you if you're writing an audiophile component to support your own file format and it was a chunky file format and there's these call down into a selectors in the audio file component API as well so you could pass back your own chunks from me or file format so alright so then user data actually gets the data out of the file there's a flags feel field for this for it's basically reserved for future use so then you pass it this size and the buffer for the user data and then audio file set user data is you pass it the data and that's the size and the data and it sets that on the file it puts that adds that chunk to the file ok another new feature in Tiger is the audio file markers this allows you to get or set audio file markers for file formats support them and so the file marker this is a structure and it contains the name position and other information about the marker ok and with that it's time for Doug Wyatt thanks James so I'd like to introduce a few more new features in the audio toolbox for tiger we have new features for high level access to audio files this builds on top of the audio file API we have some new audio units to support scheduled playback of audio files and scheduled playback of arbitrary our audio buffers and we have a new UI component which is in cocoa which you can use in your applications to present the user with lists of audio formats and file formats as James was just describing and we also have an audio unit that supports rendering on a separate thread so first at the extended audio file API it provides an even higher level of access to an audio file compared to the audio file API as James was mentioning that we have good reasons for separating the audio file and converter API at the lower levels but it becomes really useful at a higher level to just be able to sequentially access an audio file and an arbitrary PCM format those of you who have our SDK you've seen the CA audio files C++ class in there and this basically replaces it so diagrammatically how this fits into the picture extended audio file contains an audio converter and an audio file it uses the audio file API so you communicate with a file on disk in its native file format but through the embedded audio converter it then presents your application with the ability to just do i/o in the PCM format that you prefer this is a very simple API by comparison to the lower-level ones pretty much all you have to do is open or create a file configure the converter if you're doing encoding and then you can sequentially read from the file or write to it if you're encoding or or just writing PCM you can seek around within the file down to sample accuracy even within an encoded file and you can also tell which is the logical thing to have if you can seek and so that's this this API is described in this header file here so the second new Tiger feature I'd like to mention is the audio format selection view which is a KOCO UI that's a subclass of NS fuse so you can do things like just embedded in a safe panel as the accessory its subclass of all meaning that it's got hooks so you can decide things like well I know I always want to write this particular file format so don't show that menu and it will as it says there let you select an audio file type data format that's appropriate to that file type any encoding parameters sample rate and the channel layout if it's more than stereo I'll just give you a quick quick peek at that and the demo please and so here I've got it embedded as the safe panel on an application and right now I've got a iff select it's my available audio data formats are just integer formats but if I select a IFC and something I've got access to the encoded formats which can be embedded in an ace are a IFC file and if I were to select an encoded format I'd have a button here where I could configure the encoder that's not hooked up yet so this this isn't in your seed but it will be in future tiger seeds okay back to the slides please okay so the way it works you just give the view object the source data format that you're encoding from and it's channel layout if it's got more than two channels and what you get back or an audio file type ID like AIFF or wave you get back the output data format like float or 24-bit int or 16-bit int bigger little endian whatever your encoding to or just writing to and you get audio converter properties back again if your encoding okay I just did that okay so next we have several new audio units in tiger and there's a couple of them which are of a new class called generators meaning that they don't take audio or MIDI in you either interact with them programmatically by setting properties on them or by let bringing up their user interface and let letting these or do something that you know tells them to make noise of some sort for example a test tone generator this one the AU audio file player is falls into the programmatic category for the most part it's designed to be a nice building block for playing back little slices of audio files or large slices with sample accurate precision it's built on top of the extended audiophile API which gives it the ability to playback in coded formats so it's particularly appropriate for situations where you want to stream a long file off of disk its sample accurate so you could use it as a building block and even a digital audio workstation kind of application and it's also usable in simpler situations where you just want to play along file asynchronously the one thing it's not good for is playing short sounds if all your application is going to do is play a few short sounds here and there you're it's not worth the overhead because the Saudi again it will create a separate thread for reading the audio off of disk and you know if you've got a you know 200k audio file it's an alert sound or something you don't need that thread overhead you might as well just load the file into memory and play it you can use an s-sound to do that and you can also use the system sound API switch are really the best way to be playing alert sounds on the system now so conceptually the way that the audiophile player works there's this hardware timeline if you've ever looked at the timestamps going by on an audio unit you'll see that they're typically these giant numbers and it's the number of samples since the hardware was started so there's that timeline that's going along and then to play the audio files you in this example I've got two events that two chunks of audio files that I want to play at some point along this timeline and instead of having to know where on the timeline I am I just construct my sequence of events to be relative to time zero which is when I'm going to start playing with sequence of events so the one on the Left I'm playing at time zero this file a dot a dot a I have a Iife when I'm playing the first hundred thousand samples out of it and I've got another chunk out of audio file B that I want to play a little bit later so that's how the events look that I schedule when I want to have them play relative to time zero and after that I just need to set a start time and I can do this in one of two ways if I'm in a situation where I need to synchronize very accurately and I know exactly where on that audio devices timeline I want to start playing those events I can specify the start time in sample numbers the other alternative in a simpler situation is I can just say start now by passing a start time of minus one so just to give you a little more detail about how this API or the Saudi again it is used you open the audio files in your application externally to the audio unit your application owns those references of the audio file not the unit so your application keeps it open you pass an array of audio file IDs to the unit so you know when it comes time for it to read off disk it will have the files open you schedule the events that you initially want to have play then you prime the unit which is just a set property call on it and what that does is goes and fills up the disk buffers enough so that when you do say start it will have the data off the disk already then you set the start time and as the events play you get callback saying this one's been done this one's been done and in response to those callbacks or other events in your application you can always schedule additional events onto the file player unit and at any time you can just call audio unit reset and it will stop playing and there's about eight paragraphs of documentation on the tiger seed in audio unit properties dot H describing how to use this unit programmatically and that is in your seed just to zoom in one more level here's the structure that use for scheduling regions to the audiophile it's got an audio timestamp you can actually schedule using sample numbers or hosts times which can be useful sometimes and the audio timestamp structures as you may know contains both the sample number and a host time that you have an optional callback completion proc that gets called when in the the region has been either successfully loaded from disk well actually it's always called when it's successfully loaded from disk that's also used to tell you if it didn't get it off the disk and play it in time so the structure also contains a reference to an audio file object and you just say where in the file in terms of sample frames and number of frames that you want to apply and this can even be in an encoded format for example it's an mp3 file you know you say I wanted sample 512 well that's a bit of the ways into the first packet it will actually seek into the middle of the packet and begin playback there and I've got a little demo application which illustrates the new audio file so what this does is lets me generate a random playlist out of a bunch of audio files on my hard disk I can say how long I want the playlist to be and what the minimum slice size is the longest one is click that and this is a little slow cuz it's actually parsing through all of the frames of the mp3s and so it's generated a playlist your other all the same length that's no fun oh it's ok [Music] so it's my little music concrete player it's so that's dynamically reading and decoding the mp3s on the eyuth or rather on the disk read thread and preparing them to be played back so it wasn't cheating and generating the playlist ahead of time into Nadia buffer who's actually playing those back as we went okay so one related audio unit in fact this one is a subclass or actually the base class I'm sorry of Au audiophile player is the a you scheduled sound player and it shares a lot of the same semantics like how does that the start time on it in particular the difference is instead of scheduling chunks of audio files to it you give it audio buffers and memory supply and one internal client for this is the speech synthesizer and if your application dynamically generates audio into buffers you may find this useful way to to pump it out into a chain of audio units and this units also documented in audio unit properties that H and is in the seed the last of the three melodia units I'd like to talk about is the au deferred renderer so this one is a converter audio unit and the idea here is to have it pull its input on a separate thread from the one on which it's being pulled for output and this lets you put a processor intensive task on is on a separate thread and set up on the real-time audio thread so you can process your audio in larger chunks at a slightly lower priority there's opportunities for taking advantage of multi processing the the API is documented in detail and audio unit properties that H also and I'll just give you an illustration of when you might want to use that so here's an example it could be anything that I'm using as my source here but it's an au audiophile player and I've got a cpu-intensive audio unit going to an audio output unit and I'm running the hardwire at 128 frames which is around 3 milliseconds at 44.1 kilohertz and the fullness of those green IO cycle rectangles it is how much of each render cycles being occupied by this CPU intensive audio unit and so we're kind of running on the edge here we're probably consuming around 85% of the CPU just rendering that audio so what you can do if latency isn't supports so important and you want to actually get some more CPU power back is use the deferred renderer so now on the upper line here we have the source the CPU intensive audio unit those are running on a separate thread owned by the deferred renderer which is on the lower line here and it's still being driven by the audio output unit so what this does is it periodically just wakes up the lower priority thread to do work and pulls that at the larger grain which you can specify so in this example we still have our 128 frame i/o cycles but we're only doing our rendering in 512 frame cycles and depending on the algorithm of course you know your mileage may vary but in many algorithms you can get a good performance gain by processing more samples at a time so just to review I mentioned the extended audio file API which combines an audio file with audio converter and all of you all these are in the seed by the way of Panther there's the audio formats to accept the audio format selection view excuse me that will be in the new core audio kit framework we have three new programmable audio units the au audio file player the au scheduled sound player and the au deferred renderer and with that I'll bring up Bill Stewart who's going to talk about Mac os10 and OpenGL thank you okay so open il and Mac os10 we had some conversations about this last year in the games developer session and I wanted to give you an update of where we are so open al is is just to give you some background if people don't know what it is it's an engine that's designed for games it's designed to be a complement to OpenGL so to follow similar conventions coordinate systems and so forth it's being originally developed at creative labs is now a specific website devoted to it it supports multiple platforms and has for a number of years I think it's been available for about four years now so in the PCs that there's implementations for the original Mac OS Mac OS 10 Windows and Linux and you'll also see some support that was recently announced that games to alga conference for game consoles Playstations GameCube and Xbox so it's it's a pretty broad API and it was one of the things that attracted us to support this because of its broad reach to give you some example of the games that are supporting it in brackets there's a list of the platforms that are doing it so this is quite a good collection of games the open our website actually has a full list of all of the different games and and and companies that are supporting it so if you're interested to see the sort of coverage that's a good good website to look at and in the games developer conference of 2004 we announced that we were going to support this in the API in the in provided implementation for it the implementation that we did was based on preexisting core do services and I'll give you a bit of an overview of how we've done this we also made the source available for the implementation of the Cordia OPI for it on the website open al has both hardware and software implementations you've got implementations that are implemented in sound cards like creative soundblaster and so forth and there's also implementations that run on CPUs ours of course is running on the CPU and one of the things we're announcing in this conference is that we're going to be pre installing the framework in Tiger it's actually on your seed tiger seed disks at the moment open al framework that's essentially what's available on the open our website as we continue to develop and enhance the open al2 core do bridge we'll be putting the source back into the open air all organization so it's also a good reference if you want to see how to use choreo in different ways as well and we fully support the open source nature of open now so it's good news I think one of the the things that's really unique about open al on Mac OS 10 is that it uses system sound services and automatically will configure to the the system that the user has currently installed so if you have a surround system and the user has gone and set up the surround in audio MIDI setup then it will be automatically configured and we'll just play through the surround system if you just got stereo then the open al logic that sits in the implementation will find it's a stereo device and then the mixers that it will do will be for stereo and you can have a look at how to configure systems which you would need to do if you're doing any kind of multi-channel surround playback here which is in application utilities there's a some abilities there to go in say where you speakers are on my channels and so forth and having done that then you're just ready to go and that's system-wide service so other applications that are doing surround I will just work in the that this does excuse me this is a diagram of the implementation that we we do for it the it uses two audio units it uses the output unit Apple unit au has an order unit interfaces to the device that it's we recommend very strongly unless you doing very particular things with devices that you use the output unit rather than the audio device opioid it provides a lot of management of just device State that's very handy for you the output unit is just going to get data and output it from the device and it's getting data from the 3d mixer audio unit this audio unit can take any number of it puts it has different parameters that you can set on the inputs like a pitch parameter distant parameters that can be particular for each input you can have mono or stereo inputs and the 3d mixer will just take it we'll take our pan important coordinate for each input place them in a sound field you can describe with the 3d mix of different painting algorithms one is called vector panning vector panning we'll just look at the location of a sound let's say it's there and it will look to the closest two speakers and all just place the sound in those two speakers and the one we use in open area I think it's the more pleasing algorithm for this sort of usage is a sound field approach what that will do is that in each speaker it creates a representation of the sound field at that location so if you have a sound over there that sound will be in all of the speakers but it will be very quiet in this one and as you get closer to the location of the sound it's loud and so they're the kind of things of the 3d mixer does it was also the sort of things that they do on the creative cards and or another hardware implementations then the inputs to the 3d mixer are al sources and a source object is something in the open API that you and manipulate that you can get properties on and so forth one of the main things the IPL source has is a list of buffers that you can play and you can be sharing buffers between different sources the buffer in heparin format characteristics monitor different sample rates and the the game engine is practical cuter buffers up to be played now in in the line between the two buffers were using a converted either James went and that'll take the opposite convert there we've spent some time doing optimizations to the bleeders that most of that path and we call in the opener library that uses your API to wait and it's not actually part of the and that's the Apollo API supports in these buffers so what we wanted to do when that it was going to be efficient for you baby play audio and we you can be given Tokyo [Music] so we spent on it optimize just rendering stairs seeing with the problems sitting where we could optimize performance make sure they that we were looking at one of the things that distance will do is frequency so there away the you'll hear less high frequencies coming through and there's also a volume attenuation open al you can specify very nice volume curve and we to put a similar thing in the 3d mixer for quite some time where you can say well you know in this distance the sound do not get louder or choir so you could imagine like up to a bigger thing and you know bored of that be it moved further away the stamp should get quieter and then can't hear the sound at all and so you can specify those conditions when you're doing a guy so that your sound characteristics of how they're going to be tell again and you if you've got an engine sitting the corner over here then in this room never going to get far on this engine so you so so that at maximum distance the the attenuation maybe just like 40 DB down or something and or it would be you know in the case that the P that's over there as well and it could be like 20 DB and so those kinds the sorts of things we went through and we were looking at doing this because this enhances the game experience and as we went through all this optimization work we were managed to gain about a 50% improvement in the 3d mixer or a unit as a part of this work will I filed a couple of times lizard uses the this is directly so we had a look at some of their pieces of our system and we we overall I would say about 50% improvement in rendering from what we had in Panther and then give you some idea of the the difference between implementation we provide now on the implementation that we have previously for Mac OS 10 on a power book one codes before we talked about a 5% like to make 64 sports in stereo so it's 5% so that's that's not doing sort of some fancy stuff you you'll see a bit of more of life if you're doing other kinds of things but if you think of a one gig PowerBook g4 that's that's not a high-end gain in today's standards we're not 5% on a 2.5 Joule thief I've this is a you know fairly representative system and the kinds of improvements we saw the the next best implementation that was previously available was about 20% for that scientist so overall with the improvements to the 3d mixer plus more optimal bleeders in the audio converter and so forth we're down at about 25% what previously evolved so that should be really good new to you preview when you're looking at your game and I think you've heard enough of all of us talking so we're going to kill scallions and so Unreal Tournament and maybe box around and we've got incoming mobile speakers here so we got the front left are you guys hearing Sam all right it's hard to tell from up here as all sort of moving around you can say like probably the people can around here getting the best listening experience I'm not you can see how the sound is moving around [Music] you can hate you're supposed to have a pounds of hot piping in and out you can hear some of the distance or and that's the things which is it [Music] we have a battle on that list
