WWDC2003 Session 405
Transcript
Kind: captions
Language: en
hello I
so I thought as an introduction to
James's session here he's going to talk
about the details of several of our API
including the audio file AP is the audio
converter api's I just show a quick demo
program I wrote which shows you how this
looks from the outside this doesn't use
QuickTime it's using all of our native
Audio API to read and convert audio
files so here I've got a few files I
brought with me this is a simple aiff
stereo 16-bit file and I can listen to
it and so that's just using the audio
file API to recognize the file we the
samples out of it and pump it through an
audio output unit there's very little
actual manipulating of audio data in
this program it's all using our API so
another thing I can do here I can do
simple sample rate conversion these are
the same API that quick time is based on
top of in their upcoming version this
program also by the way is going to be
available sample code and we can see
some of the things the audio file API
provides some sample here are in cancer
here we can see all the file formats are
supported and so I'm just going to keep
this in a is f2 could say same the
source actually if I were to change if I
were to choose aiff here I could see
well I can make it a 24-bit AFS which
doesn't make sense because 16 bits I'll
just say saying the source and I'll
downsample it to 22 k hit convert
and this is going through our sample
rate converter which is built into our
audio converter API this is very heavily
altivec optimized code as are all of our
integer to float conversions now in
cancer these are good reasons for using
our api's for your into Floyd
conversions they're heavily optimized
for both g4 and g3 and soon on g5 as
well and here's my converted file to
ascend above the claim okay so another
thing you can do with the audio
converter is to use the to encode and
decode formats such as AAC sure I've got
an example of a six channel aiff file
which one of my co-workers authored in
logic a year ago here it is in a cifs
format and i'll leave it playing while i
convert it to an AC
I want to turn the volume down to the
bullfighting keep talking please okay so
i can say i want to convert it to AC
leave it to the same sample right
there's some more parameters you can put
on the converting here such as the bit
rate the quality you can do channel
remapping and multi-channel AAC for
example you know depending on the
channel layout it can save bit by
putting a filter on the subs or LSD
channel so this example program doesn't
show that right now but that is an
option that there is on the converter so
while it keeps flying in turn the volume
back up down
this is a pretty substantial file its
forty megabytes as a nice houses and so
we're encoding that a six channel AC
file and a little less than real time
here it's almost time and so here we
have to go to the finder will see now to
the three point one megabyte file and
it's also in surround
this is kind of cool and scrubbing
around inside the AAC file it's decoding
multiple packets to be really nice and
efficient by the way our AAC decoder in
our tasks is about three times as
efficient as mp3 decoding so if you're
considering putting sounds and games you
might take a closer look at using AAC
encoding instead of mp3 okay so that's
the simple program and I'll turn it back
over to James to explain the AP is that
are underneath this program thank you ok
back to slides please slides
and there we go okay so what I'm going
to talk about is a how to handle audio
formats with core audio first I'll sort
of review the basics of how formats are
represented in karate oh then I'll talk
about the audio converter API and the
audio format API which is new and
Panther I'll talk about some new
features for audio units for supporting
multi-channel and surround and I'll talk
about the new audio file get global info
API and the new matrix mixer audio unit
okay so one thing that's been a source
of some confusion in karate is what is
the definition of frames and packets and
bites examples and when you're dealing
with compressed formats it gets pretty
important to take them all into account
so this graphic shows a what a 5-channel
interleave 24 bit stream looks like you
can see that one sample is three bytes
and five channels are interleaved into
one frame and for linear PCM one frame
equals one packet in the way we count
thanks in the car audio ap is and this
information is is the way you describe a
format is is by specifying an audio
stream basic description which is a
structure that's used throughout the
core audio ap is it it has the sample
rate for the stream and the format ID
which tells you whether it's a PCM
screen or it's some kind of compressed
format that's the format ideas of for
character code for the format then
there's flags that are specific to that
format and some fields that tell you the
relationships between the bites the
packets and the frames in that format
and a number of channels in that format
and how many bits per channel
each sample is in that format okay so
here's an example of how to fill one of
these out for the five channel 24 bit
interleaved stream that was in the first
graphic there the format ID is linear
PCM the flags are sets a big Indian
signed integer packed it's one frame for
packets all linear PCM is one frame per
packet and then for this interleaved
rain there's 15 bytes per packet and 15
bytes for frame 5 channels for frame 24
bits per channel ok so you know this
shows an example of an on interleaves
stream there's a two buffers each of
them holds floating point samples
therefore by speech you see on the Left
there's a these names and buffers sub 0
M data that's that's a fields in the
audio buffer list structure which I'll
show in a minute so in each each buffer
there's samples from one channel of the
data so this non-arabic stream has two
channels and so there's two buffers ok
and then here's the audio stream basic
description for that the one difference
you'll see here well for one thing
there's the flag for non interleaved is
set the format flag and another
difference you'll see here is that vice
for packet invites for a frame is for
describes one of those buffers so even
though there's two channels it's not two
times four bytes it's just four bytes
per package because they're split into
two buffers ok so now when you get into
compressed formats the simplest kind of
compressed format it's a constant bit
rate format this is there's a constant
number of bytes per packet a number of
frames for packet depends on the
compressed format that you're dealing
with
so the other kind will be a variable bit
rate data and that's the number of bytes
per packet can vary and in the audio
stream basic description the n bytes per
packet it's just such a zero because
it's not a constant value ok now when
you're dealing with some formats like
AAC that are don't have information in
the bitstream about where the package
boundaries are you need something an
external piece of data to tell you where
those package boundaries are and we use
the audio stream package description in
RIT is to tell you what the starting
byte offset of a packet is and the
length in bytes of that packet and when
you're passing around a fee data you
have to pass around a ray of these
packet descriptions to tell you where
the packet boundaries are ok now audio
data is stored in audio buffer list in
core audio throughout our api's this is
an audio unit and the audio converter
and the how the audio buffer list is a
so it tells you the number of buffers
and then there's a buffer structure an
array of buffer structures in each
buffer tells you how many interleave
channels there are in that buffer and
the size of the buffer and then there's
a pointer to the buffer ok all these
structures on talking about and some
I'll talk about later are defining
Corrado types eh they're used everywhere
in car audio so they're to maintain
consistency throughout Claudio and
there's public utility classes that are
shipped with the SDK that provide common
operations on these structures and make
it easier to fill them out and do the
things that you commonly need to do with
them ok so in audio units
I'm going to talk about how frames and
packets are used by the various core
audio api's Audio Units express their
buffer sizes and frames by default the
format is 32-bit float non interleave
people have sort of gotten the
impression that that's the only format
you could do with audio units but since
there's a stream description that you
can set on the inputs and outputs of the
audio units you can actually put any
kind of audio data into those audio
units ok the how counts time in frames
so when you get the e rio proc call back
the post time I mean the sample time is
provided there in frames in PCM mode the
how tells us buffer size in frame the
buffer size are said it Bolin frames and
and unless you tell it otherwise the
format will be 32-bit float in non
linear PCM mode you're restricted to
dealing one packet at a time with the
date with the data and the buffer frame
size ranges restricted to the number of
frames for packet ok so for the audio
file API there's two calls audio file
read bytes and write bites deals and
bites and then audio file read packets
and right package deals and packets and
if you're since some frames equals
packets for PCM then you can use audio
file read packets for PCM and that will
return you PCM frames if you're dealing
with compressed data and read packets
will return you some number of packets
of that compressed data ok so now you
can describe different formats how do
you convert from one format to another
and
for that we have the audio converter it
can do a floating-point integer various
death sample rate conversion
interleaving DNR leaving channel
reordering and new and panthers that
convert convert between PCM and
compressed formats for codecs that are
installed in the system in order to
create an audio converter use audio
converter new you give it an input and
output format and it returns you an
audio converter rest which is your
audience inverter object so in order to
call it here you would fill out to audio
stream descriptions you can use see a
stream basic description to help you do
that the SDK clow and then you have a
decoder well in this example I'm just
showing a decoder creating a decoder I
do a lot of converter new and I'll get
my decoder instance out from that in
order to will say after you've created
your audio converter you want to convert
audio with it so you call audio
converter fill complex buffer that is a
call that takes an audio converter
instance in it takes a pointer to an
input procedure which is the data source
for getting into it into the audio
converter a user data field for storing
your instance data for the audio
converter and then there's a I'll output
data packet size is the number of
packets you want to get out of the audio
converter and on return it will be the
number of packets you actually got out
that number could be less than what you
requested if you're at the end of the
stream or there was an error then you
pass in the audio buffer list which you
want to be the audio converter to write
your data converted data into and and
then there's a
a pointer to an array of packet
descriptions if you're converting a AC
and you're asking for some number of
packets of AC you need to pass in an
array a packet description so you'll
know where the package boundaries of the
data you get back is ok so in order to
call audio converter field complex
buffer you need to prepare an output
buffer list if it's interleague data and
there'll be one buffer in the buffer
Lister array and if it's non interleave
you'll need multiple mono buffers the M
data pointer contains the pointer to the
buffers that will have the audio data
written into it and the data byte size
is the size tells the size of the buffer
and if you asked for more Pakistan will
fit into that size it will get truncated
down to whatever you the space that you
provided to it so in order to call audio
converter fill complex buffer basically
this shows just passing the arguments
this is the decoder the input procedure
pointer the user data the packet list
I'm requesting which is 8192 in this
case and a buffer list which I've filled
out and then on passing here null for
the pack of descriptions because I'm
probably dealing with well I'm dealing
on decoding so on I'm getting PCM out so
I don't need back in description in this
case so now when you use an audio
converter you need to write an input
procedure and the input procedure
implements the source of the
demand-driven model so the audio
converter when you ask it for for
converted data it will call your input
proc to get data the input side data and
it's it's demand-driven so that if
you're doing a sample rate conversion
then it may be
polling data you know at a different
race and then you're then you're getting
it out so and also there's a some
internal buffering that can happen for
doing compression or sample rate
conversion so that the pulling needs to
be decoupled from the pulling of info it
needs to be decoupled from the pulling
of output and that's what the input prop
implements on so you need to set up your
buffer list okay inside your input proc
the job your job is to provide data to
the audio converter you don't have to
copy data into audio converters buffers
you just give the audio converter
pointers to your data and so it passes
you a buffer list and you fill out that
buffer list with the pointers to the
data that it wants to convert the audio
converter in general goes out of its way
to eliminate copying and so it tries to
just convert it mean if it can it will
just convert from your input buffer and
into your output buffer without
buffering internally in certain cases
that cannot do that but so in the input
brockie you provide pointers to your
your data not not copy it and you need
in your input proc when you pass data to
the audio converter you have to keep
that data valid until the next time your
input proc is called and that might be
across calls the two audio converter
field complex buffer so you may call
audio converter field complex buffer it
calls your input proc which returns data
to the audio converter and then you exit
audios audio converter field complex but
for exit and returns you data you still
have to keep that that input data live
until the next time you call audio
converter fill complex buffer and it's
called your input proc because it's
still looking at that data
now your input Brock gets past the
number of packets that the auto
converter once from you but you're
you're allowed to return either more or
less data than its asked before if you
return less then it will be called again
the word again has been removed here and
if you return more than it will just ask
you less frequently I'll just ask you
the next time it needs data okay so the
input proc you have an audio converter
instance and there's the number of data
packets on input is the number of data
packets that you could have been
requested and on output you return the
number of data packets that you're
actually returning and it passes you a
buffer list which you need to fill out
for the data and then there's a the
audio stream packet description if
you're returning AAC data and you need
to set this pointer to the array of
package description to describe the
packet boundaries of the data and then
there's a the user data that's passed in
which is your own instance data for use
however you wanted to use it in your
head put prod there's a couple special
conditions for feeling that you have to
deal with in your input proc one is when
you reach the end of the stream and
you're out of data what you need to do
is set the number of data packets you're
returning to zero and return no error
your input proc may be called several
more times and you just keep returning
zero and that will signal to the audio
converter that you're indeed out of data
and it will flush its buffers another
situation you could be in is that if
you're doing real-time streaming you may
be in a situation where you're not at
the end of the screen but you don't have
any data available right now so the
audio converter needs to just return
whatever it's got converted so what you
do in that case is you return no packets
available and you return an error and
this return error gets propagated back
to the caller and any data that had been
converted up to that point will be
returned to the caller but the audio
converter will keep any unconverted data
that it has internally until the next
time audio converter field complex
buffers called okay so here's an example
of what an input proc might look like in
the first line of code here I'm just
getting a pointer system of my user data
stuff which is just my own data which I
have a buffer list stored in that points
to my data and then I have a loop here
which just copies the pointers from my
in my input data buffer list into the
audio converters a buffer list so I'm
just copying pointers not data here and
then I'm returning the number of packets
that were in my that were in my buffer
so in this in this example here I'm just
completely ignoring what the audio
converter actually asked me for but
that's generally not the best thing to
do but it will work you just would get
extra copying in that situation okay so
the next thing I'm going to talk about
is the audio format API it's a new API
in pancer for asking questions about
about audio formats what things you can
do it provides operations for handling
audio stream basic descriptions and
audio channel layouts which I'll talk
about audio channel layouts are a new
structure in a panther which describes
the locations or describe the channels
present in an audio stream and what
order they're in and then there's a you
can also ask it about what compressed
formats are installed on the system so
for audio stream basic description
operations with audio format 80
you can get a formats name you pass an
audio stream basic description to the
audio format API and it will generate a
name either name for a compressed format
or if you pass it a certain linear PCM
format it will tell you generate a CF
string and tell you what that is so you
can also pass it up partially filled out
audio stream basic description that's
mostly useful for constant bit rate data
formats to find out what the bytes per
packet is for IMA for for example so you
can ask is a format variable bitrate is
it externally framed you can ask what
encoders i have installed on the system
and what decoders i haven't installed on
the system okay so this is what the
audio format ati looks like there's a
two calls one is the audio format get
property info you use that to find out
the size of a property you pass in a
specifier which is some argument to the
property this is well there's a property
ID which tells the audio format a key
what what API what property you're
asking about and they're so the
specifier which gives an argument and
then there's a it returns you the size
of the property you're asking for and
then there's the get property call which
you use to actually get the value of the
property alright so here's an example of
using it you hear i'm finding out what
encoders are installed in the system i
make a get property info call to find
out the size of the array of encoders
it's going to return me a list of OS
types for these encode formats and then
i call audio format get property to get
the list do you actually get the array
of format ids and then i enter in the
loop here where I call the format API
I to to give me a name for that format I
created a small audio stream basic
description and get a name for the
format and print them out and then on my
system this is why I got turned it out
so okay so a new structure and Panther
is the audio channel layout this uh
describes the channel ordering of a
stream now audio stream basic
description tells you the number of
channels in the stream but it doesn't
really tell you what they are so if you
have a one channel or two channel you
can pretty much guess that's mono or
stereo but if you have five channels it
could be one of several orderings of 50
if you have six channels it could be 60
or 50 five dot one in several different
orderings so you need a way to find out
what that is the audio channel layout
has several ways of specifying ordering
there's an integer tag for a bunch of
predefined layouts in a lot of cases you
can just pass around these integer tags
to tell you what channel ordering you
have there's a bitmap for USB wave
layout style layouts on wave files or
USB there's a bitmap to tell you which
channels are present in this stream and
then they have to be present in a
certain order and then there's a an
array of channel descriptions which you
can use to describe arbitrary layouts so
the structure looks like this there's a
channel layout tag which is one of these
predefined layouts there's a bitmap then
there's a ray of channel descriptions so
lots of formats are predefined we define
just about everything we could think of
or find references to so and then
there's a one thing you can do with
these integer tags is mask off the low
16 bits and that will tell you the
number of channels in the
format and then the the the whole tag
will tell it will be more specific and
tell you what kind of ordering there is
there's two special tags one is used
channel descriptions which means that
you can't know anything from looking at
the tag you have to look at the array of
channel descriptions to find out what
channels are present and then there's
used channel bitmaps or if you're
dealing with a USB wave style channel
layout so you can see here that there's
a four different kinds of layouts for it
for five dot one this sort of
illustrates the kind of problem that
this is trying to solve various strains
will be in different ones of these
formats you have to be able to
differentiate them okay the channel
description struck for the array of
channel description to the channel label
that tells you whether it's left right
let's round and then there's some
optional coordinate so if you wanted to
specify speaker positions using floating
point coordinates you can do that in
rectangular or spherical coordinates
okay the channel label is an integer it
tells you just basically which channel
you're dealing with there's basic ones
and then there's lots of more esoteric
ones this is sort of a channel some of
the channels defined by the theater
industry along with my favorite channel
the left surround direct channel LSD
channel the audio channel layout
operations you can get the number of
channels and a layout you can get a full
description from a layout tag or a
bitmap so if you have a one of these
integer tags like five dot 1a you can
have it give you an array of the channel
descriptions telling you what with the
channel labels all filled out so you
know which channel is where in certain
layout you can get a matrix of
coefficients for using with the matrix
mixer which I'll go over in a bit
for doing down mixing from like five got
12 stereo you can get a name for a
layout so if you have some layout you
can pass it here to the audio format API
and it'll give you a CF string which you
can print out you can get a name for a
channel so if you want to you know
certain channel you want to find out a
name you can print for the user these
are these are localized strings so if
you want to see what the left channel
looks like you can get that and that
will be localized so and then there's a
you can also put these audio channel
layouts into files so alright so with
audio units you can get the channel
layouts and audio unit supports this is
a new feature for Panther all units can
support channel layouts and you can get
or set channel layout for an audio units
dream for example the matrix reverb can
have stereo quad or 50 channel layout as
output ok you don't have to support
channel layouts if you're doing an audio
unit that's it's only doing mono or
stereo or it doesn't really care about
spatial location like a filter then you
can go have to support audio channel
layouts but if you're doing a reverb or
a Panar or something that can deal with
certain numbers of channels like if
you're dealing with five channels and
you want to be able to support multiple
channel or drinks and you want to do
that use audio channel layout so the
audio converter also supports channel
layout you can use the AAC codec to
support various channel layout when
encoding to AAC there's a property for
getting the available in code channel
outs and for
are studying the channel layout when
you're in coding so this also the audio
files global info API which is related
to audio audio format API but it gets
some audio it gets information about
audio files in a skia what file types
can be read what can be written you can
get names for the file types and you can
find out what stream formats a certain
file type can have put into it you can
find out what file extensions apply for
a certain file type or for all the pile
tides so it's it's basically symmetrical
to the audio format a TI there's a
property ID a specifier and and then for
the info you get the size of the
property and for the for the get
property call you get the you get the
property data so so here's an example of
finding out what Y double file types are
on the system it's almost identical to
the finding out what encoders are on the
system I find out what the the size of
the array I'm going to get back is using
the info size call and then I get the
array of file types that can be read and
I go into a loop and I use a the file
type name property to get a CF string
that I can print out and so this is what
I get so in Panther audio file now
supports MPEG layer 3 files a ac3 and
AAC a DTS files ok one new audio unit in
in Panther is the matrix mixer audio
unit it's a naughty guy that can take n
inputs to or any number of inputs to any
number of outputs and they can be
bundled and strange of
any size so and the CPU usage depends
only on the number of nonzero cross
points in the matrix not the size of the
matrix so and you can get metering on
inputs cross points and outputs so
matrix matrix mixer is useful for signal
routing channel reordering surround down
mixing generalized panning and
generalize mixing all the input buses
are flattened to an array of mono
channel so for all the input and output
buses so it's a big matrix of mono
channel and for gains control each cross
point there's a gain on input a cross
point gain an output gain and a master
gain for the entire matrix so this is a
this is a game of go where black is
losing so this shows this shows the
input buses coming in they're flattened
to a set of four channels I have there's
a stereo buses that are flattened 24
channels there's a gain on each input
and the channels get numbered across all
the buses in just a linear fashion so
they go 0 1 2 3 4 addressing each
channel and then a I'm using a black
circle to here to represent a cross
point that has a nonzero gain and then
the open circles are show 0 game so here
I've got an input x 0 is being mixed to
output buses 0 and 1 and input bus one
is being mixed just two output bus 0 so
that's that's how its laid out and
you're only paying CPU cost for the for
the black circle so as you turn out more
gains fuel
you'll have more cpu load okay so i'm
going to demonstrate that now okay so
here's the matrix mixer or just my UI on
top of the matrix mixer I've got two
stereo input buses coming in and I've
got five channels going out in one bus
so if I hit play here you can see on the
left these are the pre fader input
meters and turn out my master gain here
and then I'm so Isaiah so I turn these
input faders on you can see this is the
postpaid input meter and I'll turn on
these cross point gain so I want to map
this challenge to channel 0 of the
output no map this channel 1 into output
1 I channel one of the output
and then I can sort of put those
backwards into the surround channels
okay so you can see the metering on the
cross points here in the metering on the
output on mixing another found in here
and I can mix it also into the surrounds
here
[Applause]
this is this is center here
okay so let's shows the basically the
studying the levels and the metering in
addition you can enable and disable
buses and that basically turns off
pulling for that branch of the of the on
that bus of input so basically it's like
pausing a section of your your input oh
yeah so all right now just I implemented
this so i can just demonstrate a that
you know you can automate these
programmatically to do in kind of
panning algorithm you wanted to do one
of the kind of interesting things you
can do with a matrix mixer is
multi-channel panning and various
manners
so that's just the rear
ok
I can also disable the output not just
needs the basement
okay so that's that's the matrix mixer
and get back to slides now
alright so in order to set the gains on
the matrix mixer you can set all gains
from global scope Audio Units when you
set parameters you set parameters using
either input scope output scope for
global scope and with the matrix mixer
you can set everything from global scope
although you can set input gains and
output gains from from input or output
scope but this shows how you would
specify a certain gain in the crosspoint
cross points of the matrix you would you
need to specify these using the element
argument to a unit set parameter and you
do that by shifting the input channel
left by 16 bits and then pouring that
with the output channel and then for
studying an input gain you said output
channel 22 hex all s or 4s and then for
that with the input channel strips less
Sixteen and for the output gain you do
the converse operation and then master
gayness all s okay so you can as you saw
you can get metering on prepaid and
postpaid metering on inputs and postpaid
metering on outputs and cross points in
order to get metering you need to set
audio unit property meeting metering
mode 21 metering does take some cpu so
you want to have the option to turn it
off if you don't need it and so and the
parameters for metering are accessing
the same method as the gains by shifting
the input last-16 in the element i'm
wearing with the output
okay so the bus enable that I showed you
major mr. parameter bus unable or enable
if a input buses disabled then it won't
be pulled pulled so that's some that's
one way you can use to manage cpu load
when you're using a matrix mixer so if
you disable input buses and basically
turning that part of your input graph
off so if you just set the input gain 20
that that input will still be playing
you just not hearing it so it's just
differently to do something like that
okay so in order to set up the matrix
mature before you can use it you need to
set the number of input and output buses
using audio unit property bus count and
you need to set the number of channels
in the stream formats of each bus this
defines the size of the matrix so that
it can allocate itself to the proper
size okay also in the IMP answer there's
a new panner unit as as bill was saying
a panel unit class from fora for doing a
mono stereo or end channel inputs to end
channel outputs and possibly using audio
channel layouts you can do you can use
it to do panning or you can use it to do
like in line faders for channel volumes
alright so that's that's about it
wrap up ok I think a couple these are
over others audio and quicktime tomorrow
morning and clearly documentation
you