WWDC2000 Session 176
Transcript
Kind: captions
Language: en
so I'm I'd like we'd like to get this
session started so I'd like to thank you
all for coming it looks like we have a
full house which is great to see this is
the second of three sessions on audio
focusing specifically on Mac OS 10 the
first session we did on Wednesday was to
do with the new audio unit and music
services that are being provided on OS
10 as well as the media services the
MIDI service is obviously at a low level
to deal with MIDI devices the music and
audio units sort of set up above those
who know so application services the
programs like QuickTime and other
applications would use in this session
we're going to be covering the sound
manager as it exists on OS 10 and some
just revisions of what's been going on
with the sound manager over the last few
months and then we'll be getting into
the low-level interface that expresses
the interface between your application
or the higher levels of the audio engine
to the particular audio devices from an
application point of view and then in
the session following this we'll be
talking about what is going on
underneath that API in the kernel and in
the iocket world so without any further
ado I'll get Jeff Moore to come on it's
been doing most of the work in this area
and make him welcome thank you hi
everybody wow it is full it's good to
see there a lot this many people who are
as interested in audio as I am so let's
get started
so like Bill said today we're going to
talk a little bit about the sound
manager first you talked a little bit
about what the sound manager can do for
you specifically we're going to talk a
little bit about how the sound manager
is implemented we're also going to talk
a lot a little bit about what's changing
the sound manager over the past year did
a bunch of different versions it's been
a little bit of confusion about it and
from that point we're going to go into
what's new with those 10 and
the core of the core audio services that
we're we've been working on then by the
end of this hopefully you understand you
know where we're going with all this
stuff and you'll be able to move you
move your own audio data to and from
devices and all over the system so first
we're going to talk a little bit about
the sound manager the some manager is
available now on three platforms Mac OS
10 Mac OS 9 and win32 the Mac OS 10 and
win32 versions are very similar in terms
of feature set Mac OS 9 has a few more
features than the other ones then on
talked a little bit about things we've
changed most notably we added variable
bitrate decoding and we've fixed a bunch
of synchronization bugs recently and
then I'm going to talk a little bit more
about you know what we did to bring it
up on carbon and what's there what isn't
there so start off with the sound
manager itself we're all pretty familiar
with sound channels and in the general
procedural API for playing sound and for
EM for bringing sound into your
application I thought I'd do a quickly
review a little bit about how all that
is implemented under the hood because
it's not exactly clear because a lot of
this stuff is not exposed through the
through the procedural API so under the
hood all sound channels are implemented
via a network of sound components they
do all the actual processing work and
moving the data and massaging it into a
format that you can play or encoding it
decoding it whatever there are a bunch
of different areas that that the sound
manager focus is on rate conversion is
is arguably the most time expensive
thing the sound manager will do for you
now all these little components are
linked together in pretty much linear
chains that is there you won't have two
inputs feeding one sound component
they just have one in one out and then
all the semantics of the procedural API
are really handled at the low level by
the the the system mixer component it's
a pretty monolithic architecture that
that does the job but it's it's
beginning to show its age a little bit
so this diagram kind of represents a
runtime view of a system using the sound
manager and the connections of the
various components rhe you'll know the
first two chains you see are show the
what is typically the full chain you get
that is you start with a sound source
component whose job is to be the traffic
cop with the buffer of data that's being
pulled through the chain and then you
have a decoder component which is the
component that will take a that'll take
a stream in one format incurred it into
a format that the rest of the system
then you can use typically that's a
that's a linear PCM format and then you
have an equalizer component the
equalizer component actually has two
jobs its first job is to provide the
implementation of the bass and treble
controls that you see in QuickTime the
second job it has is to implement the
spectrum analyzer that you see in
QuickTime to do that it siphons off the
stream and puts things into a buffer and
then performs the FFT at event time you
never will this thing will never perform
its FFT at interrupts time so it's not
too much of a performance strain if
you're not using the equalizer for
actual bass and treble adjustments and
then finally you have a rate converter
component and the rate converter
component does two roles first it will
take the the raw data and convert it to
the the rate that the hardware wants but
it also does the work of handling the
rate multiplier command so if you say
play this sound twice as fast it up
samples and then down samples to get
everything back into the format that the
hardware wants then finally everything
is junction at the system mixer
component on mac os9
you only ever have one
mixer component talking to one output
device component on Windows and Mac OS
10 you have one of those you have a
mixer and an output device per process
everything is handled in user space for
the sound manager so next we have what
we've done with the saw manager over the
past few years over the past year we've
had about seven I think eight maybe
releases of the sound manager you know
three five one three six three six three
there's been a whole bunch of dot
releases fixing little bugs here and
there the important releases are were
the 351 release which was primarily SOA
release to support new hardware
specifically the iBook in the g4 it also
rolled in features to support in new
support for some of the mac os9 features
like multi user preferences and and that
sort of thing but then in 3-6 that's
where we made some pretty big under the
hood changes to the sound manager when
we added support for variable bitrate
decoding and 3:6 shipped with QuickTime
for one now the current version is Sam
Andrew 365 between 36 and 365 there were
a couple of bug fixes that had mostly to
do a synchronization there was also one
bug fix that had to do with fixing a
very long-standing issue in the anit
that caused a hole in the resource map
to be drilled because of a bad dispose
handle it's amazing what you find the 24
to 32 bit transition biting you you know
almost seven years later the specific
thing was the the master pointer flags
were moved to the data block and if that
data block gets purged in the handle you
can no longer tell it's a resource so if
you call dispose handle on it you're in
trouble
sound manager was doing that notably it
caused remote access of all things to
have trouble go figure it was you'd see
a whole bunch of unstable connections
specifically if you are using like
tunneling IP for encryption it would
have the big lot of trouble
trying to get through so variable
bitrate decoding was a very interesting
problem and a very interesting feature
to add to the sound manager now as most
of you know variable bitrate encoding is
a technique that varies a number of bits
used to encode a sample or a block of
samples over time typically this Guild's
better quality encrypt encoding for a
lower overall bitrate and we added that
specifically to support quick times mp3
decoder in quick time for lawn so to do
it we had to basically grapple with the
fundamental issue that you don't with
variable bitrate situations quite often
you just don't know the relationship
between a buffer size and bytes and the
number of samples you're going to get
out of that when you decode it and as
the sound manager is pretty reliant on
the fact that you would know ahead of
time the number of samples in a buffer
so we had to go and and and mess things
up a little bit so that we could express
that notion of a buffer size in terms of
the number of bytes it had rather than
the number of samples to do this we
ended up extending three data structures
the scheduled sound header needed to be
extended so that you had access to
schedule a block of variable bitrate
data the sound component data structure
had to be extended so that the component
chains could talk about variable bitrate
data but with each other and then the
sound pram block had to be changed so
that the mixer could keep track of it as
well in all cases what we did to extend
these data structures were we added a
new flag to their Flags field that said
I'm extended and then you could cast
that to one of these extended structures
and then you had access to extended
fields that included in new another
flags field and the flags field was four
that was used to indicate whether the
structures was counting by sample frames
or by bytes and then the field to
actually hold the byte count as well
so in addition to that we also revved
the the sound converter API to better
support variable bitrate situations and
in fact be a better and easier to use
system in general just for any sort of
conversion so we added a new routine
that called sound converter fill buffer
it's a direct replacement for the
functionality you got from sound convert
convert buffer in particular the
difference is that the mechanism for
moving data through sound converter fill
buffer is a callback mechanism where you
specify a routine that the sound
converter can call to get more data to
decode or encode this gives you complete
control over the buffering and the in
the system you specify the output
buffering when you call sound converter
fill buffer and you expect I've
completed all the input buffering
because you have a function to feed it
into the system and then finally this
obviated the need for calling sound
converter to get buffer size it's for
really any reason if you're using sound
converter fill buffer since you are
already in control of all the
information both on input and output
there's no need to ask what sound
converter about it anymore
so the best place to find more about
this stuff is the recent QuickTime for
one developer update note and that's a
URL where you can get the PDF version so
the other major things that we've done
to the sound manager recently we're in
the nature of synchronization fixes as
most of you have read and and probably
seen firsthand we've had a lot of
problems with things like DVD audio and
video sync now we've been fighting a
running battle with these bugs for well
over a year now and we've nailed them
we've nailed most of them pretty did
much to the wall at this point I think
they're probably still a few running
around and and who knows with new
hardware you're going to get we're gonna
see more and I'm sure we'll be
revisiting this problem as we go forward
specifically we ended up changing the
sound clock which was driving most of
its timing by watching the number of
samples go by as well as relating that
to the to the progression of
microseconds over time and we've been
investigating the map changing the math
too to be a little more accurate and to
be a better suited for certain for
certain pieces of hardware like the
iBook and the g4 that have different
clock trees underneath the hood so the
other big thing we did was the carbon
saw manager and we brought that up for
you I think it was DP - yeah it was it
first showed up in Developer Preview -
of OS 10 so the sound manager for carbon
is pretty much full-featured with a few
exceptions now in most cases the
features that we did not choose to port
had were primarily because of the
functionality was either duplicated by
other services in the sound manager or
indeed on other services in the OS but
also because um the services that they
were providing rob's elite in a lot of
cases so among other things
these included the wavetable synthesizer
and related commands this is things like
freak command and in let's see there a
bunch of other ones that escaped me at
the moment
the other big one that we didn't include
choose deport was sound played oboe
buffer sound played oboe buffer is
probably the most inefficient mechanism
for feeding sound into the sound manager
at the moment you are much better off to
use buffer commands and callback
commands or even better yet you should
be using scheduled sound scheduled sound
is far and away the best way to get
sound in and out of the sound manager at
a specific time
there we also didn't have we don't have
any support for a recording - or playing
from disc again we feel that these
features are best accomplished through a
quick time and then there are a bunch of
other sound commands whose services that
will just not don't need anymore or
didn't work right some examples of these
are the amp command it's pretty much
exactly what the volume command does the
rate command which does sort of what the
rate multiplier command does only it has
an interesting problem in that it treats
all the rates as if you were scaling 22
kilohertz sound which can give you
unpredictable results if you're not sure
ahead of time what you're doing and then
there are then there's the commands like
the load command which were about
querying registers on on the old Apple
sound chip I don't think anybody was
using those least I hope not so
ultimately where this leaves us right
now
is that we have a system that's pretty
well optimized for handling 16 bits
words stereo channel formats with a 44
one kilohertz sampling rate now we
support constant and variable bitrate
formats usually the the simpler variety
of those and we're using a lot we're
seeing native processing coming along
we're using a lot of it ourselves and
we've also got a fairly loose
synchronization model all things said
and done but it's doing the job pretty
much for what we have to do today so
going forward the sorts of things that
we see coming down the road are out the
hardware we see 24-bit integer formats
you know coming straight at us and then
in the software realm you see a lot you
see a very heavy reliance on 32-bit
floating-point to support all the
bandwidth you need and then in channel
formats surround sound is just beginning
to take off now you're seeing it more
and more at the consumer level you're
seeing it more and more at the authoring
level a lot of games are being authored
in in 5.1
and in the pro market you're starting
and the authoring are you seeing much
higher sampling rates than than what
we've been used to in the past
96 killer it certainly looks like it's
going to be the next bump up in the
standard and then we're also going to
see even more complex encoding schemes
than the than the variable bitrate stuff
that you see in mp3 specifically you see
codecs that you think coatings that are
going to be doing different techniques
and for data resiliency or when you
transport it over the network new kinds
of perceptual encoding techniques that
result in better smaller faster or
whatever but they're coming and we need
to be ready for them and then we also
see you know native processing is just
gonna explode
we were talking multi processors I mean
it's like 4G affords you got plenty of
bandwidth to burn for signal processing
and then you in it in addition to that
you will also see hardware acceleration
starting to take off in fact on the PC
platform it's already taken off like
crazy on for games for doing 3d
rendering and then we also see an
increasing requirement for tight
synchronization with other media both
internal media and external to the Box
increasingly we need to run machines in
sync over a network or in sync with a
sim TD ech you know we feel that pretty
strongly that all those features need to
be encompassed in any audio architecture
at the operating system level so what is
core audio what are we going to do about
all that stuff well first we're gonna
provide a new low-level audio device API
specifically it's geared to allow you to
read and write data from a given device
and to do that you know in a way that
can be shared across many processes then
we're also going to do some inner
application communication stuff so that
you can have audio being generated in
one application and sent to be processed
in another application and then on
Wednesday
Chris Rogers went in-depth about the new
component model that we're going to be
to be supporting on OS 10 and in other
places as well the audio unit
architecture and I hope you saw his talk
because you're gonna hear a lot more
about that stuff as the year progresses
and then the probably the best good new
the best news about all this is that
we're gonna we're going to open source
all the low-level sources or the
services we feel pretty strongly that
while we think we're pretty good we know
what we're doing
we've you guys have more knowledge about
the specific areas and more knowledge
about your specific hardware so that you
could tell us hey you're not doing that
right or you're not gonna be able to
support my hardware we really want to
encourage you to participate in what
we're trying to do and to make it better
so this diagram kind of lays out the way
core audio is being spread about the
system now down in the kernel you have
i/o kit that's where all the drivers
live that's where all the hardware lives
so the big problem with a protected mode
system like this is how do you manage
the how do you get the communication out
to the application so that you can move
the data fast enough and enough of it so
that you can do something useful with it
so we're gonna tackle that with the
audio device API it specifically manages
moving data to and the kernel user
boundary and it's it's entirely itself
entirely in user space it doesn't have a
single part piece of it that lives in
the kernel and then it lives
individually within each client process
that's using it as well so each so you
link it's just another shared library
and then on top of that while clients
can directly access the audio device API
we only really expect to see that done
with clients that have a really high
degree of need for low-level control and
management and whatnot mostly we hope to
see you using audio units to deal with
the hardware as will be providing audio
units that will completely wrap up the
use of the audio device API as well as
the audio
IPC mechanisms that will allow you to
move data across processes completely in
user space again so what are the goals
of the audio device API so like I said
it's designed to be multi client the
Macintosh since its inception has been
able to play sound in two applications
simultaneously if we weren't able to do
that in OS 10 that'd be a huge step back
so it's it's very important that multi
client nature features be be carried
forward then further we're supporting
multi-channel obviously we think
surround sound is important we got to be
able to talk to devices that have more
than two channels and to be able to do
that and in a way that has low latency
so that when you say play this buffer it
gets out there on the wire as fast as we
can get it there the the audio device
API is designed to have very low
overhead and specifically for the
latency we depend very heavily on the
core OS scheduler to akkad to make sure
that the threads that have our code and
them run when they're supposed to run
and then the latency will obviously also
depend on the transport layer you're
using to send the audio to to the
hardware you know it's like PCI has one
kind of latency USB has a totally
different kind of latency and then
another primary goal is synchronization
with the audio device API it's very
important that a device be able to be
synchronized with both internal hardware
and with external hardware both sim
teeth signals digital clocks word clock
studio sync everything we're bringing
that into the system and then finally we
hope it sucks a little bit less
Jim here so the data so the the data
formats for the audio device API well
it's pretty much format agnostic now we
do treat PCM data a little bit better
than we treat other kinds of data but by
and large whatever your hardware wants
to take the audio device API is prepared
to ask for it from the client code we
pass every all the data is passed around
Android stars and we don't make any
requirements about the data now if you
do choose to use PCM the PCM format that
we use internally is 32-bit floats and
we support both interleaved and non
interleaved streams for PCM data and
further will do all the mixing for you
internally if you're using PCM data
otherwise you pretty much have to rely
on the driver supporting mixing in some
fashion for other formats because we
don't know how to do that and we're not
and we're willing to let the driver
figure it out and further with with the
Linnet with the 32-bit floats to
actually convert into the hardware
format we also rely on the driver
providing us a routine to do that so
conceptually a device in this model
encapsulate s-- an IO cycle to read and
write the data to the device and a clock
to keep track of that IO and the clock
generates timestamps that that
specifically map out the relationship
between the host clock which in our case
is the CPU time register as you know as
specified by uptime and there are other
system services on OS 10 for retrieving
that clock value and the sample clock of
the hardware that is you know the
counter that's counting the samples gone
by it's really important to know to a
high degree as accurately as you can the
relationship between when a sample is
played and the host clock time that it
was played at
you'll see why in a few minutes so
devices also have a set of properties
and properties are used to describe the
state and configuration of a device now
they have getters and setters and they
are specified as selector value pairs
now the selector is just is an integer
ID and the value is any is any format
that the selector that the that the the
property wishes to express again it's
expressed as a void star in the API
another big feature of properties is
that you can scheduler you can schedule
the the driver to make the change to the
property for you ahead of time that way
that gives you if the driver supports a
way to do sample accurate scheduling
with hardware changes and this will be
much more important as as firewire
devices start coming online and then
finally clients can register for
notifications in the chain in in the
changing of properties and the
notification mechanism specifically will
you will get a you will get a
notification only if a value the value
changes and it will and if it changes
anywhere on the system so if process a
makes a change and process B is looking
for that change process B will get a
notification that process a it changed
that value so on the inside what we have
is a single ring buffer that gets mapped
by a i/o kit into each client process
now ioq it reads and writes to this
buffer asynchronously from what the
clients up to in fact it's usually done
via DMA program we're trying to avoid
having to have any kernel threads
actually executing to do any audio
processing if we can it all help it and
then the interrupts that we get in this
system are only generated when the
device driver wraps back around the ring
buffer now the buffer size the size of
the ring buffer is typically fairly
large on the order in the currents
it's about three-quarters of a second so
you're gonna only see in it the
interrupts rate for this system is
extremely low now I'm you all have a lot
of questions I can tell so every time
the the buffer wraps around we get a new
timestamp for when that wrap around
happen and that timestamp that comes out
of the sample clock and from the host
clock and you can also get a timestamp
whenever you want so you can generate
more as you need them and sometimes
those timestamps are going to be
interpolated because we may not have
specific data on the the time that
you're asking for so we have code that
keeps track of the over the the history
of time and can predict one value given
the other so in this diagram you kind of
see what's going on from the device's
point of view so there's an input ring
buffer and output ring buffer and you
know the clients reading is spinning
around reading and writing to those
buffers and the DMA heads you know chase
those those heads around and and clean
up or or write new data after them and
then as you can see the when they when
the heads gets back to the interrupt
point you will get an it will raise an
interrupt and at that point will
generate some new timestamps and we'll
do a few other house cleaning house
cleaning chores but other than that we
don't actually call out to applications
to get data that's a big difference to
the way systems have worked in the past
so how do we get data to the system so
from the clients point of view
implicitly you're going to be doing a
lot of multi-threaded programming with
audio know just be clear on that because
that's pretty complicated and it's a lot
different than preview the the mac os9
implementation there are a lot of new
issues you're gonna have to face about
atomic operations on data making sure
that that you're not waiting on a
semaphore that's not ever going to get
signaled on or the usual things that you
have to deal with when you're dealing
with a multi-threaded environment it's
now coming to bear right on you when
you're dealing with audio so the client
has one high priority run to completion
thread per device in his in the process
now the client can give us the thread
and you can configure it however you
want with your own priorities or we will
do it on your behalf
and we will set things up so that we
give you the appropriate priority as
well for the type of latency that you're
looking for in your eye out
so this thread gets this thread is okay
so the code that's running in this
thread needs to obviously because it's
it's ten tends to be a very high
priority thread in fact audio tends to
be one of the highest priority threads
in the system there's a high degree of
there's a high chance that you will lock
the system up if you take too much time
in the i/o thread consequently there are
a number of things you can do to
alleviate that you can just not take so
long that's a fine idea optimize your
code you can also the but perhaps the
better strategy is if your if your
situation allows it to have a secondary
thread available that runs at a lower
priority than the audio thread that you
can use to generate or render your audio
or read it off the disk or get it from
the network or do whatever you need to
do to get your data and process it and
get it into a form that's ready to be
handed to the hardware if you can do
that you will increase your throughput
and the general throughput of the system
a great deal because we won't be
spending a whole lot of time in these
high it locked up in these high priority
threads that don't yield to the rest of
the system now in fact if you do take to
up too much time in your i/o thread you
we're gonna first we're going to tell
you about it you we will send you a
notification that says your thread is
taking too much time and you will also
hear glitches in your stream obviously
because you're not generating the audio
in set at the right time to be played so
that the stream can be continuous
the other sorts of issues that you were
you're going to see is that you can lock
the system up but still not panic the
system so you machine will just freeze
and the only thing you can do is a cold
reboot given that we're the the
implementation of the audio device API
goes to great lengths to keep that from
happening and to the point where it
won't reschedule your thread for you if
you're if it sees that you're taking too
much time
so you're if you eat too much processor
time we're gonna scale you back a little
bit and and so that you're not not
starving the rest of the system so so
what are you doing this thread right and
that's kind of an important question
so the i/o thread is basically scheduled
to wake up periodically in a way in and
when it wakes up the idea is is that you
read your input and you write to the
output now the the way we schedule the
the thread to wake up is so that it kind
of simulates a double buffering
situation that is specifically you will
by default your thread will be scheduled
to wake up about a buffer ahead of the
buffer you're supposed to render for
output this gives you roughly a hundred
percent of CPU to do whatever it is you
need to do and still be able to deal to
deliver the audio to the hardware on
time so that you have glitch-free audio
like I said the input and output are
presented to you synchronously so when
you when you're when your I oath routine
gets called you get both the input data
and the output data further you get
timestamps that talk about when that
data was acquired or when that data is
going to be inserted in the output
stream and you get a third timestamp
that alot tells you what the current
time is as well so you don't have to ask
and can potentially save a little time
in the in the i/o thread now the buffer
size that you're using
is completely configurable by the client
we can do this because we're just
writing into one shared ring buffer we
just know that we need to write that
data at a certain space ahead of the DMA
read head so that we keep the stream
continuous and so the buffer size that
you use is entirely up to you you can
make it as big or as small as you want
and you know that obviously you have
trade-offs with overhead in those terms
the smaller the buffer size the more
frequently we need your thread to run
and also the more effective the jitter
in the thread wake up is going to affect
you for instance if the if if you want
to render at say 64 sample buffers
that's roughly it's a it's 3 or 4
milliseconds of data now if the if the
thread that you're using to render that
data has a jitter of some number of
microseconds obviously the smaller your
buffer is the more that jitter number is
going to matter to you so another thing
that's configurable about this whole
process is the wakeup time so you can
say how much time in advance you want
the thread to wake up you can make it as
close to the the actual delivery time as
you want provided you know have a good
idea about how much time about you take
to deliver your data the general there
are a lot of interesting applications
for that one reason you might want to go
as close to the delivery point as
possible is to be as responsive as
possible to interactive events like MIDI
keys or user interface events or or user
experience user interface devices and
whatnot so given that those parameters
that you've told us how big your buffer
is and how far in advance you want us to
wake us wake you up the actual wakeup
time that we that we scheduled the
thread to wake up for is then calculated
using the previous timestamps and the
relationship between the number of
samples played and how much host time is
passed in order to generate the
appropriate time to set to wake the
threat up at so in this diagram you kind
of see a conceptual idea of what's going
on when you're IO when you're I hope
Rock is called so you see the buffer
size are the individual blocks and the
read head as you see is processing one
of those blocks and time is going from
left to right so you use the wakeup
offset to control how far into the
buffer you want to be scheduled for and
of course you can go out a few buffers
if you know that you're working ahead of
time many applications have the luxury
of being able to work you know a couple
of milliseconds into the future
and then you can also control how big
those buffers are and when your thread
wakes up you were woken up as you see
one buffer and ahead of where you're
going to deliver data for the output and
the data you're gonna get on the input
is that buffer previous that we just
finished reading so that kind of gives
you the relationship between where the
data is that you're gonna read and where
the data is going when you're gonna
write so so I'd finish this up by
showing you exactly how easy it is to
write a real live client with with this
stuff this code was adapted specifically
from the S dev component that's used in
the sound manager to talk to this API so
first first thing you got to do is you
got to find a device to talk to so the
to do that you use a property of the
entire system which is a little
different than a property for a specific
device you can tell system routines
versus device routines by the name of
the routine system routines start audio
hardware device routines start audio
device so the first property you're
interested in is trying to find out the
default output device so you call get
the the system property for the default
output device pretty simple then you
need to figure out what kind of data you
need to send to the this device so to do
that you you you you get the devices
stream properties if the stream format
property
now the stream format in this API is
encompassed by the audio stream basic
description struct it contains enough
information to describe any constant
bitrate format where all the channels
are the same width this applies to most
of the general of the compression
techniques that you see in the sound
manager today like I am a linear PCM
fill falls into that category mu law a
law all that stuff now the struck will
supply you with the sample rate the
number of bytes in a in in a frame the
number of bytes in the channel and the
number of bytes in a packet if the
format has a has a lot is another
grouping above the sample frame and the
channel structures more complicated
formats obviously can't you have more
information to talk about than just how
big their their individual form fields
are they need to know they like for
variable bitrate data you need to know
well where do the frames actually start
in the stream so more complicated
formats using a also provide an extended
description which is format specific and
is to be n is defined by that format so
and that's also available via another
property on the device and then in order
to do i oh you also need to know how big
your i/o buffer is in this case we don't
really care what the i/o buffer is
because this is just a simple client so
he's just gonna take whatever the
default buffer sizes for this device and
again you just call audio device get
property to do that and one thing I
should mention that sizes in this API
are almost always passed around in terms
of bytes and you can calculate the
number of frames if you can calculate
the number of frames for that format by
getting the stream format and and doing
the appropriate math using the the
description in the stream format
descriptor so to start playback you need
to tell the device about your i/o
routine
so you install it by calling audio
device at i/o proc now you also are
given a place to pass in a pointer to
whatever kind of data you want pass back
to your i/o routine so you can you know
that's really useful for for keeping
track of context on multiple well you
all know how to use those things been
around forever so and then you start the
Q see you then you just start the device
by starting it there's a routine to
start it and stopping a device is pretty
much the same but in Reverse
you call audio device stop and then you
can then if you're done with i/o you
just you can remove the i/o proc as well
now one thing I should point out that
you'll notice that with start and stop
that you also need to pass in the the
i/o routine again now the reason why is
that you can install multiply your
routines on a given device I'm sure that
there are a lot of reasons to do that
but it's just useful for a number of
things so what is the I so here's the
the prototype for the i/o routine and
you get the device when it's called you
get the ID of the device that the i/o is
happening on you get a timestamp that
represents now and like I said
timestamps represent the mapping between
the sample time and the host clock you
also get a pointer to the input data and
a timestamp for when that first when the
first frame of that input data was
acquired and then you get a pointer to
the output buffer and a timestamp for
win that first when the first frame of
the output buffer is going to be
inserted into the output stream and then
you get back to your client data pointer
as well and here's the entire
implementation of the i/o routine I in
my case I'm using a my nifty file object
to in my client data field so that I can
get my data back so I can get some data
to play as you can and I cast that back
and then I use it and I put that data
over just right in the output buffer and
then if I find out that I'm done I can
just turn off that i/o routine and the
semantics there is is that when you turn
off
iö routine from from during an i/o
process that current i/o will complete
and then no more i/o for that routine
will happen so how when are we gonna
give this to you so like I said the
sound manager is in DP for now it's been
there since DP - the audio device API is
also in DP for and the IPC mechanism is
going to be we're gonna start seeding
that prior to the public beta and we'll
hopefully have it jammed for the public
beta and then with the Audio Units
architecture we're looking at this fall
for releasing we don't really have
anything specific there so next just
kind of show you that it all actually
works and it's really alive first time I
point out the music that you heard when
you came in was coming live off my power
book using QuickTime Player on OS 10 on
top of the audio device API
there I think you can hear me so first
up I want to say in dp4 by default the
sound manager is not set up to use the
audio device API you can add a magic
cookie to the to the framework and it's
documented way on DP for how to do that
to make that actually happen but other
than that there's one reason why all the
demos you've seen previous to this
haven't been running through the audio
device API but these will
so first up like to show just a
reasonably high frame rate QuickTime
movie and the thing to watch for in this
one is synchronization that the
synchronization is still pretty solid
it's not a hundred percent perfect yet
but it's pretty good already
Oh got it like that all right
and how about that the sound stopped
right when the movie did thanks Marty
right NSYNC let's try that again did you
love it
hey and you don't have to reboot how
about that Oh another interesting thing
about the audio device API is you don't
have to reboot to reinstall a new
version of it either as long as you're
not playing sound you can just put in a
new version and go that's kind of neat
it cuts down on the development time so
let's once more from the top
[Music]
remember me really
who delivers ten times I lose the cat
that won't cop out
they say I'm a complicated man I might
take you down but I'll never let you
down
who's the man whose risk his neck for
his brother man now what's my name
[Music]
can you dig it
yeah so um synchronization works it
doesn't look like a bad movie either so
another thing I wanted to show you is a
little bit was some of the benefits of
variable bitrate encoding you see some
examples of some mp3 files I've encoded
using different different kinds of
different data rates and whatnot
first I like to play the original and
kind of give you a feel for what this
sounds like before I give you all the
compressed versions okay particular note
you should hear the cymbal sounds and
what they sound like kind of remember
with that and try to figure out you know
which of these sound the best first
let's start with the the high data-rate
version since that those are the easiest
obviously they're gonna sound real
relatively good and the relative
difference between variable bitrate and
constant bitrate starts to go away when
you use larger bit rates but there's
still there I mean if nothing else you
get smaller files so here's the variable
bitrate and here's the 128k constant
bitrate kind this is typically the
format you find on the Internet
[Music]
[Applause]
[Music]
[Applause]
it sounds pretty close to the original
and here's the variable bit the
high-quality version for this encoder of
a variable bit rate as you can see the
file size is a bit smaller than the 128
K size let's go back to the same place
[Music]
[Applause]
[Music]
[Applause]
and again you can hear it's still pretty
close to the original mix and this case
is savings obvious variable bitrate wins
because it's smaller file size now just
kind of an interesting aside the normal
encoded version at least for this
encoder wait for the variable bitrate to
parse
[Music]
[Applause]
the normal clumps that sounds pretty
good and hey look the file sizes even
smaller yet now here's here's a the
we're variable barrier really starts to
shine now is in the Lin the smaller the
low bit rate cases so say there's a 64 K
constant bitrate version almost doesn't
sound like a high hat
[Music]
[Applause]
[Music]
yeah and you hear all the usual gripes
you get about mp3 at that plant so let's
take a look at the variable bitrate
version in the lowest quality setting
that squeezes the most bits out of it
and in this case again you see the file
size is roughly the same as the 64k
version
[Music]
so you see the the low quality version
then much much better than the 64k
version and you get the same file size
so you know you use the VBR I guess so
to finish things up you need to contact
Dan Brown to finish things up and
participate particularly if you want to
participate in the seeding program the
seeding we're going to be seeding the
the the audio audio services a lot
quicker than we've been running in the
past we're hoping to keep things moving
get things out into the open get you
guys working with the stuff and working
with us to make it better so that we can
actually you know meet your needs for a
change so contact Dan and he can you can
get that working so next we have
finished things up a little
QA but bill has some
[Applause]
you