WWDC2010 Session 412

Transcript

>> Good morning.
My name is William Stewart and I manage the Core Audio
group and we are doing three sessions this morning
in this room on audio, primarily on iPhone.
We're also covering some general audio topics that
are relevant for the desktop as well as iPhone.
The set of frameworks and services that we provide are
quite extensive and cover a number of different areas
and we'll be going through some of these in the talk.
The Media Player API is just a general kind of remote access
for the iPod application and the media library and the iPod.
And the iPod itself uses the same sets of APIs and
frameworks that we're discussing with you today.
So it's all, you know, what Apple uses itself
to implement its features is, of course,
the same things that you get to use as developers.
OpenAL is an industry standard, I
guess you'd call it, for doing games
and we support that on the platform for game audio.
AV Foundation made its debut last year with some
very simple AV audio player and recorder objects
and this year it's become quite an extensive
framework with a collection of video.
And there's a whole series of sessions on the new classes in
AV Foundation but there is some specific audio functionality
in this framework as well and Allan
will be going through that in a moment.
And all of these technologies are really built
on a collection of services that are rendered
through Audio Toolbox and that's really the primary
services that my team delivers to the platform.
That includes services for reading and writing
audio files, for converting data formats,
for using audio units that provide
processing, mixing, all kinds of things --
basically the collection of tools
that you need in order to do audio.
So that's a very general introduction.
There will be some more data overviews
through the sessions this morning.
The sessions that we have is this one and one of the things
that will be covered here is how your
application integrates with the rest of iPhone OS.
So it's primarily audio session and managing the
resources of the platform and how you can best use them.
What we thought we'd do in the second
session is take a step back from talking --
rather than sort of focusing specifically on APIs, we
thought we'd rather take it from a different angle.
And so what we're doing in that session is
looking at: what are the fundamentals of audio?
If you're talking about digital audio, not just on
our platform but on any platform, what does that mean?
What is Linear PCM?
What is AAC?
How are these things different?
How do we express them in our APIs?
But really more like: what is the
fundamental features of these things?
And our APIs are very fundamentally shaped by what audio
looks like and as a data format in some of the constraints
in terms of time and resolution and everything that
we deal with by dealing with this media format.
And then the last session, Audio Development for iPhone OS,
is really taking a more detailed
look at how to use audio units.
So what do audio units look like for your application?
How do you interact with them?
And we're also taking a little bit of a forward-looking
stance at that and looking at some of the general ways
that you can deal with more complicated
processing demands with IU graphs and so forth.
So that's enough of me talking.
I'll get Allan to come up and he'll
begin his discussion on AV Foundation.
Thank you.
[ Applause ]
>> Allan Schaffer: Great.
So thank you, Bill and good morning everyone.
The AV Foundation has a number of high-level classes
that you can use for audio playback and recording.
And this is where we're going to spend
most of the time in this session.
So I'm going to be talking about the audio player, which
lets you play back audio from a file or from data in memory.
I'll talk about the recorder, which lets you record audio,
capture from the microphone and record that to a file.
And then I'll talk about the audio session, which,
as Bill said is going to let you manage the
audio behavior of your application on the device.
There's a fourth class that actually
I'm not going to be covering
in this session but it's worth taking a look at as well.
We covered it yesterday in the AV Foundation
sessions for video and that is the new stream player.
Now, a lot of the new functionality in AV
Foundation has been geared towards a lot
of very expressive video functionality.
And so that class is part of all of
that but it can also be used for audio,
to either play audio from a local
file or to stream it over a network.
So I'm just going to jump straight in.
Let's talk about the AVAudioPlayer.
And this is really a very simple
class for you to use to play a sound.
It supports a variety of different file formats and the
ones that are supported by the audio file service is API.
So that's things like caf files, m4as, mp3s, and so on.
And the class provides a number of
just basic playback operations --
to play a sound, stop, pause, move
the playhead around, and so on.
And with this object, if you want to play multiple
sounds simultaneously, you can do that as well --
what you do is just create multiple instances of the
object and have each one controlling the different sounds.
There's also a number of properties that I'll go
through in just a moment but things like volume control,
you can enable metering, you can have the sound be looping
as it's played back, and the few new features in iOS 4 --
the object now supports stereo panning from left to right.
It also synchronizes playback if you have
multiple instances playing simultaneously.
So right away, I'll just jump into some API here.
To get started with this class to instantiate an object,
you just -- we'll call init with contents of URL.
The URL needs to be a local file that's in the sandbox
for your application or you can create it from an NSData.
And one quick side note before I go on: All of the code
snippets that are on the slides in this talk are available
on the attendee website so you don't
need to worry about writing them down.
You can just go and download them right after
the talk or go ahead and do it right now.
Now, there's a number of properties on the
AVAudioPlayer that let you control playback.
And you can either set these up before you begin playing or
with a number of them, actually, you can just change them
as the player is playing a sound,
so you can change the volume --
here I'm setting it to 100% of the current output volume.
You can change the panning -- here I
have it set all the way to the left.
If I had it to 1.0, then it would
be all the way to the right.
The number of loops here is something also you can control.
So 0 means no loops, -1 means loop indefinitely,
or you can have a specific number of times
that the audio will loop back after
it's played through once.
You can have direct control over the playhead
as well with the current time property.
And so if you want to implement something where
you are scrubbing around in an audio file,
you would just be changing the value of this property.
Or if you want to reset the playhead
to the beginning, you set it to zero.
And there's a delegate as well
that we use for notifications.
And then other properties, some that I mentioned --
so to be able to enable metering of the playback,
you can find out of the duration of the audio
file that you're playing, the number of channels,
and the state of the player as it's playing your sounds.
Now, the playback controls here are really
simple, there's just these four controls
that you'll be using all the time with this object.
PrepareToPlay is actually probably
one of the more important ones.
This is going to get the player ready to play
a sound for you with absolutely minimal lag.
So what this will do is allocate
the buffers that the player is going
to use internally and prime those buffers with your data.
And that way, when you go to invoke the play
method, it can happen nearly instantaneously.
So the play method just starts playing your sound now.
And so -- if you later on you pause or stop the sound, the
play method will resume from the point that it leaves off.
And that's maybe an important note to make is that if
you had expected, after you stopped playing a sound,
for the playhead to go back to the
beginning, that actually isn't the behavior.
The playhead stays where it was, just like a tape deck.
And so if you want to go back to the beginning,
you would reset the current time to zero.
Pause is going to pause the playback but with
Pause, the player stays ready to resume again;
the cues and the buffers are still
going to be allocated and ready to go.
And that's the difference between Pause and Stop.
With Stop, the cues are disposed
of and the buffer is disposed of.
So if you want to restart playing after you have
stopped, after you've invoked the stop method,
you would probably call prepareToPlay
and then later on call play.
Another element of the player class
are some delegate methods.
So these will be invoked when certain events happen.
And probably the one that's very important for
you to implement is when the player is finished
and that's called audioPlayerDidFinishPlayingsuccessfully.
And in that method you might clean up, you might change the
state of your interface, and take care of other things just
to indicate to the user, "Okay,
the sound is no longer playing."
Then there's a number of other
delegates that you can implement.
If there's a decoding error in the file that you
played back or if interruptions began or ended --
interruptions are things like a
phone call came in for example,
and I'll talk about interruptions a
lot more towards the end of the talk.
So let's just put this together and look at playing a sound.
So here in this method, I'm being passed in the
URL of a local file; I create my player object;
I next set the passing appointer to that URL; I set up
a delegate for any of the notifications that I may want
to have fired later; prepare the
player for playback and hit Play.
So all of this is really -- this is just to show that
this is a very simple class but a great way for you
to get started with audio playback in your app.
So that's the player.
Let me now go just about as quickly through the recorder.
Another simple class -- this lets you record audio to a
file and its behavior actually is that it will either record
until you stop it by calling it stop method or you
can set it up to record for a specific duration.
And it supports a variety of different encoding formats,
I've listed a bunch here -- AAC,
ALAC, Linear PCM, and so on.
AAC is interesting though because we have hardware
support for the AAC encoder on certain platforms --
the second generation and third generation iPod
touch, we have hardware support for it on the iPad,
the iPhone 3GS, and of course on the iPhone 4.
Now with the AVAudioRecorder, the API here is really
just the mirror image of what you saw with the player,
so I won't go through it in quite as much detail.
You initialize the recorder with a URL to a local file.
One difference, though, is this next
parameter: the settings dictionary.
I'll cover that on the next slide
so I'll come right back to that.
There's a number of recording controls that
you can manage, so prepareToRecord, Record,
or Record for a particular duration, then Pause and Stop,
and sort of your predictable properties that you can get
about the state of the object and
the state of the recording.
Now about that the settings dictionary.
So when you're recording, you need to specify exactly what
format you want to record into, what sample rate to use,
the number of channels, and then perhaps
format specific settings as well,
like for Linear PCM you'd specify
the bit depth, the endian-ness.
For certain encoded formats, you might
specify the quality or the bit rate and so on.
So let's take a look at that.
Now this looks like a lot of code but
really, it's actually very simple.
All I'm doing here is setting up a dictionary containing
key value pairs for each of those encoding settings.
So on the left here, I'm setting my format to AAC,
the rate to 44100, the number of channels to two,
the bit rate to 128K, and the audio quality to the maximum.
So all of that is just being packed into an array and I
give that array now and create a dictionary from that.
And then that dictionary is what I pass
when I initialize the audio recorder.
And so now it's all ready to go.
I can call its methods to prepareToRecord and start
recording or record for a particular amount of time.
Okay, so I know that was pretty quick through
those objects but they're really just very simple.
The thing is, though, that they're very feature-rich so
they do suit a lot of the basic needs of audio developers
who just want to play and record some sounds.
If you saw the Quest demo on Monday
or yesterday, for example,
we're using this API to play all
of the game background soundtrack.
And so, you know, it's just a very,
very capable API for doing that.
And really, it's recommended as the
starting point for you in most cases.
You know, it's the good starting point unless your
requirements go into more complicated uses for audio.
So if you need access to audio samples, for example for
processing, then you might use the audio unit's API,
which Murray will be covering in the third session.
If you need to do spatial 3D positioning of audio
sources in a game, then OpenAL is perfect for that.
And so you would probably choose that API in that instance.
And if you need to do network streaming, then
you might use the new AV Player or you might go
into the audio file streaming services API.
Okay, where I want to go next, though,
is into audio session management.
And this is really an important topic for developers
to all get absolutely right in their applications
and that's why we put so much focus on it.
And really, I'm going to dedicate
the rest of this talk to this topic.
So the idea here is that this is how you can manage
the behavior of the sounds in your application
and make them behave according to both the expectations of
the user for the kind of application that you're writing
and to be consistent with either built-in applications
or other applications of the same time of app as yours.
What you're going to do with this API is to categorize
your application into one of six possible categories
and then your app's audio is going to follow the
behaviors that are defined for that category.
Then this is also the API that will let you
manage some of the shared resources on the device
and do things like mix with background audio.
It's how you'll interact with interruptions if
they occur, say again, if a phone call comes in
and it's how you can handle changes in the routing if
the user were to, say, plug in or unplug a headset.
Now there's actually two APIs here
that are relevant to what we're going
to be talking about: a high-level API and a low-level API.
So the high-level API is the AV audio session class.
It's an Objective-C class, part
of the AV Foundation framework.
And really, it wraps up all of the
most commonly used functionality
that you need to manage with the audio session.
Then there's also a lower-level API called Audio
Session Services and that's part of the Audio Toolbox.
And that's really all of the implementation
that we expose to you.
C-based, a lower level and has a bit more functionality.
But what's interesting to note is that it
is possible and quite okay for you to mix
and match between the high-level and low-level APIs.
In fact, what's quite typical is you might set up your audio
session using the high-level API and then maybe just drop
into the low-level API to set some overrides or
other things that aren't exposed at the high level.
So there's five basic tasks that we're going to go
through for the remainder of the session here to talk
about with AVAudioSession: We're going to set up the
session and configure its delegate; we will choose --
very carefully choose and set -- and audio session
category; we'll go active and we'll talk about that;
and the things that are new there in relation to iOS 4;
then I'll talk about how we handle
interruptions and handle route changes.
So first setting up the session -- very easy.
The audio session instance is just a
singleton object for your application.
So you just retrieve a handle to that.
You'll set up a delegate for any notifications
that might occur, like an interruption.
And this is the place where you might
request certain preferred hardware settings.
Just for example, in this case I'm
requesting a sample rate of 44100.
But now the second part is really -- this is
the most important part of using this API.
And it's not a line of code, it's a choice
that you have to make for your application.
You will choose instead a category based
on the role that audio plays in your app
and the kind of application that you're writing.
And there's six possible categories: playback;
record; play and record; audio processing;
and then two more -- ambient and solo ambient.
So let me show you how these differ.
So I have the categories listed here on the left.
Their intended usage will be a column that we
build out as I talk about this and then a number
of the behaviors are listed in the table as well.
And it's things like whether each
category obeys the ringer switch, meaning,
is your audio silenced if your user
flips the ringer switch to silent?
Is your audio silenced if the user locks their screen?
Does this category allow your sounds
to be mixed with others?
Does it use input or does it use output?
And is it allowed to play in the background if your
application is transitioned into the background?
So let's have a look.
Okay, first with playback.
This is probably the most straightforward of all of the
categories because it's intended for such an exact purpose.
This is the category that you should choose if your
application is an audio player or a video player.
So if that's the primary purpose of your application is to
output media, then you would choose the playback category.
And you can see that the behaviors here, in a way,
are very similar to what you see with
the iPod application on the device.
This does not -- applications that are using this category
are not affected by the state of the ringer switch;
they can continue to play if the user locks the screen.
Now MixWithOthers is not enabled
by default by saying optional here,
it is possible here for you to optionally turn it on.
This is an output category.
And the last one.
With this category, it will allow your application to
continue to play audio through a background transition
if you have set up the audio key for the
UI background modes in your info plist.
Now Record is very similar.
But of course, its intended usage is for audio recorders,
applications that are doing voice
capture, that sort of thing.
The behaviors, though, are right
across the board mostly the same.
Obviously this is an input category
that uses input instead of output.
But it will also survive through a transition
in the background if you have set that key.
Play and record.
Well, this one is essentially combining those first two.
So this is intended for applications that are
doing voice over IP or voice chat types of apps.
You can optionally mix with others,
just like with the playback category.
But with this one, this is using
input and output simultaneously
or enabling you to do that, so both of those are shown.
And if you choose this category also, your
application can go into the background.
One important difference with this category,
though, compared to what we've seen, say,
with the playback category is the default audio route.
So with the playback category, by default
your output will go through the speaker.
With the play and record category, your
output will go on a phone to the receiver,
which is the speaker you hold up to
your ear when you're on the phone.
Then the audio processing category.
All right, so this one is used for offline conversion
of audio file formats, offline processing of audio data,
and you can see that actually many
of the behaviors are similar
but it's not using either input or output for any sounds.
All it's doing is doing this processing in memory.
Now, one thing about -- a special note about this,
about how I say that it's allowed in the background.
So yes, if you set up your application to use this
category, the processing can continue in the background.
But unlike the previous three, just setting this category
alone does not enable your application to transition
into the background and keep running; you would have to
use one of the other ways of going into the background.
So for example, maybe you would just be
asking for extra time to do processing
and that's how you would transition into the background.
Now, these next two are very similar to each other,
so I'm just going to put them up simultaneously
so you can see the difference: ambient and solo ambient.
But the purpose of these two categories is very
different from the previous four that we talked about.
The top four are really intended for applications
whose main purpose is very audio-centric, right?
A playback app.
You know, an audio player, a voice recorder, a VoIP app or
something doing audio conversion, so very audio-centric kind
of purpose of the way that application uses audio.
These other two are really intended
for much broader purpose kinds of apps.
So games and productivity apps or utility apps
would probably choose these latter two categories
and the reason is because of the behaviors that
they enforce are consistent with the behaviors
that users expect for that kind of application.
So what you can see is, well, these
both obey the ringer switch.
That means the user is playing your
game or using your to-do list app
or whatever your app happens to
be that's using in this category.
If they're doing that and they hit the ringer switch, well,
the audio will be silenced, which
is exactly what they expect.
And again, it's because the purpose -- the usage of audio
-- is not critical to the purpose of the application.
It's perfectly expected that you can play a game
with the sound turned off, or many games at least.
It's expected that you can use your
productivity apps like Mail or Safari and so on
and have the sound turned off in those as well.
They will both obey the ScreenLock as well, that means
that audio playback will stop if the user locks his screen.
I'm going to jump over.
They both are output categories and neither
of these enable your application to transition
and continue to play audio in the background.
But the difference between them
actually is this MixWithOthers parameter.
And so I'm going to be talking about that in a
little more detail, really, just coming up next.
But it has to do with whether your application
needs access to a hardware codec or not.
With MixWithOthers with ambient that means you would
use that for applications that don't require access
to a hardware codec, don't need to use it.
Excuse me.
With solo ambient, that's the category
you would choose if your, you know,
game or productivity app does require
access to a hardware codec for decoding.
Okay, so here is where we are setting category.
So we've gone through the table now, we've made
our choice for the specific kind of application
or applications that you guys are writing.
And in this case, we're choosing the ambient category.
We go on to the next part here
and set our session to be active.
So to do that, all we do is call set
active, tell it yes, we are now active.
And once we're active, now we can play sounds or
record sounds if we have chosen the record category
and go on to set up our audio APIs,
handle interruptions, and so on.
Like we have now asserted that we want to make
use of the audio functionality on the device.
But let me go back now and talk about going active.
So it would be typical for most applications
to just go active when they start up
and stay that way for the remainder of the app.
But there's a few classes of applications that should not
do that and it's because of the interaction that they have
with other audio that might be playing in the background.
So with a voice recorder application, for
example, or a VoIP app or a Turn-by-Turn app --
well, all of those should have different behaviors related
to music that might be playing on the iPod when they start
up or there may be an email notification
sound and so on that might happen.
So with those kinds of applications, you want to
be a little more clever about when you go active.
And the story is that you should
only go active in those kinds
of applications while you're actually doing those things.
So in a recorder app, you would only
go active once you actually start
to record, not just when the application starts up.
And you would go inactive as soon as you're done recording.
On a VoIP app, you would only go active while you're
on the call and then go inactive when you're done.
And in a Turn-by-Turn app, well, there's some
specific behavior that we would want there,
which is that let's say the user had been playing music
in the background when they ran your app and now it's time
for you to announce the next turn, "turn left."
What we want to happen there is for the iPod music
to be ducked -- for it to lower its volume --
you make your announcement and then you go
inactive to bring the iPod music back up again.
The same would be true if it was audio playing from
a third-party application in the background as well.
So let me just show you in specific
how you might go about that.
I want to focus on the VoIP app and the Turn-by-Turn app.
So again I said, with the VoIP app, you would
go active when it's time to start the call --
and this is going interrupt sounds that
might be playing in the background --
and then go inactive when the call is over.
And what we can do, there's actually a new
method in iOS 4.0 called setActivewithFlags.
And what that can do is if you set that flag to
notify others on deactivation, then that other,
the background process that was playing
audio can be notified when the call is over.
And it will be told, "Ah, okay, the call is done.
You can go and resume your audio now,"
with Turn-by-Turn navigation types of apps.
As I said, there's a couple of things you want to do here
so that the other audio gets ducked while you
make your turn announcement and then comes back.
And this is in three steps here.
So this is the first step, the setup.
The first thing that we're going to do, of course, is just
choose the right category to put our application into.
So the main purpose of a Turn-by-Turn app
would be to announce these instructions.
So we would need that to happen regardless of the state
of the ringer switch, the screen locking and so on.
So it will choose the playback category.
Now next, though, we're going to
set an override on that category
and this is the part where I said was optional before.
We're going to set this override to enable our
category -- override our category -- to MixWithOthers.
So "others" being, say, the background music
coming off of another app or maybe from the iPod.
And then third in this step is that we're
going to enable other mixable audio --
or say that the other mixable audio
should duck when we go active.
And so that's what's going to lower its volume when we go
active and then when we go inactive, it will come back.
So here are those two parts.
So when it's time now for us to make the
Turn-by-Turn announcement, we'll go active,
we have some audio player ready to go with the
sound of that announcement, and we play that.
And then whenever that is done, when
it's finished making the announcement --
for example, if we were using the AVAudioPlayer,
we could do that through the delegate
-- that's when we'll go inactive again.
But now I've been talking about mixing with background
audio so let me go into a little more detail about that.
And there's really a convergence
of a few different topics here.
So your application might be playing a variety of
sounds, it might be taking advantage of a hardware codec,
it may be using a software codec, or just
playing straight through with the mixer.
Another app might be running in the
background and also playing sounds.
So it might be the iPod application is running
or it could be a third-party application
now that's running in the background.
And so we have to sort of arbitrate:
Well, what's the user going to hear?
And it's going to depend on things that
are going on in both of the applications.
It will depend on what category
both of the applications have set.
And related to that, it will depend on whether either
one of those have enabled MixingWithOthers if they happen
to choose the playback or play and record category.
So all of this is ending up to define something that I call
"mixable" or "nonmixable" as a state of your application.
If your application -- now by default, the only mixable
kind of application is one that chooses the ambient category
but if you choose the playback or play and record and
override it to MixWithOthers, then those become mixable.
Otherwise, everything but ambient would be nonmixable.
So let me show you this just in a
picture what's going to happen here.
So let's say that this is your app, the
foreground app, and you're playing in the AAC file
or some of you might be playing an mp3 file.
But just for the sake of discussion, I'm going to use AAC --
but just bear in mind that the same
things would apply if that were the case.
Now, if you had put your application into a category that
is nonmixable, then you will be able to take advantage
of the hardware codec to playback that compressed track.
It will go into the mixer and out to the playback hardware.
Now, if you had chosen a mixable
category, then what will happen is
that actually your sound is going to be decoded in software.
So we have mp3 and AAC and a number of other
software decoders and those will just run on the CPU
to code your audio, have that go in through
the mixer and out to the playback hardware.
But okay, so this is the basics and you guys
are probably already familiar with this part
but now what happens if there's a background application?
The thing is, it's the exact same story.
So a background application -- let's say that it's
playing a music soundtrack of its own, it has an mp3 file.
If it chooses a nonmixable category, then it will
be able to take advantage of the hardware codec
and its sounds will play through the mixer,
they will be mixed with your music
soundtrack, and out to the playback hardware.
The same thing again.
If they choose a mixable category -- if this is the case --
and you're mixable, too, then both of these will be decoded
in software and play out through the playback hardware.
But there's one case that you need to be thinking
about: What if you've chosen a nonmixable category
for your application, meaning you're asking to use the
hardware codec and there's something else that's going
to be running the background, maybe
when your application started,
there was already something there,
what's going to happen with it?
Well, the result will depend on the
category that the background app chose.
So if they chose a mixable category,
then they'll get a software codec
and both of these are going to be mixed together.
So even though you have chosen a nonmixable
category for your app, since they have decided
to choose a mixable category, essentially
they're playing nice.
They're saying, "Okay, I can pretty much mix with anything."
And so both of these sounds will be heard; they'll be
mixed in the CPU and sent out to the playback hardware.
But if they've chosen a nonmixable category also,
then the sounds from the background app are going
to be silenced when your application goes active.
Those are the different cases now
for mixing with background audio.
Now, what's interesting, though, is that there's actually
a way for you to detect in advance what might happen
and therefore for you to decide maybe of a different
category, depending on whether something is already there.
And you may have seen this in some apps that are
doing something like this: they'll say, "Hey,
do you want to play the game sounds, like, do you want
the game music soundtrack to play in the background
or do you want the iPod or a third-party
app soundtrack to play in the background?"
And it depends, the user either says yes or no.
If they say yes, well, then a couple of things happen.
If they say yes, this is a game, so we would usually
either choose the ambient or solo ambient categories.
So in this case, they say yes, it means,
"Okay, then that means they want my game music
and my game music is an mp3 file or an AAC
file, so it's best to use the hardware codec.
So I want the hardware codec."
So I'll use solo ambient and I'll play my game soundtrack.
And if the user says no, well, then I'm
not going to use the hardware codec.
Maybe all I'm going to do is play sort of
the incidental sounds in my game, you know,
the bullet sounds or just a momentary sounds.
But maybe those are something that would be okay to
decode in software or they may be Linear PCM, WAV,
or AAIF files that can just be mixed directly.
So in that case, I'll choose ambient,
saying, "Hey, I'm fine to mix with others
and there's something else playing in the background.
So I won't play my soundtrack, I'll
just play my incidental sounds."
So all of this is fine.
This is a perfectly good way -- logic -- to use for defining
your app but one part that just might be not be necessary is
to leave this choice as something that the user has to
figure out when they first start the application up.
Instead, you can just detect this programmatically.
So there's an audio session property
called OtherAudioIsPlaying.
And it will come back you know, 1 or 0.
And you can use this to decide whether or not
you play your game music soundtrack and really,
you'll use this to decide what category to use,
whether or not to enable MixingWithOthers or not.
And so I show you here just how
you can get at this information.
So AudioSessionGetProperty, I pass that
token above and I get back the result.
Now one change that's important to note in iOS 4 is
the behavior where your application may be suspended.
So prior to iOS 4, it would be very typical to
just put this in the beginning of your application.
Maybe application did finish launching,
you would check this,
you'd have a value that was valid for
the entire run of your application.
But now something that might happen with, say,
a game is that user might come into your game,
start it up, and then realize, "Oh, you know what?
I want to listen to my first person shooter
soundtrack instead of the game sounds."
And they suspend your app, they go over the iPod, they start
up their playlist, and then they come back into your game.
So you don't want to have already
made your decision about this,
you want to recheck it every time your
application comes back from being suspended.
So check again in applicationDidBecomeActive.
Okay. One more thing, we've been talking
a lot about the behavior about mixing
and how to detect that when your session goes active.
We talked about the behavior with, like, recorders and
VoIP apps and Turn-by-Turn apps, when they go active.
There's one more thing about going
active in a tip or a change that occurred
that I want to just bring to your attention.
It's a behavior change with the MPMoviePlayerController.
So some of you guys might be using this to
playback video; very simple class for that.
But its behavior has changed in
relation to your audio session category.
So prior to iPhone iOS 3.2, the movie player had
its own session -- playback was its category.
And so that would interrupt your session
potentially, it might silence other audio
because that playback by default is a nonmixable category.
And so this could have a number of effects.
Well, now in iPhone iOS 3.2 and above -- so iPad
and then all the devices that support iOS 4 --
the movie player controller now uses your audio
session; it just inherits whatever setting you made.
And so this is actually nice.
It's something now where the movie player's
behavior is now made consistent with your app.
But of course, you have to be just aware of this
change in case you were relying on the movie player
to silence other sounds or do something that was
kind of a side effect just of you playing the video.
Now, if you want to go with the default
behavior -- sorry the new behavior --
then you do nothing, it's just the behavior's changed.
But if you want to revert back to the
way it was prior to iPhone iOS 3.2,
there's a property on the movie player
object if you set that to false.
Use application audio session, you set that to false, then
the movie player will go back to choosing its own category.
But all right.
So folks, we have gone through
three of the five basic tasks.
We've done setting the session in delegate, we talked
about making the right choices as far as your category,
and then going active and all of the
effects that going active can have.
So two more things to talk about and
that's interruptions and route changes.
So let's go into interruptions.
The thing to understand here is that your
application's audio might be interrupted at any time.
So it could be interrupted by a phone call, a
clock alarm, if you're running in the background,
it could be interrupted by a foreground application.
So what will happen if you are interrupted is that your
session is just made inactive and whatever you were doing
with audio is stopped -- if you were playing, it's no
longer playing; if you were recording, then that is stopped.
And it just -- this just happens to you -- it's not a
request that this is about to happen, it's just done.
So what you can do -- there's certain steps,
though, that you can take in reaction to that.
Of course, you might update your user
interface to reflect that this has happened.
But more than anything else, you're interested
in what happens after the interruption has ended.
So if the user has declined the phone
call and comes back into your app,
you're interested in getting your audio restarted again.
And there's a number of other cases as well
where, you know, you just need to be --
all that you're really wanting to do is get
back up and running after the interruption.
So there's a few different things that you need to do.
So the first is let's say that we've set up our audio
session and we've implemented these two delegate methods.
Begin interruption and then this one is new
in iOS 4, it's endInterruptionWithFlags.
The old delegate is still available
as well, just endInterruption.
But this lets you get a little bit more fine-grain control.
I'll get to it in a second.
So when the interruption begins, that
means the phone call has come in.
You know, as I say, the playback has
stopped, you are already inactive
and so what you really should just do is change
the state of your user interface to reflect that.
If you're a game, you would probably
go to your pause screen.
If you're an audio player, well, you would
change your playback icon from whatever it was
to something to say, "Okay, restart again."
Let's say that the user declines the phone
call and is now back in your application.
Well, now the interruption has ended and there's this
flag that can be passed into you that will tell you,
based on various characteristics of the interruption
whether or not your session should resume, okay?
Whether you should start playback,
restart playback, or restart recording.
And assuming that that's the case, meaning if you're past
the flag -- AVAudioSessionInterruptionFlag should resume --
then now you can just fire everything back up.
So you will set your session as active again, you
can update your user interface, and resume playback.
So that actually, though, is just the general case.
There's actually a few instances depending on if
you might be using one of the lower-level audio APIs
from the Audio Toolbox or using
OpenAL that you need to take care of.
So let's start with OpenAL.
Now OpenAL has this concept of a context that is analogous
to the position of the listener in the OpenAL's world.
And the context does not survive through an interruption,
does not stay current through an interruption.
So what you need to actually do is when the
interruption begins, if you are using OpenAL,
then at that point you will want to invalidate the context.
So you do that by calling alcMakeContextCurrent and
just passing it nil -- that invalidates the context.
Then when interruption is over, you can end that
interruption and pass it -- well, excuse me --
you implement the endInterruptionWithFlags delegate method,
you check to see if the flag says that you should resume,
and if so, then you can set yourself as active and now
is when you make your OpenAL context current again.
Another case that you have to take a few extra steps is if
you're using the AudioQueue API and an interruption occurs.
So if you're using this API, well, then first of
all, when the interruption happens you probably want
to save your playback head or the recording position just
in case your application actually
gets quit or suspended for later.
But now the decision about how you restart will depend
on whether your application is using a hardware codec
or a software codec for whatever it's
doing -- its playback or recording.
If you're using a hardware codec, then that
attachment cannot survive through the interruption.
So you will dispose of the currently playing
AudioQueue when the interruption begins
and then when the interruption is
over, you will create it back again
and start it again with a new queue whenever you're ready.
Now if you're using a software codec, then it's simpler,
there's no need to dispose and restart -- excuse me --
dispose and recreate the queue, all you have to do
is restart it, for example here with AudioQueueStart.
This actually, I just -- one side note that I put up here.
This can actually get a little intricate though.
So we wrote a technical Q&A that has some snippets of
code that you can take a look at and it's up there,
number 1558, up on the developer website.
And the last topic here is routing.
So the behavior that users expect in
terms of the audio system routing is
that whatever gesture they have made most
recently is taken as an expression of their intent
for where the audio should be routed or saying it
way more simply, "whatever happened last, wins."
So if the user plugs in the headset, then we
take that as an expression of their intent --
that they want the audio to now
be routed out through the headset.
Or if they were using the microphone, they
plugged in a headset with a microphone,
then they want the audio input to be
taken from the headset microphone.
And now usually along with that, there's some behavior
as far as whether the audio should continue or not.
When they plug in a headset, we want the
audio to continue to go -- if it was output --
to just keep playing without pausing at all.
But when they unplug the headset, okay,
that's also taken as an intentional gesture
from the user to change the routing on the device.
We'll route back to whenever it was previously, probably the
speaker and in that case generally we want audio playback
to pause so that they unplug their headset, it
doesn't just start blaring out through the speaker
at them before they have a chance to take care of it.
But okay, so these are the behaviors that users expect
and so there are ways for you to respond to route changes
that will let you implement this in your app.
So really there's just three topics I'm going to mention
here: so it's possible for you to query the current route;
it's possible for you to listen for changes and then in
response to those changes, you might either keep playing
or stop playing as I just said; and it's
also possible in limited cases for you
to redirect where the output is currently going.
So first of all, just getting the current route.
So this is just another case where
you can call AudioSessionGetProperty,
pass it this token that I have
at the top of the audio route.
So this is going to give you back CFStringRef,
just with the name of the current route.
And so you can see here that I'm outputting that to the log.
But it will just tell you, "Okay, the current route is
the speaker, the headphone, the receiver," and so on.
Now, more importantly, though, is that you probably
actually would rather be listening for changes to the route
than to just know where it is now and output
that to the log because you want to react
to those changes accordingly in your app.
And what you do here is -- now this is down in
that lower level API, as I had mentioned earlier --
you're going to set up a C-based callback
using AudioSessionAddPropertyListener.
And basically you will be registering
for notifications of a route change.
And you pass it this token,
AudioSessionProperty_AudioRouteChange.
And then your callback is going to be told the reason
why the route changed, like the user unplugged something
or plugged something in and you'll also be
informed what the route had been previously.
And of course, through what I'd just shown you,
you can also find out what the route is now.
So here's the code, just to get this started.
We're setting up a property listener here.
AudioSessionAddPropertyListener with the
audio route change, we set up our C function
and you can optionally pass it some data as well.
Now what that C function is going to look
like, this is what you would have written.
The final parameter to this function
is this void* with some data.
And that data can be cast to a CFDictionaryRef.
And inside that dictionary is where
you're going find the information
about what the old route was and the reason why it changed.
So that's all I'm doing here, getting those values out.
And then you can act accordingly in your app.
Now, the third one that I mentioned
is that in certain limited cases,
it is also possible for you to redirect the output.
Now we actually limit this in most
cases because for most of the instances
where the output should be rerouted,
we want to leave that up to the user.
If the user has plugged in or unplugged the headset,
then we take as an intentional gesture
from the user to change the route.
And we really don't allow for third-party
apps to interfere with that.
But there's this one case where it may be necessary for
an application to change the route itself and that's
if you're using the category play and record.
Now you might remember from earlier in the talk how I
mentioned that play and record by default will output
to the receiver, that speaker you hold
up to your ear when you're on the phone.
Well, let's say that you're writing a VoIP app and so you're
going to be implementing something where normally that's
where the output would go but you
also want the option of rerouting
out to the main speaker for like a speakerphone mode.
So that's really what this allows you to do: You can set
a property to override the audio route and in this case,
if you were in the play and record category
and you were already outputting the receiver,
you could redirect that out to the main speaker.
But okay, folks, so actually that takes us
through our five topics on the agenda here.
As I said, we set up the session, we chose
a category, and choose that very wisely
for the role that audio plays in your application.
We made the session active and saw all the
effects that that might have with relation
to background audio and mixing and so on.
Then I talked a little bit about handling
interruptions and route changes here in the end.
I just wanted to mention a couple -- as Bill had said,
some sessions coming up so the next session is going to be
about fundamentals about digital audio,
it'll be right here in this room.
And the third session in this room as well is
going to go deep into the use of audio units,
which are awesome for writing applications
that need to do more intense audio processing.
Here is actually my contact information: I'm Allan Schaffer,
so my email address or you can contact Eryk Vershen
who is our media technologies evangelist if you
want to get in touch with us after the show.
And a couple more notes: The audio session programming
guide has a lot of great information that goes
into a little more detail than what I just
spoke about, so be sure to check that out
and really make the right category choice for your app.
And then finally, we have the Apple Developer Forum.
So if you have questions about audio and want to talk
about just among yourselves, among other developers
and along with us, check out the dev forums.
So thank you very much.
[ Applause ]