WWDC2014 Session 502

Transcript

X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
>> Good morning, everyone.
My name is Kapil Krishnamurthy
and I work in Core Audio.
I'm here today to talk to you
about a new API called
AVAudioEngine
that we are introducing for
Mac OS X Yosemite and iOS 8.
As part of today's talk we'll
first look at an overview
of Core Audio and then we'll
dive into AVAudioEngine,
look at some of the
goals behind the project,
features of the new API,
the different building
blocks you'll be using
and finally we'll do a
section on gaming and 3D audio.
So let's get started.
For those of you who aren't
familiar with Core Audio,
Core Audio provides a
number of C APIs as part
of a different frameworks
on both iOS and Mac OS X.
And you can use these
different APIs
to implement audio features
in your applications.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
to implement audio features
in your applications.
So using these APIs you will be
able to play in the card sounds
with low latency, convert
between different file
and data formats, read and write
audio files, work with many data
and also play sounds
that get spatialized.
Several years ago we added
some simple objective C classes
to AVFoundation and
they're called AVAudioPlayer
and AVAudioRecorder.
And using these classes you
can play sounds from files
or record directly to a file.
Now while these classes
worked really well
for simple use cases,
a more advanced user might
find themselves a bit limited.
So this year we're adding
a whole new set of API
to AVFoundation called
AVAudioEngine
and my colleague Doug
also spoke about a number
of AV audio utility
classes in session 501.
So using this new
API you will be able
to write powerful features with
just a fraction of the amount
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
to write powerful features with
just a fraction of the amount
of code that you may have
had to previously write.
So let's get started.
What were the goals
behind this project?
One of the biggest goals
was to provide a powerful
and feature-rich API set.
And we're able to do that
because we're building on top
of our existing Core Audio APIs.
Using this API we want
to developers to be able
to achieve simple as
well as complex tasks.
And a simple task could be
something like playing a sound
and running it through
an effect.
A complex task could
be something as big
as writing an entire
audio engine for a game.
We also wanted to
simplify real-time audio.
For those of you
who are not familiar
with real-time audio it
can be quite challenging.
You have a number of audio
callbacks every second
and for each callback you have
to provide data in
a timely fashion.
You can't do things like
take locks on the I/O thread
or call functions that
could block indefinitely.
So we make all of this
easier for you to work
with by giving you a
real-time audio system but one
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
with by giving you a
real-time audio system but one
that you interact with in
a non real-time context.
Features of the new API: this
is a full-featured Objective-C
API set.
You get a real-time audio
system to work with meaning
that any changes
that you make on any
of the blocks take
effect immediately.
Using this API you will be able
to read and write audio files,
play and record audio, connect
different audio processing
blocks together and then
while the engine is running
and audio is flowing through
this system you can tap the
output of each of these
processing blocks.
You'll also be able to
implement 3D audio for games.
Now before we actually jump
into the engine's building
blocks I thought I would give
you two sample use cases
to give you a little flavor
of what you'll be able
to do using this API.
So the first sample use case
is a karaoke application.
You have a backing
track that's playing
and the user is singing
along with it in real-time.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
and the user is singing
along with it in real-time.
The output of the microphone
is passed through a delay,
which is just a musical
effect and both
of these audio chains are mixed
and sent to the output hardware.
This could be a speaker
or headphones.
Let's also say that you tap
the output of the microphone
and analyze that raw data
to see the user's on pitch,
he's doing a great job.
And if he is, play
some sound effects,
so this stream also
gets mixed in and played
out to the output hardware.
Here's another use case.
You have a streaming
application and you receive data
from the remote location.
You can now stuff this
data into different buffers
and schedule them on a player.
You can run the output
of the player
through an EQ whose UI
you present to the user
so that they can tweak the
EQ based on that preference.
The output of the EQ then
goes to the output hardware.
So these are just
two sample use cases.
You'll be able to do a
whole lot more once we talk
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
You'll be able to do a
whole lot more once we talk
about AVAudioEngine.
So let's get started.
The two main objects
that we're going to start
with are the engine
object and the node object.
And there are three specific
types of nodes: the output node,
mixer node and the player node.
We have other nodes as
well that we will get to
but these are the initial
building block nodes.
So the engine is an object
that maintains a
graph of audio nodes.
You create nodes and you
attach them to the engine
and then you use the
engine to make connections
between these different
audio nodes.
The engine will analyze these
connections and determine
which ones add up
to an active chain.
When you then start the engine,
audio flows through all
of the active chains.
A powerful feature that the
engine has is that it allows you
to dynamically reconfigure
these nodes.
This means that while the engine
is rendering you can add new
nodes and then wire them up.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
nodes and then wire them up.
And so essentially you're adding
or removing chains dynamically.
So the typical workflow
of the engine is
that you create an instance of
the engine, create instances
of all the nodes you
want to work with,
attach them to the engine so
the engine is now aware of them
and then connect them
together, start the engine.
This will create an
active render thread
and audio will flow through
all of the active chains.
So let's now talk about a node.
A node is a basic audio
block and we have three types
of nodes: there are source
nodes, which are nodes
that generate an audio.
And examples of this are the
player or the input node.
You have nodes that
process audio.
So they take some audio and do
something to it and push it up.
And examples are a
mixer or an effect.
You also have destination
nodes that receive audio
and do something with it.
Every one of these nodes
has a certain number
of input and output buses.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
of input and output buses.
And typically you see that
most nodes have a single input
and output bus.
But an exception to
this is a mixer node
that has multiple input busses
and a single output bus.
Every bus now has an audio
data format associated with it.
So let's talk about connections.
If you have a connection
between a source node
and a destination node,
that forms an active chain.
You can insert any
number of processing nodes
between the source node
and the destination node.
But as long as you
wire every bit
of this chain up,
it's an active chain.
As soon as you break one of the
connections, all of the nodes
that are upstream of the
point of disconnection go
into an inactive state.
In this case, I've
broken the connection
between the processing node
and the destination node,
so my processing node
and my source node are
now in an inactive state.
The same holds true
in this example.
So let's now look at
the specific node types.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
So let's now look at
the specific node types.
The first node that
we're going to talk
about is the output node.
The engine has an
implicit destination node
and it's called the output node.
And the role of the output
node is to take the data
that it receives and hand
it to the output hardware,
so this could be the speaker.
You cannot create a standalone
instance of the output node.
You have to get it
from the instance
of the engine that
you've created.
Let's move on to the mixer node.
Mixer nodes are processing
nodes and they receive data
on different input
busses which they then mix
to a single output, which
goes out on the output bus.
When you use a mixer,
you get control
of the volume of each input bus.
And if you add an application
that was playing a number
of sounds and you put
each of these sounds
in on a separate input bus,
using this volume control
you can essentially blend
in the amount of each sound
that you want to hear.
So you create a mix.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
You now have control
over the output volume
as well using a mixer.
So you are controlling
the volume
of the mix that you've created.
If your application has
several categories of sound,
you can make use of a
concept called submixing
to create submixers.
So let's say that you
have some UI sounds
and you have some music.
And you put all of the UI
sounds through one mixer,
all of the music
through another mixer.
Using the output volumes
of each of these mixers,
you can essentially
control the volume
of each of these submixers.
Let's take that concept a
step further and put all
of the submixers
through a master mixer.
The output volume of the master
mixer will essentially control
the volume of the entire
mix in your application.
Now the engine has an
implicit mixer node.
And when you ask the
engine for its mixer node,
it creates an instance
of a mixer.
It creates an instance
of the output node
and connects it together
for you by default.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
The difference here
between the mixer node
and the output node is that you
can create additional instances
and then attach them
to the engine
and use them how you please.
Mixers can also have
different audio data formats
for each input bus.
And the mixer will do the work
of efficiently converting
the input data formats
to the output data format.
So now that we have looked
at these initial nodes,
let's talk about how this works
in the context of the engine.
So let's say that I have an app
that creates an instance
of the engine.
We can now ask the engine
for its main mixer node
so it's going to create
the instance of a mixer,
create a mixer of
the output node
and connect the two together.
I can now create a clear note
and attach it to the engine
and connect it to the mixer.
So at this point I have a
connection chain going all the
way from a source
to a destination,
so I have an active chain.
When I then start up the engine,
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
When I then start up the engine,
an active render
thread is created
and data is pulled
by the destination.
So I have an active
flow of data here.
The app can now interact
with each one of these blocks
and any change that
it makes on any
of the nodes will take
effect immediately.
So now that we've talked
about an active render thread
how do you push your audio data
on the render thread.
You use a player to do that.
Let's look at player nodes.
Player nodes are nodes
that can play data
from files and from buffers.
And the way that it happens
or the way that it's done is
by scheduling events,
which simply means play
data at a specified time.
That data, that time could be
now or sometime in the future.
When you're scheduling buffers
you can schedule either multiple
buffers and as each
buffer is consumed
by the player you get
an individual callback,
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
by the player you get
an individual callback,
which you can then use a cue to
go ahead and schedule more data.
You can also schedule
a single buffer
that plays in a loop fashion.
And this is useful in the case
when you may have a musical loop
or a sound effect that you want
to play over and over again.
So, you load the data and
then you play the buffer
and it'll continue to play
until you stop the player
or you interrupt it
with another buffer.
We'll get into that.
You can also schedule
a file or a portion
of a file called a segment.
So going back to our previous
diagram, we had an engine
that was in a running state.
So now I can create an instance
of a buffer and load my data
into it, shown by the red arrow.
Once I do that, I can schedule
this buffer on the player.
And when the player is playing,
the player will consume the data
in the buffer and push
it on the render thread.
In a similar manner, I can
work with multiple buffers.
So over here I have
multiple buffers.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
So over here I have
multiple buffers.
I load data into each one of
them and I schedule each one
of them to play in
sequence on the player.
As each buffer is
consumed by the player,
I get individual
callbacks letting me know
that that buffer is done.
I can use that as a cue
and schedule more data.
In a similar manner you
can work with a file.
And the difference here is you
don't have to actually deal
with the audio data yourself.
All you need is a URL
to a physical audio file
with which you can create
an AVAudioFile object
and then schedule that
directly on the player.
The player will do the work of
reading the data from the file
and pushing it on
the render thread.
So, let's now look
at a good example
of how we can achieve this.
I first create an instance of
the engine, create an instance
of a player and attach
the player to the engine,
so the engine is now
aware of the player.
I'm now going to
split my example up
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
I'm now going to
split my example up
and show how you can
first work with a file.
So, given a URL to
an audio file,
I can create the
AudioFile object.
The next thing that I do is ask
the engine for its mainMixer.
So the engine will create
an instance of a mixer node,
create an output node and
connect the two together.
I can now go ahead and
connect the player to the mixer
with the files processing
format.
So I have so I have a connection
chain going all the way
from a player that's a source
to the output node
that's a destination.
Now I can schedule my
file to play atTime:nil,
which is as soon as possible.
And in this case I pass a nil
for the completion handler.
If I had some work that
I needed to be done
after the file is
consumed by the player,
I can pass in a block over here.
So in a similar manner I can
work with a buffer as well.
Let's say that I create
an AVAudioPCMBuffer object
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
Let's say that I create
an AVAudioPCMBuffer object
and load some data into it.
The specifics of that part
are covered in session 501.
So if you missed that,
please refer to that session.
Once I have my buffer object I
can go ahead and ask the engine
for its mixer and make the
connection between the player
to the mixer with
the buffer's format.
Now I can go ahead and
schedule this buffer atTime:nil,
which is as soon as possible.
But note that we have
an additional argument
when we are working with
buffers, the options argument.
We're going to talk about
that right after this.
But for now I'm going
to pass nil,
and nil for the completion
handler as well.
So now that I've
scheduled my data
on the player I can go
ahead and start the engine.
This creates an active
render thread
and then call play
on the player.
And the player will do the
work of creating the data
from the file in the buffer and
pushing it on the render thread.
So let's now talk about
the different buffer
scheduling options.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
scheduling options.
In all of the examples that
I'm going to talk about now,
I'm going to specify nil
for the atTime argument
and that just means that
in all of these examples,
I'm going to schedule something
to play as soon as possible.
So let's talk about the first
option and that's when you want
to schedule a buffer to
play as soon as possible.
In that case all you need
to do is schedule a buffer
with the option set to nil.
You call play on the player
and that buffer gets played.
If you have a buffer that's
playing now and you want
to append a new buffer,
it's the exact same call.
You schedule the new buffer
with the option set to nil
and so the new buffer
gets appended to the cue
of currently playing buffers.
On the other hand if I want
to interrupt my currently
playing buffer
with a new buffer I can
schedule the new buffer
with the AVAudioPlayerNode
BufferInterrupts option.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
with the AVAudioPlayerNode
BufferInterrupts option.
So that will interrupt the
currently playing buffer
and start playing my
new buffer right away.
Let's now look at the different
variants with a looping buffer.
So like I said earlier,
if I have a buffer that's
to be played in a looped
fashion, like a sound effect,
for instance, I can load the
data in that buffer and schedule
that buffer with the
AVAudioPlayerNodeBufferLoops
option.
When I call play on the
player, that buffer starts
to play in a looped fashion.
If I want to interrupt a looping
buffer it's the same option
as what we've seen before.
I have to schedule a new buffer
with the AVAudioPlayerNode
BufferInterrupts option.
So essentially it's the same
option for when you want
to interrupt a regular
buffer or a looping buffer.
The last case is
an interesting one.
So if you have a looping
buffer but you want
to let the current loop finish
before you start playing your
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
to let the current loop finish
before you start playing your
new data you can
schedule your new buffer
with the AVAudioPlayerNodeBuffer
InterruptsAtLoop option.
So this will let the current
loop finish and as soon
as that loop is done the
new buffer starts playing.
Now that was a whole bunch
of options, so let's look
at one practical example of
how we can use these options.
So let's say that I have
a sound that's broken
up into three parts.
And the example that I'm
going to use here is a siren.
So you have the initial buildup
of the sound which is the
"attack" portion of the siren.
You have the droning
portion of the siren,
which can be modeled using
just a looping buffer.
And this is the "sustain"
portion of the sound.
And then you have the
dying down of the siren,
which is the "release"
portion of the sound.
So let's say that I load
up each of these sounds
into different buffers.
The way that I can
implement this in code is
to first schedule the attack
buffer with my options set
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
to first schedule the attack
buffer with my options set
to nil and then schedule
the sustain buffer
with the AVAudioPlayerNodeBuffer
loops option.
So when I call play
on the player,
what this will do is play the
attack portion of the sound
and then immediately start
playing the sustain portion
of the sound and continue
to loop that sustain buffer
and that goes on until
I'm ready to interrupt it.
So after some time has gone
by when I'm ready to interrupt
that I can schedule
my release buffer
with the AVAudioPlayerNodeBuffer
InterruptsAtLoop option.
So this will let the last loop
of the sustained buffer finish
up and then play the
release portion of my sound.
Now remember that I said
in the beginning that all
of my examples involve
scheduling events
that play as soon as possible.
I can also schedule events
to play in the future.
So here's an example of that.
In this case I'm just
going to schedule a buffer
to play 10 seconds
in the future.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
to play 10 seconds
in the future.
So I create an AVAudioTime
object
that has a relative sample
time 10 seconds in the future,
and I use the buffer sampleRate
as my reference point.
I can now schedule the buffer
with this AVAudioTime object
and call play on the player
and my buffer gets played
10 seconds in the future.
So we've talked about
player nodes
and how you can use a player
to push your audio data
on the render thread.
Well if you wanted to pull data
from the render thread
how do you do that?
You use a node tap.
And here's some reasons for
why you may want to do that.
Let's say you want to capture
the output of the microphone
and save that data to disk, or
if you have a music application
and you want to record
a live performance
or if you have a
game and you want
to capture the output
mix of the game.
You can do all of
that using a node tap.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
You can do all of
that using a node tap.
And what that is, is essentially
a tap that you install
on the output bus of a node.
So the data that's captured
by the tap is returned back
to your application
via the callback log.
So going back to a
familiar diagram,
I have two players
that's connected
to the engines main mixer.
And I want to tap the output of
the mixer so I can install a tap
on the mixer and the tap
will start pulling data
from the render thread.
I can then go ahead, the
tap will then go ahead
and create a buffer object,
stuff that data into the buffer
and return that back
to the application
via a callback block.
In code it's just
one function call.
You install a tap on the mixer's
output bus 0 with a buffer size
of 4096 frames and the mixer's
output format for that bus.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
Within the block I have an
AVAudioPCMBuffer that contains
that much amount of data.
And I can do whatever I
need to do with that data.
Alright, so to quickly
summarize,
you have an active
render thread.
You use player nodes
to push your audio data
on the render thread and use
node taps to pull audio data
from the render thread.
Let's now switch gears and talk
about a new node
called the input node.
The input node receives
data from the input hardware
and it's parallel
to the output node.
With the input node you cannot
create a standalone instance.
You have to get the
instance from the engine.
When you've connected the
input node in an active chain
and the engine is running, data
is pulled from the input node.
So let's go back to
a familiar diagram.
I've connected the input
node to the mixer nodes
and that's connected
to the output node.
So when I start the engine,
this is an active chain
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
So when I start the engine,
this is an active chain
and data is pulled
from the input node.
So, if I'm receiving
data from the input node
and the engine is running and
I want to stop receiving data
at a certain point,
how do I do that?
It's very simple.
All you have to do-oh I'm sorry.
I raced ahead.
Let's look at a code example
of how you can connect
the input node.
So I get the input
node from the engine.
Just make a connection
to any other node
with the input node's
hardware format
and then start the engine.
This creates an active
render thread
and the input nodes
pull for data.
So like I was saying earlier,
if you have an input node that's
being pulled and you don't want
to receive data anymore from
the input node, what do you do?
Just disconnect the input node.
So the input node will no
longer be in an active chain
and it won't be pulled for data.
In order to do that, it's
just one line of code.
Using the engine, you
disconnect the node output
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
Using the engine, you
disconnect the node output
of the input node.
Now if you want to capture
data from the input node,
you can install a node tap
and we've talked about that.
But what's interesting about
this particular example is,
if I wanted to work with
just the input node,
say just capture data from the
microphone and maybe examine it,
analyze it in real time or
maybe write it out to file,
I can directly install
a tap on the input node.
And the tap will do the work of
pulling the input node for data,
stuffing it in buffers
and then returning
that back to the application.
Once you have that data you
can do whatever you need
to do with it.
And let's now talk about
the last type of nodes
in this section, effect nodes.
Effect nodes are nodes
that process data.
So, depending on the type of
effect, they take some amount
of data in, they process
it and push that data out.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
We have two main
categories of effects.
You have AVAudioUnitEffects
and AVAudioUnitTimeEffects.
So what's the difference
between the two?
AVAudioUnitEffects require the
same amount of data on input
as the amount of data they're
being asked to provide.
So let's take the example
of a distortion effect.
If a distortion node has
to provide 24ms of output,
all it needs is 24ms of input
that it then processes
and pushes out.
As opposed to that, TimeEffects
don't have that constraint.
So let's say that you have a
TimeEffect that's doing some
amount of time stretching.
If it is being asked to
provide 24ms of output,
it may require 48ms of input.
So that brings me
to my second point.
It is for that reason why you
cannot connect a TimeEffect
directly with the input node.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
Because, when you have
the input node running
in real-time it cannot provide
data that it doesn't have.
As opposed to that,
with AVAudioUnitEffects you
can connect them anywhere
in the chain.
So you can use them with players
or you can use them
with the input node.
These are the list of effects
that we currently
have available.
So on the effects side
we have the Delay,
Distortion, EQ and Reverb.
If you're a musician you're
probably already familiar
with these effects so you
can use them in real time
or use them in the player.
And on the TimeEffect side,
we have the Varispeed
and the TimePitch.
And these effects are useful
in cases where you want
to manipulate the amount of time
stretching or maybe the pitch
of the source content.
So let's say that you have a
speech file that you're playing
and you want to pitch the voice
up to sound like a chipmunk.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
and you want to pitch the voice
up to sound like a chipmunk.
Well you can do that using
one of the TimeEffects.
So let's now look at an example
of how you can use
one of these effects.
In this example I'm going
to make use of the EQ.
But note that over here I've
connected the EQ directly
to the output node.
In all of my prior examples
I was connecting nodes
to the mixers, to the
engine's mixer node.
But I don't always
have to do that.
If I just have one chain of data
in my application then I
can just directly connect it
to the output node, which
is what I've done here.
So this is a multiband EQ
and I specify the number
of bands I'm going to use when
I create an instance of the EQ.
So over here, I'm
going to use two bands
so I create an EQ
with two bands.
I can then go ahead and get
access to each of the bands
and set up the different
filter parameters.
Connecting the EQ
is no different
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
Connecting the EQ
is no different
than what we've already seen.
I can connect the
player to the EQ
with the file's processing
format and connect the EQ
to the engine's output node with
the same format and that's it.
So with all of this information,
let's look at a demo
that makes use of some of the
nodes that we've talked about.
Okay, so I'm going to
explain what I have here.
Over here I have two
player nodes and each
of these players is going
to be fed by a separate,
by separate looping buffers.
Each of the players are
connected to separate effects
and each of these effects are
connected to separate inputs
of the engine's main mixer.
I have control over the output
volume of the main mixer
and down here I have
a transport control,
which essentially
controls a node tap
that I've installed
on the main mixer.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
So when I hit Record, a tap
gets installed and I capture
that data and save it to a file.
And then when I hit
Play I'm just going
to play that file back.
So let's listen to what
this sounds like [music].
So here I'm playing the drums.
I can change the volume
and the pan of each player,
so you can hear that effect.
I'm now going to go ahead and
play the reverb a little bit.
It sounds a little too wet, so
I'm going to keep it about here.
Let me start the other player.
[ Music ]
Okay, so what I thought I'd
do now is use the node tap
to maybe capture a little live
performance, and any changes
that I make to any of the
nodes here should be captured
in that performance.
So when we go back and listen
to that we should hear that.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
So let me do that.
[ Music ]
Okay, so I'm going to stop my
recording, stop my delay player.
And let's go back and
listen to the recording.
[ Music ]
[ Silence ]
And that's a preview of
AVAudioEngine in action.
Let's go back to slides.
Alright, so two of the
settings that I was changing
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
Alright, so two of the
settings that I was changing
with the players were the volume
and the pan for each player.
But these are actually
settings of the input mixer bus
that the player is connected to.
So the way we've exposed
mixer input bus settings
in the audio engine is
through a protocol called the
AVAudioMixing protocol.
Source nodes conform to this
protocol so the player node
and the input node do that.
And settings, like
volume, you can change
by just doing player.volume=.5
or player.band=minus 1.
When a source node is in an
active connection with the mixer
and you make changes
to the protocols,
different properties they
take effect immediately.
However, if a source node
is not connected to a mixer
and you make changes to
the protocol's properties,
those changes are cached in the
source node and then applied
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
those changes are cached in the
source node and then applied
when you make a physical
connection to a mixer.
So these are the
mixing properties
that we have available.
Under the common mixing
properties we just have volume
right now.
Under the stereo mixing
properties, we have pan
and we have a number
of 3D mixing properties
that we're going to look
at in the next section.
So in the form of a diagram,
let's say that I have Player
1 connected to Mixer 1
and I go ahead and set player
to start pan to minus 1,
hard pan it to the left and
player 1's volume to .5.
So these mixing settings are
now associated with Player 1.
And because Player
1 is connected
to Mixer 1 they also get
applied on the mixer.
If I were to disconnect Player
1 and connect it to Mixer 2,
these mixing settings
travel along with Player 1
and get applied to Mixer 2.
So in this sense, we've
been able to carry settings
that belong to the input
bus of a mixer along
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
that belong to the input
bus of a mixer along
with the source node itself.
Alright, so let's now
move onto the next section
on gaming and 3D audio.
So in games, typically
you have several types
of sounds that you play.
You have short sounds, and we've
seen AudioServices, which is one
of our C-APIs get used for that.
For playing music we see
AVAudioPlayer getting used
a lot.
And for sounds that
need to be spatialized,
OpenAL is the API of choice.
Now while each of these
APIs work really well
for what they were designed
for, if your application has
to make use of all of them, then
one of the biggest tradeoffs is
that you have to
familiarize yourself
with the nomenclature
associated with each API.
In addition, with AudioServices
you don't have a latency
guarantee of when
your sound will play.
With AVAudioPlayer
you can't play sounds
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
With AVAudioPlayer
you can't play sounds
that you have in buffers.
And with OpenAL, you can't play
sounds directly from a file
or play compressed data.
With our knowledge
of AVAudioEngine,
we have to go back
and cover cases one
and two; we can easily do so.
For short sounds we
can just load them
into AVAudioBuffer objects
and schedule them on a player.
For music you can just create
an AVAudioFile log object
and schedule that
directly on a player.
So how do you play sounds
that need to be spatialized?
We'll look at that now.
I'd like to introduce a new
node called the environment node
and this is essentially
a 3D mixer.
So when you create an instance
of the environment node.
You have a 3D space and you
get a listener that's implicit
to that 3D space.
All of the source
nodes that connect
to the environment node act
as sources in this 3D space.
So the environment
has some attributes
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
So the environment
has some attributes
that you can set directly
on the environment node.
And then each of these
sources have some attributes
and you can set that using
the AVAudioMixing protocol's
3D properties.
Now in terms of data formats,
I just wanted to point out that
when you're working with
the environment node,
all of the sources need to have
a mono data format in order
for that audio to
be spatialized.
If the sources have
a stereo data format,
then that data is passed through
and currently the environment
node doesn't support a data
format greater than
two channels on input.
So as a diagram, this
is what it looks like.
I've created an instance
of an environment node
which means I now
have a 3D space
and I have an implicit listener.
I now create two player nodes.
Who are going to act as
sources in my 3D space
and using the AVAudioMixing
protocol I can set all
of the source attributes.
So what makes things
sound 3D or virtual 3D?
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
So what makes things
sound 3D or virtual 3D?
Well, we have a number of
attributes and some belong
to the sources, others
belong to the environment.
Let's walk through each of
the source attributes first.
So every source has a
position in this 3D space.
And right now it's specified
using the right-handed cartesian
coordinate system that
right positive Y is up
and positive Z is
towards the listener.
Now with respect
to the listener,
the listener uses
some spatial cues
to localize the position
of the source.
There's an inter-aural
time difference,
just a slight time difference
for the sound made by the source
to get to each one of the
listeners in those years.
There's also an inter-aural
level difference.
In addition, your head has the
effect of doing some filtering
and you also have some
filtering here with the ears,
depending on the ears.
So we have several rendering
algorithms and each one
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
So we have several rendering
algorithms and each one
of them model these
spatial cues differently.
The thing is that we've exposed
this as a source property.
So you can pick a rendering
algorithm per source
and some algorithms may sound
better depending on the type
of content your source
is playing
and also they differ
in terms of CPU cost.
So you may want to pick a
more expensive algorithm
for an important source
and a cheaper algorithm
for a regular source.
The next two properties,
obstruction and occlusion,
deal with the filtering of
sound if there are obstacles
between the source and listener.
So in this case, I have the
source, that's the monster,
and the listener, that's
the handsome prince,
and there is a column between
the source and the listener.
So the direct path of sound is
muffled whereas the reflected
paths across the walls are clear
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
paths across the walls are clear
and this is modeled
by obstruction.
On the other hand, if the
source and the listener are
on different spaces, so
right now that's a wall
between the source
and the listener.
Both the direct part of sound
and the reflective parts
of sound are muffled.
Let's now move on
to the listener,
the environment attributes.
So every environment
has an implicit listener
and the listener has a
position and an orientation.
The position is specified using
the same coordinate system.
And for the orientation, you
can specify using either two
vectors, a front
and an up vector,
or three angles yaw,
pitch and draw.
You also have distance
attenuation in the environment,
which is just the
attenuation of sound
as a source moves away
from the listener.
So in this graph there are
two points of interest.
There's the reference
distance, which is the distance
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
There's the reference
distance, which is the distance
above which we start applying
some amount of attenuation.
There's also the maximum
distance, which is the point
above which the amount
of attenuation being
applied is capped.
So all of the exciting
stuff happens
between the reference distance
and the maximum distance.
And in that region we have three
curves that you can pick from.
So, in the form of code,
this is what it looks like.
All you need to do is get the
distance attenuation parameters
object from the environment
and then you can go ahead
and tweak all the settings.
Now every environment
also has reverberation
which is just a simulation
of the sound reflections
within that space.
The environment node has a
built-in reverb and you can pick
from a selection
of factory presets.
Now once you pick the type
of reverb you want to use,
you can set a blend
amount for each source
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
you can set a blend
amount for each source
and that just affects
the amount of each source
that you'll here
in the reverb mix.
So for some sources,
you may want them
to sound completely dry, so you
set the blend amount to zero.
And other sources you may
want to sound more ambient
so you can turn up
the blend amount.
We also have a single
filter that applies
to the output of the reverb.
So let's say that you pick
one of the factory presets
and you want it to sound
maybe a little brighter.
You can do that using
the filter.
In code, this is
what it looks like.
I get the ReverbParameters
object from the environment.
In this case, I'm enabling it
and then I load a factory
preset, LargeHall preset.
And using the AVAudioMixing
protocol,
I set the source's
reverbBlend to 0.2.
So now we've talked about
two types of mixers.
You have the 2D mixer and
you have the 3D mixer.
And source nodes, that is
the player or the input node,
talk to these mixers using
the AVAudioMixing protocol.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
talk to these mixers using
the AVAudioMixing protocol.
So I just wanted
to point out that
when a source node is
connected to a 2D mixer,
then all of the common and
the 2D mixing properties
take effect.
When a source node is
connected to a 3D mixer,
then all of the common and
the 3D mixing properties
take effect.
Let's look at what
that looks like here.
So let's say that
I have Player 1
who is connected
to the 2D mixer.
I set the pan to be -1
and volume to be .5.
Note that pan is a
2D mixing property
but volume is a common
mixing property.
But in this case both of
them will take effect,
because the mixer
node implements both
of these properties.
If I disconnect Player 1
from the mixer and connect it
to the environment node, the
pan property will now be cached.
It doesn't take effect because
it's a 2D mixing property.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
It doesn't take effect because
it's a 2D mixing property.
It doesn't apply to
the environment node.
Volume, on the other hand,
will continue to take effect,
because it's a common mixing
property and it's implemented
by the environment node.
So with all of that
information let's look
at a sample gaming setup.
This is just one of many
ways that you can do this
and this is just a suggestion.
It really all depends
on your application.
But in this case, I
have two 3D sources.
So, I'm going to use a
player to play some sounds
that will be spatialized
and also live input.
So let's say that the user
is chatting and then you want
to spatialize that
in a 3D environment.
I can connect the player
node and the input node
to the environment node.
And that's connected out
to the engine's main mixer.
I can now have a second
player that I'm going
to dedicate to playing music.
So this player is going to play
music and I'm going to run it
through an EQ and connect
that to the main mixer.
Let's say that I present
some UI for the users
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
Let's say that I present
some UI for the users
so that he can tweak
the EQ settings,
maybe to make the
music sound better.
I have a third player
now that I'm going
to dedicate only to
UI sound effects.
So maybe the sounds that are
made as I navigate through menus
or if my game avatar has
picked up a bonus item,
etc. So the UI player
is connected directly
to the engine's main mixer.
This is what the overall
picture looks like.
So given all of this information
let's now look at a demo
of the environment node.
[ Balls popping ]
So I want to explain
what's happening over here.
In this demo, I am using
SceneKit for the graphics
and SceneKit also comes
with a physics engine.
So this works nicely
with AVAudioEngine.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
So this works nicely
with AVAudioEngine.
So I basically have two types
of sounds that I'm playing;
that's the "fffuh"
sound that plays
and that's before
any ball is launched.
So to do that I use
a player node
and I have the long sound effect
in a buffer and I schedule
that buffer on the player node.
But I make use of the
completion handler to know
when the player has
consumed the buffer.
So, when the player lets me know
that it's done with the buffer,
I go ahead and now
create a SceneKit node.
That's a ball and I also
create an AVAudioPlayer node,
attach it to the
engine and connect
that to the environment node.
So I'm tying a player, a
dedicated player to each ball.
Now the ball is launched into
the world and as it goes about
and collides with other
surfaces, for every collision
that happens SceneKit's
physics engine lets me know
that a collision has happened
with some other surface.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
that a collision has happened
with some other surface.
And I get the point of
collision and also the impulse.
So using that, I can go and dig
up the player node that's
tied to the SceneKit node.
I can set the position
on the player based
on where the collision
happened, calculate a volume
for the collision sound
based on the impulse
and then just play the sound.
But you can see now
how, in this setup,
for every ball that's
born into this world,
a new player node
is also created.
So the number of
players is growing
and I'm dynamically attaching it
to the engine and connecting it
to the environment node.
So this setup is very flexible.
[ Balls popping ]
Alright, so let's
get back to slides.
That brings us to
the end of our talk.
So let's quickly summarize all
the things we've seen today.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
We started off with
talking about an engine
and how you can create different
nodes, attach them to the engine
and then use the engine
to make connections
between each of these nodes.
We then looked at the
different types of nodes:
the destination node,
which is the output node.
And we talked about
two source nodes,
the player node and
the input node.
The player is the node you use
to push audio data
on the render thread.
We looked at two types of
mixer nodes, the 2D Mixer
and the 3D Mixer and
how source nodes talk
to these mixers using the
AVAudioMixing protocol.
We then looked at effect nodes
and two types of effect nodes:
the AVAudioEffects and
AVAudioUnitTime effects.
Finally we talked
about node taps
and that's how you pull
data from the render thread.
So I just wanted to point
out that node taps are also
a useful debugging tool.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
out that node taps are also
a useful debugging tool.
Let's say that you have
a number of connections
in your application and things
don't sound the way you expect
them to sound.
What you can do is install
node taps at different points
in your chain on different nodes
and just examine the output
of each of these nodes.
And using that you can drill
down and where the problem is.
So in that sense node taps
are a useful debugging tool.
So that brings us to
the end of our session.
I just want to say that
this is the first version
of AVAudioEngine and we
are very excited about it.
So, we'd love to
hear what you think.
Please try it out and
give us your feedback.
If you have any further
questions at a later point,
you can contact Filip,
who's our Graphics
and Game Technologies
Evangelist.