WWDC2017 Session 501

Transcript

>> Thank you.
Good afternoon, everyone.
Welcome to the session "What's
New in Audio?"
I'm Akshatha Nagesh, from the
Audio Team, and today, I would
like to share with you all the
new, exciting features we have
in audio in this year's OS
releases.
I'll begin with a quick overview
of the audio stack.
Audio frameworks offer a wide
variety of APIs, and our main
goal is to help you deliver an
exceptional audio experience to
the end user, through your apps.
At the top, we have the AV
foundation framework, with APIs
like AVAudioSession, Engine,
Player, Recorder, etcetera.
And these APIs cater to the
needs of most of the apps.
But if you wanted to further
customize the experience, you
could use our other frameworks
and APIs like AUAudioUnits,
Audio Codecs, in Audio Toolbox
framework, Code Mini framework,
AudioHAL framework, etcetera.
In our last year's talk here at
WWDC, we did a walkthrough of
all these APIs and more,
throughout the stack.
And I highly encourage you to
check that out.
Now, let's see what's on the
agenda for today.
We will see the new features
we've added in some of these
APIs, starting with the ones in
AVFoundation framework.
And that includes,
AVAudioEngine, AVAudioSession,
and the enhancements we have in
AVFoundation on watchOS 4.
Later on, we'll move over to the
Audio Toolbox world, and see the
enhancements in AUAudioUnits and
Audio Formats.
And finally, we'll wrap up
today's session with an update
on Inter-Device Audio Mode.
We also have a few demos along
the way, to show you many of
these new features in action.
So, let's begin with
AVAudioEngine.
And here's a quick recap of the
API.
AVAudioEngine is a powerful
Objective-C and Swift based API
set.
And the main goal of this API,
is to simplify dealing with real
time audio, and to make it
really easy for you to write
code to perform various audio
tasks, ranging from simple
playback, to recording, to even
complex tasks like audio
processing, mixing, and even 3D
audio specialization.
And again, in our previous
year's talk here at WWDC, we
have covered this API in detail.
So, please check those out if
you're not familiar with this
API.
The Engine manages a graph of
nodes, and a node is the basic
building block of the Engine.
So, here's a sample Engine setup
and this is a classic karaoke
example.
As you can see, there are
various nodes connected
together, to form the processing
graph.
We have the InputNode that is
implicitly connected to the
[inaudible] and is capturing
user's voice.
This is being processed through
an EffectNode which could be for
example, an EQ.
We also have something called a
[inaudible] on the InputNode
through which we could be
analyzing user's voice to see
how he's performing, and based
on that, we could be playing out
some cues to the user through a
PlayerNode.
And we have another PlayerNode
that is playing the backing
track as the user is singing.
All of these signals are mixed
together, in a MixerNode and
finally, given to the OutputNode
which plays it out through the
output hardware.
This is a simple example of the
engine setup, but with all the
nodes and the features the
Engine actually offers, you
could build a lot more complex
processing graph, based on your
app's needs.
So, that was a recap of the
Engine.
Now, let's see what's new in the
Engine this year.
We have a couple of new modes,
namely the Manual Rendering Mode
and Auto Shutdown Mode, and
also, we have some enhancements
in AVAudioPlayerNode, related to
the file and buffer completion
callbacks.
We'll see each of these, one by
one, starting with the Manual
Rendering Mode.
So, this is the karaoke example
that we just saw.
And as you can see, the Input
and the OutputNodes here, are
connected to the audio hardware,
and hence, the Engine
automatically renders in real
time.
The IO here is driven by the
hardware.
But what if you wanted the
Engine to render, not to the
device, but to the app?
And say, at the rate faster than
real time?
So, here is Manual Rendering
Mode which enables you to do
that.
And as you can see, under this
mode, the Input and the
OutputNodes, will not be
connected to any audio device,
and the app will be responsible
for pulling the Engine for
Output and to provide the Input
to the Engine which will be
optionally through the InputNode
or PlayerNode, etcetera.
So, the app drives the IO in
Manual Rendering Mode.
We have two variants under
Manual Rendering.
That is the Offline and Real
Time Manual Rendering Modes.
And again, we'll see each of
these in detail and also, later
in this section, I'll show you a
demo of the Offline Manual
Rendering Mode.
Under the Offline Manual
Rendering Mode, the Engine and
all the nodes in your processing
graph, operate under no
deadlines or real-time
constraints.
And because of this flexibility,
a node may choose to say use a
more expensive signal processing
algorithm when it's offline, or
a node for example, a player
node, may choose to block on the
render thread, until all the
data that it needs as input,
becomes ready.
But these things may not -- will
not happen with the nodes are
actually rendering in real time,
as we'll see soon.
So, let's consider a simple
example where we could use the
offline mode.
So, here's an example where an
app wants to process the audio
data in a source file.
I'll place some effects onto
that data, and dump the process
output to a destination file.
As you can see, there is no
rendering to the device involved
here.
And hence, the app can now use
the Engine in the offline mode.
So, it could set up a very
simple graph in the Engine, like
this.
It could use the PlayerNode to
read the data from the source
file, process it through an
EffectNode, which could be for
example a [inaudible], and then,
pull the data out of the
OutputNode and drive the process
data into a destination file.
And we will soon see a demo of
this exact setup in a couple of
slides.
There are many more applications
where you can use the offline
mode.
And some of these are listed
here.
Apart from post-processing of
audio files that I just
mentioned, you could also use
offline mode to say mix audio
files.
You could use it for offline
processing using a very CPU
intensive or a higher quality
algorithm, which may not be
feasible to use in real time.
Or simply, you could use the
offline mode, to test, debug, or
tune your live Engine setup.
So, that concludes the offline
mode and as promised, I'll show
you a demo of this in action.
Alright so, what I have here is
an [inaudible] Playground.
And this is the example where we
will post-process the audio data
in a source file, apply a
[inaudible] effect on the data,
and dump the output into a
destination file.
I have some code snippets here
and [inaudible] on [inaudible].
So, the first thing I do here,
is set up the Engine to render
in a live mode to the device,
just to see how the source file
sounds without having added any
effect to it.
So, I'm first opening up the
source file, which I want to
read.
And then, I'm creating and
configuring my Engine.
So, I have an Engine and a
PlayerNode.
And I'm going to take the player
to the main mixer node of the
Engine, which is implicitly
connected to the OutputNode of
the Engine.
Then I'm scheduling the source
file that I have on the player
so that it can read the data
from the source file.
And then I'm starting the Engine
and starting the player.
So, as I mentioned, the Engine
is now in a live mode, and this
will render to the device.
So, let's see how the source
file sounds without any effects.
[ Music ]
Okay, so that's how the source
file sounds like.
So, now what I'll do is, add a
reverb effect to process the
data.
So, I'll remove the player to
main mixer connection, and I'll
insert the reverb.
So, here I've created a reverb
and I'm setting the parameters
of the reverb.
And in this example, I'm using a
factory preset and wetDryMix of
70%.
And then I'm inserting the
reverb in the playback part in
between the player and the main
mixer.
So, now if I run the example, we
can see how the processed output
will sound like.
[ Music ]
Okay, so now at this point, if I
want, I could go ahead and tune
my reverb parameter so that it
sounds exactly as I want.
So, suppose I'm happy with all
the parameters and then now I
want to completely export my
source file into a destination
file.
And this is where the offline
mode comes into picture.
So, what I'll first do is, I'll
enable -- I'll switch the Engine
from the live mode to the
offline mode.
So, what I've done here is I'm
calling an Enable Manual
Rendering Mode API, and I'm
saying, "It needs to be the
offline variant of it."
I'm specifying a format of the
output which I want the Engine
to give me.
And this is, in this example,
same as the format of the input.
And then I'm specifying a
certain maximum number of
frames, which is the maximum
number of frames that you will
ever ask the Engine to render in
a single rendered call.
And in this example, the value's
4096.
But you can configure this as
you wish.
So, now if I go ahead and run
this example, nothing will
happen because the Engine is now
in the offline mode, and it's
ready to render.
But of course, it's waiting for
the app to pull the Engine for
output.
So, what we'll do next is to
actually pull the Engine for
output.
So, here I'm creating an output
file to which I want to dump the
process data.
And I'm creating an output
buffer to which I'll ask the
Engine to render sequentially in
every rendered call.
And the format of this buffer is
the same format that as I
mentioned, when enabling the
offline mode.
And then comes the rendered loop
where I'll [inaudible] pull the
engine for output.
Now, in this example, I have a
source file which is about three
minutes long.
So, I really don't want to
allocate a huge output buffer
and ask the Engine to render the
entire three minutes of data in
a single rendered call.
And that's why what I'm doing is
allocating an output buffer of a
very reasonable size, but
[inaudible] pulling the Engine
for output into the same buffer,
and then dumping the output to
the destination file.
So, in every iteration, I'll
decide the number of frames to
render in this particular
rendered call.
And I call the rendered offline
[inaudible] around the Engine,
asking it to render those many
number of frames, and giving it
the output buffer that we just
allocated.
And depending on the status, if
it rendered success, the data
was rendered successfully and I
can go ahead and drag the data
into my output file, and in case
it rendered an error, then
something went wrong, so you can
check the error code for more
information.
So, finally, when the
rendering's done, I'll stop the
player and I'll stop the Engine.
So, now if I go ahead and run
this example, the entire source
file will get exported and the
data will be dumped into the
destination file.
So, let's do that.
Okay, so as you may have
observed, the three-minute
length long source file got
rendered into an output file,
way faster than real time.
And that is one of the main
applications of the offline
rendering mode.
So, what we'll do next is again,
listen to the source file, and
the destination file, and make
sure that the data was indeed
processed.
So, that is my source file.
And this is my destination file.
So, first we'll listen to the
source file.
[ Music ]
So, as you saw, it is pretty
dry.
And now, the processed file.
[ Music ]
Okay, so as expected, the
processed data has reverb effect
added to it.
So, that concludes the offline
rendering demo.
And I'll switch back to the
slides.
[ Applause ]
So, as I mentioned, there are
many more applications to the
rendering mode.
And I'm also happy to announce
that the sample code for this
example, is already available on
our Sessions Homepage, and we'll
show you a link to that homepage
at the end of presentation.
Now, going to the second variant
of the Manual Rendering Mode.
The real time Manual Entering
Mode.
As the name itself suggests,
under this mode, the Engine and
all the nodes in your processing
graph, assume that they are
rendering under a real-time
context.
And hence, the they honor the
real-time constraints.
That is, they will not make any
kind of a blocking calls on the
render thread.
For example, they will not call
any libdispatch.
They will not allocate memory or
wait to block on a mutex.
And because of this constraint,
suppose the input data for node
is not ready in time.
A node has no other choice, but
the say, "Drop the data for that
particular render cycle, or
assume zeros and proceed."
Now, let's see where you would
use the Engine in the real-time
Manual Rendering Mode.
Suppose you have a custom AU
audio unit.
That is, in the live playback
part, and within the internal
render block of your audio unit,
you would like to process the
data that is going through,
using some other audio unit or
audio units.
In that case, you can set up the
Engine to use those other audio
units and process the data in
the real-time Manual Rendering
Mode.
The second example would be,
suppose you wanted to process
the audio data that belongs to a
movie or video, as it is
streaming or playing back.
Because this happens in the
real-time, you could use the
Engine in real-time Manual
Rendering Mode, to do that audio
processing.
And now, let's consider the
second use case and see how to
set up and use the Engine both
as an example an in code.
So, here's the app that's
receiving input movie stream,
and displaying back in
real-time, say to a TV.
But what it wants to do is
process the audio data as it in
the input, before it goes to the
output.
So, now it can use the Engine in
the real-time Manual Rendering
Mode.
So, it could set up a processing
graph like this.
It can provide the input through
the input node, process it
through an effect node, and then
pull the data from the output
node and then play it back to
the device.
Now, let's see a code example on
how to set up and use the Engine
in this mode.
So, here's the code.
And note that the setting up the
Engine itself, happens from a
non-real-time context.
And it's only rendering part
that actually happens from a
real-time context.
So, here's the setup code, where
you first cleared the Engine,
and by default, on creation, the
Engine will be ready to render
to the device until you switch
it over to the Manual Rendering
Mode.
So, you cleared the Engine, make
your required connections, and
then switch it over to the
Manual Rendering Mode.
So, this is the same API that we
saw in the demo, except that we
are now saying -- now asking the
Engine to operate under
real-time Manual Rendering Mode.
And specifying the format for
the output and maximum number of
frames.
The next thing you do is session
cache, something called a
surrender block.
Now, because the rendering of
the Engine happens from a
real-time context, you will not
be able to use the render
offline Objective-C or Swift
meta that we saw in the demo.
And that is because, it is not
safe to use Objective-C or Swift
runtime from a real-time
context.
So, instead, the engine itself
provides you a render block that
you can search and cache, and
then later use this render block
to render the engine from the
real-time context.
The next thing is -- to do, is
to set up your input node so
that you can provide your input
data to the Engine.
And here, you specify the format
of the input that you will
provide, and this can be a
different format than the
output.
And you also provide a block
which the Engine will call,
whenever it needs the input
data.
And when this block gets called,
the Engine will let you know how
many number of input frames it
actually needs.
And at that point, if you have
the data, you'll fill up an
input audio buffer list and
return it to the engine.
But if you don't have data, you
can return nil at this point.
Now note that the input node can
be used both in the offline and
real-time Manual Rendering Mode.
But when you're using it in the
real-time Manual Rendering Mode,
this input block also gets
called from a real-time context,
which means that you need to
take care not to make any kind
of blocking calls within this
input block.
The next part of the setup is to
clear your output buffer, and
the difference here is you will
create an AVAudioPCMBuffer and
fetch its audio buffer list
which is what you'll use in the
real-time render logic.
And finally, you'll go ahead and
start the Engine.
So, now the Engine is all set up
and ready to render, and is
waiting for the app to pull for
the output data.
Now here comes the actual render
logic.
And note that this part of the
chord is written in C++, and
that is because as I mentioned,
we are -- it's not safe to use
Objective-C or Swift runtime
from a real-time context.
So, what we're doing first is
calling the render block that we
cached earlier, and asking the
Engine to render a certain
number or frames, and giving it
the outputBufferList that we
created.
And finally, depending on the
status, if you get a success, it
means everything went fine and
the data was rendered to the
output buffer.
But you could also get an
insufficient data from input
note as a status, which means
that when your input block was
called by the Engine for input
data, you did not have enough
data in your written nil from
that input block.
And note that in this case, in
case you have other sources in
your processing graph, for
example, you have some of the
[inaudible] notes.
Those notes could have still
rendered the input data, so you
may still have some output in
your output buffer.
So, you can check the sizes of
your output buffer, to determine
whether or not it has any data.
And of course, you handle the
other status which includes the
error, and that is pretty much
the render logic in real-time
Manual Rendering Mode.
Now, lastly a note on the render
cause.
In the offline mode, because
there are no deadlines or
real-time constraints, you can
use either the Objective-C or
the Swift render of line method,
or you could use the render
block based render call in order
to render the Engine.
But in real-time Manual
Rendering Mode, you must use the
block based render call.
So, that brings us to the end of
Manual Rendering Mode.
Now let's now see the next new
mode we have in the Engine,
which is the Auto Shutdown Mode.
Now, normally it is the
responsibility of the app to
pause or stop the Engine when it
is not in use in order to
conserve power.
For example, say we have a music
app that is using one of the
player nodes for playing back
some file, and say the user
stops the playback.
Now the app, should not only
pause or stop the player node,
but it should also pause or stop
the Engine in order to prevent
it from running idle.
But in the past, we have seen
that not all the apps actually
do this, and especially that's
true on watchOS.
And hence, we are now adding the
safety net in order to conserve
power with this auto shutdown
mode.
When the Engine is operating
under this mode, it will
continuously monitor and if it
detects that the Engine is
running idle for a certain
duration, it will go ahead and
stop the audio hardware and
delete.
And later on, suppose any of the
sources become active again, it
will start the audio hardware
dynamically.
And all of this happens under
the hood.
And this is the enforced
behavior on watchOS, but it can
also be optionally enabled on
other platforms.
Now, next onto the enhancements
in AV Audio Player Node.
AV Audio Player Node is one of
the source nodes in the Engine,
through which you could schedule
a buffer or file for playback.
And the existing [inaudible]
methods, take a completion
handler and they call the
completion handler when the data
that you have provided has been
consumed by the player.
We are now adding new completion
handler and new types of
callbacks, in order for you to
know various stages of
completion.
The first new callback type is
the data consumed type.
And this is exactly same as the
existing completion handler.
That is, when the completion
handler gets called, it means
the data has been consumed by
the player.
So, at that point, if you
wanted, you could recycle that
buffer, or if you have more data
to schedule on the player, you
could do that.
The second type of callback is
the data rendered callback.
And that means that the data
that you provided, has been
rendered when the completion
handler gets called.
And this does not account for
any downstream signal processing
latencies in your processing
graph.
The last type is the data played
back type, which is the most
interesting one.
And this means that when your
completion handler gets called,
the buffer or the file that you
scheduled, has actually finished
playing from the listener's
perspective.
And this is applicable only when
the Engine is rendering to the
device.
And this accounts for all the
signal processing latencies,
downstream of the player in your
processing graph, as well as any
latency in the audio playback
device.
So, as a code example, let's see
a scheduled file method through
which you can schedule a file
for playback.
So, here, I'm scheduling a file
for playback and indicating that
I'm interested to know when the
data has played back.
That me -- and I'm providing a
completion handler.
So, when the completion handler
gets called, it means that my
file has finished playing, and
at this point, I can say,
"Notify my UI thread to update
the UI," or I can notify my main
thread to go ahead and stop the
Engine, if that's applicable.
So, that brings us to the end of
the enhancements we have in AV
Audio Engine.
At this point, I would also like
to mention that we will soon be
deprecating the AU Graph API in
the Audio Toolbox framework, in
2018, so please move over to
using AV Audio Engine instead of
AU Graph if you've not already
done that.
Now let's go to the second set
of API in the AV Foundation
framework, AV Audio Session.
AirPlay 2 is a brand-new
technology in this year's iOS,
tvOS, and macOS [inaudible].
And this lets you do multi-room
audio with AirPlay 2 capable
devices, which is for example,
the Homepod.
So, there is a separate
dedicated session called
"Interviews in AirPlay 2,"
happening this Thursday at 4:10
p.m. to go over all the features
of this technology.
So, you can catch that if you're
interested in knowing more
details.
Also seated with AirPlay 2 is
something called Long-Form
audio.
And this is a category of
content, for example music or
podcast, which is typically more
than a few minutes long, and
whose playback can be shared
with others.
For example, say you have a
party at home, and you are
playing back a music playlist
through an AirPlay device.
Now that is categorized as --
that can be categorized as a
long-form audio content.
Now with AirPlay 2 and long-form
audio, we now get a separate
shared route for the long-form
audio apps to the AirPlay 2
devices.
And I'll explain about that in a
little more detail.
And right -- and now, we have
new API in AV Audio Session, for
an app to identify itself as
being long-form and take
advantage of this separate
shared audio route.
So, let's consider the example I
just mentioned.
So, say you have a party at
home, and you're playing back
music to an AirPlay device.
We'll contrast the current
behavior and see how the
behavior changes with long-form
audio routing.
So, here is the current
behavior.
So, you -- the music is now
playing back through the AirPlay
device, and suppose you now get
a phone call.
What happens is, at this point,
your music playback gets
interrupted and it stops.
And the phone call gets routed
to the system audio which could
be receiver or [inaudible]
speaker.
And only when the phone call
ends, is when the music gets a
resumable [inaudible] and it
resumes the playback.
So, as you can see, a phone call
interrupting your party music is
not really an ideal scenario.
So, we'll now see how the
behavior changes with long-form
audio routing.
So lets see the same example.
So, now that we have music
playing back through an AirPlay
2 capable device.
And then, a phone call comes in.
Now because the phone call is
not a long-form audio, it does
not interrupt your music
playback, and it gets routed
independently to the system
audio without any issues.
So, with long-form audio
routing, two of the sessions can
coexist without interrupting
each other, and as you can see,
this is definitely an enhanced
user experience.
So,-- .
[ Applause ]
So, to summarize, with long-form
audio routing, all the apps that
identified themselves as being
long-form, which is for example,
music, podcast, or any other
music streaming app, they get
the dedicated -- they get a
separate shared route to the
AirPlay 2 capable device.
Now, note that there is a
session arbitrated in between.
And that ensures that only one
of these apps is playing to the
AirPlay device at the time.
So, these apps cannot mix with
each other.
And all the other apps that use
the system route, which are
non-long-form, can either
interrupt each other or mix with
each other, and they get routed
to the system audio without
interrupting your long-form
audio playback.
Now, let's see how an app can
identify itself as being
long-form and take advantage of
this routing.
So, on iOS and tvOS, the code is
really simple.
You get shared instance of your
AVAudio session, and you use
this new API to set your
category as playback and route
sharing policy as long-form.
Now, moving over to the macOS,
the routing is very similar to
the iOS and tvOS.
All the long-form audio apps,
for example your iTunes and any
other music streaming app, gets
routed to the AirPlay 2 capable
device, and of course, there is
an arbitrator in between.
And the other system apps like
GarageBand, Safari, or Game App,
do not interrupt your long-form
audio apps, and they always mix
with each other and get routed
to the default device.
And to enable the support of
long-form audio routing on
macOS, we are now bringing a
very small subset of AVAudio
Session to macOS.
So, as an app, in order to
identify yourself as being
long-form, you again get the
shared and sense of your AVAudio
Session, and set the route
sharing policy as being
long-form.
So, that is the end of AVAudio
Session enhancements, and let's
now see the last section in the
AV Foundation framework, that is
the enhancement on watchOS.
So, we introduced the AV -- we
made AVAudio Player API
available in watchOS 3.1SDK.
And this is the first time we
get to mention it at WWDC.
And the nice thing about using
the AVAudio Player for playback
is that it comes associated with
its AVAudio Session, so you
could use the session category
options like [inaudible] or mix
with others, to describe your
app's behavior.
Now starting watchOS 4, we are
exposing more APIs in order to
do recording.
That is, we are making AVAudio
Recorder and AVAudio Input Node
and AVAudio Engine, available.
And with these, comes the
AVAudio recording permissions,
through which an app can
[inaudible] the user permission
to record.
Now, [inaudible] to this you
could use the watch [inaudible]
framework to do the recording,
using the Apple UI.
But now, with these APIs, you
could do the recording with your
own UI.
With AVAudio Recorder, you could
record to a file, or if you
wanted to get access to the
microphone [inaudible] directly,
you could use the AVAudio Input
Node, and also optionally, write
it to a file.
And here are the formats that
are supported on watchOS, both
for playback and recording.
A last note on the recording
policies.
The recording can start only
when the app is in foreground.
But it is allowed to continue
recording in the background, but
-- and the right microphone icon
will be displayed at the top so
that the user is aware of it.
And recording in background is
CPU limited, similar to the
[inaudible] sessions and you can
refer to this URL for more
details.
Now, let's move over to the
Audio Toolbox world and look at
the enhancements in AUAudio Unit
and audio formats.
We have two main enhancements in
AUAudio Unit.
And at the end of this section,
we will also show you a demo
with those two new features in
action.
Now, Audio Unit host
applications choose various
strategies in order to recommend
how to display the UI for AU.
They can decide to say embed the
AU's UI in their own UI, or they
could present a full screen
separate UI for the AU.
Now, this presents mainly a
challenge on the [inaudible]
devices because currently, the
view sizes are not defined, and
the audio unit is expected to
adapt to any UI size that the
host has actually chosen.
In order to overcome this
limitation, we're now adding a
way in which the host and the AU
can negotiate with each other
and the AU can inform the host
about all the view
configurations that it actually
supports.
Now, let's see how this
negotiation can take place.
The host first compiles a list
of all the available view
configurations for the AU, and
then hands the audio over to the
AU.
The AU can then [inaudible]
through all these available
configurations, and then let the
host know about the
configuration that it actually
supports.
And then, the host can choose
one of the supported
configurations and then it will
let the AU know about the final
selected configuration.
Now, let's see a code example on
how this negotiation takes
place.
We'll first look at the audio
unit extension site.
The first thing the AU has to do
is to override the supported
view configuration method from
the base class.
And this is called by the host
with the list of all the
available configurations.
Then, the AU can iterate through
each of these configurations and
decide which ones it actually
supports.
Now, the configuration itself,
contains a width and a height,
which recommends the view size.
And also, it has a host test
controller flag.
And that flag indicates whether
or not the host is presenting
its own controller in this
particular view configuration.
So, depending on all these
factors, an AU can choose
whether it supports that
particular configuration.
Note that there is a wild card
configuration which is 0x0, and
that means -- and that
represents a full default size
that the AU can support.
And on macOS, this actually
translates to a separate,
resizable window -- full size,
resizable window, for the AU's
UI.
So, the AU has its own logic to
decide which configuration it
supports, and then finally, it
compares a list of the indices
corresponding to the ones that
it supports, and [inaudible]
this index set back to the host.
The last thing that the AU has
to do, is to override select
method, which is called by the
host with the configuration that
it has finally selected, and
then, the AU can let its view
controller know about the final
selected configuration.
Now, let's go to the host site
and see how the code looks like.
The host has to compile the list
of available configurations, and
in this example, it is saying
that it has a large and a small
configuration available.
And in the last configuration,
the host is saying it's not
presenting its controller, so
the host has controller flag as
false.
And in the small configuration,
the host does present its
controller, so the flag is true.
The host then calls the
supported view configurations
method on the AU, and provides
this list of configurations.
And depending on the return set
of indices, it goes ahead and
selects one of the
configurations.
And in this particular example,
the host is just toggling
between the large and the small
configuration.
So, that is end of the preferred
view configuration negotiation.
Now, let's see the second main
new feature we have, which is
the support for MIDI output in
an audio unit extension.
We have now support for an AU to
emit MIDI output synchronized
with its audio output.
And this mainly useful if the
host wants to record and edit
both the MIDI performance, as
well as the audio output, from
the AU.
So, the host installs a MIDI
output event block on the AU,
and the AU should call this
block every render cycle and
provide the MIDI output for that
particular render cycle.
We also have a couple of other
enhancements in the Audio
Toolbox framework.
The first one is related to a
privacy enhancement.
So, starting iOS 11 SDK, all the
audio unit extension host apps
will need the inter-app-audio
entitlement to be able to
communicate with the audio unit
extensions.
And we also have a new API for
an AU to publish a very
meaningful short name so that
the host say, can use this short
name if it has to display the
list of AU names in a space
constraint list.
So, that brings us to the end of
all the enhancements in Audio
Toolbox framework, and as
promised, we have a demo to show
these new features in action.
And I call upon Bela for that.
[ Applause ]
>> Thank you, Akshatha and good
afternoon everyone.
My name is Bela Balazs and I am
an engineer on the Core Audio
Team.
Today, we would like to show you
an application of our newly
introduced APIs.
For this purpose, we have
developed an example audio unit,
which has the following
capabilities.
It supports its preferred view
configuration with the Audio
Unit host application.
It supports multiple view
configurations, and it uses the
newly bridged MIDI output API in
order to pass on MIDI data to
the Audio Unit host application
for recording purposes.
So, here I have an upcoming
version of GarageBand.
And I have loaded my example
audio unit to a track.
Here you can see the custom view
of my audio unit, together with
the GarageBand keyboard.
In this reconfiguration, I rely
on the GarageBand keyboard to
play my instrument.
I have mapped out three drum
samples on the keyboard.
I have a kick, I have a snare,
and I have a high hat.
In addition to these, on the
view of my audio unit, I also
have a volume slider to control
the volume of these samples.
However, my audio unit also has
a different view configuration,
and I can switch to it using
this newly added button on the
right -- lower, right section of
the screen.
When I activate that button, I
get taken to the large view of
my audio unit and the GarageBand
keyboard disappears.
When I activate it again, I get
taken back to the small view of
my audio unit.
This is made possible by
GarageBand's publishing all the
available view configurations to
my audio unit, and my audio unit
goes through that list and marks
each of them as supported or
unsupported, and at the end of
this process, GarageBand knows
that my audio unit supports two
view configurations and it can
toggle between them.
In case my audio unit only
supported one view
configuration, then this button
could be hidden by GarageBand,
but my audio unit could still
take full advantage of the
negotiation process to negotiate
the preferred view configuration
for that one view.
In this small view, the host has
controller flag as set to true,
and that is why the GarageBand
keyboard is visible.
In the larger view
configuration, the GarageBand
keyboard is hidden because that
flag is set to false.
In this view configuration, my
audio unit has its own playing
surface, which I can use to play
my instrument.
I have a kick, a snare, and a
high hat.
And in addition to these three
buttons, I also have a new
button on the right-hand side
called Repeat Note.
And this allows me to repeat
each sample at the certain rate.
And I can set those rates
independently from each other
using the sliders.
And I can toggle each sample in
and out of the drum loop.
[ Drums playing ]
This allows me to easily
construct drum loops that
respect the tempo of my track.
So, let's use the MIDI output
API to record the output of this
audio unit extension.
I have the synchronized rates
button here, which sets my rates
to 110 BPM.
And first, I will record a kick,
snare drum loop.
And then when the recording
wraps around, I will add my high
hats.
This is made possible by
GarageBand's merge recording
feature.
So, let's do just that.
[ Drums playing ]
I will just record four bars of
that.
And then add my high hats.
[ Drums playing ]
My high hats have been added to
the recording.
And now we can go to the track
view and take a look at our
recorded media output.
And I can quantize the track.
And then we can play it back.
And we have the full MIDI
editing capabilities of
GarageBand at our disposal to
construct our drum track.
And this concludes my demo.
Thank you very much for your
attention.
And I would like to hand it back
to my colleague, Akshatha.
Thank you.
[ Applause ]
>> Thank you, Bela.
So, now, onto the last set of
enhancements in the Audio
Toolbox framework, related to
the audio formats.
We now have support for two of
the popular formats, namely the
FLAC and the Opus format.
On the FLAC side, we have the
codec, file, and the streaming
support, and for Opus, we have
the codec, and the file I/O
support using the code audio
format container.
From audio formats to spatial
audio formats, those of you who
are interested in [inaudible]
audio, AR, and VR applications,
you may be happy to know that we
now support ambisonics.
And for those of you who may not
be really familiar with
ambisonics like me, ambisonics
is also a multichannel format,
but the difference is that the
traditional surround formats
that we know of, for example 5.1
or 7.1, have the signals that
actually represent the speaker
layout.
Whereas ambisonics provide a
speaker independent
representation of the sound
feed.
So, they are by nature,
[inaudible] from the playback
system.
And at the time of rendering, is
when they can be decoded to the
listener's speaker setup.
And this provides more
flexibility for the content
producers.
We now support the first order
ambisonics which is called the
B-format and higher ordered
ambisonics with the Order N, can
range from 1 through 254.
And depending on the order, the
channels itself can go from zero
-- the ambisonic channel number
can go from zero to 65,024.
And we support two of the
popular normalized streams,
namely the SN3D and the N3D
streams, and we support decoding
ambisonics to any arbitrary
speaker layout, and conversion
between the B-format and these
normalized streams.
The last enhancement is on the
AU Spatial Mixer side.
So, this is an Apple built-in
spatial mixer, which is used for
3D audio spatialization.
And the AVAudio Environment
Node, which is a node in the
AVAudio Engine, also uses the
Spatial Mixer underneath.
And we now have a new rendering
algorithm in this Spatial Mixer,
called HRTFHQ, high quality.
And this differs from the
current existing HRTF algorithm
in the sense that it has a
better frequency response and
better localization of sources
in the 3D space.
So, that concludes all the
enhancements in the Audio
Toolbox framework and now, I
hand it over to Torrey to take
it away from here, and give you
an update on inter-device audio
mode.
[ Applause ]
>> Thank you, Akshatha.
I am Torrey Holbrook Walker, and
I'm going to take you home today
with inter-device audio mode, or
if you want to be cool, you can
just say IDAM for short.
And you remember IDAM.
You take your iOS device.
You plug it into your Mac.
You open up Audio MIDI setup and
then it shows right up there in
the Audio Device Window, you can
-- there's a button next to it
that says Enable.
And if you click it, boom,
you've immediately got the
capability to record audio
digitally over the USB lightning
cable that came with the device,
and it looks just like a USB
audio input to the Mac host.
So, it uses the same driver, the
same low latency driver, that's
used on MacOS 4, class-compliant
audio devices.
And you've been able to do this
since El Capitan and iOS 9.
Well, today we would like to
wave a fond farewell to IDAM.
So, wave goodbye IDAM.
Goodbye IDAM.
And while you're waving, say
hello to IDAM, Inter Device
Audio and MIDI.
So, this year, we are adding
MIDI to IDAM configuration, and
that will allow you to send and
receive your musical instrument
data to your iOS device using
the same cable that came with
the device.
It's class-compliant once again,
so on the iOS side, you will see
a MIDI source and destination
representing the Mac.
On the Mac, you will see a
source and destination
representing your iOS device.
Now, this will require iOS 11,
but you can do it as far back as
MacOS El Capitan or later,
because it's a class-compliant
implementation.
And you don't have to do
anything special to get MIDI.
You're going to get it
automatically anytime you enter
the item configuration by
clicking Enable.
Do you need to do anything to
your app to support that?
No. It will just work if it
works with MIDI.
So, while you're in the IDAM
configuration, your device will
be able to charge and sync, but
you will temporarily lose the
ability to photo import and
tether.
You can get that back by
clicking the Disable button or
hot plugging the device on your
Mac.
The input, the audio input, side
of this can be aggregated, so if
you've got multiple iOS devices,
like I do, say, your iPhone and
your iPad and your kid's iPad,
you could say enable IDAM
configuration on all three of
these and aggregate them into a
single, six-channel audio input
device that your digital audio
workstation can see.
And because the MIDI
communication is bidirectional,
you can use it as -- you could
say for example, "Send MIDI to a
synthesizer application," and
record the audio back from it.
Or you could just design a MIDI
controller application for an
iPad, that magical piece of
glass, and you could use that to
control your [inaudible].
But talk is cheap, and demos pay
the bills.
So, let's see this in action.
So, before I actually bring up
my demo machine here, I want to
show you the application that
I'm going to use here.
And it is called Feud Machine.
So, I've got Feud Machine open
here.
And on Feud Machine, this is a
multi-playhead MIDI sequencer.
So, that means that you can
actually use one MIDI sequence
and use different playheads,
perhaps moving at different
times, in different directions,
and use that to create a complex
arpeggio using phasing and a
timing relationship.
So, I'm just going to play this
pattern here.
And there are a lot of
playheads.
I'll just stop some of them.
So, this is just one.
I'll add another.
Add another.
As you see, we can create
arpeggios very easily this way.
So, there are other patterns
that I could use.
For example, this one's called
"Dotted".
This one's "Triplet."
But we'll still with this one,
and we're going to use this
actually to control a project
that we're working on in Logic.
So, now, I'll move over to my
demo machine.
I'm going to click Enable here.
And I'll see it come up as a USB
audio input, and if I look at
the MIDI studio window, I'll
also see that it shows up here
as a MIDI source and destination
that I can use in Logic.
So, if I launch a project that
I've been working on here -- now
this is a short, four-bar loop
that I'm working on for a gaming
scoring screen.
So, after this video game level
is completed, the player can
look at their results and they
will be listening to this loop.
And the loop right now, before
I've added anything to it,
sounds like this.
[ Music ]
Now, I want to add the arpeggio
part over this.
So, what I'm going to do is I'm
just going to double-click here
to add another track.
I'm going to choose an arpeggio,
maybe something like a square.
There we go.
I'll do percussive squares here.
And in the channel strip, you
can actually see an arpeggiator.
I'm not going to need that
because I'm going to play this
with Feud Machine.
So, if I record enable this, and
I arm my sequence here, I'll be
able to hear Feud Machine play
the soft synth here in Logic.
So, I'll solo that.
This is all four playheads
moving at the same time.
I could turn them off.
I could just have one playhead
if I wanted to.
Or as many as all four.
So, I'm going to record this
into my track, and we'll see
what that sounds like in
context.
Oops, sorry about that.
I have to record arm here and
play.
[ Music ]
Okay, so I've recorded my
automation here, and I can use
this automation and I can
playback from the iPad here.
So, if I listen to this in
context, it sounds like this.
So, now I've got MIDI going -- a
MIDI start command going to Feud
Machine.
Feud Machine's playing our soft
synth here.
And I've got some automation
here for the recording.
And that concludes my demo for
MIDI over IDAM configuration.
Let's head back to the slides.
[ Applause ]
Okay, we've talked about a lot
of things today.
We've talked about enhancements
to AVAudio Engine, including
Manual Rendering which you can
now do offline, or you can do
real-time.
There's AirPlay 2 support.
There'll be an entirely other
session on AirPlay 2 down the
road in the conference.
Please make sure to check that
out if you're interested.
Watch OS 4, you can now record.
We've talked about the
capabilities and the limitations
and policies regarding that.
For AUAudio Units, you can now
negotiate your view
configurations and you can also
synchronize your MIDI output
with your audio output for your
AU.
We've talked about some other
audio enhancements including new
supported formats, ambisonics,
head related transfer functions,
and we wrapped up with talking
about IDAM, which now stands for
Inter Device Audio and MIDI.
The central URL for information
regarding this particular talk
is here.
And if you're interested in
audio, you may also be
interested in these related
sessions later on in the week.
We thank you very much for your
time and attention, and have a
fantastic conference.
[ Applause ]