WWDC2016 Session 507

Transcript

[ Music ]
>> All right.
[ Applause ]
>> Good afternoon, everyone.
So, how many of you
want to build an app
with really cool audio
effects but thought
that that might be hard?
Or how many of you
want to focus more
on your application's overall
user experience but ended
up spending a little
more time on audio?
Well, we've been working
hard to make it easy for you.
My name is Saleem.
I'm a Craftsman on
the Core Audio Team.
I want to welcome you
to today's session
on delivering an
exceptional audio experience.
So let's look at an overview
of what's in our stack today.
We'll start with our
AVFoundation framework.
We'll start with our
AVFoundation framework.
We have a wide variety
of high-level APIs
that let you simply
play and record audio.
For more advanced use cases, we
have our AudioToolbox framework.
And you may have
heard of AudioUnits.
These are a fundamental
building block.
If you have to work with
MIDI devices or MIDI data,
we have our CoreMIDI framework.
For game development,
there's OpenAL.
And over the last two years,
we've been adding many new
APIs and features as well.
So you can see there
are many ways
that you can use audio
in your application.
So, our goal today
is to help guide you
to choosing the right API
for your application's needs.
But don't worry, we also
have a few new things
to share with you as well.
So, on the agenda
today, we'll first look
at some essential setup steps
for a few of our platforms.
Then, we'll dive straight into
simple and advanced playback
and recording scenarios.
We'll talk a bit about
multichannel audio.
And then later in
the presentation,
we'll look at real-time audio --
how you can build your
own effects, instruments,
and generators -- and then
we'll wrap up with MIDI.
So, let's get started.
iOS, watchOS, and tvOS all
have really rich audio features
and numerous writing
capabilities.
So users can make calls,
play music, play games,
work with various
productivity apps.
And they can do all of this
mixed in or independently.
So the operating system manages
a lot of default audio behaviors
in order to provide a
consistent user experience.
So let's look at a diagram
showing how audio is a
managed service.
So you have your device,
and it has a couple
of inputs and outputs.
And then there's the
operating system.
It may be hosting many apps,
some of which are using audio.
And lastly, there's
your application.
So AVAudioSession is your
interface, as a developer,
for expressing your
application needs to the system.
Let's go into a bit
more detail about that.
Categories express
the application's
highest-level needs.
We have modes and
category options
which help you further customize
and specialize your application.
If you're into some
more advanced use cases,
such as input selection,
you may want to be able
to choose the front microphone
on your iPhone instead
of the bottom.
If you're working with
multichannel audio
and multichannel content on
tvOS, you may be interested
in things like channel count.
If you had a USB audio device
connected to your iPhone,
you may be interested in
things like sample rate.
So when your application
is ready and configured
to use audio, it informs the
system to apply the session.
So this will configure
the device's hardware
So this will configure
the device's hardware
for your application's needs
and may actually result
in interrupting other audio
applications on the system,
mixing with them, and/or
ducking their volume level.
So let's look at some
of the essential steps
when working with
AVAudioSession.
The first step is to sign
up for notifications.
And the three most important
notifications are the
interruption, route change,
and mediaServicesWereReset
notification.
You can sign up for these
notifications before you
activate your session.
And in a few slides, I'll show
you how you can manage them.
Next, based on your
application's high-level needs,
you'll want to set the
appropriate category mode
and options.
So, let's look at
a few examples.
Let's just say I was
building a productivity app.
And in that application, I
want to play a simple sound
when the user saves
their document.
Here, we can see that audio
enhances the experience
but it's not necessarily
required.
So, in this case, I'd want
to use the AmbientCategory.
So, in this case, I'd want
to use the AmbientCategory.
This category obeys
the ringer switch.
It does not play audio
in the background,
and it'll always
mix in with others.
If I was building a podcast app,
I'd want to use the
PlaybackCategory,
the SpokenAudio mode.
And here, we can see that this
app location will interrupt
other applications
on the system.
Now if you want your
audio to continue playing
in the background,
you'll also have
to specify the background
audio key in your info.plist.
And this is essentially a
session property as well.
It's just expressed
through a different means.
For your navigation app,
let's look at how you can
configure the navigation prompt.
Here, you'd want to use
the PlaybackCategory,
the DefaultMode.
And there are a few
options of interest here.
You'd want to use both
the InterruptSpokenAudio
AndMixWithOthers as
well as the duckOthers.
So, if you're listening to
a podcast while navigating
So, if you're listening to
a podcast while navigating
and that navigation prompt
comes up saying, "Oh,
turn left in 500 feet,"
it'll actually interrupt
the podcast app.
If you're listening to music,
it'll duck the music's volume
level and mix in with it.
For this application,
you'll also want
to use a background
audio key as well.
So, next, let's look at how
we can manage activation
of our session.
So what does it mean
to go active?
Activating your session
informs the system
to configure the hardware
for your application's needs.
So let's say, for example,
I had an application
whose category was set
to PlayAndRecord.
When I active my session,
it'll configure the hardware
to use input and output.
Now, what happens if I activate
my session while listening
to music from the music app?
Here, we can see that
the current state
of the system is set
for playback only.
So, when I activate my
session, I inform the system
to configure the hardware
for both input and output.
to configure the hardware
for both input and output.
And since I'm in a
non-mixable app location,
I've interrupted the music app.
So let's just say my application
makes a quick recording.
Once I'm done, I
deactivate my session.
And if I choose to notify others
that I've deactivated
my session,
we'll see that the music
app would resume playback.
Next, let's look at how we can
handle the notifications we
signed up for.
We'll first look at the
interruption notification,
and we'll examine a case
where your application
does not have playback UI.
The first thing I do is I
get the interruptionType.
And if it's the beginning
of an interruption,
your session is already
inactive.
So your players have been
paused, and you'll use this time
to update any internal
state that you have.
When you receive the end
interruption, you go ahead
and activate your session,
start your players,
and update your internal state.
Now, let's see how that differs
for an application
that has playback UI.
So when you receive the
begin interruption --
again, your session
is inactive --
you update the internal state,
as well as your UI this time.
So if you have a Play/Pause
button, you'd want to go ahead
and set that to "play"
at this time.
And now when you receive the end
interruption, you should check
and see if the shouldResume
option was passed in.
If that was passed in,
then you can go ahead
and activate your
session, start playback,
and update your internal
state and UI.
If it wasn't passed
in, you should wait
until the user explicitly
resumes playback.
It's important to
note that you can have
unmatched interruptions.
So, not every begin interruption
is followed by a matching end.
And an example of this are
media-player applications
that interrupt each other.
Now, let's look at how we
can handle route changes.
Route changes happen for
a number of reasons --
the connected devices
may have changed,
a category may have changed,
you may have selected a
different data source or port.
So, the first thing you do is
you get the routeChangeReason.
If you receive a reason that
the old device is unavailable
in your media-playback
app, you should go ahead
and stop playback at this time.
An example of this is if
your user is streaming music
to the headsets and they
unplug the headsets.
They don't expect that
the music resumes playback
through the speakers right away.
For more advanced use cases,
if you receive the
oldDeviceUnavailable
or newDeviceAvailable
routeChangeReason, you may want
to re-evaluate certain
session properties
as it applies to
your application.
Lastly, let's look at how we
can handle the media services
where we set the notification.
This notification is
rare, but it does happen
because demons aren't
guaranteed to run forever.
The important thing
to note here is
that your AVAudioSession
sharedInstance is still valid.
You will need to reset your
category mode and other options.
You'll also need to destroy and
recreate your player objects,
such as your AVAudioEngine,
remote I/Os,
and other player
objects as well.
And we provide a means for
testing this on devices by going
to Settings, Developer,
Reset Media Services.
OK, so that just recaps
the four steps for working
with AVAudioSession --
the essential steps.
You sign up for notifications.
You set the appropriate
category mode and options.
You manage activation
of your session.
And you handle the
notifications.
So let's look at some
new stuff this year.
New this year, we're adding
two new category options --
allowAirPlay and
allowBluetoothA2DP --
to the PlayAndRecord category.
So, that means that you can now
use a microphone while playing
to a Bluetooth and
AirPlay destination.
to a Bluetooth and
AirPlay destination.
So if this is your
application's use case, go ahead
and set the category
and the options,
and then let the
user pick the route
from either an MPVolumeView
or Control Center.
We're also adding a new
property for VoIP apps
on our
AVAudioSessionPortDescription
that'll determine whether
or not the current
route has hardware voice
processing enabled.
So if your user is
connected to a CarPlay system
or a Bluetooth HFP headset that
has hardware voice processing,
you can use this property
to disable your software
voice processing
so you're not double-processing
the audio.
If you're already using Apple's
built-in voice processing IO
unit, you don't have
to worry about this.
And new this year, we
also introduced the
CallKit framework.
So, to see how you can enhance
your VoIP apps with CallKit,
we had a session
earlier this week.
And if you missed that, you can
go ahead and catch it online.
So that's just an
overview of AVAudioSession.
So that's just an
overview of AVAudioSession.
We've covered a lot
of this stuff in-depth
in previous sessions.
So we encourage you
to check those out,
as well as a programming
guide online.
So, moving on.
So you set up AVAudioSession
if it's applicable
to your platform.
Now, let's look at how
you can simply play
and record audio in
your application.
We'll start with the
AVFoundation framework.
There are a number of classes
here that can handle the job.
We have our AVAudioPlayer,
AVAudioRecorder,
and AVPlayer class.
AVAudioPlayer is the simplest
way to play audio from a file.
We support a wide
variety of formats.
We provide all the basic
playback operations.
We also support some
more advanced operations,
such as setting volume level.
You get metering on
a per-channel basis.
You can loop your playback,
adjust the playback rate,
work with stereo panning.
If you're on iOS or
tvOS, you can work
with channel assignments.
If you had multiple files
you wanted to playback,
If you had multiple files
you wanted to playback,
you can use multiple
AVAudioPlayer objects
and you can synchronize
your playback as well.
And new this year, we're adding
a method that lets you fade
to volume level over
a specified duration.
So let's look at a code example
of how you can use AVAudioPlayer
in your application.
Let's just say I was working
and building a simple
productivity app again
where I want to play an
acknowledgement sound
when the user saves
their document.
In this case, I have an
AVAudioPlayer and a URL
to my asset in my class.
Now in my setup function,
I go ahead
and I create the AVAudioPlayer
object with the contents
of my URL and I prepare
the player for playback.
And then, in my saveDocument
function, I may do some work
to see whether or not the
document was saved successfully.
And if it was, then I
simply play my file.
Really easy.
Now, let's look at
AVAudioRecorder.
Now, let's look at
AVAudioRecorder.
This is the simplest way
to record audio to a file.
You can record for a specified
duration, or you can record
until the user explicitly stops.
You get metering on
a per-channel basis,
and we support a wide
variety of encoded formats.
So, to set up a format,
we use the Recorder
Settings Dictionary.
And now this is a
dictionary of keys that has --
a list of keys that let you
set various format parameters
such as sample rate,
number of channels.
If you're working with Linear
PCM data, you can adjust things
like the bit depth
and endian-ness.
If you're working with encoded
formats, you can adjust things
such as quality and bit rate.
So, let's look at a code example
of how you can use
AVAudioRecorder.
So the first thing I do is
I create my format settings.
Here, I'm creating an AAC file
with a really high bit rate.
And then the next thing I do --
I go ahead and create my
AVAudioRecorder object
with a URL to the file location
and the format settings
I've just defined.
And in this example, I have a
simple button that I'm using
And in this example, I have a
simple button that I'm using
to toggle the state
of the recorder.
So when I press the button,
if the recorder is recording,
I go ahead and stop recording.
else -- I start my recording.
And I can use the
recorders built in meters
to provide feedback to the UI.
Lastly, let's look at AVPlayer.
AVPlayer works not
only with local files
but streaming content as well.
You have all the standard
control available.
We also provide built-in
user interfaces
that you can use directly,
such as the AVPlayerView
and the AVPlayerViewController.
And AVPlayer also works
with video content as well.
And this year, we added a number
of new features to AVPlayer.
So if you want to find out
what we did, you can check
out the Advances in
AVFoundation Playback.
And if you missed that, you can
go ahead and catch it online.
OK, so what we've seen so far is
just some very simple examples
of playback and recording.
So now let's look at some
more advanced use cases.
Advanced use cases include
playing back not only from files
but working with buffers
of audio data as well.
You may be interested in
doing some audio processing,
applying certain effects
and mixing together
multiple sources.
Or you may be interested
in implementing 3D audio.
So, some examples of this
are you're building a classic
karaoke app, you want
to build a deejay app
with really amazing effects,
or you want to build a game
and really immerse
your user in it.
So, for such advanced use
cases, we have a class
in AVFoundation called
AVAudioEngine.
AVAudioEngine is a powerful,
feature-rich Objective-C
and Swift API.
It's a real-time audio system,
and it simplifies working
with real-time audio
by providing a non-real-time
interface for you.
So this has a lot of
complexities dealing
with real-time audio,
and it makes your code
that much simpler.
The Engine manages
a graph of nodes,
and these nodes let you
play and record audio.
You can connect these
nodes in various ways
to form many different
processing chains
and perform mixing.
You can capture audio
at any point
in the processing chain as well.
And we provide a special node
that lets you spatialize
your audio.
So, let's look
at the fundamental building
block -- the AVAudioNode.
We have three types of nodes.
We have source nodes, which
provide data for rendering.
So these could be
your PlayerNode,
an InputNode, or a sampler unit.
We have processing nodes that
let you process audio data.
So these could be
effects such as delays,
distortions, and mixers.
And we have the destination
node,
which is the termination
node in your graph,
and it's connected directly
to the output hardware.
So let's look at a sample setup.
Let's just say I'm building
a classic karaoke app.
In this case, I have
three source nodes.
I'm using the InputNode to
capture the user's voice.
I'm using a PlayerNode
to play my Backing Track.
I'm using another PlayerNode
to play other sound effects
and feedback used to the user.
In terms of processing nodes,
I may want to apply a specific
EQ to the user's voice.
And then I'm going
to use the mixer
to mix all three sources
into a single output.
And then the single
output will then be played
through the OutputNode and then
out to the output hardware.
I can also capture the user's
voice and do some analysis
to see how well they're
performing
by installing a TapBlock.
And then based on that,
I can unconditionally
schedule these feedback queues
to be played out.
So let's now look at
a sample game setup.
The main node of interest
here is the EnvironmentNode,
which simulates a 3D space
and spatializes its
connected sources.
In this example, I'm using
the InputNode as well
as a PlayerNode as my source.
And you can also adjust
various 3D mixing properties
on your sources as well,
such as position, occlusion.
And in terms of the
EnvironmentNode,
you can also adjust
properties there,
such as the listenerPosition as
well as other reverb parameters.
So this 3D Space can then be
mixed in with a Backing Track
and then played through
the output.
So before we move any
further with AVAudioEngine,
I want to look at some
fundamental core classes
that the Engine uses
extensively.
I'll first start
with AVAudioFormat.
So, AVAudioFormat
describes the data format
in an audio file or stream.
So we have our standard
format, common formats,
as well as compressed formats.
This class also contains
an AVAudioChannelLayout
which you may use when dealing
with multichannel audio.
It's a modern interface
to our
AudioStreamBasicDescription
structure and our
AudioChannelLayout structure.
Now, let's look at
AVAudioBuffer.
This class has two subclasses.
It has the AVAudioPCMBuffer,
which is used to hold PCM data.
And it has the
AVAudioCompressBuffer,
which is used for holding
compressed audio data.
Both of these classes
provide a modern interface
to our AudioBufferList and our
AudioStreamPacketDescription.
Let's look at AVAudioFile.
This class lets you read and
write from any supported format.
It lets you read data into
PCM buffers and write data
into a file from PCM buffers.
And in doing so,
it transparently handles
any encoding and decoding.
And it supersedes now our
AudioFile and ExtAudioFile APIs.
Lastly, let's look
at AVAudioConverter.
This class handles
audio format conversion.
So, you can convert between one
form of PCM data to another.
You can also convert between
PCM and compressed audio formats
in which it handles the
encoding and decoding for you.
And this class supersedes
our AudioConverter API.
And new this year, we've also
added a minimum phase sample
rate converter algorithm.
So you can see that all these
core classes really work
together when interfacing
with audio data.
Now, let's look at how
these classes then interact
with AVAudioEngine.
So if you look at
AVAudioNode, it has both input
and output AVAudio formats.
If you look at the PlayerNode,
it can provide you to the Engine
from an AVAudioFile or
an AVAudioPCMBuffer.
from an AVAudioFile or
an AVAudioPCMBuffer.
When you install a NodeTap, the
block provides audio data to you
in the form of PCM buffers.
You can do analysis with
it, or then you can save it
to a file using an AVAudio file.
If you're working with
a compressed stream,
you can break it down
into compress buffers,
use an AVAudioConverter to
convert it to PCM buffers,
and then provide it to the
Engine through the PlayerNode.
So, new this year,
we're bringing a subset
of AVAudioEngine to the Watch.
Along with that, we're including
a subset of AVAudioSession,
as well as all the core
classes you've just seen.
So I'm sure you'd love
to see a demo of this.
So we have that for you.
We built a simple game
using both SceneKit
and AVAudioEngine directly.
And in this game, what I'm doing
is I'm launching an asteroid
into space.
And at the bottom of the
screen, I have a flame.
And I can control the flame
using the Watch's Digital Crown.
And I can control the flame
using the Watch's Digital Crown.
And now if the asteroid
makes contact with the flame,
it plays this really
loud explosion sound.
So, let's see this.
[ Explosions ]
I'm sure this game, like,
defies basic laws of physics
because it's playing
audio in space.
Right? And that's not possible.
[ Applause ]
All right, so let me just go
over quickly the
AVAudioEngine code in this game.
So, in my class, I
have my AVAudioEngine.
And I have two PlayerNodes --
one for playing the
explosion sound,
and one for playing
the launch sound.
I also have URLs
to my audio assets.
And in this example,
I'm using buffers
to provide data to the engine.
So, let's look at how
we set up the engine.
The first thing I
do is I go ahead
and I attach my PlayerNodes.
So I touch the explosionPlayer
and the launchPlayer.
Next, I'm going to
use the core classes.
I'm going to create an AVAudio
file from the URL of my assets.
And then, I'm going to
create a PCM buffer.
And I'm going to read the data
from the file into
the PCM buffer.
And I can do this because my
audio files are really short.
Next, I'll go ahead and
make the connections
between the source nodes
and the engine's main mixer.
So, when the game is
about to start, I go ahead
and I start my engine
and I start my players.
And when I launch an asteroid, I
simply schedule the launchBuffer
to be played on the
launchPlayer.
And when the asteroid makes
contact with the flame,
I simply schedule the
explosionBuffer to be played
on the explosionPlayer.
So, with a few lines of code,
I'm able to build a really
rich audio experience
for my games on watchOS.
And that was a simple
example, so we can't wait
to see what you come up with.
So, before I wrap up with
AVAudioEngine, I want to talk
about multichannel audio
and specifically how
it relates to tvOS.
So, last October, we
introduced tvOS along
with the 4th generation
Apple TV.
And so this is the first time
we can talk about it at WWDC.
And one of the interesting
things about audio
on Apple TV is that many
users are already connected
to multichannel hardware
since many home theater
systems already support 5.1
or 7.1 surround sound systems.
So, today, I just want to go
over how you can render
multichannel audio
using AVAudioEngine.
So, first, let's review the
setup with AVAudioSession.
I first set my category
and other options,
and then I activate my session
to configure the hardware
and then I activate my session
to configure the hardware
for my application's needs.
Now, depending on the
rendering format I want to use,
I'll first need to check and see
if the current route
supports it.
And I can do that by
checking if my desired number
of channels are less
than or equal
to the maximum number
of output channels.
And if it is, then
I can go ahead
and set my preferred
number of output channels.
I can then query back the
actual number of channels
from the session and then
use that moving forward.
Optionally, I can
look at the array
of ChannelDescriptions
on the current port.
And each ChannelDescription
gives me a channelLabel
and a channelNumber.
So I can use this information
to figure out the exact format
and how I can map my content
to the connected hardware.
Now, let's switch gears and
look at the AVAudioEngine setup.
There are two use cases here.
The first use case is
if you already have
multichannel content.
And the second use case is
if you have mono content
and you want to spatialize it.
And this is typically
geared towards games.
So, in the first use case,
I have multichannel content
and multichannel hardware.
I simply get the
hardware format.
I set that as my connection
between my Mixer
and my OutputNode.
And on the source side, I get
the content format and I set
that as my connection between
my SourceNode and the Mixer.
And here, the Mixer handles
the channel mapping for you.
Now, in the second use case, we
have a bunch of mono sources.
And we'll use the
EnvironmentNode
to spatialize them.
So, like before, we get
the hardware format.
But before we set the compatible
format, we have to map it to one
that the EnvironmentNode
supports.
And for a list of
supported formats,
you can check our
documentation online.
So, I set the compatible format.
And now on the source side, like
before, I get the content format
and I set that as my
connection between my player
and the EnvironmentNode.
Lastly, I'll also have
to set the multichannel
rendering algorithm
to SoundField, which is
what the EnvironmentNode
currently supports.
And at this point, I can start
my engine, start playback,
and then adjust all the
various 3D mixing properties
that we support.
So, just a recap.
AVAudioEngine is a
powerful, feature-rich API.
It simplifies working
with real-time audio.
It enables you to work with
multichannel audio and 3D audio.
And now, you can build games
with really rich audio
experiences on your Watch.
And it supersedes our
AUGraph and OpenAL APIs.
So we've talked a bit about the
Engine in previous sessions,
so we encourage you to
check those out if you can.
And at this point, I'd like to
hand it over to my colleague,
Doug, to keep it
rolling from here.
Doug?
[ Applause ]
>> Thank you, Saleem.
So, I'd like to continue
our tour
through the audio APIs here.
We talked about real-time audio
in passing with AVAudioEngine.
Saleem emphasized that,
while the audio processing is
happening in real-time context,
we're controlling it from
non-real-time context.
And that's the essence
of its simplicity.
But there are times when
you actually want to do work
in that real-time
process, or context.
So I'd like to go
into that a bit.
So, what is real-time audio?
The use cases where
we need to do things
in real-time are
characterized by low latency.
Possibly the oldest
example I'm familiar
with on our platforms is
with music applications.
For example, you may
be synthesizing a sound
when the user presses a
key on the MIDI keyboard.
And we want to minimize
the time from when
that MIDI note was struck
to when the note plays.
that MIDI note was struck
to when the note plays.
And so we have real-time audio
effects like guitar pedals.
We want to minimize the time it
takes from when the audio input
of the guitar comes
into the computer
through which we process it,
apply delays, distortion,
and then send it back
out to the amplifier.
So we need low latency there
so that the instrument,
again, is responsive.
Telephony is also characterized
by low latency requirements.
We've all been on phone calls
with people in other countries
and had very long delay times.
It's no good in telephony.
We do a lot of signal
processing.
We need to keep the
latency down.
Also, in game engines, we
like to keep the latency down.
The user is doing things --
interacting with
joysticks, whatever.
We want to produce those
sounds as quickly as possible.
Sometimes, we want to
manipulate those sounds
as they're being rendered.
Or maybe we just have
an existing game engine.
In all these cases, we have a
need to write code that runs
In all these cases, we have a
need to write code that runs
in a real-time context.
In this real-time context,
the main characteristic of --
our constraint is that we're
operating under deadlines.
Right? Every some-number
of milliseconds,
the system is waking us up,
asking us to produce some audio
for that equally-small
slice of time.
And we either accomplish it
and produce audio seamlessly.
Or if we fail, if we take too
long to produce that audio,
we create a gap in the output.
And the user hears
that as a glitch.
And this is a very small
interval that we have
to create our audio in.
Our deadlines are typically
as small as 3 milliseconds.
And 20 milliseconds,
which is default on iOS,
is still a pretty
constrained deadline.
So, in this environment, we have
to be really careful
about what we do.
We can't really block.
We can't allocate memory.
We can't use mutexes.
We can't access the
file system or sockets.
We can't log.
We can't even call
a dispatch "async"
because it allocates
continuations.
And we have to be careful not
to interact with the Objective-C
and Swift runtimes because they
are not entirely real-time safe.
There are cases when they,
too, will take mutexes.
So that's a partial list.
There other things we can't do.
The primary thing
to ask yourself is,
"Does this thing I'm doing
allocate memory or use mutexes?"
And if the answer is yes,
then it's not real-time safe.
Well, what can we do?
I'll show you an example
of that in a little bit.
But, first, I'd like
to just talk
about how we manage this problem
of packaging real-time
audio components.
And we do this with an API
set called Audio Units.
So this is a way
for us to package --
and for you, for that matter,
as another developer --
and for you, for that matter,
as another developer --
to package your signal
processing and modules
that can be reused in
other applications.
And it also provides an API
to manage the transitions
and interactions between
your non-real-time context
and your real-time
rendering context.
So, as an app developer,
you can host Audio Units.
That means you can let
the user choose one,
or you can simply
hardcode references
to system built-in units.
You can also build
your own Audio Units.
You can build them as app
extensions or plug-ins.
And you can also simply
register an Audio Unit privately
to your application.
And this is useful, for example,
if you've got some small piece
of signal processing
that you want to use
in the context of AVAudioEngine.
So, underneath Audio Units,
we have an even more
fundamental API
which we call Audio Components.
So this is a set of APIs in
the AudioToolbox framework.
The framework maintains
a registry of all
of the components on the system.
Every component has a type,
subtype, and manufacturer.
These are 4-character codes.
And those serve as the key
for discovering them
and registering them.
And there are a number
of different kinds
of Audio Components types.
The two main categories
of types are Audio Units
and Audio Codecs.
But amongst the Audio Units,
we have input/output units,
generators, effects,
instruments,
converters, mixers as well.
And amongst codecs, we
have encoders and decoders.
We also have audio file
components on macOS.
Getting into the
implementation of components,
there are a number
of different ways
that components are implemented.
Some of them you'll need to know
about if you're writing them.
Some of them you'll need to know
about if you're writing them.
And others, it's
just for background.
The most highly-recommended
way to create a component now
if it's an Audio Unit is
to create an Audio Unit
application extension.
We introduced this last year
with our 10.11 and 9.0 releases.
So those are app extensions.
Before that, Audio Units were
packaged in component bundles --
as were audio codecs, et cetera.
That goes back to
Mac OS 10.1 or so.
Interestingly enough,
audio components also include
inter-app audio nodes on iOS.
Node applications
register themselves
with a component subtype
and manufacturer key.
And host applications
discover node applications
through the Audio
Component Manager.
And finally, you can register
-- as I mentioned before --
you can register your own
components for the use
of your own application.
And just for completeness,
there are some Apple
built-in components.
On iOS, they're linked
into the AudioToolbox.
So those are the flavors of
component implementations.
Now I'd like to focus in on just
one kind of component here --
the audio input/output unit.
This is and Audio Unit.
And it's probably the one
component that you'll use
if you don't use any other.
And the reason is that this
is the preferred interface
to the system's basic
audio input/output path.
Now, on macOS, that basic path
is in the Core Audio framework.
We call it the Audio HAL,
and it's a pretty
low-level interface.
It makes its clients deal with
interesting stream typologies
on multichannel devices
for example.
So, it's much easier to deal
with the Audio HAL interface
through an audio
input/output unit.
On iOS, you don't
even have access
to the Core Audio framework.
It's not public there.
You have to use an
audio input/output unit
as your lowest-level way to get
audio in and out of the system.
And our preferred interface now
for audio input/output
units is AUAudioUnit
and the AudioToolbox framework.
If you've been working
with our APIs for a while,
you're familiar with version
2 Audio Units that are part
of the system AUHAL on the macOS
and AURemoteIO on iOS as well
as Watch -- actually, I'm not
sure we have it available there.
But in any case, AUAudioUnit
is your new modern interface
to this low-level I/O mechanism.
So I'd like to show
you what it looks
like to use AUAudioUnit
to do AudioIO.
So I've written a simple
program in Swift here
So I've written a simple
program in Swift here
that generates a square wave.
And here's my signal processing.
I mentioned earlier I
would show you what kinds
of things you can do here.
So this wave generator
shows you.
You can basically read memory,
write memory, and do math.
And that's all that's
going on here.
It's making the simplest of all
wave forms -- the square wave --
at least simplest from a
computational point of view.
So that class is called
SquareWaveGenerator.
And let's see how to play
a SqaureWaveGenerator
from an AUAudioUnit.
So the first thing we
do is create an audio
component description.
And this tells us which
component to go look for.
The type is output.
The subtype is something I chose
here depending on platform --
either RemoteIO or HalOutput.
We've got the Apple manufacturer
and some unused flags.
We've got the Apple manufacturer
and some unused flags.
Then I can create my AUAudioUnit
using my component description.
So I'll get that
unit that I wanted.
And now it's open and I
can start to configure it.
So the first thing I
want to do here is find
out how many channels of
audio are on the system.
There are ways to do this
with AVAudioSession on iOS.
But most simply and portably,
you can simply query
the outputBusses
of the input/output unit.
And outputBus[0] is the
output-directed stream.
So I'm going to fetch
its format,
and that's my hardware format.
Now this hardware format
may be something exotic.
It may be inertly for example.
And I don't know that I
want to deal with that.
So I'm just going to
create a renderFormat.
So I'm just going to
create a renderFormat.
That is a standard format
with the same sample rate.
And some number of channels.
Just to keep things short
and simple, I'm only going
to render two channels,
regardless of the
hardware channel count.
So that's my renderFormat.
Now, I can tell the I/O unit,
"This is the format I want
to give you on inputBus[0]."
So, having done this, the unit
will now convert my renderFormat
to the hardwareFormat.
And in this case, on my MacBook,
it's going to take this
deinterleaved floating point
and convert it to interleaved
floating point buffers.
OK. So, next, I'm going
to create my square
wave generators.
If you're a music and
math geek like me,
you know that A440 is
there, and multiplying it
by 1.5 will give you
a fifth above it.
So I'm going to render
A to my left channel
So I'm going to render
A to my left channel
and E to my right channel.
And here's the code that will
run in the real-time context.
There's a lot of
parameters here,
and I actually only
need a couple of them.
I only need the frameCount
and the rawBufferList.
The rawBufferList is a
difficult, low-level C structure
which I can rewrap in Swift
using an overlay on the SDK.
And this takes the
audio bufferList
and makes it look something
like a vector or array.
So having converted
the rawBufferList
to the nice Swift wrapper,
I can query its count.
And if I got at least
one buffer,
then I can render
the left channel.
If I got at least two buffers,
I can render the right channel.
And that's all the work
I'm doing right here.
Of course, there's more work
inside the wave generators,
but that's all of the
real-time context work.
but that's all of the
real-time context work.
So, now, I'm all setup.
I'm ready to render.
So I'm going to tell
the I/O unit,
"Do any allocations you need
to do to start rendering."
Then, I can have it
actually start the hardware,
run for 3 seconds, and stop.
And that's the end of
this simple program.
[ Monotone ]
So, that's AUAudioUnit.
I'd like to turn next briefly to
some other kinds of Audio Units.
We have effects which take audio
input, produce audio output.
Instruments which take something
resembling MIDI as input
and also produce audio output.
And generators which
produce audio output
without anything going in except
maybe some parametric control.
If I were to repackage my square
wave generator as an Audio Unit,
I would make it a generator.
So to host these
kinds of Audio Units,
So to host these
kinds of Audio Units,
you can also use AUAudioUnit.
You can use a separate block
to provide input to it.
It's very similar to the
output provider block
that you saw on the I/O unit.
You can chain together
these render blocks of units
to create your own
custom typologies.
You can control the units
using their parameters.
And also, many units,
especially third-party units,
have nice user interfaces.
As a hosting application,
you can obtain
that audio unit's view,
display it in your application,
and let the user
interact with it.
Now if you'd like to
write your own Audio Unit,
the way I would start
is just building it
within the context of an app.
This lets you debug
without worrying
about inter-process
communication issues.
It's all in one process.
So, you start by
subclassing AUAudioUnit.
You register it as a component
using this class method
You register it as a component
using this class method
of AUAudioUnit.
Then, you can debug it.
And once you've done that --
and if you decide you'd
like to distribute it
as an Audio Unit extension --
you can take that same
AUAudioUnit subclass.
You might fine-tune and
polish it some more.
But then you have to do a
small amount of additional work
to package this as an
Audio Unit extension.
So you've got an extension.
You can embed it
in an application.
You can sell that
application on the App Store.
So I'd like to have
my colleague, Torrey,
now show you some of the power
of Audio Unit extensions.
We've had some developers
doing some really cool things
with it in the last year.
>> How is everybody doing?
Happy to be at WWDC?
[ Applause ]
Let's make some noise.
I'm going to start here by
launching -- well, first of all,
I have my instrument here.
This is my iPad Pro.
And I'm going to start by
launching Arturia iSEM --
And I'm going to start by
launching Arturia iSEM --
a very powerful synthesizer
application.
And I have a synth trumpet
sound here that I like.
[ Music ]
So I like this sound
and I want to put it
in a track that I'm working on.
This is going to serve as our
Audio Unit plug-in application.
And now I'm going to launch
GarageBand, which is going
to serve as our Audio
Unit host application.
Now, in GarageBand, I have a
sick beat I've been working
on that I'm calling WWDC Demo.
Let's listen to it.
[ Music ]
Well move into what I call
"the verse portion" next.
[ Music ]
And next, we're going to
work on this chorus here.
This is supposed to be
the climax of the song.
I want some motion.
I want some tension.
And let's create that by
bringing in an Audio Unit.
I'm going to add
a new track here.
Adding an instrument, I'll see
Audio Units is an option here.
If I select this, then I can
see all of the Audio Units
that are hosted here
on the system.
Right now, I see Arturia iSEM
because I practice this at home.
Selecting iSEM, GarageBand
is now going
to give me an onscreen MIDI
controller that I can use here.
It's complete with the scale
transforms and arpeggiator here
that I'm going to make use of
because I like a lot of motion.
Over here on the left, you
can see a Pitch/Mod Wheel.
You can even modify
the velocity.
And here is the view that the
Audio Unit has provided to me
that I can actually tweak.
For now, I'm going to record
in a little piece here
and see what it sounds
like in context.
So --
[ Music ]
All right, pretty good.
Let's see what it
sounds like in context.
[ Music ]
There we go.
That's the tension that I want.
Now, let me dig in
here a little bit more
and show you what I've done.
I'm going to edit here.
And I'll look into this
loop a little bit more.
There are two observations
that I'd like you to make here.
The first one is that
these are MIDI events.
The difference between
using inter-app audio
and using Audio Units
as a plug-in is you'll
actually get MIDI notes here,
which is much easier
to edit after the fact.
The other observation I'd
like you to make here is
that you see these
individual MIDI notes here
but you saw me play one
big, fat-fingered chord.
So, it's because
I've taken advantage
of the arpeggiator that's
built into GarageBand
that I've got these
individual notes.
And I can play around
with these if I want to
and make them sound
a bit more human.
and make them sound
a bit more human.
But I'm happy with this
recording as it is.
The last thing that I'd actually
like to show you here is, first,
I'm going to copy this
into the adjacent cell.
And I told you earlier that the
Audio Unit view that's provided
here is actually interactive.
It's not just a pretty picture.
So if you were adventurous,
you could even try
to give a little
performance for your friends.
[ Music ]
Turn it up a little bit.
[ Music ]
Let's wrap it up.
[ Music ]
That concludes my demo.
[ Applause ]
I want to thank you for
your time, your attention,
and always for making dope apps.
[ Applause ]
>> Thank you, Torrey.
So, just to recap here, you can
see the session we did last year
about Audio Unit extensions.
It goes into a lot more detail
about the mechanics of the API.
We just wanted to show you here
what people have been doing
with it because it's so cool.
So, speaking of MIDI, we saw
how GarageBand recorded Torrey's
performance as MIDI.
We have a number of
APIs in the system
that communicate using MIDI,
and it's not always clear
which ones to use when.
So I'd like to try to help
clear that up just a little bit.
Now, you might just have a
standard MIDI file like --
well, an ugly cellphone
ringtone.
well, an ugly cellphone
ringtone.
But MIDI files are very
useful in music education.
I can get a MIDI file of
a piece I want to learn.
I can see what all
the notes are.
So if you have a MIDI
file, you can play it back
with AVAudioSequencer.
And that will play it back into
the context of an AVAudioEngine.
If you wish to control
a software synthesizer
as we saw GarageBand doing
with iSEM, the best API to do
that with is AUAudioUnit.
And if you'd like your
AUAudioUnit to play back
into your AVAudioEngine, you
can use AVAudioMIDIInstrument.
Now there's the core
MIDI framework
which people often
think does some
of these other higher-level
things.
But it's actually a very
low-level API that's basically
for communicating
with MIDI hardware --
for example, an external
USB MIDI interface
or a Bluetooth MIDI keyboard.
We also supply a
MIDI network driver.
You can use that to send raw
MIDI messages between an iPad
You can use that to send raw
MIDI messages between an iPad
and a MacBook for example.
You can also use the core
MIDI framework to send MIDI
between processes in real time.
Now this gets into a
gray area sometimes.
People wonder, "Well, should
I use core MIDI to communicate
between my sequencer and
my app that's listening
to MIDI and synthesizing?"
And I would say that's probably
not the right API for that case.
If you're using MIDI
and audio together,
I would use AUAudioUnit.
It's in the case where
you're doing pure MIDI
in two applications
or two entities
within an application --
maybe one is a static library
from another developer.
In those situations, you can
use core MIDI for inter-process
or inter-entity real-time MIDI.
So that takes us to the end
of our grand tour
of the audio APIs.
We started with applications
-- and at the bottom,
We started with applications
-- and at the bottom,
the CoreAudio framework
and drivers.
We looked at AVAudioEngine,
how you use AVAudioSession
to get things setup on all of
our platforms except macOS.
We saw how you can
use AVAudioPlayer
and the AVAudioRecorder
for simple playback
and recording from files.
Or if your files or network
streams involve video,
you can use AVPlayer.
AVAudioEngine is a very
good, high-level interface
for building complex
processing graphs
and will solve a
lot of problems.
You usually won't have to use
any of the lower-level APIs.
But if you do, we saw how in
AudioToolbox there's AUAudioUnit
that lets you communicate
directly with the I/O cycle
and third-party, or
your own instruments,
effects, and generators.
And finally, we took a quick
look at the core MIDI framework.
So that's the end
of my talk here.
You can visit this link
for some more information.
You can visit this link
for some more information.
We have a number of
related sessions here.
Thank you very much.
[ Applause ]