WWDC2013 Session 610

Transcript

X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
>> Morning everyone.
Welcome to Session 610.
I'm Brad Ford, I work on the
core media engineering team.
For the next hour I'm
going to talk to you
about the most popular
camera in the world.
Actually that's inaccurate.
If you go by Flickr data,
I'm going to talk to you
about the three most popular
cameras in the world --
iPhone 4s, iPhone
5, and iPhone 4.
And we recognize that you're
a big part of that popularity.
We bring the great
hardware, we bring the camera
that people get excited about,
and we bring the framework level
support, but you bring the apps.
And we wouldn't be as popular or
as successful without your apps
that make our platform
so useful, and so fun.
So thank you for that.
Today we're going to
have a brief appetizer
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
of greater transparency
for users,
and then the main
course is features --
lots and lots of new features,
and then we'll follow that up
with a sample code
update for our dessert.
We're not going to spend any
time today on core media basics
or AV foundation basics,
because we just don't have time
in an hour to do that.
But lucky for you, we've
talked about them several times
in the past, and all of
these sessions are available
on your WWDC app on
your phone right now.
So you could actually call it
up, and you could be listening
to me two years ago, while
you're listening to me now.
But turn the sound down.
First upgrade for
transparency for users,
last year we introduced
some security hardening
that we introduced in iOS 6
to make it more transparent
for users when the photos and
videos were being accessed
in their photos library.
And we did that by popping
up a dialog the first time your
application tries to access --
that is read to-- read from,
or write to the assets library,
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
so that the user would have an
opportunity to opt in or out.
And we warned you that you
should start paying attention
to errors that you get back
from AL assets library.
Well this year we're hardening
things even a little bit more,
and we do this for
a couple of reasons.
You've probably noticed that
on our iOS devices we have no
hardware blinky light that
tells you that recording is
in progress, and AV foundation
as a framework does not force
you to put up a UI saying
"recording in progress".
So therefore it's possible
to do something headlessly.
And users want to trust
your app, they want to know
when things are happening.
So also in some regions
it's now required by law
to present users notice
when the microphone
or the camera is in use.
So new in iOS 7, we are going
to introduce two new dialogs the
first time your app makes use
of the microphone or the
camera to allow users to know
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
about it, and to opt in or out.
Now the microphone dialog is
everywhere, that is all iPhones,
all iPads, everywhere.
The camera dialog is just in
certain regions where required
by law, such as China.
Here's how it looks in code.
The first time you create
an AV capture device input,
the dialog will be invoked.
In code it looks like this.
You call device input with
device, and pay attention
to that error, because it
might return an error now.
The very first time we need
to succeed, because we need
to return control
to you immediately,
but we actually don't know the
answer yet because the dialog is
up but the person might
not have said okay or deny.
So what do we do in the interim?
For a microphone
we produce Silence
until the user grants access,
and for the camera we spit
out black frames until
they've granted access.
If on subsequent launches we
already know what the answer is,
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
we can return an
error immediately,
and that's a new error
in AV foundation called
application is not authorized
to use device.
So pay attention to that.
That means the user is
choosing not to allow you
to use the camera or microphone.
Alright, on to features.
This is going to be the
major bulk of our talk today,
and we have a lot of
them to get through.
Five major feature areas, first
60 fps support, video zoom,
machine readable code
detection, or barcode detection,
focus enhancements, and
integration with audio session.
First up 60 fps support.
And we know a lot of you
have been waiting a long time
for this.
You've been waiting patiently,
and I think it's worth the wait.
We didn't want to unleash
this feature on all
of you before we had
really thought it through,
and given you comprehensive
support across the media stack.
So by introducing 60
frame rate movies,
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
we wanted to make sure you
could also do interesting things
with them, as far as
playback and editing.
So we are introducing
full iOS ecosystem support
for high frame rate content.
What does that mean?
On capture we support
720p video,
up to 60 frames per second,
with video stabilization,
and we write really cool movies.
These have droppable p-frames
in them, a feature of H264,
which allows them to
play back smoothly,
even on lower powered
machines, or older machines.
On the playback front,
we've beefed up our support
for audio processing in
the time pitch domain,
so that if you want to do
effects like slow the movies
down or speed them up, you
can do interesting things
with audio.
On the editing side, we largely
already had the support there,
but we do support fully scaled
edits in mutable compositions.
And lastly, in export we
allow you to do it two ways.
You can either export such
that the high frame rate areas
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
of the movie are preserved,
or you can do a frame rate
conversion that will sort
of flatten it all down
to 30 frames per second,
or something else.
But enough talk,
let's do a demo.
Alright. So the first demo
app is called Slowpoke.
This is an app that showcases
all four feature areas
of 60 frames per second support.
First one is capture,
as you might expect.
Now it looks just like a regular
capture app, except I don't know
if you can tell out there, but
it's a really fast frame rate,
it's a buttery smooth 60
frames per second preview.
And it's running the
camera at 720p 60.
You can also do all the things
you'd expect, like focus,
and it writes movies that have
the proper H264 bitrate profile
level, etcetera.
Let's go over to the
more interesting part
for today's demo, which is
the playback and editing.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
I recorded several
movies here previously,
they're all 60 frames
per second movies.
I'm just going to pick one
of them, and now we'll find
out why this app
got its namesake.
This is a clip of a
guitarist playing the prelude
from Bach's E Major Lute Suite,
let me play a little
bit for you.
[ Music ]
So let's say you're trying
to learn this piece yourself,
and he's going too fast,
you need to slow him down so
that you can hear it better.
I'm going to swipe to
the left to slow it down.
[ Music ]
And I'll go even slower now.
[ Music ]
So now you can really see
his fingers move well.
Alternately, you
could make him sound
like Yngwie Malmsteen,
which is my favorite.
[ Music ]
Notice how good it sounds.
We're preserving pitch here, so
that you could even export this
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
and pass it off as him
doing the real thing.
Now he's an amazing guitarist.
Alright, let's pick another one.
[ Applause ]
Thank you.
Let's have a little fun
at my dog's expense.
This poor animal protects our
house from dangerous birds
on wires, and this is
what he sounds like.
[ Dog Barking ]
Okay, protecting our house.
Now let's have some fun speeding
him up and slowing him down,
but this time I'm going
to engage chipmunk mode.
You'll notice over in
the corner I'm going
to turn the chipmunk
button on so
that we can make him
sound like a yip-yip dog.
[ Dog Barking ]
Or, like Barry White.
[ Dog Barking ]
Or a dinosaur.
[ Dog Barking ]
Okay, enough of that.
And finally, let's go --
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
[ Applause ]
Notice we have so many
frames in the movie
that it looks really good
when you slow him down.
Let's take this last one
here, this is an action shot,
kind of a frightening one of
my dog coming up towards you
at a million miles an hour.
Now let's engage
the chipmunk mode,
but let's say this time I don't
just want to mess around with it
in real time, I want to
program a slow motion part right
into the asset.
Okay, so I'll pick a point
right where he's starting
to come up the stairs.
Now I'm going to swipe down
to begin and end it, and edit.
And then I'll go to where he's
right next to me, and I'll swipe
up to end the edit, and
here I get to apply a rate.
So I'll set the rate to .25,
quarter speed, and apply it.
And you notice that the duration
just changed on this movie.
Now I can go back and play it.
[ Silence ]
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
Oh yeah. He's coming for you.
Okay. So -- and then of
course as you might suspect,
we would want to be able to
save these off for posterity,
so we have the export button
over here on the side,
which lets us export
to the camera roll,
either preserving the
high frame rate sections,
or going down to a constant
frame rate of say 30,
or something like that.
And that is Slowpoke.
[ Applause ]
On the playback side, AV
player does most of this
for you automatically.
If you just use player setRate,
it can select arbitrary rates
and play them back, and
do the really hard job
of keeping audio
and video in sync.
There's a new property
on the player item.
A player is composed
of player items,
because it's a queue model.
You can take the player item
and set its audio
time pitch algorithm,
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
that's what I was using there to
either adjust the pitch higher
or lower, or keep it constant.
I was using the bottom
two constants there,
which are spectral, which keeps
it preserved, and varispeed,
which alters the pitch.
And these are very high
quality algorithms.
They can go constantly from
32x down to 1 over 32x.
And they sound great.
On the editing side, I was using
AV mutable composition to build
up those temporal edits when I
saved off that scaled section.
I did that by creating an empty
composition, inserting all
of my source asset
into that composition,
and then choosing the section
that I wanted to scale up
or down, just by
using scale time range
to duration, very simple.
See the Slowpoke sample code
where you can find out how
to do all this yourselves.
And if you're interested in
the editing aspect of this,
I invite you to come back
tomorrow at 9:00 a.m.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
where we're having an
advanced editing session
with AV foundation.
On the export side, I'm using AV
asset export session to flatten
that out into a new movie.
Now as I mentioned, there
are two ways to do this.
You can use the pass-through
export preset if you want
to avoid any re-encoding.
That will just retime the media,
and send it out as section
60 frames per second,
section slowed down or sped up.
Or you can do a constant
frame rate export.
You might want to do this
if you want maximum
playback compatibility.
To do this you set the video
composition's frame duration,
saying my source's composition's
frame duration is 1
over 30 frames per
second, for instance.
This gives you maximum
playback compatibility.
And you can also choose to
set the time pitch algorithm
for the export as well.
So you can use a low quality,
or a cheap expensive
one during playback,
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
and then when you export
use a high quality one.
Again, that's all in Slowpoke.
Now on to recording, which
is why we're all here.
AV capture movie file output
just works, as you might expect.
It picks for you automatically
the right H264 profile level bit
rate, and makes sure that
the movie looks great.
If you want to do stuff
with the frames yourself,
you need to use AV asset writer,
and it requires some
additional setup.
As with all real-time
use of AV asset writer,
you need to set expects media
data in real time to yes,
otherwise it won't be able to
keep up with the frame rate.
And we have a new object
that helps you create
settings dictionaries
for the AV asset writer.
An asset writer doesn't
know what kind
of output you want by default.
You have to tell it what
kind of settings to use.
And this can be complicated
with high frame rate movies,
knowing what H264
keys to use, etcetera.
So you can instantiate an AV
output settings assistant,
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
tell it what the
source video format is,
tell it what the source video
frame rate is, and then ask it
for a dictionary of
settings, and then apply
that to your asset
writer, and it just works.
It'll pick the best
settings for you.
That was the recording aspect
of 60 fps, now let's talk
about how you just configure
the session in general.
Those who've used AV
foundation's capture classes
know that we have an AV capture
session that's the center
of our universe.
And the way that you
configure it is one call.
You set the session
preset to something.
We have a set of strings
that tell you the quality
of service you're going to
get -- photo, high quality,
medium, low, etcetera.
And that does the hard job
of configuring the inputs
and outputs for you.
now we had a problem with
60 fps captures on iOS 7,
because we didn't want to try
to make new session presets
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
for every conceivable frame
rate and resolution combination,
because that would result
in a combinatorial explosion
of presets, and it would be
very difficult to program to.
So in iOS 7 we're
introducing a parallel
configuration mechanism.
The old one is not going away,
but this one is for
a power use case.
And that is we're now going to
allow you to inspect the format
of the AV capture device, and
set the active format directly.
And when you do this, the
session is no longer in control,
it no longer automatically
configures inputs and outputs.
720p 60 capture is supported
on iPhone 5, iPod Touch,
the tall one, and iPad Mini.
Let's review how set
session preset works.
Here's a block diagram we have
of the various pieces
in a capture session.
You have inputs, you have
outputs, you have a preview,
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
and they're connected
via these white arrows,
which are represented in our
API as AV capture connections.
So you can see the capture
session kind of knows its inputs
and outputs, it knows
its topology.
So when you set a session
preset, let's say photo,
here's a common scenario,
you might also want
to get BGRA frames out of
your video data output instead
of the default, which is 4:2:0.
Here's what happens
under the covers.
The session goes and talks
to all of its outputs.
It says still image output,
for the photo preset what
shall -- what do you require?
And it requires full
res jpeg, so it figures
out that it should
give 3264 by 2448,
assuming this is an iPhone 5.
The video data output does
not give full res buffers
for the photo preset, it's
sort of a special case.
Instead it gives screen
sized buffers to make sure
that they're not too
large for your processing.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
So it picks a screen
resolution, and chooses BGRA
because you wanted to
override the default.
The video preview layer
just wants screen size,
and it can cope with
the native format.
So knowing all of
these requirements now,
the session goes up, aggregates
all of those requirements,
goes to the AV capture device,
and says pick me
the best format.
And the AV capture device
looks through its formats,
picks the best match, and also
picks the optimal frame rates --
min and max frame rate to
satisfy all those requirements.
That's what's happening
underneath.
Now using the new configuration
mechanism, it's simple.
Just do this.
Oh let's highlight
this one at a time.
The AV capture device
now exposes an array
of natively supported formats.
Each one is an AV
capture device format.
So here I'm iterating
through them trying
to find the highest frame rate.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
In the next little
section I look
at each format object's
supported frame rate ranges,
find the one that has the max
-- the highest max frame rate.
From that I select the
best format match based
on the highest frame rate range.
And once I have a best
format, I lock my device
for configuration,
set the active format,
and then I pin the min and
max to the highest frame rate
that I found, which is exactly
what I did in the Slowpoke app.
Always, always, always unlock
for configuration
when you're done.
A note about frame
rate selection.
Previously we've only
allowed you to do this
at the AV capture
connections, which as you know,
sit lower in the session
hierarchy than the device.
But what they're doing is
actually going and talking
to the device and setting
the active format --
or the active frame rates
on the device, and we would
like you to do now too.
This is the new preferred
mechanism
for setting frame rates.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
So talk to the device,
not the connections.
Frame rates can be
set at any time,
whether using the set
session preset API,
or the new set active
format API.
It's -- you can use
them at any time,
and it reconfigures the
graph without tearing down.
Sometimes you just
want to go back
to whatever the default
was supposed to be
for the given preset or active
format that you're using.
But if you don't know
what those defaults are,
you can just set the min
and max frame durations
to CM time invalid, which
will go back to the defaults.
As I mentioned, AV capture
connection's frame rate
selection, accessors
are now deprecated.
Please switch over to the
new one as soon as possible.
Okay, this slide hopefully
doesn't scare you too much.
This is a list of
supported formats
on the iPhone 5's
back facing camera.
It's a little overwhelming.
There are 10 formats here,
and actually that's
only half of them.
There are 20 formats,
but I left half of them
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
out because they're really just
two flavors of the same format.
There's a 420v and a 420f,
the v for video range,
the f for full range.
But if we take that
complexity out,
we're left with basically 10
formats, and as you can see,
they're sorted, ascending
by dimensions,
and the more commonly used
ones are listed first.
So let's take a look over
at the right-hand column.
You can see that most
of them are already used
by one session preset
or another.
There are, however, two new ones
that we've never exposed before
on iPhone 5, which is the 720p
60 format and one that's a 4
by 3 format, but not as big
as the absolute full
res 8 megapixel,
and that's a 5 megapixel
4 by 3, and it's not used
by any session preset.
To get at it, you have to use
the active format setters.
Here's what happens
when you use the new way
of configuring AV capture.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
Instead of talking to the
session, you talk directly
to the AV capture device.
You say I want your
active format
to be 8 megapixel, let's say.
Now when you do this, the
session is listening for that,
and the session says okay,
they are now in control,
I'm going to take hands-off.
My session preset is
now inputPriority,
which means I'm not going to
touch the inputs or the outputs,
I'm just going to let
the user be in control.
That means that the AV
capture device will now deliver
to the still image output
the full 8 megapixel.
Video preview layer
is an exception,
it still only gets screen size.
But now, new for
video data output,
you get the full 8 megapixel
buffers, not a scaled
down screen resolution
version of it.
Let's talk briefly about what is
in an AV capture
device format object.
It has a media type,
as you might expect,
so you know if it's
audio or video,
it has a format description
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
from which you can get
the pixel dimensions,
the pixel format
such as 420v, 420f.
You can also get the
video field of view.
This is a really handy one.
Previously, in order to
know what the field of view
of the sensor is, you would
have to run a video data output,
and then look through the meta
data and find the focal length
in 35-millimeter film,
and that's really hard.
Now you don't need to
run the camera at all.
You can just look through
the supported formats array,
and see what the field of
view is, and it's expressed
in degrees, and this is the
horizontal field of view.
Also it tells you whether a
given sensor format supports
video stabilization, so that
later when you enable it
on the connection, you can
know if it's going to succeed,
if it's going to turn
on stabilization or not.
You can also look through the
supported frame rate ranges,
such as I support 1 through
30 frames per second.
And there's something called
video binned, which you may
or may not have heard of.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
This is sort of a
sensor-specific keyword.
And binning means taking
groups of neighboring pixels
and binning them together,
sort of averaging them.
And it's a means of
reducing the resolution,
reducing the throughput,
but without reducing
the field of view.
Let me give you an
example of it.
Going back to the iPhone 5 back
facing camera, we have two 1280
by 720 modes available.
Previously we only let you
use the 1 through 30 one.
But the 1 through 60 one has
almost an identical field
of view, so you're not really
losing any depth of field.
Instead, one of them is
binned and one is non-binned.
If you'd like to read
more about what this means
and the different image
characteristics of binned
versus non-binned, I encourage
you to go do a web search
for sensor pixel binning,
and read all about it.
So some guidance about when
to use one over the other.
The session presets setting
mechanism is not deprecated,
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
it's not going away.
It's still a good one to
use, because it knows how
to optimally configure
inputs and outputs,
so it gives you the
best bang for the buck.
It's just one call and it
does everything for you.
But you should use the new
set active format means
of configuration if you
need a specific format,
such as the 60 frames per second
format, or if you're looking
for a specific field of view,
or if you need those full
resolution video data output
buffers, this is the
only way to do that.
Alright, I think we've
talked that to death.
Let's move on to video zoom,
and to do that I'd like to bring
up Ethan Tira-Thompson.
Thank you.
[ Applause ]
>> Hi everyone.
I'm very excited to talk
to you today about zoom.
I hope that perked you up,
because we've got some exciting
stuff here, and I think a lot
of you will want to
take advantage of it.
I'd like to start by
reviewing our current API,
which is the video
scale and crop factor
of the AV capture connection.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
Apparently this only applies
to still image outputs,
so you could enlarge an image
by setting this property.
And typically then you
would also apply a transform
to the preview layer so that
the user gets some feedback
as to how much zoom
has been applied.
So that would look something
like this screenshot
on the left.
And this is not being
deprecated,
so this is still available.
However, we're introducing a
new property simply called video
zoom factor, which is on
the AV capture device.
So this is at the
root of the session,
and applies to all image
outputs, including the preview.
So by setting this one
factor, you'll get a preview,
which is enlarged,
and also much sharper.
So to look under the hood and
let you know how this works,
let's go back to our
architecture diagram,
and we have our video
scale and crop factor
on the AV capture connection.
And notice this is only applying
to the still image output,
and again this is still there.
However, our new property up on
the capture device is applying
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
to all the image outputs,
including some not shown here,
such as the meta data output
and movie file output.
And because this is at
the root of the session,
we can do some interesting
things
with the image processing.
Normally when we're getting
an image from the sensor,
it's at the full
photo resolution,
it's the maximum
resolution of the sensor.
However, video resolutions
that we output, like 1080p,
are a lower resolution.
So we must scale down
the image in order
to get to that resolution.
Now if we want to
enlarge the image,
instead of upscaling the
video that we're outputting,
let's just crop the image
and not downscale as much.
This means that you'll be
getting a larger output,
without actually --
with retaining the original
detail that's coming
from the sensor.
Of course we can also
crop tighter than that.
So once we are requesting
a smaller source area
than the output, then we need
to upscale, and that's fine.
We have a property to let
you know when this is going
to happen, and this
is a property
of the new AV capture
device format
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
that Brad was just
talking about.
So you can check
for each format,
depending on the resolution
of the video that's going
to be returned, that will
adjust how much upscaling --
the threshold before
you hit upscaling.
So you can check the property
for each format and know
when you're going to
encounter this range.
To illustrate this, we have
a little animation here.
So the purple box would
be the same dimensions
as the video output, and the
red box would be the maximum say
that you want to zoom down to.
And as we increase the zoom
factor, we cross that threshold,
so now we're in the upscaling
range as zoom back out.
Now we enter the
crop zoom section,
so that's just cropping
on the sensor.
And you kind of just go back
and forth across the transition.
You don't actually need to know
where that threshold
is, but it's there.
So to talk a little bit
about the API behind this,
there's the aforementioned video
zoom factor, which is applying
to all the image outputs.
And it's up to a maximum value,
which is also a method --
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
property of the AV
capture device format.
So if you just care
about the current format,
the active sensor format,
there's a property,
the device active format,
and then you can check
the max zoom factor
of the active format,
and that'll let you know
what the current maximum zoom
that you can use is.
The device coordinates
are a little interesting
because they are going to
stay fixed through the frame.
You can think of this as an
optical zoom at the front
of the pipeline, the rest of
the pipeline has already been --
the image has already
been cropped.
And so all the device
coordinates are going
to be applying to the
image that is being shown.
So for instance, if we
set a focus interest point
on the soccer ball in the
corner, and then we zoom in,
as the image scales, the soccer
ball will go out of the field
of view, but that focus
point's going to stay fixed
in the corner where you set it.
Those corners are static.
Similarly, if the -- if you
have face detection enabled,
the faces will be returned
as they're being displayed,
and as the face goes out
of the field of view,
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
they will stop being detected.
There's a pre-existing
transform meta data object
for meta data object, which
also helps you coordinate these,
because we return the meta
data and device coordinates,
and if you want to convert it to
the preview layer coordinates,
this pre-existing method
will do that for you.
There's an interesting --
another aspect to this in
that we are now applying zoom
to video outputs.
And there's a temporal
aspect there.
Because we don't
want you to have
to increment the zoom
factor for each frame
as it's being captured, because
you might have some threading
issues, it might be hard to time
and synchronize your threads
and updates with the
capture of frames
as they're being received,
so we can do that internally.
And we have this new method,
ramp to video zoom
factor with rate.
So you can specify the target
factor, and the rate at which
to get there, and then we will
internally increment the zoom
factor on each frame as it's
being captured in real time.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
There's another method,
cancel video zoom ramp,
so that any time you can
do this interactively.
You can either call ramp
to video zoom factor again
with a new rate or a new target,
or you can just cancel the
current one if the user has --
cancels, you know,
lets go on the button.
And then any changes in the
rate are smoothed further,
so that you don't have any jumpy
transitions within the zooms.
Now rates are a little tricky in
zoom, because the apparent speed
of a zoom is actually determined
by the multiplicative factor
that we're applying, it's
not an additive thing.
So the rate is specified
in powers of 2 per second.
So if you want a consistent
speed of doubling every second,
then you'll set a rate of 1.
When you set a rate of 2,
it'll go twice as fast,
if you set a rate of .5
it'll go half as fast.
What you see here on the right
is the graph showing a rate
of 1.
So essentially every
second we double the zoom.
And so we go from 1 to 2,
2 to 4, 4 to 8, and so on.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
In practice you'll probably
want to stay around 1 to 3
for comfortable ranges, but
of course your app is welcome
to do whatever it wants.
To demonstrate this I'm going
to bring up Rob Simutis,
and he's going to
help me demo SoZoomy.
There we go.
Alright, so we're going to start
with a mode called
constant face size.
So there's a cinematic
effect called a dolly zoom,
where the zoom is changed
to keep a target object
a consistent size,
while the object is moving,
which causes the
background to shift.
So if I have Rob start walking
forward, tap on his face,
then if there's anything
in the background,
which it's mostly dark so
you'll have trouble seeing this,
but yeah, let's have
him try that again.
Keep backing up, and you can
kind of see -- there you go.
Let's see, I'll try again.
And we're losing the face.
But anyway, so that's
the constant face size.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
There's another aspect
of this demo,
which I hope you'll recognize,
if he turns around and I zoom
in a little bit,
let's get a good size.
And let's go.
[ Music ]
[ Applause ]
So I think people have
a lot of fun with that.
Let's take a look at the
code that's running in this.
First off if you
notice there's a slider,
which I was using
to adjust the zoom.
And I was actually accounting
for that exponential growth
in the zoom, and the formula
for that is there on the slide.
So what we do is you don't
want to directly take --
say your slider's going
from 0 to 1, you don't want
to directly send that
right into the zoom factor,
because that'll mean that it's
very sensitive on the wide end
of the zoom, it'll
be less sensitive
on the telephoto
end of the zoom.
By taking our maximum zoom
that you want to achieve,
and taking that to the power
of the current 0 to 1 target,
that'll give you that
exponential growth
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
over the range, and so
you get a linear feel
to the actual zoom motion.
And of course remember to lock
the configuration and unlock it
when adjusting these values.
Now for speed control, such as a
jog dial, or maybe you just want
to have a button you
hold down to zoom in,
you'll typically want to set
the target either to the minimum
or maximum zoom, and
then you'll be --
the user will be interactively
controlling the rate.
So in this case, we look
to see if we're zooming in,
then we go to the maximum zoom,
otherwise we go to
the minimum zoom.
And we pass this to ramp
to video zoom factor,
and then you specify some rate.
And then at any point if
the user cancels the zoom,
then you cancel the video
zoom ramp, and we will ease
out of the ramp so that
it looks very silky.
In summary, to compare it to
the video scale and crop factor,
they both apply to
still image output,
but our new video
zoom factor applies
to all the image outputs.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
You can set the zoom
factor directly,
but the new video
zoom factor API
on the AV capture device lets
you set the zoom rate as well.
And this is currently
available on the iPhone 5
and iPod Touch 5th generation.
And with that I'd like
to welcome Rob back
up to present machine
readable codes.
[ Applause ]
>> Thank you, Ethan.
Hi, I'm Rob Simutis,
and I'm also
with core media engineering.
I'm here to talk to you today
about machine readable code
detection, which is a formal way
of talking about
barcode scanning.
We've introduced this in iOS 7,
but to do real-time machine
readable code detection
for one-dimensional and
two-dimensional barcodes,
up to four at a time on both
the front and back cameras
on all supported iOS 7 hardware
that has a camera on it.
You can see this in
action today in the seed
with the passbook application.
In the upper right-hand corner,
there's now a scan code button.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
So when you press that, you
get a view to scan in codes.
These are PDF417 or QR codes,
or Aztec codes that are
in the passbook format,
and they pull directly
into your passbook.
Now beyond just those types,
we actually support a number
of different types of
machine readable codes,
or symbologies -- UPC-E often
found in products in stores,
EAN-8 and 13 commonly found over
in Europe, code 39, code 93,
and code 128, some other types
of one-dimensional codes.
In the 2D space we support
three types, PDF417 often found
on airline passes, QR
codes found on buildings
and billboards, and corn
fields in some cases,
and Aztec which you often find
on packages that you ship.
So we'll demo this
in action today,
and I'll invite Ethan
to come back up.
And we have a sample
app we call QRchestra.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
So Ethan has a version
of the application,
and his view has a
bunch of QR codes on it.
And each of these contains
a value that is a midi note.
And I'm just going to
have the scanner on mine,
if I can hold it steady.
[Several Beeps] So each of
the QR codes is a midi note,
and we translate that into
a string, and then we run it
through a synthesizer,
and then out.
And the detection's failing
'cause I'm shaking a little.
[Several Beeps] But
it allows you
to have a QRchestra
right in front of you.
There we go.
[ Beeping ]
Alright, there we go.
Thanks, Ethan.
So a couple of notes.
You'll be able to download
this with sample code along
with our slides and the other
demos that we have today.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
But those QR codes were actually
being generated on the fly,
they weren't fixed images.
And those are being done
with a new core image filter
that is available in iOS 7, so
you can go and check out how
to make your own QR
codes on the fly.
[ Applause ]
So getting into the
programming model,
in iOS 6 we introduced the AV
capture meta data output class,
and this was originally done
for face detection data.
So we've expanded that, and this
is how we get bar codes out.
Normally you add it to
your capture session,
and it has a connection
to your capture device,
and so this would be
your video device.
And then your application
implements a meta data output
objects delegate.
And as we detect barcodes,
machine readable codes,
we will then send an array
of those AV meta data
machine readable code objects
to your delegate.
We'll take a look
at this in code.
First, alloc/init your meta data
output, add it to your session,
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
create your meta data delegate,
along with its dispatch queue,
set it on the output, and
then configure the types
of machine readable codes
that you're interested in.
Here we've set it up to
look for Aztec codes.
Now in your meta
data output delegate,
you implement the
capture output,
didOutputMetadataObjects
fromConnection API.
And this will get
callbacks periodically,
and you'll receive an array
of AV meta data objects.
So because we're listening for
machine readable code objects,
we'll look and make
sure that they're
of the class AV meta data
machine readable code object.
And once we have one of
those, we can retain it
or use it further at that point.
So let's take a look now at
what a machine readable code
object contains.
It contains a bounds property,
the bounding rectangle
that it's sitting within,
an array of corners
which are CG points
represented as dictionaries.
We'll cover the difference
between bounds
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
and corners here
in a later slide.
It also has a type property, so
it indicates whether it's UPC-E
or QR, EAN-8, etcetera.
And then finally the most
important one is the string
value property.
This is our best effort
attempt to decode the payload
into a string that
you can make use of.
Now I say best effort,
because in certain cases
with some barcodes, maybe it's
damaged beyond recognition,
but we can still tell maybe that
it's a QR code or Aztec code.
This property might return nil,
so your code should be
prepared to handle this.
Now, bounds versus corners.
So let's say you've got
your QR code scanning app,
and your user is holding it
off center, and it's sort
of off axis, and the
bounds property's going
to come back as a CG rect.
It is a rectangle that is
axis-aligned with the image.
But your corners are going
to come back as the corners
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
of where the barcode were
detected, and so this allows you
to draw a much tighter
fitting overlay
of where the barcode was found,
so you can give a better
representation onscreen.
So, performance considerations.
These are some things you
want to take into account
for your application to get
the best user experience.
To start off, you should
really just enable the codes
that you're interested
in finding.
You generally don't want to
turn all types of barcodes on,
because this takes more
CPU, more processing power,
and it hurts battery life.
So depending on your
application's needs,
just enable the codes
that you're interested in.
You can also make use of a new
AV capture meta data output rect
of interest property.
We've introduced this
in iOS 7, and I'll talk
about this in a little bit.
You also want to pick
the right session preset
for your use case.
Most applications can
start off with the 640
by 480 session preset.
Depending on the density of
the codes, you might want
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
to go higher or lower,
maybe up 720p,
or something below 640 by 480.
But you can start there, and
adjust as your testing dictates.
You could also consider using a
new auto focus range restriction
API that can help you get faster
performance for auto focus,
and Brad's going to cover
this a little bit later.
And as Ethan said, you
could also make use
of the new zoom APIs to
get the barcode right,
nice and tight in your image.
So let's talk about
requesting the codes you want.
As before in iOS 6, you make use
of the AV capture meta data
output meta data object
types property.
And this is an array of string
constants, these are defined
in AV MetadataObject.h, you
can check out that header.
Now normally you -- with
iOS 6 we had behavior
where all meta data types
would be turned on by default.
This was fine when
we just had faces,
but now we're introduced a
new type for each symbology
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
of machine readable
code that we detect.
So that's really not
the ideal situation.
So in iOS 7 you need
to explicitly opt
in to all desired meta
data object types.
If your app was built and
linked prior to iOS 7,
you get the old behavior
of face data only,
if that device supports it.
So here's an example of what
you probably want to avoid.
You can make use
of the available meta
data object types method
on the meta data output,
which is the array
of all the supported types
that that device will support,
and then you set it on
the meta data output.
This would enable
everything by default.
Most apps should avoid this.
Instead, do something like
specifying your array of types,
and here we're going to look
for faces, as well as QR codes,
so this is the way we
prefer you to do it.
Specify them as you need.
So here, now I can find
faces within my QR code.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
Alright, let's talk about
limiting your search area.
The new property AV capture
meta data output rect
of interest was introduced,
and this is going
to help you narrow
the search window
for where you're scanning
for your meta data.
This works on faces
as well as barcodes.
By default, it's the
entire size of the image,
but you can restrict
that to a smaller portion
as your application needs.
And as Ethan talked about, and
also as Brad discussed last year
in WWDC slides, there's some
conversion that you need to keep
in mind when going between the
different coordinate spaces.
So the meta data output
rect of interest is
in the devices coordinate
space, which is different
from your preview layer,
or your video data output
coordinate space.
So we've provided conversion
methods that help you go
between those different
coordinate spaces,
and make it really easy.
Let's look at this
and visualize it.
So here I've got an app that's
doing a scan of a barcode.
You can see the sort
of highlighted region
up near the top.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
That's the rect of
interest that I would
like to have in my application.
As far as the video
preview layer's concerned,
its coordinates are
in pixel coordinates,
so the upper left is 0,0, and
the bottom right is 320, 540.
The meta data output,
however, is different.
It's actually rotated
90 degrees,
and its coordinates
are normalized.
They're in scalar
coordinates from 0 to 1,
so 0,0 is the upper left
and 1,1 in the bottom right.
So if we want our rect of
interest to be at 100 pixels
down and 320 by 540, in the meta
data outputs coordinate space
that's .1,0, and the
rectangle size is .4 by 1.0.
So you can see going back
and forth here could be a little
complicated, and it gets tricky
with mirroring and
video gravity,
and things of that nature.
So we've provided the methods
that help you go back and forth.
So when going from the
video preview layer
to the meta data output, you
use the AV capture video preview
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
layer meta data output rect
of interest for rect method.
To go the opposite way
from the meta data output
to the video preview layer,
use rect for meta data
output rect of interest.
We'll take a look at this
in code very briefly.
So using the previous example
that I showed you visually,
if I have my CG rect
that are the bounds,
I make my rectangle
that's 100 pixels down,
and it's 150 pixels
high, and then that's
in the preview layer's
coordinates,
so let's go to the devices,
or the meta data
output's coordinates,
and use the conversion method.
And then finally, we'll
set that rect of interest
on our meta data output.
So all of these things
are good to keep in mind
to give your users the
best experience possible
when doing machine
readable code scanning.
And just to drill this
home, this is supported
on every single platform
where iOS 7 is supported.
And with that, I'm going to turn
it back over to Brad to talk
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
about some additional
focus enhancements.
Thank you.
[ Applause ]
>> Are your brains
exploding yet?
This is a little bit of
information overload,
but I hope that if
you'll just focus with me
for a few more minutes,
all will become clear.
Alright, focus enhancements.
Focus is a hard job.
We take it for granted
because our eyes do
such a good job of it.
The eyes are the motor
that change shape
and can bring different
things into focus,
and the brain is
the engine that --
the pixel processing engine that
can determine what's supposed
to be sharp, and what
you want to focus on.
The iPhone does the same thing.
It has a physical mechanism
that can move the lens
so that it can get
it into focus,
and there's some
algorithms that have to run
to decide what should
be in focus.
But this can be a
really hard job,
because sometimes you have
ambiguous results, such as here
where we have a person looking
at a clock that's very close
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
to him, and a tree that's far.
And both might be equally sharp,
and so which one do you choose?
Which one should be in focus?
Sometimes we need a little
help to get that right.
And you can make sure
that we get it right
by using the new auto focus
range restriction modifiers
to our auto focus mechanism.
Here's what it looks
like in code.
You tell the AV capture
device to set its focus range
to just near, or just far, or by
default none means the default
of search the whole range.
For machine readable code
detection, we'd recommend
that you use near, unless
again you're going to look
for barcodes that are in fields.
And these auto focus range
restrictions are supported
on all iOS devices that have
cameras, so go to town on it.
The next enhancement
is smooth auto focus.
I'm going to show you
two different videos.
The one on the left is what
I'll term fast focus --
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
this is what's shipping
today, this is our algorithm
for finding focus -- and
then what I term smooth focus
on the right.
You'll notice that it's going to
pan from one side to the other,
and then pan back, and you'll
see different characteristics
in the focus.
The one that's fast has a
tendency to sometimes pulse,
or throb a little bit,
because it's running
through the whole
range really fast,
whereas the smooth one runs
slower, takes a little more time
to do it, but doesn't
have the visual pulsing.
So here I go, 1, 2, 3.
Take a look at both sides.
You'll see the one on the left
has a tendency to just come in
and out a little
bit more noticeably.
The right one is focusing.
It's just doing it more subtly.
[ Silence ]
You'll still see
it focus from --
every once in a while
you'll see --
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
you can definitely see that the
smooth one is zooming it back
into focus, but the left one
tends to be much more prominent.
Okay, like I said, the
fast focus is the one
that ships today.
We're offering now
the smooth focus
as an alternative behavior
modifier to auto focus,
and you do that by telling
the AV capture device
to set smooth auto
focus enabled to yes.
Smooth auto focus just
slows the focus scan down,
so it's a little less
visually intrusive.
We recommend that you use this
if you're recording a movie.
For instance, you want
to perform a tap to focus
in the middle of a recording.
Well you don't want
to ruin your recording
with a big vwoop in the middle.
So if you use the smooth
recording it will take a little
bit longer to get there, but
the smooth focus will be less
visually intrusive.
We do recommend that you
stick with the fast focus
for still image taking, because
it's faster, and no one's going
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
to see the pulse in the
resulting still image
if it got to focus faster.
This is supported on iPhone 5.
And Slowpoke makes use of
this when recording movies,
so you can see how it does it.
Now let's look at how you
program with these modifiers.
It's easy, just do that.
So as with all setters
on the video device,
you have to lock it for
configuration first.
If you're successful, then you
can start checking whether these
features are available.
Don't set them blindly, you'll
throw an exception on platforms
where these features
are not supported.
Auto focus range
restriction happens
to be supported everywhere,
but be safe.
So here we're going to
set it to the far range,
and then in the next block I'm
seeing whether smooth auto focus
is supported, and
I set it to yes.
And then for giggles I
threw in an extra one,
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
which is to set the
point of interest.
This is how you would do a tap
to focus at a particular point.
None of these actually
start a focus operation,
they're just programming
the next focus operation.
You're telling it I
want you to focus far,
I want you to focus smooth,
I want you to focus
at dead center.
And then the way that you
actually kick off the focus is
to set focus mode to auto
focus, or continuous focus,
and then unlock when
you're done.
The last bit we're going to
talk about today is integration
with application audio session.
Hopefully you've seen some
of the core audio sessions
in years past, and the one
that they just did I think it
was yesterday, where they talked
about some improvements
to AV audio session.
If you're not familiar
with it, you have one.
If you have an app, you
have an AV audio session.
It's a singleton instance
that every app gets,
whether they want it or not.
As soon as you use audio,
you have an AV audio session.
And it does important things for
you like configure the routing,
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
for instance, whether
both the microphones
and the speakers are
active at the same time,
or just speakers only.
You can customize your
category, for instance
so that you include Bluetooth
or not, lots of goodness there.
And new in iOS 7 they have
some great new features
for microphone selection
that they talked
about in yesterday's session,
where you can select a specific
microphone, top or bottom,
back or front, and you can
even set polar patterns.
For instance, if you want
an omnidirectional pickup
as opposed to cardioid
or sub-cardioid, so
great stuff there.
Why do I bring it up here in
a video recording session?
It's because we have a situation
on our hands called
dueling audio sessions,
and let me describe it to you.
Let's say you have an app,
and it plays back audio,
and it also does some
recording with the camera
and with the microphone.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
Well, you're probably going to
be using an AV audio session,
because you're playing
some audio,
and you're definitely going to
be using an AV capture session,
because you have
to if you're going
to use it for camera capture.
Unbeknownst to you, AV
capture session is kind
of lousing things up, because
it has its own little private AV
audio session.
So now what happens when you
play and record, you fight.
So you get a situation
where depending
on which one you started
first, one is going
to interrupt the other.
So if you started playback first
and then you start recording,
the playback stops, or
if you do vice versa,
then you interrupt
your recording.
Not good for anyone.
So in iOS 7, we're
changing that behavior.
There were some good things
about the old behavior.
By having a private
AV audio session,
we ensured that the AV capture
session is always configured
correctly to succeed
for recording.
And your audio session is
not configured automatically
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
by default to record,
it's just for playback.
So we needed to do that.
But now we're going to help
out the interruption problem
by using your app's
audio session by default.
And we tell you that we're doing
that by accessing the session
property uses application
audio session.
And again, by default it's yes.
If your app is linked before
iOS 7 you get the old behavior.
We use our own little
private audio session,
and nothing changes.
And we still will configure your
audio session now, not ours,
so that it succeeds
for recording.
And that's the default behavior.
You can opt out of
that behavior,
and there's an accessor for that
called automatically configures
application audio session.
(We're going for length here.)
After the capture is finished,
we're not going to attempt
to clean up our mess, so we're
not going to try to preserve any
of the state that was in
your audio session before we
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
configured it to succeed.
If you want to stash off
some state you can do
that before beginning
your recording.
Be careful though.
If you opt out of the automatic
configuration that we provide,
because you are now in control
of your AV audio session,
you can pick a category that
will make recording fail.
So just be on guard there.
I mentioned earlier in the talk
that we have this great new way
of configuring AV
capture devices
by setting the active format.
This however, does not apply
to audio devices on iOS 7.
It exposes a no format array,
its format's array is nil,
and that's because we already
have a perfectly good mechanism
on iOS 7 to configure audio,
which is the AV audio session.
If you want to configure
your input,
instantiate your AV audio
session, and then go to town -
setting gain, sample
rate, whatever you want.
Best practices, we do recommend
that you let us use
your app audio session
so that we don't have
the interruption problem,
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
and we do recommend that you let
AV capture session modify your
AV audio session by default,
because it'll succeed.
The exceptions to the rule would
be if you know that it's going
to do something that
you don't want to do.
For instance, by default it
will always pick the microphone
that's pointed the
same direction
as the camera that you're using.
So if you're using the
front facing camera,
it's going to pick the
microphone that's pointed
at the person's face.
If you for instance want to
use the front facing camera,
but also record from
something in the back,
you'll need to use your own AV
audio session configuration.
Lastly, sample code update.
Here's our dessert.
Last year we talked
about Video Snake,
which was a great demo app
that incorporates a lot
of capture aspects with open GL.
We've updated it this year,
incorporated iOS 7 APIs
and best practices,
including use of clock APIs
that I have not talked about
today, but they let you know
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
which clock we're
using, video or audio,
which one is the master
clock for a session.
It also illustrates best
practices with respect
to integration with open GL,
and writing with asset writers.
So please download it, and
model your code after it.
If you've been watching the news
-- the Apple news, you'll --
you probably were aware that two
weeks ago we introduced a new
iPod Touch.
It's the 16 gigabyte iPod Touch,
and what's unusual about it is
that it has no back
facing camera,
it only has a front
facing camera.
Well, if you have been
following Apple's sample code,
your app still works
with this new device,
because it would have picked
the right one by default.
So sample code is your friend.
Please use it.
Please model your code
after these samples
that we spend a lot of
time putting together,
because we want to make sure
you're using best practices
in your apps.
In summary, we talked about
user consent, transparency,
then we talked about
a lot of new features,
60 frames per second, video
zoom, barcodes, app integration,
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
app audio session
integration, focus enhancements.
And all of these demos that we
showed you today are available,
so go download them and
take a look at them.
Documentation, and of
course, related sessions.
Some of these already happened,
but you can already look at them
in your WWDC app, because
they've already been posted.
They're amazing.
Thank you for coming today,
and enjoy the rest of the show.
[ Applause ]