WWDC2018 Session 503

Transcript

[ Music ]
[ Applause ]
>> Good morning, it's a pleasure
to be here with you.
My name is Emmanuel and I'm an
engineer on the Core Image team.
This morning we'll be looking at
creating photo and video effects
using depth.
Let's get started.
With iOS 11 we started
delivering portrait depth data
alongside your portrait still
images.
During last year's WWDC sessions
we showed how you can leverage
that depth data to achieve
amazing effects, such as force
perspective, simulated depth of
field effects, as well as
various foreground and
background separation effects.
This year we're extremely
excited to announce that we're
coming out with a new feature, a
portrait matte.
So during the first half of this
session we'll be focusing on
portrait still images and how
you can apply really great
effects on them.
During the next half my
colleague Ron over from video
engineering will be focusing on
using a TrueDepth camera to
achieve real-time video effects.
All right, let's take a look at
the portrait segmentation API.
So I mentioned a portrait matte,
so what is a portrait matte?
A portrait matte is a
segmentation from foreground to
background and what this means
precisely is that you have a
mask which is 1.0 in the
foreground and 0.0 in the
background and you get soft and
continuous values in between.
The portrait matte is of
extremely high quality and is
able to preserve fine details,
such as curl and hair on the
outline of your subject.
This is amazing.
The matte can be used to achieve
great many effects, here is just
one of them where we essentially
do a foreground and background
separation by darkening the
background.
But we're really putting this
tool into your hands so that you
can create amazing apps and new
effects and delight your users.
All right, so the portrait
effects matte is coming to you
with iOS 12.
It is available for both the
front and the rear facing
camera.
It is available to you with
portrait still images and at the
moment only when there are
people in the scene.
Note that the portrait matte is
linearly encoded, so it's not
gamma encoded and what you get
is a grayscale buffer.
And also there's no guarantee
that you will be getting a
portrait matte alongside your
portrait still images, you
always do get the depth data but
you need to make sure to test
for its existence.
All right, let's take a look at
the API and how we can actually
load that data in.
So ImageIO provides a low-level
API that allows you to load
portrait effects matte.
So by calling CGImageSourceCopy
there's a new key it can pass
kCGImageAuxiliaryData
TypePortraitEffectsMatte.
And this call returns a
dictionary containing three main
pieces of information.
The image data itself as a
CFDataRef, metadata pertaining
to the buffer itself as a
CFDictionary, as well as
metadata pertaining to the
capture itself.
AVFoundation also provides a
higher-level API that sits on
top of ImageIO that you can use.
So taking the output from
CGImageSourceCopy you can feed
it to the AVPortrait effects
matte class.
And what you get out of it is
very simple it's a CV pixel
buffer along with pixel format
type so you can use that CV
pixel buffer for your further
processing needs, it's really
that simple.
AVFoundation also supports
portrait matte delivery at
capture time.
So starting with your typical
AVFoundation setup with your
AVCaptureInput, device, as well
as capture session.
The first thing you want to do
is make sure that your
environment supports the
delivery of the portrait effects
matte.
To do this you'll be checking
for is that data delivery
supported, as well as is
portrait effects matte delivery
supported.
The reason we have the two there
both depth and portrait effects
matte is that they come
together.
You can either opt in to get
only the depth data, but
whenever you want the portrait
effects matte you also need to
activate the depth data
delivery.
And to activate or to opt in for
that delivery make sure to
modify your
AVCapturePhotoSettings and set
the isPortraitEffects
MatteDelivery enabled, as well
as isDepthDataDeliveryEnabled to
true.
Then the capture time your
ddiFinishProcessingPhoto
callback will give you the
portrait effects matte.
It's really that simple.
All right Core Image also
provides you with a way to load
and save your portrait effects
matte.
A new queue was introduced
auxiliaryPortrait
EffectsMatte which you just pass
your image with contents of URL
and you get a CI image back
which contains the portrait
effect.
Core Image also allows you to
save your portrait effects
mattes directly into your files.
To do this there's a new context
option called
portraitEffectsMatteImage, you
pass in your CI image containing
the portrait effects matte, and
then you can write your file to
disk using for example
writeHEIFRepresntationOfImage.
All right so one thing that's
important to note here is that
the three images, so your RGB,
the depth buffer, and the
portrait matte buffer live at a
different resolution.
So for example, the portrait
matte for the rear facing camera
is half-size and the depth data
is even smaller.
So let's look at the images
side-by-side in the case of the
rear facing camera.
What this means that in your
applications you need to make
sure to either down-sample your
RGB image to the size of your
portrait depth or portrait matte
or [inaudible] and sample them
to the size of your RGB image.
So that's all I wanted to talk
about today for the portrait
segmentation API and we have a
great demo for you to see this
live in action.
So during this demo I'll be
making use of a Jupyter Notebook
which is a browser-based,
real-time interpreter for
Python.
And we'll be making use of
Python bindings for Core Image
which we'll be introducing later
today in a separate session.
So let's start and load an image
that contains portrait depth and
portrait matte in.
So this is the image we're going
to be working with.
The first thing I want to show
you is what the depth data looks
like for that image, so let's
look at the two side-by-side.
So we have the portrait depth on
the left and the portrait matte
on the right-hand side and you
can just see how fine the
details are.
And we'll do a zoom crop in just
a minute so that you can better
appreciate just how high quality
it is.
Then next thing we do is we
resize the images.
As you see the RGB and the depth
data vary greatly in size.
So we resize our images and
let's have a look at them
side-by-side.
So we have our RGB and our depth
data on the right-hand side.
During the first part of this
demo I'll focus on depth data,
then we'll see how things get
much, much simpler when you use
a portrait effects matte.
So the effect I'm going to be
working on today is depth
thresholding and essentially
what I'll be doing is computing
a histogram of the gray level
values in my portrait depth.
And I'll be applying a threshold
or a clipping point in that
histogram so that everything
becomes zero or one depending if
it's sitting below or above that
threshold.
Then we'll be closing holes in
the image by using morphological
closing operations and then
blurring the mask so that we get
a nice feathered look.
Let's have a look at this in
action.
So remember all of this is
executed live in the browser
using Core Image as a back end.
All right so the first thing I
want to show you is how changing
percentile point changes my mask
here.
So the higher it is the less
aggressive I am on clipping the
foreground.
So let's pick a value that's
reasonable, maybe something like
this here.
And what you can see here is
that there are regions or
islands we call them that are
connected to my foreground and
there's a bit of the subject
here I'd like to take out.
So what I'll do is I'll add a
bit of morphological closing.
Look at how this appears
magically.
If I go too far obviously I lose
my entire subjects, I don't want
to do that.
So let's pick something like
this.
What I can do then is change the
feathering by applying the mask
on top of it all.
Let's take a look at how the RGB
is threshold in back.
This is not the effect we're
coming up it's just to give a
sense of how the mask is
applied, so let's keep going.
So I've chosen a few parameters
for this thresholding here and
this is what I'll be using for
my foreground.
Next, I'm applying an effect
only on my foreground, so in
this particular case I'm using
the Core Image photo effect Noir
which turns everything grayscale
and has a has a bit of contrast.
I'm doing an exposure
adjustment, as well as
desaturating my image slightly
and augmenting the contrast even
further.
Let's take a look at the output.
This is going to be the
foreground that I'll be using
and what I want to do here is
leverage the depth data mask
that I have to composite this
foreground onto a background.
Let's just generate a background
which is just a darker version
of the original image.
We can then composite the two
together using the Core Image
filter blendWithMask and we have
the result right there, it's
that simple.
All right.
Thank you.
[ Applause ]
Okay so as you saw that required
a bit of fiddling with the
parameters for clipping, for
smoothing, and then so on and so
forth.
The portrait effects matte
enables you to do this without
actually doing much process,
this is really exciting so let's
have a look at this.
So here we're starting with
another image that also has
portrait depth and portrait
matte information embedded into
it, so let's look at these.
As I mentioned earlier, this is
an extremely high-quality
foreground mask.
So let's have a look at a crop
on this hair.
Look at how fine the detail is,
this is beautiful.
On the right-hand side here is
the depth that it shows you just
how coarse the depth data is
compared to effect matte.
So we'll do another foreground
separation effect here similarly
to what we just did but we used
a portrait matte instead of
using the portrait depth.
So let's look at our foreground
similar as before but in this
case, we'll desaturate the
foreground slightly, add a bit
of contrast, as well as some
vibrance to the image.
Now let's generate a background,
in this case we'll do a disk
blur which is another Core Image
filter, as well as bring down
the exposure so things get
pretty dark.
But we still get a bit of
background remaining, it's quite
faint but it's still there.
And again, I use a CI blend with
mask which is going to do the
compositing for us with the mask
and we have left and right.
Isn't this beautiful?
[ Applause ]
Thank you.
All right let's look at another
great demo which we call Big
Head.
So because the portrait matte is
so fine we can actually do
things like change the
[inaudible] size of our subject
with respect to the background
using it.
So let's do just that and we'll
do it live.
So here's our input image on the
left here and the portrait matte
on the right-hand side.
And what I'll be doing here as
I'll be playing with the size of
the subject in my frame notice
how the subject is getting
smaller and bigger.
And you can actually you know
give more weight to the subject
in your frame, but you can also
do pretty cool things like now
that I have this let's say that
I pick my favorite size here.
You can do things like pseudo
simulated depth of field here by
giving more contrast to the
foreground and using a
[inaudible] in the background to
give more pop to your subject.
All right let's take a look at
just another image to see just
how easy that is because they're
using the exact same pipeline
under the hood which is very
simple just using the portrait
matte and using it to blur the
foreground and background.
Again our input image here, we
can change the size of it, make
it bigger, then apply a bit of
contrast to give more focus to
our subject.
It's that easy, really exciting.
[ Applause ]
All right let's take a look at
another demo which we call
Marching.
I'm not even going to try to
explain what it does, let's just
take a look at the filter in
action.
There you go, fun stuff and just
because we can do it I can
expose how many of these I want
to stitch together.
So going from just a few to you
really, really pushing that way,
way too far.
Really exciting stuff.
All right.
So that's it for this demo, I
hope you enjoyed this.
So if you'd like to know more
about using Python bindings for
Core Image I encourage you to
come to this afternoon's session
on Core Image Performance
Prototyping in Python.
That's all for me today, let me
introduce you to my colleague
Ron from video engineering who
will be talking about real-time
video effects with TrueDepth.
Thank you everybody.
[ Applause ]
>> Thank you Emmanuel.
Great photo effects but what
about video.
My name is Ron Sokolovsky and I
am from video engineering.
In this part we are going to
leverage the TrueDepth's camera
to create similar effects with
real-time video, like for
example this background
replacement app.
In order to create such effects
we are going to deep dive into
the stream coming from the
TrueDepth camera, the
characteristics, best practices,
and challenges.
We are also going to show you
how to work with point clouds, a
completely different way to
process and render rich depth
information.
And that background replacement
app we're calling it Backdrop
and we'll show you how to make
it step-by-step.
But first things first, the
stream for the TrueDepth's
camera is made of frames, each
frame is a depth map, a 2-D
image in which each pixel
contains the depth information
or the distance to the scene in
that direction.
We've chosen a specific coloring
scheme, closed pixels are
colored in red while fire red
pixels are colored in blue.
In between them there is a
colorful spectrum so you can see
the texture of the depth map.
There are also black pixels,
those are holes in a depth map.
For those pixels we have no
information what is the depth.
We are releasing today a new
tool, a sample app for you to
explore this stream and we call
it TrueDepth Streamer.
You can slide between the video
stream and the TrueDepth stream.
Now because the TrueDepth camera
has active illumination even in
complete darkness while the
video is pitch black it is
business as usual for the
TrueDepth camera.
So now you see me and now you
don't.
[ Applause ]
So how do you add the stream
from the TrueDepth camera into
your application?
Well I'm glad you asked.
The first thing you need to do
is to discover the built-in
TrueDepth camera and then you
initialize the device capture
input.
And you add the depth data
output into your session.
At this point you're good to go,
you can start the session and
you will have the TrueDepth
stream with your session.
This stream can come in two
forms of data, disparity or
depth.
Now disparity is the inverse of
depth and vice versa, so which
one should you choose?
Well disparity usually yields
better results, especially for
machine learning applications
but the depth data has more
meaning in terms of real-world
measurements.
Know that if you work with depth
that the depth error goes with
the depth squared.
That means that an object at 1
meter would have four times the
depth accuracy as an object at 2
meters.
We have two streams, video and
depth, and they don't
necessarily share the same
resolution.
The native resolution of the
TrueDepth's camera is VGA or
640x480 and that's what you'll
get if you choose a video preset
of an aspect racial of 4:3.
If however you choose an aspect
ratio of 16x9 you'll get a depth
map of 640x360.
In both cases the depth map will
cover the entire field of view
of the RGB image.
Now we are talking about video
applications, so we are
crunching a lot of numbers very,
very fast and that could create
system pressure over time.
So you can test your application
and gauge the system pressure
level which goes from nominal to
fair, serious, critical, and
then shutdown.
And the responsibility is in
your hands because the system
will let you go all the way to
shutdown but when it does it's
bye-bye every capture device.
Another thing you can do is to
adopt a degradation scheme, if
the pressure level gets serious
you can reduce the frame rate to
15 frames per second or you can
choose a more elaborate scheme
with gentle degradation going
from 30, 24, 20 and 15 frames
per second anytime the pressure
level increases.
So we have holes in the depth
map what can we do about it?
Well, in fact you could get the
stream already filtered for you.
There is a parameter called
isFilteringEnabled and it's'
defaulted to true, which means
you get a filtered depth map
smooth, spatially and temporally
and the holes are filled from
the RGB image.
This is especially useful for
photography and segmentation
applications because you know
every time you query a pixel you
get the depth's value.
In TrueDepth Streamer you can
switch to the filter stream and
see that it is smoother and the
holes are filled.
So this is great, but it is not
applicable to 100% of the use
cases.
If you're working with point
clouds or any type of real-world
measurements you're better off
staying with the raw data which
holds the highest fidelity.
If you do you will have holes,
you will have pixels marked as
zero, it does not mean that they
are the distance of zero meters
from the camera it just means we
have no information about them.
Therefore, you should watch out
for operations like averaging
and downsampling because you
don't want to mix those real
values with those zeros.
But why do we even get holes?
Well the TrueDepth camera
detects objects up to a distance
of about 5 meters, but not all
materials are made the same.
Some materials have low
reflectivity.
they absorb most of the lights.
For example this extreme
scenario is a very low
reflective fabric watch what
happens when we switch to the
depth map and I walk away from
it.
Even though there are objects in
the scene with larger distance
we see holes forming on this
fabric because it's absorbing
most of the light.
If we switch to the filtered
stream and repeat the same
motion those holes are filled.
But it's not only about the
amount of light reflected back
it's also about the direction in
which it is reflected to.
Some materials are specular or
shiny and they are very picky
and choosy in which direction
they send back the light.
An extreme scenario would be
this display, you can watch the
video stream to see the
reflection.
And when we switch to the depth
map holes are forming depending
on the angle between the device
and the screen.
And if we switch to the filtered
stream those holes are filled
but with less fidelity.
Another challenging scenario is
outdoor, typically in an outdoor
scene the background is very far
away so we don't expect to get
any depth on the background.
Also, the sun acts as an
aggressor to the active
illumination.
To demonstrate that I went
outside on a very sunny
afternoon, positioned myself
against the sun.
And when we switch to the depth
map you can see there's no
depths on the background and we
get some holes around the
frames, specifically in this
case the hair.
But still most of the depth on
the foreground is intact and
very useful.
One final point I want to cover
for getting holes is the fact
that from the perspective of the
TrueDepth camera some of the
light projected hits an object
on the way back so we get
shadows from the parallax
between the projector and the
camera.
You can see an example on the
right side of this mug, but
something is different here.
This is not a depth map so why
do we even get holes?
Well in TrueDepth Streamer you
can switch from 2-D mode into
3-D mode, which gives us a point
cloud view.
With point clouds we can
dynamically change the
perspective of the scene
creating even more holes when we
do so.
And now I can ask you is this
mug half empty or half full and
the answer is we have no idea.
By virtually changing the point
of view of the camera we don't
add new information and that is
because the TrueDepth camera can
do many things, but bending
light into the mug not one of
them.
Let's see this live.
[ Applause ]
So I'm starting with a video
view and I want you to look in
this corner.
Watch what happens when I touch
the screen, I will touch my
forehead, you can see an
indication of the depth.
And if I move the phone you can
see the [inaudible] changing and
the reason I can do so is
because we have the stream from
the TrueDepth camera running as
well and you can see that it is
overlaid on the video.
So we have this livestream 30
frames per second and we can
switch to the filtered stream
and then all the holes are
filled.
If I switch to the point cloud
view I can dynamically change
the point of view to the scene.
So even though I'm looking
directly to the device it looks
as if somebody's watching me
from up above.
Now the reason we call this a
point cloud is if I zoom in you
can actually see the points in
3-D space.
But being here with you in WWDC
I feel like I have to pinch
myself just to get things back
in perspective.
[ Applause ]
Thank you so that brings us to
point clouds.
How do we create them?
We're starting from a depth map,
a 2-D image in which the depth Z
is a function of the pixel
coordinates U and V.
And we want to transform it to a
new coordinate system in 3-D
space, X, Y, Z.
Now we already have Z right,
that's the depth from the depth
map but we want to get X and Y.
For that we need to get the help
from the Intrinsics Matrix which
holds information for the focal
lengths and principle point.
If for example I want to get X I
need to start with the pixel
coordinate U, subtract the
principle point, multiply by the
depth, and divide by the focal
lengths.
And naturally I have to do the
same thing for the other
dimension as well.
Now this Intrinsics Matrix is
accessible through the camera
calibration data.
In fact, this operation is done
in every frame of the TrueDepth
stream.
The reason for that is that the
video stream and the depth
stream are coming from two
separate cameras.
But because the TrueDepth camera
gives us a depth map we can
transform it into a point cloud
and re-project it to the
perspective of the RGB image so
the depth stream is already
registered on the video stream
for you and you get RGBD data.
Now, thank you.
Yeah, it's pretty cool.
Now these types of operations
are best done in metal graphic
shaders.
And you can download the code
for TrueDepth Streamer and you
want to focus on two areas.
In the vertex shader we control
the location of the points,
we'll start with the depth map
and transform it to real-world
coordinates or X, Y, Z.
Then we can multiply it with a
view matrix to change the point
of view to the scene.
In the fragment shader we get
the output of the vertex, but we
have to see if it's a real value
or a hole in the depth map.
If it's a hole and it's marked
as zero we don't know its depth
so we cannot transform it to X,
Y, Z and we would need to
discard this point.
If it is a real value we can
sample the RGB texture and add
color to the fragment or point
in this case.
So I understand this part was a
bit technical and a lot of you
come from different backgrounds.
Have no fear we have just the
app for you, an app to replace
your background, let's see it
live.
Let's see it live.
So I can put myself in Yosemite,
I can swipe down put myself in
something more abstract.
I can even go all the way to
Antelope Canyon, Arizona, it
took me 15 hours to get there
last time, I could have just
swiped down, saved a lot of
money on gas.
In fact, this application can
even put you in space where
nobody can hear you stream.
[ Applause ]
So how do we create that?
Anytime we're dealing with a
video application there's other
things that are going on a per
frame basis, in this case we
have to detect a face, create a
brand-new mask from the depth
map, smooth and upscale it to
the RGB resolution.
And then we take this foreground
mask and upscale it again to the
low-light background image.
And then we can blend or
[inaudible] them, but there's
something we can do to reduce
some of the complexity.
If anytime we load a background
image we resize it to the RGB
resolution just once not
per-frame, then we don't need
that second upscale and the
blending is done at low
resolution which makes a big
difference.
So let's deep dive into those
depths.
The first thing we need is to
find the center of the face.
And in iOS there are actually
quite a few ways you can get
face metadata.
You can use a Core Image
detector or the Vision
Framework, but in this case
since we just need the pixel at
the center of the face we can
use AV meta data object type
face.
But it gives us the center in
the coding system of the RGB
image and we need to map it to
the depth map which might not be
in the same resolution.
Once we have the value of the
depth of the face we can use it
plus a margin of characteristic
25 centimeters to threshold the
depth map and create a binary
mask, foreground is one,
background is zero.
In fact, we can stop here, we
can use this binary mask and
create the effect.
The transition from background
to foreground will be very
sharp, but we'll get some
fidgeting around the edges.
So we want to filter it a bit.
The first stage will be to apply
some smoothing from the
background to the foreground, in
this case Gaussian Blurring.
The radius of Gaussian Blurring
will determine the slope of the
transition and you can play with
the value to get different
effects.
Another processing stage we add
is gamma adjustments, it allows
us to further fine-tune this
transition from background to
foreground.
If we use a gamma value which is
higher than one we'll get a
narrower foreground mask.
On the other hand, if we use a
gamma value that is smaller than
one we'll get a wider foreground
mask and maybe some aura.
So you can create different
effects by combining those two
parameters.
If you use a large blur radius
and a large gamma value you
create this transparent
transition that makes you seem
as if you're a hologram in space
or similarly it could be
underwater and you can play with
the values to create different
effects.
If I keep the radius high and
reduce the gamma value to a very
low number I create this halo
around my head.
So you can play with this to
create your own effects.
How do we implement this?
In Core Image it is very
straightforward, we can
concatenate three filters in a
row.
We start with a Gaussian Blur,
we add the gamma adjustment, and
we upscale to the RGB
resolution.
But there are a couple of small
points I want to emphasize as
best practices.
Anytime you work with a
convolutional based operation
such as Gaussian Blurring the
best practice will be to start
by clamping to extent.
By repeating the border pixels
outwards we can make sure all
the borders of the image are
handled correctly by the filter.
Moreover, after the filtering
and just before the upscaling
the best practice will be to
crop back to the original extent
because that's the part of the
image we really care about.
At this point we have an alpha
matte of the foreground and you
can use it to create different
kinds of effects for the
background and the foreground
just like Emmanuel showed in the
first half.
In Backdrop, we blend the RGB
stream with a loaded background
image in a single line of Core
Image code using the alpha matte
we created from the TrueDepth
camera to create this background
replacement effect.
So the TrueDepth camera gives us
a resolution of depth map of
640x480 coming at you 30 frames
a second, already registered to
the video stream.
You can use it to create point
clouds and dynamically change
the perspective of the scene or
use the filter depth to create
different kinds of video
effects.
You can go to the webpage and
download the Jupiter Notebook
TrueDepth Streamer and Backdrop
and we hope it inspires you as a
starting point to many new cool
video effects to create in your
applications.
Come say hi at the AVCapture
lab, thank you so much for your
time.
Have a great day.
[ Applause ]