WWDC2017 Session 510

Transcript

>> All right.
Now -- thank you very much and
welcome.
And my name is David Hayward,
and I'm really excited to talk
to you today about all the great
new features and functionality
we've added to Core Image on
iOS, macOS, and tvOS.
We have a great agenda today.
I'll be giving a brief overview
of Core Image and a summary of
the highlights of what we've
added this year.
Then after that we'll spend the
rest of the hour going into deep
details on all our new APIs and
new features.
So, with that in mind, let's
start going in.
So a brief overview of Core
Image.
In a nutshell, Core Image
provides a simple
high-performance API to apply
filters to images.
These images can be used to
adjust the color, adjust the
geometry, or perform complex
reductions or convolutions.
There are several tricks that
Core Image uses to get the best
performance out of this because
these filters can be combined in
either simple chains like this
example here or in complex
graphs.
One of the tricks Core Image
does to get great performance is
to support automatic tiling, and
this allows us to -- if an image
is too big or there's too much
memory that's going to be
required to render a render
graph, we can reduce our memory
footprint.
Another feature of this tiling,
which it means that if we're
only rendering part of an image
we can be very efficient and
only load the portion of the
input image that's needed.
So these are great performance
things that you get for free by
using Core Image.
Also when you're writing your
own kernels in Core Image, our
language extensions provide you
the ability to get this
functionality for free with very
little difficulty.
Another thing to keep in mind is
that all the filters in Core
Image may be based on one or
more kernels, either our
built-in kernels or custom
kernels.
And another trick Core Image
uses to get great performance is
to concatenate these kernels
into programs.
This allows Core Image to reduce
the number of intermediate
buffers, which allows us to
reduce our memory requirements
and also improves image quality.
So with that introduction
underway, I'm going to give a
brief description of what we've
added to Core Image this year.
It falls into three main
categories.
First, of course, is
performance.
As always, this is something
that's very important to Core
Image, and we've added some
enhancements in this area to
give your application the best
performance.
Specifically -- the next thing
I'd like to mention is that
we've also spent a lot of time
this year enhancing Core Image
to give you, the developer,
better information about how
Core Image works internally.
All of these optimizations we do
internally that I alluded to in
the previous slides, you can now
see what we're doing to achieve
these.
Thirdly, we've added a lot of
functionality, and this is to
allow your applications to get
the best access to all the new
features on our platform.
So a little bit more detail on
these.
In the area of performance, Core
Image now allows you to write CI
kernels directly in Metal.
And we also have a new API to
allow you to better render to
destinations.
We'll be talking about this in
much more detail later on in the
presentation.
In the area of information, we
have a new API that allows you
to get information about what
Core Image did for a given
render.
And also, we have some great new
Xcode Quick Look support, which
we'll show you.
And in the area of
functionality, we have some
great new stuff as well.
We have a new collection of
filters, new barcode support,
and also support for editing
depth.
I want to call out the session
that occurred earlier today on
image editing with depth.
If you didn't see it, you should
definitely go back and watch it.
It goes into great detail about
how to use Core Image to edit
depth.
So now let me talk in more
detail about the new filters
that we've added this release.
We now have 196 built-in
filters, and we've added some
that have been -- are great new
additions.
For example, some of these are
very useful when you're working
with depth data.
For example, we have convenience
filters for converting between
depth and disparity.
We also have morphological
operations, which allows you to
erode and dilate an image, which
is useful for manipulating depth
masks.
We also have convenience filters
that allow you to combine an
image with two different color
cubes based on the depth of an
image.
I also want to call out a great
new filter which we talk about
in the previous -- in the
editing session called
CIDepthBlurEffect, which allows
your application to get access
to the great depth blur effect
that we have in our camera and
photos application.
Again, I highly recommend you
watch the image editing and
depth session that was recorded
earlier today.
We also have several other new
filters based on popular
requests.
We have a filter now that allows
you to generate an image from
text, which is great for
allowing you to add watermarks
to video or other textual
overlays.
We have a filter that allows you
to compare two images in
LabDeltaE space, which is great
for seeing if your results are
what you expect or to give a
user information about how much
an image might have changed.
We also have a new bicubic
upsample, or downsample filter,
which is great for a variety of
purposes.
We also have a new way of
generating barcodes, which we'll
talk about in more detail later
in the presentation.
Lastly, in the area of filters,
we have some filters that have
been improved since our last
release.
We have several of the blend
mode filters -- now behave more
like expectations.
And we've also improved greatly
the quality of our demosaic and
noise reduction filters that are
part of our RAW pipeline.
So, as we release new cameras,
we'll be getting -- or support
for new cameras, you'll see the
improvements of that.
So that's new filters.
I'd like to bring Tony up to the
stage who'll be talking in
detail about how to write
kernels directly in Metal, which
is a great new feature.
[ Applause ]
>> All right.
Thank you David.
Good afternoon everyone.
My name is Tony, and I'm really
excited to tell you about this
great new feature we've added to
Core Image.
So let's get right to it.
So first, let's put this in a
little bit of context.
If you refer back to this simple
filter graph that you saw
earlier.
What we're talking about now are
these kernels that you see here
at the bottom, which allow you
to implement your very own
custom code that will describe
exactly how you want the pixel
to be processed on the GPU.
So previously these kernels were
written in the CIKernel
Language, which was the shading
language based on GLSL, but it
also provided some extensions
that allows Core Image to enable
automatic tiling and subregion
rendering.
For example, we had a function
called destCoord that lets you
access the coordinate of the
destination that you are about
to render to regardless of
whether you're just rendering a
subportion of the output or if
the output image is tiled.
We also have a couple of
functions called
samplerTransform and sample that
let you sample from an input
image regardless of whether the
input image is tiled.
So again, as David mentioned
earlier, this provides a nice
abstraction to tiling so that
you don't have to worry about
that when writing your kernels.
So once these kernels are
written, they are then
translated, concatenated
together as much as possible
with other kernels, and then
compiled at runtime to either
Metal or GLSL.
Now for a rich language like
Metal, the compilation phase can
actually take quite a long time
at runtime.
And, in fact, in the worst case,
if you're just rendering a very
small input image - sorry,
rendering a very small image or
a relatively simple filter
graph, most of that time could
actually be spent compiling
versus the actual rendering.
So to show you an example of
that, here's a case where on the
very first render before any of
the compilation has been cached,
you can see there's a lot of
time spent compiling versus
rendering.
So if we step through these
stage by stage, the first step
is to translate the CIKernels
shown in blue.
And the second stage is to
concatenate the CIKernels.
And then we go through a phase
to compile the CIKernels to an
intermediate representation,
which is independent of the
device.
And then there's a final stage
to actually compile that IR to
GPU code to be executed on the
GPU.
So the problem here is that
concatenating CIKernels is
something that has been done
dynamically at runtime, so what
if we were to allow that stage
to happen after the compilation?
So what that allows us to do is
hoist up that really expensive
compilation at build-time and
therefore leaving behind only
the work that needs to be done
at runtime.
So, as you can see, this is now
a much more efficient use of the
CPU, not to mention lower power
consumption.
So I'm pleased to say that this
is now possible in Core Image
and that's by writing CIKernels
directly in Metal.
And to make this happen required
some really close collaboration
with both the Metal Framework
Team and the Metal Compiler
Team.
And we think this is going to
open up doors to some really
exciting new opportunities.
But first, let me just highlight
some of the key benefits that
you're already getting today.
So, as you saw earlier, now the
CIKernels can be precompiled
offline at build-time.
And along with that, you can get
some really nice error
diagnostics.
So if you had some typo or a
mistake in your kernel, you can
see that directly in Xcode,
without having to wait for the
runtime to detect them.
Second is now you have access to
much more modern language
features since Metal is a
relatively new language that was
based on C++.
And I want to stress that with
writing CIKernels in Metal, you
still get all the benefits such
as concatenation and tiling,
which has been the cornerstone
of our Core Image framework for
many years.
So nothing is compromised by
writing CIKernels in this new
way.
And furthermore, these new
CIKernels in Metal can also be
mixed with traditional
CIKernels.
So [inaudible] can contain
either traditional kernels or
kernels written in Metal.
And that allows this feature to
be maximally compatible with
your existing application.
And, as you would expect, this
feature is supported in a wide
variety of platforms, namely iOS
for A8 or newer devices, as well
as macOS and tvOS.
So now let's take a look at how
we go about creating these Metal
CIKernels.
The first step is to write your
CIKernel in a Metal shader file.
Then once you have that CIKernel
implemented, the second step is
to compile and link the Metal
shader file in order to generate
a Metal library that can then be
loaded at runtime.
And then a final step is to just
initialize the CIKernel with any
function from that Metal
library.
So let's take a closer look at
the first step, writing a
CIKernel in Metal.
So to do that, I'd like to
introduce you to our new
CIKernal Metal library.
And what that is is basically a
header file that contains our
CIKernel extensions to the Metal
shading language.
So namely we have some new data
types, such as destination,
sampler, and sample.
Destination lets you access all
the information that you need
that pertains to the output.
And sampler lets you access all
the information that pertains to
the input image.
And sample is a representation
of a single-color sample from an
input image.
And along with these types we
also have some convenience
functions that are very useful
for image processing.
For example, you can do
premulitply and unpremultiply,
as well as some color
conversions between different
color spaces.
So these new extensions are
semantically the same as they
used to be in the CIKernel
language.
There's just some slight syntax
differences that pertains to the
destination and sampler types,
so let me show you that in a
little bit more detail.
So here's a snippet of what our
CIKernel Metal library looks
like.
It is called CIKernelMetalLib.h,
and all our extensions are
declared inside a namespace
called coreimage to avoid any
conflicts with Metal.
So the first step that we have
defined is called a destination
and it has a method that lets
you access the coordinate of the
destination.
Previously, if you were writing
CIKernels in the CIKernel
language, you would have done
that via a global function
called destCoord.
But now, if you're writing
kernels in Metal, you need to
declare this type as an argument
to your kernel in order to
access that method.
And then the second type we have
to find is the sampler.
And this has all the same
methods that used to exist as
global functions, but they are
now implemented as member
functions on the sampler type.
So to give you a nice summary of
all that, here's a table that
shows you the syntax that used
to exist in CIKernel language
versus the syntax that is now
available in Metal.
And, as you can see, in the
CIKernel language, those are all
implemented as global functions,
but now with Metal, those are
all member functions on their
appropriate types.
So we think that with the new
syntax they'll allow you to
write your code to be more
concise and easier to read.
But for sake of portability, we
did include the global sampler
functions in our header, which
are merely just wrappers to the
new syntax.
So that'll help minimize the
amount of code changes that you
need to make if you're importing
from existing kernels to Metal.
So now let's take a look at some
examples of CIKernels in Metal.
The first one we're going to
look at is a warp kernel.
And as with all Metal shaders,
the first thing you need to
include is the metal underscore
stdlib, but for CIKernels you
need to include our Metal kernel
library, and that can be done by
just including an umbrella
header CoreImage.h. Then the
next step is to implement all
your kernels inside the extern C
enclosure, and what that allows
is for the kernel to be
accessible at runtime by name.
So here we have a simple kernel
called myWarp.
And all it takes is a single
argument, a destination type,
and from that destination you
can access the coordinate that
you're about to render to and
apply various geometric
transformations that you want to
it.
And then return the result.
And for sake of comparison,
here's that same warp kernel
that was implemented in the
CIKernel language.
So you can see they're almost
identical minus some minor
syntax differences.
But semantically they are the
same, and at the end of the day
compiled to the exact same GPU
code.
The second example here is a
color kernel.
And for the most part it looks
very similar, the only
difference is now we have a
kernel called myColor, and what
it takes is a single sample as
input.
From that sample, you can apply
various color transformations
that you want on it and again
return the result.
Here again is that same color
kernel implemented in the
CIKernel language.
And then the last example I want
to show you is a general kernel,
which you can do if you can't
implement your kernel as either
a warp or a color.
And so here we have a kernel
called myKernel.
And it takes a single input,
which is a sampler type, and
from that sampler, you can
sample anywhere in the input
image and take as many samples
as you need.
And again, do something
interesting with it and return
the result.
And one more time, here is that
same CIKernel written in the old
CIKernal language.
So now that you have a CIKernel
implemented in Metal shader
file, the next step is to
compile and link the Metal
shader.
So, for those who have
experience writing Metal
shaders, this build pipeline
should look very familiar.
It's basically a two-stage
process.
The first one is compiling a
.metal to a .air file, and then
the second stage is to link the
.air file, and package it up in
a .metallib file.
The only additional thing you
need to do here for CIKernels is
specify some new options.
The first option you need to
specify is for the compiler.
It is called -fcikernel.
And then the second option is
for the linker and it's called
-cikernel.
Note that there's no f on that
option.
And you can do that directly in
Xcode, and let me show you that
with a short little video clip
that illustrates how that can be
done.
So for the compiler option, you
could just look up the Metal
compiler build options and
specify -fcikernel directly in
the other Metal compiler flags.
And because we don't have a UI
for linker options to specify
that, you have to add a
user-to-find setting.
And give that setting a key
called MTLLINKER underscore
FLAGS and then the value that
you specify is -cikernel.
So you just need to set this up
once for your project.
And then all the Metal shaders
that you have in there will be
automatically compiled with
these options.
But if you prefer to do things
on [inaudible] or in a custom
scrip, you can also invoke those
two compiler and linker tools,
like so.
So now the last and probably the
easiest step is to initialize a
CIKernel with a given function
from the Metal library.
And so to do that we have some
new API on our CIKernel class,
and they allow you to initialize
the CIKernel with a given
function by name as well as a
Metal library that you can load
at runtime.
There's also a variant on this
API that lets you specify an
output pixel format for your
kernel.
So if your kernel is just going
to output some single shadow
data, you can specify a single
shadow format for that kernel.
So here's an example of how to
initialize the CIKernel.
All it takes is those three
simple lines.
The first two is for loading the
Metal library, which, by
default, if it was built in
Xcode, will be called
default.metallib.
And then once you have that data
loaded, you can initialize the
CIKernel with a given function
name from that library.
Similarly, for warp and color
kernels they can be initialized
with the exactly the same API.
So once you have that kernel
initialized you can apply that
however you like to produce the
filter graph that you desire.
So that's all there is to
writing CIKernels in Metal and
we think this is going to be a
great new workflow for
developers so we look forward to
seeing some amazing things that
you can do with this new
capability.
All right.
So now the next topic I'd like
to talk about is a new API that
we have for rendering to
destinations.
And this is a new consistent API
across all the different
destination types that we
support.
Namely IOSurfaces, which, by the
way, is now public API on iOS.
And we also support rendering to
CVPixelBuffers, as well as Metal
and OpenGL Textures.
Or even just some raw bitmap
data that you have in memory.
And one of the first things
you'll notice with this new API
is that it will now return
immediately, if it detects a
render failure, and give you
back an error indicating why it
failed.
So now you can actually detect
that programmatically in the
application and fails gracefully
if an error is detected.
With this API, you can also set
some common properties for the
destination object, such as an
alpha mode or a clamping mode
behavior, or even a colorspace
that you want to render the
output to.
Previously, with our existing
API, the alpha mode and clamping
mode was something that would be
determined implicitly, based on
the format of your destination.
But now, you can actually
explicitly override that with
the behavior that you want.
In addition to these common
properties, we have some new
advanced properties that you can
set on the destination, such as
dithering and blending.
So, for example, if you have an
8-bit output buffer that you
want to render to, you can just
simply enable dithering to get
some -- a greater perceived
color depth in order to reduce
some banding artifacts that you
may see in certain parts of the
image.
And a nice thing about these
properties is now they
effectively reduce the need for
having to create multiple
CIContexts.
And that's because some of these
properties used to be tied to
the CIContexts.
So, if you had multiple
configuration of different
destinations, you would have had
to create a CIContext for every
single one.
So now that these properties are
nicely decoupled, you can, for
the most part, just have one
CIContext that can render to
various different destinations.
But along with all these
functionality that this API
provides, there are some really
great performance enhancements
that can be realized with this
new API.
For example, our CIContext API
for rendering to IOSurfaces or
CVPixelBuffers.
They used to return after all
the render on the GPU is
completed.
But now with this new API, it
will return as soon as the CPU
has finished issuing all the
work for the GPU.
And without having to wait for
the GPU work to finish.
So we think this new flexibility
will now allow you to pipeline
all your CPU and GPU work much
more efficiently.
So let me show you an example of
that use case.
So here we have a simple render
routine that is going to clear a
destination surface and then
render a foreground image over
top of a background image.
So the first thing we do is
initialize a CIRenderDestination
object given in ioSurface.
And then the first thing we want
is to get a CIContext and start
a render task to clear the
destination.
But before waiting for that task
to actually finish, we can now
start another task to render the
background image to the
destination.
And then, now before we start
the final task, we can set a
blend kernel on this destination
object, which can be anyone of
our 37 built-in blend kernels.
In this case, we've chosen a
sourceOver blend.
But you can even create your own
custom blend kernel by using our
new CIBlendKernel API.
So once we have the blend kernel
that we want, we then call
CIContext to start the final
render task to render the
foreground image over top of
whatever is already in that
destination.
And only then do you need to
call waitUntilCompleted, if you
need to access the contents on
the CPU.
So with this new setup, this
will now minimize the latency of
getting your results without
having to do any unnecessary
synchronization with the GPU.
The next use case I'd like to
illustrate is one that will
highlight a much more subtle
performance benefit, but it can
have a huge impact in your
application.
And that's rendering to Metal
drawable textures.
So you can do that very simply
by getting a currentDrawable
from, let's say Metal
[inaudible] view.
And then from that, you can
initialize a CIRenderDestination
with the texture from that
drawable.
So this will work just fine, but
if you were to do this in a
per-frame render loop, there's a
potential for a performance
bottleneck here that may not be
so obvious.
So let me try to describe that
or explain that in a little bit
more detail with a timeline view
here.
And please bear with me because
there could be a lot of steps
involved.
So here we have a timeline that
has two tracks, the CPU at the
top and the GPU at the bottom.
Technically there's actually a
third component in play here,
which is to display.
But for the sake of simplicity,
we'll just treat that as part of
the GPU.
So in the very first frame, your
app will try to get a drawable
from the view.
And then from that drawable you
can get a texture and then start
a task to render to that
texture.
So once the CI gets that call,
we will start encoding the
commands on the CPU for the work
to be done on the GPU.
And in this particular case,
we're illustrating a filter
graph that actually has multiple
render passes, namely two
intermediate passes and a final
destination pass.
Once Core Image has finished
encoding all the work, the call
to startTask will return.
And then from then, that point
on the GPU will happily schedule
that work to be done at some
appropriate time.
But, if the work on the GPU is
going to take a long time, your
app could get called to render
another frame before the work is
done.
And at that point, if you try to
get a drawable, that call to get
drawable will stall until it is
ready to be vended back to your
application.
And then only then can you get
the texture from it and start
another task to render to it and
then so on for all subsequent
frames.
So, as you can see here, this is
not a very efficient use of both
the CPU and the GPU because
there's a lot of idle times on
both processors.
But, if you look closely here,
the drawable texture that we're
about to render to is actually
not needed until the very last
render pass.
So let's look at how we can
actually improve this scenario.
So, with our new
CIRenderDestination API, you can
now initialize it, not with the
texture, per se, but rather, all
the properties of the textures,
such as the width and height and
the pixel format of the texture.
And then you can provide that
texture via call back, which
will be called lazily at the
latest possible time for when
that texture is actually needed.
And so now, with the destination
object initialized immediately,
you can start a task and render
to it much sooner, and this will
effectively defer that
potentially blocking call to
currentDrawable to a much later
point in time.
So now, if we look at this
example, the work now on the CPU
and GPU can be pipelined much
more efficiently.
So if you're rendering to Metal
drawable textures, we strongly
encourage you to use this new
API.
Because this could greatly
improve the frame rate of your
application.
In fact, we have seen cases
where the frame rate literally
doubled just by simply employing
this technique.
All right.
So now I'd like to hand it back
to David who will tell you about
some really cool stuff that lets
you look under the hood inside
the Core Image framework.
Thank you.
[ Applause ]
>> Thank you so much Tony.
That's great stuff.
As I mentioned in my
introduction, Core Image has a
lot of great tricks it uses to
get the best performance.
And one of our goals this year
was to make it clearer to you,
the developer, how those tricks
are occurring so that you can
get a better understanding of
how to use Core Image
efficiently.
And we've done that in a couple
of interesting ways.
First of all, in our new APIs,
we have some new ways of
returning you information about
the render.
After you've issued a task to
render, you can now, after when
you wait for that task to
complete, it will return a
CIRenderInfo object.
And this object will return to
you an object with a few
properties on it, including the
number of passes that Core Image
needed to use to perform that
render.
As well as the total amount of
time spent executing kernels on
the device and also the total
number of pixels processed.
So that's just a great piece of
information that you can get
when we return information to
you.
But perhaps what is even cooler
is the awesome new editions
we've made to Core Image to
provide better Quick Looks in
Xcode.
Notably, we now have a great
Quick Look support for CIImages.
In addition to just showing the
pixels, we now show you the
image graph that you constructed
to produce that image.
If you do a Quick Look on a
CIRenderTask, it'll show you the
optimized graph that Core Image
converted your image graph into.
And, if you wait for the render
info to be returned to you, if
you do a Quick Look on that,
it'll show you all the
concatenation, timing, and
caching information that Core
Image did internally.
So to give you an idea, we're
going to show you this in a very
visual way.
Here's some code.
Let's pretend we're stepping
through this in Xcode.
Here's an example of a image
graph that we're going to
construct.
In this case, we're creating a
CIImage from a URL, and we have
a new option that we're
specifying on this image, which
is
kCIApplyImageOrientationProperty
to true.
And what this will do is
automatically make the image
upright for you, which is a nice
convenience.
The next thing we're going to do
is we're going to add onto that
image an additional
AffineTransform which scales it
down by .5.
Now imagine we're in Xcode, and
we hover over the image object.
And, if you click on the little
eye icon, it'll now pick up an
image like this.
In addition to the image showing
you what the image looks like
and it's nice and upright, it
also shows you the graph to
create that image below it.
If we zoom in, we can see all
sorts of interesting
information.
We can see, at the input of the
image graph, we have our
IOSurface.
This is, I can tell by looking
at it it's a YCC image.
It means it probably came from a
JPEG, and you can see the size
of that surface, as well as that
it's opaque.
You can then see the next step
above that in the graph is a
color-matching operation.
So we were able to determine
automatically what the
colorspace of the input image
was.
And we've inserted it in the
render graph and operation to
convert from the display P3
colorspace to Core Image's
workingspace.
And lastly, you can see three
affine matrices.
The first one, as we're counting
from the bottom, is the matrix
that converts from the image
coordinate system to the
Cartesian coordinate system that
Core Image uses.
Then we have the affine to make
the image upright, and then the
image affine to scale it down by
.5.
So now you can really look at an
image and see everything that's
being asked of it.
Now let's do something slightly
different this time.
Now we're going to ask for an
image, but we're going to ask
for the auxiliary disparity
image.
And this is a new option that we
have as well, and this, if your
image has depth information in
it, it will return that as a
monochrome image.
After we ask for that image,
we're going to apply a filter on
it.
In this case, we're going to
apply a bicubic scale transform
to adjust its size.
Now, if we were to hover over
this object while we're
debugging in Xcode, we will now
be able to get this image.
And now you can actually see the
disparity image where white is
in the foreground and darker is
further in the background, but
we can also see the graph that
was used to generate this image.
Here we see that the leaf or
input image is an IOSurface that
is a format luminance half
float, and you can also see that
the dimensions of the image are
smaller than the original image.
You can also see at the top of
this graph the cubic upsampling
filter that we've applied.
There's actually some method to
our colors that we chose here.
One thing you'll notice is all
of the inputs to our graphs are
purple.
Anything that affects the color
of an image, i.e., CIColorKernel
is red.
Anything that affects the
geometry, in other words, a
CIWarpKernel, is green.
And the rest of the kernels are
in a blue color.
So now let's get even more
interesting.
We're going to take the primary
image and we're going to apply
two different color cubes to it.
We're going to take those two
resulting images and then we're
going to combine them with the
CIBlendWithMask filter.
And, if we look at this in Quick
Looks, we now see the final
image where it's been
beautifully filtered with two
different effects, based on
foreground and background.
But also, we see detailed
information about the graph that
was used to produce it.
You can see here, on the
left-hand side, the portion of
the subgraph where we took the
input image, got the color cube
data, which is a 32 by 24 image,
and then apply that color cube
to it.
On the middle graph, we're doing
the same thing for the
background image.
All of these, plus the mask
image, are used to combine with
the blendWithMask kernel.
So we're hoping that gives you
great insight on how your
application creates CIImages.
But what happens when it comes
time to render?
And this is where things get
really interesting.
Once you tell a CIContext to
start a task, that'll return a
CIRenderedTask object.
This also supports Quick Looks.
And, if we look at this, we see
now an even more elaborate
graph.
Again, the color coding is the
same and we can see some of the
same operations but now, what we
saw before as a color matching
operation has been converted
into the primitives that are
needed to do the color
management.
So we can see that we needed to
apply this gamma function and
this color matrix to convert
from P3 to our workingspace.
Another thing you can see is,
while the original image had
three affine transforms, Core
Image has concatenated those all
into one.
Another thing you can notice is
at the end of the graph, we now
know what the destination
colorspace is.
So those operations have been
applied to the image as well,
both applying the colorspace as
well as the coordinate system
transform for the final
destination.
You can also see it associated
with all these objects or all
the dimensions of all the images
that are involved at each stage
of the render graph.
Now, if you wait for your task
to be completed, now there's a
new object.
And associated with this object
is detailed information about
how Core Image was able to
concatenate and what the
performance of that render was.
You'll see now there's much
fewer objects in the tree
because of concatenation.
If we look at one of the lower
ones here, we can see that this
particular program is the result
of concatenating a few steps
into one program, and we can
also see associated with this
the amount of time that was
spent on that program in
milliseconds.
One great feature of Core Image
is, if you then render this
image again later and a portion
of the graph is the same, then
Core Image may be able to obtain
the previous results from a
cache.
If that happens, then you'll see
the time go to zero in those
cases.
So this also provides a way for
your application to know how
efficiently Core Image is able
to render and cache your
results, given memory limits.
So we hope you find this really
informative and instructional on
how Core Image works internally.
The next subject I'd like to
talk a little bit about today is
barcodes, specifically the
CIBarcodeDescriptor API.
We now have great broad support
for barcodes on our platform.
Barcodes of various different
types from Aztec to Code128 to
QRCode and PDF417.
We also have support for
barcodes across a broad variety
of frameworks, and those
frameworks use barcodes in
different ways for good reasons.
For example, oops, sorry.
AVFoundation is the framework to
use if you want to detect
barcodes when you're capturing
video from the camera.
If you want to detect barcodes
from either still images or from
a video post-capture, the Vision
framework is a great way of
using it.
And lastly, you can use Core
Image to render barcodes into
actual image files.
Given this broad support across
frameworks, we wanted to have a
new data type that would allow
the barcode information to be
transported between these
frameworks in a lossless way.
And that is the reason for the
new CIBarcodeDescription API.
There's one key property of this
object, and that is the
errorCorrectedPayload.
This is not just the textural
message of a barcode.
It's actually the raw data,
which allows you to use barcodes
and get information out of them
in ways that are more than just
textural information.
So, with this raw
errorCorrectedPayload and
understanding of each barcode's
formatting, you can do
interesting properties and build
interesting vertical
applications based on barcodes.
Also there are properties that
are unique to each particular
barcode.
For example, in the case of the
Aztec barcode, you can know how
many layers were involved in
that code.
Or, in the case of the QRCode
barcode, what the maskPattern
that was used on this.
So let me give you an example of
how this type can be used across
these three frameworks.
So firstly, in the area of
AVFoundation, it is possible to
register for a
metadataOutputObjectDelegate
that will see barcodes, as they
are appeared in a video feed.
In your code, all you have to do
is setup this object.
And, in your response for when
that object is detected, you can
ask for the object as a
AVMetadataMachine
ReadableCodeObject.
From that object, you can get
the descriptor property, and
that will return one of the
CIBarcodeDescriptor objects.
Second if you want to use the
Vision framework to detect
barcodes, the code is really
simple as well.
Basically, we're going to use
Vision to both create a request
handler as well as a request.
Then we're going to issue that
request to the handler to detect
the barcode.
Once we get the request results
back, we can then get the
barcodeDescriptor object from
that result.
Very simple.
And lastly, simplest of all, is
using Core Image to generate a
barcode image from a descriptor.
So, in this case, it's very
easy.
We just create an instance of a
CIFilter of type
BarcodeGenerator.
We then give that filter the
input of the descriptor object
for the inputBarcodeDescriptor.
And then we're going ask for the
output image.
And these combined allow us to
do some interesting stuff with
both detection and generation of
barcodes.
And so, as a brief prerecorded
demo of this, let me show you a
sample app that we wrote.
And what this does is look for
frames of video, pulls the
barcode, and actually renders
the barcode back over the
barcode as a -- kind of a
augmented image over it.
And you can see that we were
able to perfectly reproduce the
detected barcode and re-render
it on top of it.
If we just see that again real
quickly, we can actually even
see that it's being render
because overlapping my thumb in
the image.
So --
[ Applause ]
All right.
So for our last section of our
talk today, I'm really excited
to talk about how to use Core
Image and Vision together.
These are two great frameworks.
Core Image is a great easy to
use framework for applying image
processing to images.
Vision is a great framework for
finding information about
images.
And you can use those together
in great and novel ways.
For example, you can use Core
Image as a way of preparing
images before passing them to
Vision.
For example, you might want to
crop the image to an area of
interest.
Or correctly make the image
upright.
Or convert to grayscale before
passing division.
Also, once you've called Vision,
once you have information about
the image, you might be able to
use that as a way of guiding how
you want to adjust the look of
your image.
So, for example, if a feature is
detected in an image, you might
choose different ways to apply
image processing to it.
And, of course, these two can be
combined together.
But to go into a little bit more
detail about this.
We have a interesting demo we'd
like to talk about today, where
we're going to try to generate a
photo from several frames of the
video.
Where the unwanted objects have
been removed from that.
And this demo is going to
involve using three frameworks
and four steps.
The first step is going to be
using AVFoundation to get the
frames out of the video.
And that's very simple to do.
Then we're going to use Vision
to determine what the homography
matrices are needed to align
each of these frames to a common
reference.
Inevitably there's some camera
shake, so a little bit of
correction goes a long way.
So that will allow us to get
these homography matrices
represented here as these
slightly moving arrows for each
frame in the image.
The third step is to use Core
Image to align all these frames
to each other.
And that's very easy to do, as
well.
And lastly, we're going to use a
median technique to create a
single photo from all the frames
in the video in a way that
produces an optimal image.
The technique here is to produce
an output image where, at each
location of the output, we're
going to look at locations in
the input frames.
And we're going to use the
median value at each location.
If you look in this example
here, the first four images are
that little spot is over the
concrete pavement, but the fifth
one is over the legs.
Now, if we take the median of
these five values, we're going
to get a value that looks like
just the concrete.
If we do the same thing at
another portion of the image,
again, we're here underneath the
tree.
Three of the five frames are
good.
The other two are less good.
So we're going to use the median
of those, and if you do that for
every pixel of an image, you get
a great result.
Where objects that are
transitory in the video are
magically removed.
Let me talk a little bit about
the code, and I promise we'll
get to the demo at the end.
The first step here is we're
going to use Vision to determine
the homographic registration for
each frame.
Again, this is quite easy.
We're going to use Vision to
create a request and a request
handler to produce that
information.
We're going to tell Vision to
perform that request.
And then once we get the result,
we'll ask for the result and
make sure it is an object of
type VNImageHomographic
AlignmentObservation.
Wow, that's a mouthful.
And then we'll return that.
That object is basically a 3 by
3 matrix.
Once we've returned that, we can
then use Core Image to align the
images based on this 3 by 3
matrix.
This is a little tricky, but
it's actually very easy to do
using a custom warp kernel
written in Metal.
You can see here in this kernel
we now have a parameter which is
a float3 by 3.
This is something that's new to
Core Image this year.
And what we're going to do is
we're going to get destCoord,
convert that to a homogenous
destCoord by adding a 1 to the
end.
We're then going to multiply
that vector by the matrix and
that gives us our
homogenousSrcCoord.
Then we're going to do a
perspective divide to get us the
source coordinate that we're
going to sample from.
And that's all there is to it.
The last step is to apply the
median filter in Core Image,
and, in this example here, I'm
illustrating the code that we
use for a 5-image median.
In fact, sometimes you have many
more and we'll go into that
during the demo.
But, in this case, we're going
to use a sorting network to
determine what the median value
of the five pixel samples are.
Again, if we look here, this is
a great example where writing
this kernel in Metal was
actually very convenient and
easy, because now we can
actually pass the values into
this swap function by reference
rather than value.
So now for the fun part.
I'm going to invite Sky up to
the stage.
He'll be showing you more about
this and showing how this filter
works.
[ Applause ]
>> All right.
Thank you, David.
Hi, my name is Sky and it's a
pleasure for me to bring to you
today this demo.
As we can see here on the top,
this is our input in video.
And, if I scrub around on the
slider, we can see that there's
no point in time during the
whole duration of the video that
there's a clear image that we
can take of the landmark.
So, for example, at the
beginning, we're obstructed by
the shadow here.
And then we have people passing
by.
So there's really just no point
during the entire video where we
can get a clean shot.
And actually, if we zoom in on
one of the corners, we can see
that it's constantly shifting
across the entire duration.
So before we run the reduction
kernel, we need to first align
these frames.
And the way we do that is using
Vision, as David just mentioned.
Now Vision offers two
registration APIs, as shown here
in the slider.
We have a homographic line, and
which David just mentioned, but
Vision also offers a
translational alignment which,
in this case, doesn't work
extremely well.
It's because our camera movement
is not restricted to one plane
that is parallel to the image
plane.
So the way we're doing the
stabilization is we're
registering every frame during
the video onto the middle frame.
And so you can expect a pretty
dramatic camera movement between
the first frame and middle
frame, which is why the
homographic registration is
going to work better for us in
this case.
So let's just go with that.
So with that turned on, if I
zoom in on this corner here
again and scrub through, you can
see that the point is barely
moving across the entire frame.
And if we go back and scrub
through, the video becomes
extremely stabilized.
And so this gives you an idea,
if you're like writing an image
stabilization application, then
you could easily do that with
Core Image and Vision.
So now let me jump into the
reduction part.
And the first thing I'd like to
point out is doing a median
reduction over what we have here
is 148 frames.
It's not really that practical
because we need to hold all
those frames in memory when
we're sorting them.
So what we do here instead is we
take the median of median as an
approximation.
So the first thing we do is
we're dividing entire frames,
the 148 frames, into several
groups.
And for each one of those
groups, we compute a local group
median.
And on the second pass, we run
our reduction kernel again on
the local medians to compute the
final approximation results.
Which is why we have this
control here on the bottom, and
these ticks here that shows you
how the frames are grouped.
And so, if I change this group
count here, we can see that the
indicators are changing.
So, if we have a group count of
three, that means we're dividing
the entire video range into
three groups.
And for each one of those
groups, I can change the group
size, which indicates how many
evenly distributed tasks we're
taking out of the frames to use
for our group median
computation.
So, with that in mind, let's try
something with a group count of
five and group size of seven,
let's say.
And let's just see what that
gets us.
It's going to take a little bit
of time to run because Vision
needs to do all the registration
and we need to warp the images.
And so, as we see here in the
output, none of the moving
transient objects were actually
in our final reduced image, and
we have a very clean shot of our
landmark, which is exactly what
we wanted.
And, if we switch back between
the input and the output, we can
see that all textual details are
very well preserved, which gives
you an idea of how well Vision
is doing the alignment.
And so I hope this gives you a
sense of like the interesting
applications you can build with
this nice synergy between Core
Image and Vision.
And with that, I'd like to
invite David back onstage and to
give you a recap.
[ Applause ]
>> All right.
Well, thank you all so much.
Let me just give just a summary
of what we talked about today.
Again, our primary goals for
this release was to give your
applications better performance,
better information about how
Core Image works, and great new
functionality.
And we really look forward to
seeing how your applications can
grow this year based on this new
functionality.
So we look forward to you coming
to some of our related sessions.
If you look -- need more
information, please go to
developer.apple.com website.
There are some related sessions
that are definitely worth
watching.
There was a session earlier
today called image editing with
depth, as well as sessions on
Vision framework and sessions on
capturing data with depth.
Thank you so much for coming and
have a great rest of your show.
Thanks.
[ Applause ]