WWDC2018 Session 708

Transcript

[ Music ]
[ Applause ]
>> Good morning.
Welcome. I'm Michael, and this
session is about what's new in
Core ML.
So introduced a year ago, Core
ML is all about making it
unbelievably simple for you to
integrate machine learning
models into your app.
It's been wonderful to see the
adoption over the past year.
We hope it has all of you
thinking about what great new
experiences you can enable if
your app had the ability to do
things like understand the
content of images, or perhaps,
analyze some text.
What could you do if your app
could reason about audio or
music, or interpret your users'
actions based on their motion
activity, or even transform or
generate new content for them?
All of this, and much, much
more, is easily within reach.
And that's because this type of
functionality can be encoded in
a Core ML model.
Now if we take a peek inside one
of these, we may find a neural
network, tree ensemble, or some
other model architecture.
They may have millions of
parameters, the values of which
have been learned from large
amounts of data.
But for you, you could focus on
a single file.
You can focus on the
functionality it provides and
the experience it enables rather
than those implementation
details.
Adding a Core ML model to your
app is simple as adding that
file to your Xcode project.
Xcode will give you a simple
view, describe what it does in
terms of the inputs it requires
and the outputs it provides.
Xcode will take this one step
further and generate an
interface for you, so that
interacting with this model is
just a few lines of code, one to
load the model, one to make a
prediction, and sometimes one to
pull out the specific output you
are interested in.
Note that in some cases you
don't even have to write back
code because Core ML integrates
with some of our higher-level
APIs and allows you to customize
their behavior if you give them
a Core ML model.
So with Vision, this is done
through the VNCoreML Request
object.
And in the new Natural Language
framework, you can instantiate
an MLModel from a CoreML model.
So that's Core ML in a nutshell.
But we are here to talk about
what's new.
We took all the great feedback
we received from you over the
past year and focused on some
key enhancements to CoreML 2.
And we are going to talk about
these in two sessions.
In the first session, the one
you are all sitting in right
now, we are going to talk about
what's new from the perspective
of your app.
In the second session, which
starts immediately after this at
10 a.m. after a short break, we
are going to talk about tools
and how you can update and
convert models to take advantage
of the new features in Core ML
2.
When it comes to your app, we
are going to focus on three key
areas.
The first is how you can reduce
the size and number of models
using your app while still
getting the same functionality.
Then we'll look about how you
can get more performance out of
a single model.
And then we'll conclude about
how using Core ML will allow you
to keep pace with the
state-of-the-art and rapidly
moving field of machine
learning.
So to kick it off, let's talk
about model size.
I'm going to hand it off to
Francesco.
[ Applause ]
>> Thank you Michael.
Hello. Every ways to reduce the
size of your Core ML app is very
important.
My name is Francesco, and I am
going to introduce quantization
and flexible shapes, two new
features in Core ML 2 that can
help reduce your app size.
So Core ML, [inaudible] why
should you learn in models and
device.
This gives your app four key
advantages compared to running
them in the cloud.
First of all, user privacy is
fully respected.
There are many machine-learning
models on device.
We guarantee that the data never
leaves the device of the user.
Second, it can help you achieve
real-time performance.
And for silicon [phonetic] and
devices are super- efficient for
machine-learning workloads.
Furthermore, you don't have to
maintain and pay for Internet
servers.
And Core ML inference is
available anywhere at any time
despite natural connectivity
issues.
All these great benefits come
with the fact that you now need
to store your machine-learning
models on device.
And if the machine-learning
models are big, then you might
be concerned about the size of
your app.
For example, you are -- you have
your [inaudible] map, and it's
full of cool features.
And your users are very happy
about it.
And now you want to take
advantage of the new
opportunities offered by machine
learning on device, and you want
to add new amazing capabilities
to your app.
So what you do, you train some
Core ML models and you add them
to your app.
What this means is that your app
has become more awesome and your
users are even happier.
But some of them might notice
that your app has increased in
size a little bit.
It's not uncommon to see that
apps grow up, either to tens or
hundreds of megabytes after
adding machine-learning
capabilities to them.
And as you keep adding more and
more features to your app, your
app size might simply become out
of control.
So that's the first thing that
you can do about it.
And if these machine-learning
models are supporting other
features to your app, you can
keep them outside your initial
bundle.
And then as the user uses the
other features, you can download
them on demand, compile them on
device.
So this -- in this case, this
user is happy in the beginning
because the installation size is
unchanged.
But since the user downloads and
uses all the current Core ML
functionality in your app, at
the end of the day the size of
the -- of your app is still
large.
So wouldn't it be better if,
instead, we could tackle this
problem by reducing the size of
the models itself?
This would give us a smaller
bundle in case we ship the
models inside the app, faster
and smaller downloads if instead
of shipping the models in the
app we download them.
And in any case, your app will
enjoy a lower memory footprint.
Using less memory is better for
the performance of your app and
great for the system in general.
So let's see how we can
decompose the size of a Core ML
app into factors to better
tackle this problem.
First, there is the number of
models.
This depends on how many
machine-learning functionalities
your app has.
Then there is the number of
weights.
The number of weights depends on
the architecture that you have
chosen to solve your
machine-learning problem.
As Michael was mentioning, the
number of weight -- the weights
are the place in which the
machine-learning model stores
the information that it has been
learning during training.
So it is -- if it has been
trained to do a complex task,
it's not uncommon to see a model
requiring tens of millions of
weights.
Finally, there is the size of
the weight.
How are we storing these
parameters that we are learning
during training?
Let's focus on this factor
first.
For neural networks, we have
several options to represent and
store the weights.
And the first, really, is of
Core ML in iOS 11.
Neural networks were stored
using floating-point 32-bit
weights.
In iOS 11.2, we heard your
feedback and we introduced half
precision floating-point 16
weight.
This gives your app half the
storage required for the same
accuracy.
But this year we wanted to take
several steps further, and we
are introducing quantized
weights.
With quantized weights we are no
longer restricted to use either
Float 32 or Float 16 values.
But neural networks can be
encoded using 8 bits, 4 bits,
any bits all the way down to 1
bit.
So let's now see what
quantization here is.
Here we are representing a
subset of the weights of our
neural networks.
As we can see, these weights can
take any value in a continuous
range.
This means that in theory, a
single weight can take an
infinite number of possible
values.
So in practice, in neural
networks we store weights using
floater 32 point -- Float 32
numbers.
This means that this weight can
take billions of values to
better represent the -- their
continuous nature.
But it turns out the neural
networks also work with lower
precision weights.
Quantization is the process it
takes to discontinue strings of
values and constrains them to
take a very small and discrete
subset of possible values.
For example, here quantization
has turned this continuous
spectrum of weights into only
256 possible values.
So before quantization, the
weights would take any possible
values.
After quantization, they only
have 256 options.
Now since its weight can be
taken from this small set, Core
ML now needs only 8 bits of
stored information of a weight.
But nothing can stop us here.
We can go further.
And for example, we can
constrain the network to take,
instead of one of 56 different
values, for example, just 8.
And since now we now have only 8
options, Core ML will need 3-bit
values per weight to store your
model.
There are now some details about
how we are going to choose these
values to represent the weights.
They can be uniformly
distributed in this range, and
in this case we have linear
quantization instead in lookup
table quantization, we can have
these values scattered in this
range in an arbitrary manner.
So let's see practically how
quantization can help us reduce
the size of our model.
In this example, you are
focusing on Resnet50, which is a
common architecture used by many
applications for many different
tasks.
It includes 25 million trained
parameters and this means that
you have to use 32-bit floats to
represent it.
Then the total model size is
more than 100 megabytes.
If we quantize it to 8-bits,
then the architecture hasn't
changed; we still have 25
million parameters.
But we are now using only 1 byte
to store a single weight, and
this means that the model size
is reduced by a factor of 4x.
It's only -- it now only takes
26 megabytes to store this
model.
And we can go further.
We can use that quantized
representation that only uses 4
bits per weight in this model
and end up with a model that is
even smaller.
[ Applause ]
And again, Core ML supports all
the quantization modes all the
way down to 8 bits.
Now quantization is a powerful
technique to take an existing
architecture and of a smaller
version of it.
But how can you obtain quantized
model?
If you have any neural
networking in Core ML format,
you can use Core ML Tools to
obtain a quantized
representation of it.
So Core ML 2 should quantize for
you automatically.
Or you can train quantized
models.
You can either train quantized
-- with a quantization
constraint from scratch, or
retrain existing models with
quantization constraints.
After you have obtained your
quantized model with your
training tools, you can then
convert it to Core ML as usual.
And nothing will change in the
app in the way you use the
model.
Inside the model, the numbers
are going to be stored in
different precision, but the
interface for using the model
will not change at all.
However, we always have to
consider that quantized models
have lower-precision
approximations of the original
reference floating-point models.
And this means that quantized
models come with an accuracy
versus size-of-the-model
tradeoff.
This tradeoff is model dependent
and use case dependent.
And it's also a very active area
of research.
So it's always recommended to
check the accuracy of the
quantized model and compare it
with the referenced
floating-point version for
relevant this data and form
metrics that are valid for your
app and use case.
Now let's see a demo of how we
can use -- adopt quantized
models to reduce the size of an
app.
[ Applause ]
I would like to show you a style
transfer app.
In style transfer, a neural
network has been trained to
render user images using styles
that have been learned by
watching paintings or other
images.
So let me load my app.
As we can see, I am shipping
this app with four styles; City,
Glass, Oils and Waves.
And then I can pick images from
the photo library of the users
and then process them blending
them in different styles right
on device.
So this is the original image,
and I am going to render the
City style,
Glass,
Oils,
and Waves.
Let's see how this app has been
built in Xcode.
This app uses Core ML and Vision
API to perform this stylization.
And as we can see, we have four
Core ML models here bundled in
Xcode; City, Glass, Oils, and
Waves, the same ones we are
seeing in the app.
And we can see -- we can inspect
this model.
These are seen as quantized
model, so each one of these
models is 6.7 megabytes of this
space on disk.
We see that the models take an
input image of a certain
resolution and produce an image
called Stylized of the same
resolution.
Now we want to investigate how
much storage space and memory
space we can use by -- we can
save by switching to quantized,
models.
So I have been playing with Core
ML Tools and obtained quantizer
presentation for these models.
And for a tutorial about how to
obtain these models, stay for
Part 2 that is going to cover
quantization with Core ML Tools
in detail.
So I want to focus first on the
Glass style and see how the
different quantization versions
work for these styles.
So all I have to do is drag
these new models inside the
Xcode project, and rerun the
app.
And then we are going to see how
these models behave.
First we can see that the size
has been greatly reduced.
For example, the 8-bit version
already from 6 or 7 megabytes
went down to just 1.7.
[ Applause ]
In 4-bit, we can save even more,
and now the model is less than 1
megabyte.
In 3-bit, it is -- that's even
smaller, at 49 kilobytes.
And so on.
Now let's go back to the app.
Let's make this same image for
reference and apply the Glass
style in the original version.
Still looks as before.
Now we can compare it with the
8-bit version.
And you can see nothing has
changed.
This is because 8-bit
quantization methods are very
solid.
We can also venture further and
try the 4-bit version of this
model.
Wow. The results are still
great.
And now let's try the 3-bit
version.
We see that there are -- we see
the first color shift.
So it probably is good if we go
and check with the designers if
this effect is still acceptable.
And now, as we see the 2-bit
version, this is not really what
we were looking for.
Maybe we will save it for a
horror app, but I am not going
to show this to the designer.
[ Applause ]
Let's go back to the 4-bit
version and hide this one.
This was just a reminder that
quantized models are
approximation of the original
models.
So it's always recommended to
check them and compare with the
original versions.
Now for every model and
quantization technique, there is
always a point in which things
start to mismatch.
Now we -- after some discussion
with the designer, extensive
evaluation of many images, we
decided to ship the 4-bit
version of this model, which is
the smallest size for the best
quality.
So let's remove all the
floating-point version of the
models that were taking a lot of
space in our app and replace
them with the 4-bit version.
And now let's run the app one
last time.
OK. Let's pick the same image
again and show all the styles.
This was the City, Glass, Oils,
and big Wave.
So in this demo we saw how we
started with four models and
they were huge, in 32-bit -- or
total app size was 27 megabytes.
Then we evaluated the quality
and switched to 4-bit models,
and the total size of our app
went down to just 3.4 megabytes.
Now --
[ Applause ]
This doesn't cost us anything in
terms of quality because all
these versions -- these
quantized versions look the
same, and the quality is still
amazing.
We showed how quantization can
help us reduce the size of an
app by reducing the size of the
weight at the very microscopic
level.
Now let's see how we can reduce
the number of models that your
app needs.
In the most straightforward
case, if your app has three
machine-learning functionalities
then you need three different
machine-learning models.
But in some cases, it is
possible to have the same model
to support two different
functions.
For example, you can train a
multi-task model.
And multi-task models has been
trained to perform multiple
things at once.
There is an example about style
transferring, the Turi Create
session about multi-task models.
Or in some cases, you can use a
yet new feature in Core ML
called Flexible Shapes and
Sizes.
Let's go back to our Style
Transfer demo.
In Xcode we saw that the size of
the input image and the output
image was encoded in part of the
definition of the model.
But what if we want to run the
same style on different image
resolution?
What if we want to run the same
network on different image
sizes?
For example, the user might want
to see a high-definition style
transfer.
So they use -- they give us a
high-definition image.
Now if I were Core ML model all
it takes is a lower resolution
as an input, all we can do as
developers is size -- or resize
the image down, process it, and
then scale it back up.
This is not really going to
amaze the user.
Even in the past, we could
reship this model with Corel ML
Tools and make it accept any
resolution, in particular, a
higher-resolution image.
So even in the past we could do
this feature and feed directly
the high-resolution image to a
Corel ML model, producing a
high-definition result.
This is because we wanted to
introduce a finer detail in the
stylization and the way finer
strokes that are amazing when
you zoom in, because they have
-- they add a lot of work into
the final image.
So in the past we could do it,
but we could do it by
duplicating the model and
creating two different versions:
one for the standard definition
and one for the high
definitions.
And this, of course, means that
our app is twice as much the
size -- besides the fact that
the network has been trained to
support any resolution.
Not anymore.
We are introducing flexible
shapes.
And with flexible shapes, you
have -- if you have -- you can
have the single model to process
more resolutions and many more
resolutions.
So now in Xcode --
[ Applause ]
-- in Xcode you are going to see
that this -- the input is still
an image, but the size of the
full resolution, the model also
accepts flexible resolutions.
In this simple example, SD and
HD.
This means that now you have to
ship a single model.
You don't have to have any
redundant code.
And if you need to switch
between standard definition and
high definition, you can do it
much faster because we don't
need to reload the model from
scratch; we just need to resize
it.
You have two options to specify
the flexibility of the model.
You can define a range for its
dimension, so you can define a
minimal width and height and the
maximum width and height.
And then at inference pick any
value in between.
But there is also another way.
You can enumerate all the shapes
that you are going to use.
For example, all different
aspect ratios, all different
resolutions, and this is better
for performance.
Core ML knows more about your
use case earlier, so it can --
it has the opportunities of
performing more optimizations.
And it also gives your app a
smaller tested surface.
Now which models are flexible?
Which models can be trained to
support multiple resolutions?
Fully convolutional neural
networks, commonly used for MS
processing tasks such as style
transfer, image enhancement,
super resolution, and so on --
and some of the architecture.
Core ML Tools can check if a
model has this capability for
you.
So we still have the number of
models Core ML uses in flexible
sizes, and the size of the
weights can be reduced by
quantization.
But what about the number of
weights?
Core ML, given the fact that it
supports many, many different
architecture at any framework,
has always helped you choose the
right -- the model of the right
size for your machine-learning
problem.
So Core ML can help you tackle
the size of your app using this
-- all these three factors.
In any case, the inference is
going to be super performant.
And to introduce new features in
performance and customization,
let's welcome Bill March.
Thank you.
[ Applause ]
>> Thank you.
One of the fundamental design
principles of Core ML from the
very beginning has been that it
should give your app the best
possible performance.
And in keeping with that goal,
I'd like to highlight a new
feature of Core ML to help
ensure that your app will shine
on any Apple device.
Let's take a look at the style
transfer example that Francesco
showed us.
From the perspective of your
app, it takes an image of an
input and simply returns the
stylized image.
And there are two key components
that go into making this happen:
first, the MLModel file, which
stores the particular parameters
needed to apply this style; and
second, the inference engine,
which takes in the MLModel and
the image and performs the
calculations necessary to
produce the result.
So let's peek under the hood of
this inference engine and see
how we leverage Apple's
technology to perform this style
transfer efficiently.
This model is an example of a
neural network, which consists
of a series of mathematical
operations called layers.
Each layer applies some
transformation to the image,
finally resulting in the
stylized output.
The model stores weights for
each layer which determine the
particular transformation and
the style that we are going to
apply.
The Core ML neural network
inference engine has highly
optimized implementations for
each of these layers.
On the GPU, we use MTL shaders.
On the CPU we can use
Accelerate, the proficient
calculation.
And we can dispatch different
parts of the computation to
different pieces of hardware
dynamically depending on the
model, the device state, and
other factors.
We can also find opportunities
to fuse layers in the network,
resulting in fewer overall
computations being needed.
We are able to optimize here
because we know what's going on.
We know the details of the
model; they are contained in the
MLModel file that you provided
to us.
And we know the details of the
inference engine and the device
because we designed them.
We can take care of all of these
optimizations for you, and you
can focus on delivering the best
user experience in your app.
But what about your workload?
What about, in particular, if
you need to make multiple
predictions?
If Core ML doesn't know about
it, then Core ML can't optimize
for it.
So in the past, if you had a
workload like this, you needed
to do something like this: a
simple for loop wrapped around a
call to the existing Core ML
prediction API.
So you'd loop over some array of
inputs and produce an array of
outputs.
Let's take a closer look at what
happens under the hood when this
-- when we are doing this.
For each image, we will need to
do some kind of preprocessing
work.
If nothing else, we need to send
the data down to the GPU.
Once we have done that, we can
do the calculation and produce
the output image.
But then there is a
postprocessing step in which we
need to retrieve the data from
the GPU and return it to your
app.
The key to improving this
picture is to eliminate the
bubbles in the GPU pipeline.
This results in greater
performance for two major
reasons.
First, since there is no time
when the GPU is idle the overall
compute time is reduced.
And second, because the GPU is
kept working continuously, it's
able to operate in a higher
performance state and reduce the
time necessary to compute each
particular output.
But so much of the appeal in
Core ML is that you don't have
to worry about any details like
this at all.
In fact, for your app all you
are really concerned with for
your users is going from a long
time to get results to a short
time.
So this year we are introducing
a new batch API that will allow
you to do exactly this.
Where before you needed to loop
over your inputs and call
separate predictions, the new
API is very simple.
One-line predictions, it
consumes an input -- an array of
inputs and produces an array of
outputs.
Core ML will take care of the
rest.
[ Applause ]
So let's see it in action.
So in keeping with our style
transfer example, let's look at
the case where we wanted to
apply a style to our entire
photo library.
So here I have a simple app
that's going to do just that.
I am going to apply a style to
200 images.
On the left, as in your left,
there is an implementation using
last year's API in a for loop.
And on the right we have the new
batch API.
So let's get started.
We are off.
And we can see the new is
already done.
We'll wait a moment for last
year's technology, and there we
go.
[ Applause ]
In this example we see a
noticeable improvement with the
new batch API.
And in general, the improvement
you'll see in your app depends
on the model and the device and
the workload.
But if you have a large number
of predictions to call, use the
new API and give Core ML every
opportunity to accelerate your
computation.
Of course, the most
high-performance app in the
world isn't terribly exciting if
it's not delivering an
experience that you want for
your users.
We want to ensure that no matter
what that experience is, or what
it could be in the future, Core
ML will be just as performant
and simple to use as ever.
But the field of machine
learning is growing rapidly.
How will we keep up?
And just how rapidly?
Well let me tell you a little
bit of a personal story about
that.
Let's take a look at a
deceptively simple question that
we can answer with machine
learning.
Given an image, what I want to
know: Are there any horses in
it?
So I think I heard a chuckle or
two.
Maybe this seems like kind of a
silly challenge problem.
Small children love this, by the
way.
But -- so way, way back in the
past, when I was first starting
graduate school and I was
thinking about this problem and
first learning about machine
learning, my insights on the top
came down to something like
this: I don't know -- seems
hard.
I don't really have any good
idea for you.
So a few years pass.
I get older, hopefully a little
bit wiser.
But certainly the field is
moving very, very quickly,
because there started to be a
lot of exciting new results
using deep neural networks.
And so then my view on this
problem changed.
And suddenly, wow, this
cutting-edge research can really
answer these kind of questions,
and computers can catch up with
small children and horse
recognition technology.
What an exciting development.
[ Laughter ]
So a few more years pass.
Now I work at Apple, and my
perspective on this problem has
changed again.
Now, just grab Create ML.
The UI is lovely.
You'll have a horse classifier
in just a few minutes.
So, you know, if you are a
machine learning expert, maybe
you are looking at this and you
are thinking, "Oh, this guy
doesn't know what he is talking
about.
You know, in 2007 I knew how to
solve that problem.
In 2012 I'd solved it a hundred
times."
Not my point.
If you are someone who cares
about long-lasting, high-quality
software, this should make you
nervous, because in 11 years we
have seen the entire picture of
this problem turn over.
So let's take a look at a few
more features in Core ML that
can help set your mind at ease.
To do that, let's open the hood
once again and peek at one of
these new horse finder models,
which is, once again, a neural
network.
As we have illustrated before,
the neural network consists of a
series of highly optimized
layers.
It is a series of layers, and we
have highly optimized
implementations for each of them
in our inference engine.
Our list of supported operations
is large and always growing,
trying to keep up with new
developments in the field.
But what if there is a layer
that just isn't supported in
Core ML?
In the past, you either needed
to wait or you needed a
different model.
But what if this layer is the
key horse-finding layer?
This is the breakthrough that
your horse app was waiting for.
Can you afford to wait?
Given the speed of machine
learning, this could be a
serious obstacle.
So we introduced custom layers
for neural network models.
Now if a neural network layer is
missing, you can provide an
implementation with -- will mesh
seamlessly with the rest of the
Core ML model.
Inside the model, the custom
layer stores the name of an
implementing class -- the
AAPLCustomHorseLayer in this
case.
The implementation class fills
the role of the missing
implementation in the inference
engine.
Just like the layer is built
into Core ML, the implementation
provided here should be general
and applicable to any instance
of the new layer.
It simply needs to be included
in your app at runtime.
Then the parameters for this
particular layer are
encapsulated in the ML model
with the rest of the information
about the model.
Implementing a custom layer is
simple.
We expose an MLCustomLayer
protocol.
You simply provide methods to
initialize the layer based on
the data stored in the ML model.
You'll need to provide a method
that tells us how much space to
allocate for the outputs of the
layer, and then a method that
does the computation.
Plus, you can add this
flexibility without sacrificing
the performance of your model as
a whole.
The protocol includes an
optional method, which allows
you to provide us with a MTL
shader implementation of your
model -- of the layer, excuse
me.
If you give us this, then it can
be encoded in the same command
buffer as the rest of the Core
ML computation.
So there is no extra overhead
from additional encodings or
multiple trips to and from the
GPU.
If you don't provide this, then
we'll simply evaluate the layer
on the CPU with no other work on
your part.
So no matter how quickly
advancements in neural network
models may happen, you have a
way to keep up with Core ML.
But there are limitations.
Custom layers only work for
neural network models, and they
only take inputs and outputs
which are ML MultiArrays.
This is a natural way to
interact with neural networks.
But the machine learning field
is hardly restricted to only
advancing in this area.
In fact, when I was first
learning about image
recognition, almost no one was
talking about neural networks as
a solution to that problem.
And you can see today it's the
absolute state of the art.
And it's not hard to imagine
machine-learning-enabled app
experiences where custom layers
simply wouldn't fit.
For instance, a machine-learning
app might use a neural network
to embed an image in some
similarity space, then look up
similar images using a
nearest-neighbor method or
locality-sensitive hashing -- or
even some other approach.
A model might combine audio and
motion data to provide a bit of
needed encouragement to someone
who doesn't always close his
rings.
Or even a completely new model
type we haven't even imagined
yet that enables novel
experiences for your users.
In all these cases, it would be
great if we could have the
simplicity and portability of
Core ML without having to
sacrifice the flexibility to
keep up with the field.
So we are introducing custom
models.
A Core ML custom model allows
you to encapsulate the
implementation of a part of a
computation that's missing
inside Core ML.
Just like for custom layers, the
model stores the name of an
implementation class.
The class fills the role of the
general inference engine for
this type of model.
Then the parameters are stored
in the ML Model just like
before.
This allows the model to be
updated as an asset in your app
without having to touch code.
And implementing a custom model
is simple as well.
We expose a protocol,
MLCustomModel.
You provide methods to
initialize based on the data
stored in the ML Model.
And you provide a method to
compute the prediction on an
input.
There is an optional method to
provide a batch implementation
if there are opportunities in
this particular model type to
have optimizations there.
And if not, we'll call the
single prediction in a for loop.
And using a customized model in
your app is largely the same
workflow as any other Core ML
model.
In Xcode, a model with
customized components will have
a dependency section listing the
names of the implementations
needed along with a short
description.
Just include these in your app,
and you are ready to go.
The prediction API is unchanged,
whether for single predictions
or batch.
So custom layers and custom
models allow you to use the
power and simplicity of Core ML
without sacrificing the
flexibility needed to keep up
with the fast-paced area of
machine learning.
For new neural network layers,
custom layers allow you to make
use of the many optimizations
already present in the neural
network inference engine in Core
ML.
Custom models are more flexible
for types and functionality, but
they do require more
implementation work on your
part.
Both forms of customization
allow you to encapsulate model
parameters in an ML model,
making the model portable and
your code simpler.
And we've only been able to
touch on a few of the great new
features in Core ML 2.
Please download the beta, try
them out for yourself.
Core ML has many great new
features to reduce your app
size, improve performance, and
ensure flexibility and
compatibility with the latest
developments in machine
learning.
We showed you how quantization
can reduce model size, how the
new batch API can enable more
efficient processing, and how
custom layers and custom models
can help you bring cutting-edge
machine learning to your app.
Combined with our great new tool
for training models in Create
ML, there are more ways than
ever to add ML-enabled features
to your app and support great
new experiences for your users.
After just a short break, we'll
be back right here to take a
deeper look at some of these
features.
In particular, we'll show you
how to use our Core ML Tools
software to start reducing model
sizes and customizing your user
experiences with Core ML today.
Thank you.
[ Applause ]