WWDC2018 Session 709

Transcript

[ Music ]
[ Applause ]
>> Hello!
[ Applause ]
So welcome to the second session
of Core ML.
My name is Aseem, and I'm an
engineer in the Core ML team.
As you all know, Core ML is
Apple's machine learning
framework for on-device
inference.
And the one thing I really like
about Core ML is that it's
optimized on all Apple hardware.
Over the last year, we have seen
lots of amazing apps across all
Apple platforms.
So that's really exciting.
And we are even more excited
with the new features that we
have this year.
Now you can reduce the size of
your app by a lot.
You can make your app much
faster by using the new
batch-predict API.
And you can really easily
include cutting-edge research
right in your app using
customization.
So that was a recap of the first
session.
And in case you missed it, I
would highly encourage you to go
back and check the slides.
In this session, we are going to
see how to actually make use of
these features.
More specifically, we'll walk
through a few examples and show
you that how in a few simple
steps using Core ML Tools.
You can reduce the size of the
model, and you can include a
custom feature in your model.
Here's the agenda of the
session.
We'll start by a really quick
update on the Core ML Tools
ecosystem.
And then we'll dive into a demo
of our quantization and custom
conversion.
So let me start with the
ecosystem.
So how do you get an ML model?
Well, the best thing is that if
you, if you can, if you find it
online, you just download it,
right?
Very good place to download your
ML models is the Apple Machine
Learning landing page.
We have a few models there.
Now let's say you want to train
a model on your data set.
In that case, you can use Create
ML.
This is a new framework that we
have just launched this year,
and you do not have to be a
machine learning expert to use
it.
It's really easy to use.
It's right there in Xcode.
So go and give it a try.
Now some of you are already
familiar with the amazing
machine learning tools that we
have outside in the community.
And for that, last year we had
released Core ML Tools, a Python
package.
And along with that, we had
released a few converters.
Now there has been a lot of
activity in this area over the
last year.
And this is how the picture
looks now.
So as you can see, there are
many more converters out there.
And you really do have a lot of
choice to choose your training
framework now.
And all of these converters are
built on top of Core ML Tools.
Now, I do want to highlight a
couple of different converters
here.
Last year, we collaborated with
Google and released the
TensorFlow converter.
So that was exciting.
As you know, TensorFlow is quite
popular with researchers who try
out new layers so we recently
added support for custom layers
into the converter.
And TensorFlow recently released
support for quantization during
training and that's Core ML 2
supports quantization.
This feature will be added soon
to the converter.
Another exciting partnership we
had was with Facebook and
Prisma.
And this resulted in the ONNX
converter.
The nice thing about ONNX is
that now you have access to a
bunch of different training
libraries that can all be
converted to Core ML using the
new ONNX converter.
So that was a quick wrap-up of
Core ML Tools ecosystem.
Now to talk about quantization,
I would like to invite my friend
Sohaib on stage.
[ Applause ]
>> Good morning, everyone.
My name is Sohaib.
I'm an engineer in the Core ML
team.
And today we're going to be
taking a look at new
quantization utilities in Core
ML Tools 2.0.
Core ML Tools 2.0 has support
for the latest Core ML model
format specification.
It also has utilities which make
it really easy for you to add
flexible shapes and quantize in
your own network machine
learning models.
Using these great new features
in Core ML, you can not only
reduce the size of your models.
But also reduce the number of
models in your app, reducing the
footprint of your app.
Now let's start off by taking a
look at quantization.
Core ML Tools supports
post-training quantization.
We start off with a Core ML
neural network model which has
32-bit float weight parameters.
And we use Core ML Tools to
quantize the weights for this
model.
The resulting model is smaller
in size.
Now size reduction of the model
is directly dependent on the
number of bits we quantize our
model to.
Now, many of us may be wondering
what exactly is quantization?
And how can it reduce the size
of my models?
Let's step back and take a peek
under the hood.
Neural networks are composed of
layers.
And these layers can be thought
of as mathematical functions.
And these mathematical functions
have parameters called weights.
And these weights are usually
stored as 32-bit floats.
Now in our previous session, we
took a look at ResNet50.
A popular machine-learning model
which is used for image
classification amongst other
things.
Now this particular model has
over 25 million weight
parameters.
So you can imagine, if you could
somehow represent these param --
these parameters using a fewer
number of bits, we can
drastically reduce the size of
this model.
In fact, this process is called
quantization.
In quantization, we take the
weights for our layers which
[inaudible] to minimum and to
maximum value and we map them to
unsigned integers.
Now for APIC quantization, we
map these values from a range of
0 to 55.
For 7-bit quantization, we map
them from 0 to 127, all the way
down to 1 bit.
Where we map these weights as
either zeros or ones.
Since we're using fewer bits to
represent the same information,
we reduce the size of our model.
Great. Now many of you may have
noticed that we're mapping
floats to integers.
And you may have come to the
conclusion that maybe there's
some accuracy loss in this
mapping.
That's true.
The rule of thumb is the lower
the number of bits you quantize
your model to, the more of a hit
our model takes in terms of
accuracy.
And we'll get back to that in a
bit.
So that's an overview of
quantization.
But the question remains.
How do we obtain this mapping?
Well, there are many popular
algorithms and techniques out
there which help you to do this.
And Core ML supports two of the
most popular ones: linear
quantization and lookup table
quantization.
Let's have a brief overview.
Linear quantization is an
algorithm in which you map these
full parameters equally.
The quantization is parametrized
by a scale and by values.
And these values are calculated
based on the parameters of the
layers that we're quantizing.
Now and a really intuitive way
to see how this mapping works is
if we take a step back.
And see how we would go back
from our quantized weights which
are at the bottom back to our
original float weights.
In linear quantization, we would
simply multiply our quantized
weights with the scale parameter
and add the bias.
The second quantization
technique that Core ML supports
is lookup table quantization.
And this technique is exactly
what it sounds like.
We construct a lookup table.
Now again it's helpful if we
imagine how we would go back
from our quantized weights back
to our original weights.
And in this case, the quantized
weights are simply indices back
into our lookup table.
Now, if you notice, unlike
linear quantization, we have the
ability to move our quantized
weights around.
They don't have to be spaced out
in a linear fashion.
So to recap, Core ML Tools
supports linear quantization and
lookup table quantization where
we start off with a full
precision neural network model.
And quantize the weights for
that model using the utilities.
Now you may be wondering well
great, I can reduce the size of
my model.
But how do I figure out the
parameters for my quantization?
If I'm doing linear
quantization, how do I figure
out my scale and bias?
If I'm doing lookup table
quantization, how do I construct
my lookup table?
I'm here to tell you that you
don't have to worry about any of
that.
All you do is decide on the
number of bits you want to
quantize your model to.
And decide on the algorithm you
want to use, and let Core ML
Tools do the rest.
In fact --
[ Applause ]
In fact, it's so simple to take
a Core ML neural network model.
And quantize it.
Then we can do it in a few lines
of Python code.
But why stand here and talk
about it when we can show you a
demo?
So for the purposes of this
demo, I'm going to need a neural
network in the Core ML model
format.
Now, as my colleague Aseem
mentioned, a great place to find
these models is on the Core ML
machine learning home page.
And I've gone ahead and
downloaded one of the models
from that page.
So this model's called
SqueezeNet.
And let's go ahead and open it
up.
As we can see, this model is 5
megabytes in size.
It has a input which is an image
of 227 by 227 pixels.
And it has two outputs.
One of the outputs is the class
label which is a string, and
this is the most likely label
for the, for the input image.
And the second output is a
mapping of strings to
probabilities given that if we
pass an image, it's going to be
a list of probabilities of what
that image may be.
Now let's start quantizing this
model.
So the first thing I want to do
is I want to get into a Python
environment.
Now a Jupyter Notebook is one
such environment that I'm
comfortable with.
So I'm going to go ahead and
open that up.
Let's open up a new notebook and
zoom in on that.
Alright. So let's start off by
importing Core ML Tools.
Let's run that.
Now the second thing I want to
do is I want to import all the
new quantization utilities that
we have in Core ML Tools.
And we do that by running this.
And now we need to load up the
model which we want to quantize.
And we just saw the SqueezeNet
model a minute okay.
We're going to go ahead and get
an instance of that model.
Send this to my desktop.
Great. Now to quantize this
model, we just need to make one
simple API call.
And let's try a linear,
quantizing this model using
linear quantization.
And its API is simply called
quantize weights.
And the first parameter we pass
in is the original model which
you just loaded up.
The number of bits we want to
quantize our model to.
In this case, it's 8 bits.
And the quantization algorithm
we want to use.
Let's try linear quantization.
Now what's happening is that the
utility is iterating over all of
the layers of the linear
networks.
And is quantizing all the
weights in those layers.
And we're finished.
Now, if you recall a few moments
ago I mentioned that quantizing
our model had an associated loss
in accuracy.
So we want to know how our
quantized model stacks up to the
original model.
And the easiest way of doing
this is taking some data,
passing and getting inference on
that data using our original
model.
And doing the same inference on
the same data using our
quantized model and comparing
the predictions from that model.
And seeing how well they agree.
Core ML Tools has utilities
which help you to do that.
And we can do that by making
this call which is called
compare models.
We pass in our full precision
model, and we pass in our model
which we had just quantized.
And because this model is a
simple image classifier which it
only has one image inputs.
We, we have a convenience
utility.
So we can just pass in a folder
containing sample data images.
Now on my desktop here, I have a
folder with a set of images
which are relevant for my
application.
So I'm going to go ahead and
pass a path to this folder as my
[inaudible] parameter.
Great. So now we see we're
analyzing all the images in that
folder.
We're running inference on the,
we're using full prediction or
full precision model.
And we're running inference on
our quantized model.
And we're comparing our two
predictions.
So we seem to have finished
that.
And you can see our Top 1
Agreement is 94.8%.
Not bad. Now what does this Top
1 Agreement mean?
This means that when I pass in
my original model, that image of
a dog for example, and it
predicted that this image was a
dog.
My quantized model did the same.
And that happened over 98, 94.8%
of the data set.
So I can go ahead and use this
model in my app.
But I want to see if other
quantization techniques work
better on this model.
As I mentioned, Core ML supports
two types of quantization
techniques.
Linear quantization and lookup
table quantization.
So let's go ahead and try and
quantize this model using lookup
table quantization.
Again, we pass in an original
model, the number of bits we
want to quantize our model to.
And our quantization techniques.
Oops, made a typo there.
Let's go ahead and run this.
Now, k-means is a simple
clustering algorithm which
approximates the distribution of
our weights.
And using this distribution, we
can construct the lookup table
for our weights.
And what we're doing over here
is that we're iterating over all
the layers in the neural
network.
And we're quantizing and we're
figuring out the lookup table
for that particular layer.
Now, if you're an expert and you
know that your model, you know
your model architecture and you
know that k-means is not the
algorithm for you, you have the
flexibility of passing in your
own custom function instead of
this algorithm and the utility
will use your custom function to
actually construct the lookup
table.
So we finished quantizing this
model again using the lookup
table approach.
And now let's see how well this
model compares with our original
model.
So once again we call our
compare model's API.
We pass in our original model
and we pass in our lookup table
model.
And again we pass in our sample
data folder.
Again, we run inference over all
the images using both the
original model and the quantized
model.
And we see this time we're
getting a much better, little
bit better Top 1 Agreement.
Now for this model, we see that
lookup table was the right way
to go.
But again, this is
model-dependent and for other
models, linear may be the way.
So now that we're happy with
this and we see that this is
good enough for at least my
application, let's go ahead and
save this model out.
We do that by causing or calling
save.
I'm going to give it the
creative name of Quantized
SqueezeNet.
And there we go.
We have a quantized model.
So this was an original model.
And we saw that it was 5
megabytes in size.
Let's open up our quantized
model.
And the first thing we notice
right off the bat is that this
model is only 1.3 megabytes in
size.
[ Applause ]
So if you notice, all the
details about, about our
quantized model are the same as
the original model.
It still takes in an image
input, and it still has two
outputs.
Now, if I had an app using this
model, what I could do as we saw
in the previous demo.
Is we could just drag this
quantized model into our app and
start using that instead.
And just like that, we reduce
the size of our app.
So that was quantization using
Core ML Tools.
To recap, we saw how easy it was
to use Core ML Tools to quantize
our model.
Using a simple API, we provided
our original model, the number
of bits we wanted to quantize
our model to, and the
quantization algorithm we wanted
to use.
We also saw that Core ML Tools
has utilities which help us to
compare our quantized model to
see how it performs against our
original model.
Now as we saw in the demo, there
is a loss of accuracy associated
with quantizing our model.
And this loss of accuracy is
highly model and data dependent.
Some models work well or perform
better than others after
quantization.
As a general rule of thumb
again, the lower the number of
bits we quantize our model to
the more of a precision hit we
take.
Now in the demo we saw that we
were able to use Core ML Tools
to compare our quantized model
and the original model using our
Top 1 Agreement metric.
But you have to figure out what
the relevant metric for your
model and your use case is and
validate that your quantize
model is acceptable.
Now in a previous session, we
took a look at a style transfer
demo.
And this network took in an
input image, and the output for
this network was a stylized
image.
Let's take a look at how this
model performs at different
levels of quantization.
So on the top, top left here,
your left.
We see that original model is 30
-- is 32 bits and it's 6.7
megabytes in size.
And our 8-bit linearly quantized
model is only 1.7 megabits in
size.
And we see that the performance
by visual inspection it's good
enough for my style transfer
demo.
Now we can see that even down to
4 bits, we don't lose out much
in the way of performance.
I would even argue that for my
app at least, the 3 bit will
work fine as well.
And we see at 2 bit, we start to
see a lot of artifacts and this
may not be the right model for
us.
And that was quantization using
Core ML Tools.
Now I'm going to hand it back to
Aseem who's going to talk about
custom conversion.
Thank you.
[ Applause ]
>> Thank you, Sohaib.
So I want to talk about a
feature that is essential to
keep pace with the machine
learning research that's
happening around us.
As you all know, the field of
machine learning is expanding
very rapidly.
So it's very critical for us at
Core ML to provide you with the
necessary software tools to help
with that.
Now let's take an example.
Let's say you are experimenting
with a new model that that is
not supported on Core ML.
Or let's say you have a neural
network that runs on Core ML but
maybe there's a layer or two
that Core ML does not have yet.
In that case, you should still
be able to use the power of Core
ML, right?
And the answer to that question
is yes.
And the feature of customization
will help you there.
In the next few minutes, I want
to really focus on the specific
use case of having a new neural
network layer.
And show you how you would
convert it to Core ML and then
how you would implement it in
your app.
So let's take a look at model
conversion.
So if you have used one of our
converters, or even if you have
not, it's a really simple API.
It's just a call to one
function.
This is how it looks for the
Keras converter.
And it's very similar for say
the ONNX converter or the
TensorFlow converter.
Now when you call this function,
mostly everything goes right.
But sometimes you might get an
error message like this.
It might say, "Hey, unsupported
operation of such-and-such
kind."
Now if that happens to you, you
only need to do a little bit
more to get past this error.
More specifically, such an error
message is an indication that
you should be using a custom
layer.
And before I show you what is
the little bit of extra effort
that you need to do to convert,
let's look at a few examples
where you would need to use a
custom layer.
So let's say you have an image
classifier.
This is how it looks in Xcode.
So it will be high-level
description of the model.
If you look inside, it's very
likely that it's a neural
network.
And it's very likely that it's a
convolutional neural network.
So it has a lot of layers,
convolution, activation.
Now it might happen that there's
a new activation layer that
comes up that Core ML does not
support.
And it's like at every machine
learning conference, researchers
are coming up with new layers
all the time.
So this is a very common
scenario.
Now if this happens, you only
need to use a custom
implementation of this new
layer.
And then you are good to go.
So this is how the model will
look like.
The only difference is this
dependency section at the
bottom.
Which would say that this model
contains a description of this
custom layer.
Let's take a look at another
example.
Let's say we have a very simple
digit classifier.
Now I came across this research
paper recently.
It's called Spatial Transformer
Network.
And what it does is this.
So it inserts a neural network
after the digit that tries to
localize the digit.
And then it feeds it through a
grid sampler layer which renders
the digit again, but this time
it has already focused on the
digit.
And then you pass it through
your old classify method.
Now we don't need to worry about
the details here.
But the point to note is that
the portion in green is what
Core ML supports.
And the portion in red, which is
this new grid sampler layer, is
this new experimental layer that
Core ML does not support.
So I want to take an example of
this particular model and show
you how you would convert it
using Core ML Tools.
So let's go to demo.
I hope it works on the first
try.
Back, oh yes.
Okay. So let me close off these
windows.
Let me get, clear this.
Clear the ML.
Okay, so I'm also going to use
Jupyter Notebook to show the
demo.
So I just navigate to the folder
where I have my pre-trained
network.
So what you see here is that I
have this spatial transformer
dot [inaudible] file.
This is a pre-trained Keras
model.
And if you are wondering if I
did something special to get
this model.
Basically what I did was I could
easily find an open source
implementation of spatial
transformer.
I just exhibited that script in
Keras, and I got this model.
And along with this model, I
also got this grid sampler layer
Python script.
Now this grid sampler layer that
I'm talking about, it's also not
supported on Keras natively.
So the implementation that I got
online used that Keras custom
layer to implement the layer.
So as you can see, the concept
of customization is not unique
to Core ML.
In fact, it's very common in
most machine learning
frameworks.
This is how people experiment in
new layers.
Okay, so so far, I just have a
Keras model.
And now I want to focus on how
can I get a Core ML model?
So I'll open -- there, let me
launch a new Python notebook.
So I'll start by importing this
Keras model into my Python
environment.
Okay? So I import Keras, I
import the, the custom layer
that we have in Keras.
And now I will load the model in
Keras.
Okay? So this is how you load
model, Keras models.
You give the part to the model
and if there's a custom layer,
you give a part to that.
Okay. So we have the model now.
Now let's convert this to Core
ML.
So I'm going to import Core ML
Tools.
Execute that.
And now as I, as I showed you
before that this is just a call
to one function to convert it.
So let me do that.
That's my call.
And I get an error as expected.
Python likes to throw these huge
error messages.
But really what we're focused on
is this last line.
Let me --
So as we can see in this last
line it says that hey, the layer
or sampler is not supported.
So now let's see what we need to
do to get rid of that.
Maybe I clear this all so that
you can see.
Okay. So now I change my
converter call just a little bit
so I have my Core ML model.
And now I'm going to pass one
additional argument.
It's called custom conversion
functions.
And this will be a dictionary
from the name of the layer to a
function that I will define in a
minute.
And that I'm calling a good
sampler.
So let me take a step back and
explain what is happening here.
So as we know the way converter
works is that it goes through
each and every Keras layer.
It will, if you look at the
first layer.
Then [inaudible] its parameters
to Core ML.
If you go to the second layer,
then translate its parameters
and so on.
Now when it hits this custom
layer, it doesn't know what to
do.
So this function that I'm
passing here that convert this
sampler is going to help my
converter in doing that.
And let me show you what this
function looks like.
So this is a function.
There are a few lines of code,
but all that it's doing is three
things.
First, it's giving a name of a
class.
So as we might have noticed, the
implementation of the layer is
not here.
The implementation will come
later in the app and it will be
encapsulated in a class.
And this is the name of the
class that we'll later
implement.
So during conversion, we just
need to specify this class name.
That's it.
And then there's the description
which is a, which you should
provide so that if anybody is,
if somebody is looking at your
model, they know what it has.
And the third thing is basically
translating any parameters that
the Keras layer had to Core ML.
For this particular layer, it
has two parameters.
The output height, and output
weight.
And I'm just translating it to
Core ML.
If your custom layer that does
not have any parameters, then
you load, then you do not need
to do, do this.
If your layer has lots of
parameters, they can all go
here, and they will all be
encapsulated inside the Core ML
model.
So as you might have noticed
that all I did here was very
similar to how you would define
a class, right?
You give a class name.
Maybe a description, maybe some
parameters.
So now let me execute this.
And now we see that the
converter went, conversion went
fine.
So let me this is behaving very
weirdly for some reason.
If you don't mind, I'm just
going to delete this all.
So let me visualize this model,
and you can do that very simply
using function in Core ML Tools.
That's called visualize spec.
And here you can see a
visualization of the model.
So as we can see, we have the
[inaudible] and some layers
there.
And this is our custom layer.
And if I click on this, I see
the parameters that it has.
So this is the name of the class
that I gave.
And this, and these are the
parameters that I set.
It's always a good idea to
visualize your Core ML model
before you drag and drop just to
see if everything looks fine.
Okay. This is the wrong
notebook.
Okay. And now I'll save out this
model.
And now let's take a look at
this model.
So let me close this.
Okay.
Let me actually let me navigate
to the directory that I have.
And here's my model.
So if I click on it and see it
in Xcode just to see how it
looks.
We can see that it has the
custom description here.
Okay. Let me go back to slides.
[ Applause ]
So what we just saw was with a
few simple lines, we could
exhibit a convert a function to
Core ML.
And the process is pretty much
the same if you are using the
TensorFlow converter or the ONNX
converter.
So we have our model here on the
left-hand side.
The custom layer model with the
parameters.
Now when you drag and drop this
model into Xcode, you will need
to provide the implementation of
the class.
In a file say, for example,
[inaudible].
And this is how it would look
like.
So you have your class, so
you'll have the initializer
function.
So this would be just
initializing any parameters that
we had in the model.
And then the main function in
this class would be evaluate.
This is where the actual
implementation of whatever
mathematical function the layer
is supposed to perform will go
here, in here.
And then there's one more
function called output shape or
input shapes.
This just specifies the size of
the output area that the layer
produces.
This helps Core ML in allocating
the buffer size at load time so
that your app is more efficient
at runtime.
So we just saw how you would
tackle a new layer in a neural
network.
There's a very similar concept
to a custom layer, and it's
called custom model.
It has the same idea, but it's
sort of more generic.
So with a custom model, you can
deal with any sort of network.
It need not be a neural -- it
need not be a neural network.
And basically gives you just
more flexibility overall.
So let me summarize the session.
We saw how much more rich is
this ecosystem around Core ML
Tools and that's great for you
guys.
Because now you have lot of
choice to get Core ML models
from.
We saw how easy it was to
quantize this, quantize Core ML
model.
And we saw that with a few lines
of code, we could easily
integrate a new custom layer in
the model.
You can find more information at
our documentation page.
And come to the labs and talk to
us.
Okay, thank you.
[ Applause ]