WWDC2017 Session 703

Transcript

[ Crowd Sounds ]
[ Applause ]
>> Wow. It's very exciting to be
here.
I'm Gaurav.
And today we are going to
discuss machine learning.
Machine learning is a very
powerful technology.
And together with the power of
our devices, you can create some
really amazing experiences.
For example, you can do things
like real-time image recognition
and content creation.
In the next 40 minutes, we are
going to tell you all about
those experiences and how Apple
is making it easy for you to
integrate machine learning in
your app.
In particular, we will be
focusing on Core ML, our machine
learning framework.
At Apple, we are using machine
learning extensively.
In our photos app, we use it for
people recognition, scene
recognition.
In our keyboard app, we use it
for the next word prediction,
smart responses.
We use it even in our watch to
do smart responses and
handwriting recognition.
And I'm sure you guys also want
to create similar experiences in
your app.
For example, some of you might
like to do real-time image
recognition.
In this way, you can tell your
users, provide them more
contextual information about
they are seeing, for example
whether they are eating a hotdog
or not.
[ Laughter ]
We want you to enable all of
these great facility in your
app.
And, hence, we are introducing
machine learning frameworks just
for you guys to enable all the
awesome things.
You can clap.
[ Applause ]
But before we clap, we should
understand why do we even need
machine learning in the first
place?
And when do you need machine
learning?
So let's just say I'm a florist.
And all I want to show is all
the images of roses in user's
photo library.
This seems like a simple task,
so I can do some programming.
Perhaps, I'll start with the
color.
I say if the dominant color in
the picture is red maybe it's a
rose.
The problem is there are many
rose that are white in nature,
white in color, or yellow in
color.
So I will actually go forward
and start describing the shape.
And soon I will realize that
it's very difficult to write
even such a simple program
programmatically.
Hence, [inaudible] to machine
learning for our help.
Rather than describing how a
rose looks like
programmatically, we will
describe a rose empirically.
We will learn that
representation of rose
empirically.
Machine learning actually has
two steps.
One, training.
So what you will do in this
case, off-line you will collect
images of roses, lilies,
sunflowers and label them.
You will run it past through a
learning algorithm.
And you will get what we call as
a model.
This model is an empirical
representation of how a rose
look like.
Almost all of this stuff happens
off-line.
And there is a big [inaudible]
on the Internet around training.
Training is a complex subject.
Now once you have the model, you
want to use this model.
So what you will do, you will
take a picture of a rose, embed
this model in your app, you run
it past through your model, and
you will get the label and the
confidence.
This step is known as inference.
Training is very complex, but
inference is also getting very,
very challenging nowadays.
For example, if you want to
describe a new, deep neural
network programmatically in your
code, you have to write
thousands of lines of code just
to describe the network.
It can be [inaudible] tedious
and perhaps you can face your
biggest challenge in proving the
correctness of such a program.
Once you make sure that the
program is correct, you have to
make sure that you're getting
the best performance possible.
Finally, you also have to make
sure that there is energy
efficiency involved here.
These tasks can be very, very
challenging and can bog you down
completely.
At Apple, we are using on-device
machine learning for several
years now.
And we have solved these
challenges.
We have faced these challenges
and solved them.
And we really don't want you to
solve them.
We will solve them for you.
We want you to focus only on the
experience.
We will give you the
implementation, the best
implementation possible.
And, hence, we are introducing
machine learning frameworks for
you in which case we take care
of all the challenges, and you
take care of all the experience
in your app.
By doing so, we are going to
follow a layered approach.
At the top is your app.
We are going to introduce new
domain specific framework, in
particular, Vision.
The Vision framework will
actually allow you to do all the
tasks related to computer
vision.
The best part about computer
vision of Vision framework is
that you don't have to be a
configuration expert for using
Vision framework.
We are also enhancing our NLP
APIs to include more languages
and more functionality.
We are introducing a brand-new
framework called Core ML.
This framework has an exhaustive
support of machine learning
algorithms, more deep learning
as well as a standard.
And finally, all of these
frameworks are built on top of
Accelerate and Metal Performance
Shaders.
To give you a sneak peak, we
have Vision framework.
Vision is our one-stop shop to
do all things related to
computer vision and images.
You can use it to do things like
object tracking or more modern
deep learning-based face
detection.
There is a lot more to Vision
framework.
We are going to discuss Vision
in detail tomorrow afternoon.
NLP, this is our one-stop shop
to do the text processing.
You can use it to do things like
language identification.
And we are also releasing Named
Entity Recognition APIs.
These APIs can actually tell you
in your text if it is a name of
the location, people, and
organization.
We believe this will be very
powerful for many app developers
so please check this out
tomorrow more in the morning
session on NLP.
Core ML, until now, we were
discussing domain specific
framework.
Core ML is our machine learning
framework and is domain
agnostic.
It supports multiple kind of
inputs such as images, text,
dictionaries, raw numbers.
So you can do things like music
tagging.
There's another interesting part
of our Core ML that it has the
ability to deal with mixed
input/output type.
So you can take an image and you
can get a text out.
The best part about all these
frameworks is that they can work
together.
So you take a text, you pass it
through your NLP, you get the
output, and you pass it through
Core ML and do things like
sentiment analysis.
We are going to discuss it in
detail in our in-depth Core ML
session, how you do that exactly
on Thursday.
All of these frameworks are
being powered by Accelerate and
MPS.
These are our engine.
Now you can use Accelerate and
MPS whenever you need some kind
of math functionality.
So it doesn't have to be
necessarily related to machine
learning.
You can also use these
Accelerate and MPS when you are
doing custom ML models.
So if you are using a machine
learning model, you might like
to use Accelerate and MPS.
All these APIs run on user's
device.
And they're very, very highly
performing.
And we work very hard to make
sure you get the best
performance possible.
Now because they learn on user's
device, there are advantages.
First, user privacy.
Your users will be happy to know
that you're not sending their
personal messages, texts, and
images to the server.
Your users don't have to pay
extra data cost just to receive
these -- just to send this data
to the server to get a
prediction.
You also don't have to actually
give huge amount of money to
server companies just to set up
the servers and get a
prediction.
Our devices are very, very
powerful.
You can do a lot here.
And finally, your app is always
available.
Let's just say your user goes to
Yosemite in the wilderness where
there is no network
connectivity.
They can still use your app.
And this is very powerful.
However, perhaps one of the best
things you can do is real-time
machine learning.
Our devices are very powerful.
They can run modern deep neural
networks such as ResNet 15
[inaudible].
You can use our devices to do
real-time image recognition.
In these cases, you really don't
have [inaudible] to actually get
the application.
[ Applause ]
So for any latency sensitive
things, you might like to do
real-time image recognition.
So just to recap here, we are
releasing a series of machine
learning frameworks.
If your app is a vision-based
app, please use Vision
framework.
If your app is a text-based app,
please use NLP.
Now if you think that NLP and
the Vision framework are not
providing the APIs you need, go
down on Core ML.
Core ML provides exhaustive
support of standard machine
learning and deep learning
algorithms.
And let's just say you are
[inaudible] your APIs machine
learning algorithms then you
should use Accelerate and MPS.
We are going to discuss all
these frameworks in detail in
the next couple of days so you
are [inaudible], but today let's
just focus on Core ML.
And to do that, I'll invite my
friend and colleague, Michael.
>> Thanks Gaurav.
Hi everyone.
I'm so excited to talk about
Core ML and its role in helping
you make a great app.
We're going to start out with a
high level overview of Core MLs
main goals.
We'll then talk a little bit
about models and how they're
represented.
And then we'll walk you through
the typical development process
using Core ML.
So the Core ML framework is
available on macOS, iOS,
watchOS, and tvOS.
But Core ML is more than just a
framework and a set of APIs.
It's really a set of development
tools all designed around making
it as easy as possible to take a
machine learned model and
integrate it into your app.
This will let you focus on the
user experience you're trying to
enable rather than the
implementation details.
Core ML is simple to use, will
give you the performance you
need while being compatible with
a wide variety of machine
learning tools out there.
Its simplicity comes from having
a Unified Inference API across
all model types.
Xcode integration will let you
interact with machine learned
models using the same software
development practices you're all
already experts in.
It uses the inference engines
Apple is already shipping to
millions of customers in its
apps and systems services now
being made available to you via
Core ML.
And as mentioned, these are
built on top of Metal and
Accelerate to make the best use
of hardware that your apps are
being deployed on.
Core ML is also designed to work
in this rapidly evolving machine
learning ecosystem.
It defines a new public format
for describing models as well as
a set of tools that will allow
you to convert the output of
popular training libraries into
this format.
So that's Core ML in a nutshell.
It's all about making it super
easy to get a learned model
integrated into your app.
So let's talk a little bit more
about these models.
Since Core ML, a model is simply
a function.
Now the logic for this function
happens to be learned from data
but just like any other
function, it takes in a set of
inputs and produces a set of
outputs.
In this case, shown on the
slide, we have a single input of
an image and an output, perhaps
telling you what type of flower
is present in it.
Now many of the use cases in
which you may want to apply a
machine learning model have some
key function at its core.
A common type of function is
performing some sort of
classification.
It's taking a set of inputs and
assigning some categorical
label.
Take, for example, sentiment
analysis.
Here, you may take in an English
sentence and output whether the
sentiment was positive or
negative, here represented by an
emoji.
Now, in order to encode
functions of this type, Core ML
supports a wide variety of
models.
It has extensive support for
neural networks both feedforward
and recurrent neural networks
with over 30 different layer
types.
It also supports tree ensembles,
support vector machines, and
generalized linear models.
Now each of these model types
are actually a large family of
models.
And we could spend the rest of
this session and the rest of
this conference just talking
about any one of them, but don't
let that intimidate you.
You do not need to be a machine
learning expert to make use of
these models.
You continue to focus on the use
case you're trying to enable and
let Core ML handle those
low-level details.
Core ML represents a model in a
single document.
That is, sharing a model is just
like sharing a file.
This document has the high level
information you, as a developer,
need to program against.
That is the functional
description, its inputs, types,
and outputs as well as those
details that Core ML needs to
actually execute the function.
For a neural network, this may
be the structure of the neural
network along with its trained
parameter.
We encourage you to come to our
Thursday session to learn more
about this format and the set of
models it can encode.
But maybe now in the back of
your head you're starting to
think well, where do I get these
models?
Where do models come from?
Well, a great place to start is
to visit our machine learning
landing page on
developer.apple.com.
There you'll find a small
collection of task specific,
ready to use models already in
the Core ML format.
We'll be expanding the set
overtime.
And we encourage you to explore.
Give it a try.
This is a great way to get
introduced to machine learning
in general and Core ML and how
it can be used in your app.
But we may not have all the
models you need.
And for your application, you
may need to make some
customization to an existing
model or make a whole new model
from scratch.
And for that, we encourage you
and your colleagues to tap into
the machine learning community.
There is a thriving community
out there.
There are tons of libraries and
models already existing for you
to start playing with.
In addition, there are wonderful
online courses and new tutorials
are popping up every week.
You can do it.
There's great resources to
learn.
And I'm confident that all of
you can get started.
It's a wonderful time.
But you'll see that most of
these libraries are focused
around training.
That is, it's a very important
step but they'll spend a lot of
time making sure that you can
train a great model.
What we want you to do is let
Core ML take you that last mile
from taking that machine learned
model and actually deploying it
in your app.
So we're doing this by
introducing the Core ML Tools
Python package.
This is a Python package that is
centered completely around
taking the output of these
machine learning libraries and
getting them into the Core ML
format.
We'll be expanding these tools
over time, but they're also
being made Open Source.
[ Applause ]
So what this means, if you don't
find the converter you need
there, we're super confident you
can already write your own.
All of the primitives are there
ready for you to use.
Now I encourage you, again, to
visit us on Thursday to learn
more about this Python package
and how you can easily convert
models.
OK. Now we have this model.
Let's talk about how we use it
in our apps.
So you're going to start with
the machine learned model in
this Core ML format.
You're going to simply add it to
your Xcode project.
And Xcode will automatically
recognize this model as a
machine learned model, and it
will generate an interface for
you.
That will give you a
programmatic interface to this
model using data types and
structures you're already
familiar programming against.
You'll program using this
interface and compile it into
your app.
But in addition, the model
itself will also get compiled
and bundled into your app.
That is, we'll take this format
that's been optimized for
openness and compatibility and
turn it into one that is
optimized for run time on the
device.
To show you this in action, I'm
going to invite my friend and
colleague, Lizi, to the stage.
[ Applause ]
And I'm an engineer on the Core
ML team.
Today, let's take a look at how
you can use Core ML to integrate
machine learning into your
applications.
Now first some context.
As much as I love machine
learning, I also love to garden.
And if you're anything like me,
you may get really excited
whenever you come across a new
kind of flower and instantly
want to identify it.
So we sought out to build an app
that would take images of
different types of flowers and
classify them by type.
But first, in order to do that,
we needed a model.
So we went ahead and we
collected many different images
of different types of flowers,
and we labeled each one.
I then trained a neural network
classifier using an Open Source
machine learning tool and
converted it into our Core ML
format.
So with this trained model and
with this app shell of this
flower classifier, let's take a
look at how we can put the two
together.
So over in Xcode, you can see
I'm currently running the flower
classifier app and nearing the
display on my device.
And what I can do is I can go
into the photos library and
choose an image, but right now
it doesn't currently predict
anything for the output.
It just displays this string.
So if I move over to the
ViewController, what we see is
it has this caption method that
currently just returns that
blank string.
It doesn't actually do anything.
Over here, I have this flower
classifier in the ML model
format.
And all I need to do is take it
and drag it over into the
project.
And we can see, as soon as I add
it, Xcode automatically
recognizes this file format.
And it populates the name of the
model as well as what type it is
underneath.
We can see that the size is 41
megabytes and also other
information like the author,
license, or description
associated with it.
In the bottom, we can see the
inputs and outputs that this
model takes which, first is
flower image, a type of image
that's ordered red, green, blue,
the following width and height.
We can also see that it outputs
a flower type as a string.
So if give it an image of a
rose, I would expect flower type
to be rose.
There's also a dictionary full
of probabilities for different
types of flowers.
Now the first thing I want to do
is I want to make sure that I
add this to my app.
And you'll see that afterwards
this middle pane will show that
the Swift generated source has
arrived in the application.
What that means is we can
actually use this generated code
to load the model and predict
against it.
To show you, let's go to the
ViewController.
The first thing I want to do is
define this flower model.
And to instantiate it, all I
need to do is use the name of a
model class itself.
We'll see that it's highlighted
because it exists in the
project.
Now next in the caption method,
let's define a prediction.
And we'll actually use the
flower model and look inside.
We'll see using auto complete
that this prediction method is
available that takes a
CVPixelBuffer as input.
We'll use that and pass the
image directly to it.
Now, instead of returning this
preset string, we can use the
prediction object and instead
return the flower type.
Now if I save, let's build and
run the app.
And what's happening while I do
this is the model itself is
actually getting compiled and
bundled with the application.
So if we go back to the device,
if I go to my photos library, I
can choose a rose on the second
row.
And we can see it actually is
able to display rose.
[ Applause ]
We give it another one, maybe
the sunflower on the third.
It's also able to get this one.
Or even something a little bit
more difficult like the passion
flower on the bottom, it can
still do.
So to recap, we were just able
to drag a trained ML model into
Xcode and easily add it into our
application with three lines of
code.
And now we have a full neural
network classifier running on
the device.
But I have some other models
that we might want to try so
let's go through this again.
You may have noticed there's
also this FlowerSqueeze.
And the first thing I'll do is
try to drag it in again.
And then I'll also add this one
to the flower classifier, the
hello flowers target so that it
starts generating the Swift
source in the background.
And we'll see, again if we zoom
in, that the name of this model
is different.
It's called FlowerSqueeze but in
the bottom the inputs and
outputs are exactly the same.
What this means is that if I go
back into the ViewController,
instead of flower -- actually,
let's delete flower classifier.
We don't want that model
anymore.
And in the ViewController,
instead of instantiating this
one, we can use FlowerSqueeze.
And the really neat thing about
this model is its size.
So if we go back, this one is
only 5.4 megabytes, which is a
huge difference.
Back in the device, we can go to
the photos library and choose
another one like love in the
mist and see that this app is
also able to classify this as
well.
Try another passion flower or
even a more difficult photo of a
rose on the side and it's still
able to predict correctly.
So to recap, we just saw that
we're able to run this neural
network classifier on device
using Core ML, all with only a
couple lines of code.
[ Applause ]
Now let's go back and recap what
we just saw.
We saw that in Xcode, as soon as
you drag an ML model that's been
trained into your application,
you get this view that's
populated that gives you
information such as the name of
the model, the type that it is
underneath, and also other
information like the size or any
other information the author put
in.
You can also see in the bottom
that you get information such as
the input and output of the
model which is very helpful if
you're actually trying to use
it.
And once you add it to your app
target, you get generated code
that you can use to load and use
the model to predict.
We also saw in our application
how simple it was to actually
use it.
Again, even though the model was
a neural network classifier, all
we needed to do to instantiate
it was call the name of the
file.
This means that the model type
was completely abstracted so
whether it's a support vector
machine, a linear model, or a
neural network, you all load
them the exact same way.
We also noticed that the
prediction method took a
CVPixelBuffer.
It was strongly typed so you
know you'll be able to check
input errors at compile time
rather than at runtime.
But now let's take a deeper look
at the generated source that's
being created by Xcode.
Here, we see it defines three
classes, first of which is the
input that defines the flower
image as a CVPixelBuffer.
We then see the output with the
same two types that were listed
in the generated interface.
And next, in the model class
itself, we see the convenience
initializer that we use to
actually create the model and
the predict method as well.
But also inside this generated
source, you have access to the
underlying model.
So for more advanced use cases,
you can use it as well to
programmatically interact with
it.
We take a look inside the
MLModel class, we can see that
there's a model description
which gives you access to
anything you saw in the metadata
in Xcode.
There is also an additional
prediction method that takes
protocol conforming input and
returns protocol conforming
output giving you more
flexibility in how the input is
provided.
So we walked through the top
half of this development flow
and saw what happens when you
drag a trained model into Xcode.
It's able to generate Swift
source that you can then use in
your -- to use the model in your
application.
The model itself, as we saw, is
also compiled and bundled in
your app.
To elaborate on this step a bit
more, what we do, we first take
the model from its JSON form,
which is optimized for
portability and ease of
conversion, and compile it so
that it loads quickly on your
device when you initialize it.
We also make sure that the model
is matched to the underlying
inference engines that are
ultimately evaluating it on your
device giving you optimized
prediction when you need it.
And the last thing we saw in the
demo was we swapped out models.
Mostly just to reduce the size
but there are other reasons why
you might want to play around
with different ones.
For example, accuracy or just to
test out different types to see
how they perform.
In summary, today we saw an
overview of some of the machine
learning frameworks that we have
available.
We also introduced Core ML, a
new framework for on-device
inference.
And also we showed you the
development flow in action.
For more information, make sure
you check us out at
developer.apple.com with our
session number 703.
And also, come visit some of the
related sessions such as Vision
and NLP and Core ML in depth on
Thursday if you want to see some
more use cases of how you can
use Core ML or how to actually
convert models into the Core ML
format.
Thank you and enjoy the rest of
the conference.
[ Applause ]