WWDC2018 Session 712

Transcript

[ Music ]
>> [Applause] Hey, everyone.
[Applause] Thank you.
Good afternoon, and welcome to
our session on Turi Create.
This session builds upon
yesterday's session on Create
ML.
That session covered many of the
foundations of machine learning
in Swift.
So if you didn't catch that
session I do recommend you add
it to your watch list.
The goal of Turi Create is to
help you add intelligent user
experiences to your apps.
For example, you might want to
be able to take a picture of
your breakfast.
And tap on the different food
items to see how many calories
you're consuming.
Or maybe you want to control a
lightbulb using your iPhone and
simple gestures.
Maybe you want to track an
object in real time like a dog
or a picture of a dog if you
don't have dogs in the office.
[ Laughter ]
Maybe you have custom avatars in
your game, and you want to
provide personalized
recommendations like hairstyles
based on the beard that the user
has selected.
Or maybe you want to let your
users apply artistic styles or
filters to their own photos.
These are very different user
experiences.
But they have several things in
common.
First and foremost, they use
machine learning.
These experiences all require
very little data to create.
All these models were made with
Turi Create and deployed with
Core ML.
They all follow the five, the
same five-step recipe that we'll
go over today.
And all these demo apps will
also be available in our labs.
So please join us today or
Friday for the ML labs to get
hands-on experience.
Turi Create is a Python package
that helps you create Core ML
models.
It's easy to use, so you don't
need to be an ML expert or even
have a background in machine
learning to create these
exciting user experiences.
We make it easy to use by
focusing on tasks first and
foremost.
What we do is we abstract away
the complicated machine-learning
algorithms so you can just focus
on the user experience that
you're trying to create.
Turi Create is cross platform so
you can use it both on Mac and
Linux.
And it's also open source.
We have a repo on GitHub and we
hope you'll visit.
There's a lot of great resources
to get started.
And we look forward to working
with you and the rest of the
developer community to make Turi
Create even better over time.
Today we're excited to announce
that the beta release of Turi
Create 5.0 is now available.
This has some powerful new
features.
Like GPU acceleration.
And we'll go into these features
in detail later on today.
The main focus today is going to
be the five-step recipe for
creating Core ML models.
I'm going to start by going over
these steps at a high level.
And then we'll dive into some
demos and code.
So the first step you need is to
understand the task you're
trying to accomplish.
And how we refer to that task in
machine learning terms.
Second, you need to understand
the type of data you need for
this task.
And how much of it.
Third, you need to create your
model.
Fourth, you need to evaluate
that model.
That means understanding the
quality of the model and wheth--
whether it's ready for
production.
And finally, when your model's
ready to deploy, it's really
easy using Core ML.
Let's dig a bit deeper into each
of these steps.
Turi Create lets you accomplish
a wide variety of common machine
learning tasks.
And you can work with many
different types of data.
For example, if you have images,
you might be interested in image
classification.
Or object detection.
You might want to provide
personalized recommendations for
your users.
You might want to be able to
detect automatically activities
like walking or jumping jacks.
You might want to understand
user sentiment given a block of
text.
Or you might be interested in
more traditional machine
learning algorithms like
classification and regression.
Now we know this can get really
confusing to those of you who
are new to machine learning.
So we've taken a stab at making
this really easy for you in our
documentation.
We begin by helping you
understand the types of tasks
that are possible and then how
we reference them in machine
learning terms.
So what we can do now is revisit
those intelligent experiences
that we walked through at the
beginning of this presentation
and assign them to these machine
learning tasks.
For example, if you want to
recognize different types of
flowers in photos we call that
image classification.
If you want to take pictures of
your breakfast and understand
the different food objects
within them, we call them, that
object detection.
If you want to apply artistic
styles to your own photos, we
call that style transfer.
And if you want to recognize
gestures or motions or different
activities using sensors of
different devices, we call that
activity classification.
Finally, if you want to make
personalized recommendations to
your users, we call this task
recommender systems.
Now the great thing is that the
same five-step recipe we just
walked through also applies to
your code.
We begin by importing Turi
Create.
We proceed to load our data into
a data structure called an
SFrame.
And we'll go into a bit more
detail on the SFrame data
structure shortly.
We'll proceed to create our
model with a simple function,
.create.
This, this function extracts
away the complicated machine
learning behind the scenes.
We proceed to evaluate our model
with the simple function
.evaluate.
And finally we can export the
resulting model to Core ML's
ML-model format to be easily
dragged and dropped into Xcode.
Now I mentioned that this same
five-step template applies to
all the different tasks within
Turi Create.
So whether you're working on
object detection, image
classification or activity
classification, the same code
template applies.
For our first demo today, we're
going to walk through a
calorie-counting app that uses
an object detection model.
So we'll want to recognize
different foods within an image.
And we'll need to know where
those, where in the image those
foods are so that we could tap
on them to see the different
calorie counts.
So let's take a look at the type
of data that we'll need to
create this machine learning
model.
Of course we need images.
And if we were just building a
simple image classifier model,
we would just need our set of
images and labels that describe
the images overall.
But because we're performing
object detection, we need a bit
more information.
We need to understand not just
what's in the image.
But where those objects are.
Now if we zoom in a bit closer
to one example, we see a red box
around a cup of coffee.
And a green box around a
croissant.
We call those boxes bounding
boxes and we represent those in
JSON format.
With a label, and then with
coordinates x, y, width and
height.
Where x and y refer to the
center of that bounding box.
So it's also worth noting that
with object detection, you can
reference or detect multiple
images, multiple objects within
each image.
So I mentioned that we'd be
loading our data into this
tabular data structure called an
SFrame.
And in this example, we will end
up with two columns.
The first column will contain
your images.
The second column will contain
your annotations in JSON format.
Let's or by now you're probably
wondering what is an SFrame?
So let's take a step back and,
and learn more about it.
SFrame is a [inaudible] tabular
data structure.
And what this means is you can
create machine learning models
on your laptop.
Even if you have enormous
amounts of data.
SFrame allows you to perform
common data manipulation tasks.
Like joining two SFrames or
filtering to specific rows or
columns of data.
SFrames let you work with all
different data types.
And once you have your data
loaded into an SFrame, it's easy
to visually explore and inspect
your data.
Let's zoom in a bit more on
what's possible with SFrame.
With our object detector
example.
So after we import Turi Create,
in the case of our object
detector, we're actually going
to load two different SFrames.
The first containing our
annotations, and the second
containing our images.
We have a simple function
.explore that will allow you to
visually inspect the data you've
imported.
We can do things like access
specific rows.
Or columns of our data.
And of course we can do common
operations like joining our two
SFrames into one.
And saving the resulting SFrame
for later use or to share with a
colleague.
Next we create our model.
So I mentioned we have this
simple function .create that
does all the heavy lifting for
the creation of the actual
model.
And what we do behind the scenes
is we ensure that the model we
create for you is customized to
the task and that it's state of
the art.
Meaning it's as high quality and
high accuracy as we can get.
And we're able to do this
whether you have large amounts
of data or small amounts of
data.
It's very important for us that
all of our tasks work whether
you have even a small amount of
data as small as about 40 images
per item that you're trying to
detect in the class of object.
In the case of object detection.
Let's move on to evaluation.
So I mentioned we have a simple
function .evaluate that will
give you an idea of the quality
of your model.
In the case of object detectors.
We have two factors to consider.
First, we want to know did we
get the label right?
But we also have to know if we
got that bounding box right
around the object.
So we can establish a simple
metric with these two factors
and go through a test data set
scoring predictions against
known what we call ground-truth
data.
And so we want to make sure that
we have correct labels.
And then a standard metric is at
least 50% overlap in the
predicted bounding box when
compared with to the
ground-truth bounding box.
Let's look at a few examples.
In this prediction we see that
the model got the label right
with a cup of coffee.
But that bounding box is not
really covering the whole cup of
coffee.
It's only about ten percent
overlapping the ground truth.
So we're going to consider that
a bad prediction.
Here we see a highly accurate
bounding box, but we got the
label wrong.
That's not a banana.
So let's not consider that a
successful prediction either.
Now this middle example's what
we want to see.
We have 70% overlap of our
bounding box, and the correct
label, coffee.
So what we can do is
systematically go through all of
our predictions with a test data
set.
And get an overall accuracy
score for a new model.
And finally, we move to
deployment.
We have an export to Core ML
function that saves your model
to Core ML's ML model format.
So you can then drag and drop
that model in to Xcode.
This week we've actually some
exciting new features related to
object detection specifically.
So I encourage you to attend
tomorrow's Vision with Core ML
session to learn more.
In that session, the speaker
will actually take the object
detection model that we're
building today and go into more
detail about deployment options.
And there you have it!
The five-step recipe for Turi
Create.
[ Applause ]
Thank you.
So with that, I'm going to hand
off to my colleague Zach Nation,
for a demo.
[ Applause ]
>> Thanks, Aaron.
I think let's just jump straight
into code.
Who wants to write some code
live today?
We're going to go ahead and
build an object detector model
right now.
So I'm going to start out in
Finder.
Here I've got a folder of images
that I want to use to train a
model.
We can see this folder's named
Data, and it's full of images of
breakfast foods.
I've got a croissant, some eggs,
and so on.
This is, this is a good data set
for breakfast food.
I think let's go ahead and write
some code with it.
I'm going to switch over to an
environment called Jupyter
notebook.
This is an interactive Python
environment where you can run
snippets of Python code and
immediately see the output.
So this is a great way to
interactively work with a model.
And it's very similar in concept
to Xcode Playgrounds.
The first thing we're going to
do is import Turi Create as TC.
And that we way we can refer to
it as TC throughout the rest of
the script.
Now, the first task we want to
do is load data.
And we're going to load it up
into SFrame format.
So first we can say images
equals TC.loadimages.
And we're going to give it that
folder day that I just showed in
Finder.
And Turi Create provides
utilities to interactively
explore and visualize our data.
So let's make sure those images
loaded correctly and we got the
resulting SFrame that we
expected.
I'm just going to call .explore,
and this is going to open up a
visualization window where we
can see that we have two columns
in our SFrame.
The first is called path, and
it's the relative path to that
image on disk.
And the second column is called
image.
And that's actually the contents
of the image itself.
And we can see our breakfast
foods right here.
It looks like these loaded
correctly, so I'm going to
proceed.
Back in Jupyter notebook.
Now I'm also going to load up a
second SFrame called
annotations.
And this I'm just going to call
the SFrame instructor and
provide a file name to
annotations.csv.
This is a CSV file containing
the annotations that correspond
to those images.
And let's take a look at that.
Right in Jupyter notebook, we
can see that this SFrame
contains a path column, again
pointing to that relative path
on disk of the image.
And an annotation column
containing a JSON object
describing the bounding box and
labels associated with that
image.
But now we have two different
data sources and we need to
provide one data source to train
our model.
Let's join them together.
In Turi Create, this is as easy
as calling the join method.
I'm going to say data equals
images.joinannotations and now
we can see we have a single
SFrame with three columns.
It joined on that path column.
So for each image with a path,
it combined the annotations for
that path.
And so now for each image, we
have annotations available.
Now we're ready to train a
model.
So I'm going to create a new
section here called train a
model.
And that's just one line of code
here.
I'm going to say model equals
TC.objectdetector.create.
And this is our simple
task-focused API for object
detection that expects data in
this format.
I'm going to pass in that data
SFrame that I just created.
And for the purposes of today's
demo, I'm going to pass another
parameter called max iterations
and normally you wouldn't need
to pass this parameter because
Turi Create will pick the
correct number of iterations for
you.
Based on the data that you
provide.
In this case, I'm going to say
max iterations equals one just
to give an example of what
training would look like.
And the reason this is going to
take a minute is it actually
goes through and resizes all of
those images in order to get
them ready to run through the
neural network.
That is under the hood of this
object detector.
And then it will perform just
one iteration on this Mac GPO.
But this is probably not the
best model we could get because
I just wanted to train it in a
couple of seconds.
So I'm going to go ahead and
switch over to like cooking show
mode.
And I'm going to take one out of
the oven that we've had in there
for an hour [laughter].
[Applause] So I'm going to say
TC.loadmodel and it's called
breakfastmodel.model.
And this is one that I've had an
opportunity to train for a bit
longer.
So let's inspect that right here
in the notebook.
And we can see that it's an
object detector model.
It's been trained on six
classes, and we trained it for
55 minutes.
This is means, this means you
can train a useful
object-detector model in under
an hour on your Mac.
Next, let's test the predictions
of this model and see if it's
any good.
So I'm going to make a new
section here called inspect
predictions.
And we're going to go ahead and
load up a test data set.
And here I've already prepared
one in SFrame format.
So I'm just going to load it,
and I called it
testbreakfastdata.sframe.
There are two important
properties of this test SFrame.
One is that it contains the same
types of images that the model
would have trained on.
But the second important
property is the model has never
seen these images before.
So this is a good test for
whether that model can
generalize to users' real data.
I'm going to make predictions
from that whole test set by
calling model.predict and
providing that test SFrame.
And we'll get a batch prediction
for the whole SFrame.
And that'll just take a few
seconds.
And then we're going to inspect.
I'm just going to pick a random
prediction here.
Let's say index two.
Here we can see the JSON object
that was predicted in just the
same format that the training
data is provided in.
So here we have coordinates,
height, width, x and y.
And a label, banana.
And we get a confidence score
from the model.
In this case about .87.
This is a little bit hard for me
as a human to interpret though.
I can't really tell if this
image is really supposed to be a
banana or whether these
coordinates are where the banana
would appear in that image.
Turi Create produces a function
to take the predicted bounding
boxes or the ground-truth
bounding boxes and draw them
right onto the images.
So let's go and do that.
I'm going to create a new column
in my test SFrame called
predicted image.
And I'm going to assign it the
output of the object detector
utility called draw bounding
boxes.
And I'm going to pass into draw
bounding boxes that test image
column.
So that's the image itself and
then I'm also going to pass the
predictions that I just got from
the model.
That's going to draw those
predicted bounding boxes onto
each image.
Now let's take a look at that
number two prediction again.
This time in image form.
So I can say
testpredictedimage2.show.
And it will render right here in
the notebook.
[ Applause ]
And this is great as a spot
check because at least for one
picture, we know that the
model's working.
But this doesn't tell us if
it'll work for say the next
50,000 images that we pass in.
So for that, we're going to
evaluate the model
quantitatively.
And I'm going to start a new
section here in the notebook
called evaluate the model.
And what we're going to do is
call model.evaluate and once
again, I'm going to pass in just
that whole test data set.
Here the evaluation function is
going to run the metric that
Aaron described, testing whether
the bounding boxes are
overlapping at least 50% and
have a correct label.
And it's going to give us that
result across each of the six
classes that we trained on.
So here we can see that our
overlapping bounding boxes with
the correct label are happening
about 80% of the time for bagel.
About 67% of the time for
banana.
And so on.
That's pretty good, so I think
let's, let's see if this model's
actually going to work in a real
app.
I'm going to go ahead and call
exportcoreml to create a Core ML
model from the model we just
trained.
And I'm going to call it
breakfastmodel.mlmodel.
And then as soon as that's done
training, I'm going to go ahead
and open it in finder.
Or sorry. As soon as that's done
exporting.
So here in finder, I've got my
breakfastmodel.mlmodel.
And when I open it in Xcode, I
can see that it looks just like
any Core ML model.
It takes an input image, and as
output we get confidence and
coordinates.
And that's going to tell us the
predicted bounding box and label
for the image that we have.
Now let's switch over to the
iPhone app where we're going to
consume this model.
So here on my iPhone, I've got
an app called Food Predictor.
And this is going to use the
model that we just trained.
Here I'm going to choose from
photos.
And I've got a picture of this
morning's breakfast.
This is a pretty typical
breakfast for me: coffee and a
banana.
Well, often I skip the banana.
But suppose I ate a banana this
morning.
We can just tap right on the
image.
And because we know the bounding
box, we can identify the object
within that bounding box.
And here we see the model tells
us this is a banana, and this is
a cup of coffee.
[ Applause ]
So let's recap what we just saw.
First, we loaded images and
annotations into SFrame format
and joined them together with a
simple function call.
We interactively explored that
data using the explore method.
We created a model just with a
simple high-level API< passing
in that data object containing
both the images and the bounding
boxes and labels.
We then evaluated that model
both qualitatively,
spot-checking the output as a
human would.
And quantitatively, asking for a
specific metric that applies to
the task that we're doing.
Then we exported that model to
Core ML format for use in an
app.
Next, I'd like to switch gears
and talk about some exciting new
features in Turi Create 5.0.
Turi Create 5.0 has a new task
called style transfer.
We have major performance
improvements from native GPU
acceleration on your Mac.
And we have new deployment
options including recommender
models for personalization and
vision feature print-powered
models so that you can reduce
the size of your app.
Taking advantage of models that
are already in the operating
system.
Let's talk a little bit more
about that style transfer task.
Imagine we've got some style
images and these are really cool
looking recognizable stylistic
images.
Here we've got sort of a light
honeycomb pattern and a very
colorful flower pattern.
And we want apply those as
filters to our own images that
we take with a camera.
We've got a dog photo here and
what it would look like to apply
those styles to that dog is
something like that.
And with a style transfer model,
we can take the same styles and
apply them to more photos.
Let's say a cat and another dog.
And that's the sort of effect we
would get.
Here's an example of an app that
uses style transfer for filters
on photos that a user would
take.
The code to create the style
transfer model follows the same
five-step recipe as any other
high-level task in Turi Create.
So you can start by importing
Turi Create, loading data into
the SFrame format, creating the
model with a simple high-level
API.
Then we make predictions, in
this case with a function called
stylize to take an image and
apply that style filter.
Finally, we can export it for
deployment into Core ML format,
just like any other model in
Turi Create.
So let's take a look at another
demo.
This time, we're going to build
a style-transfer model.
So switching back over to that
Jupyter notebook environment.
I'm going to start once again by
importing Turi Create.
As TC.
Then, I'm going to load up two
SFrames, each containing images.
One, is the style images I'm
going to call tc.loadimages, and
I'm going to give it a directory
name, styles.
And then the other is content
images.
And the way that this works is
the style images are the styles
that you want to turn into
filters you can apply.
And the content images can be
any images that are
representative of the types of
photograph that you would want
to apply those filters to.
So in this case, it's just a
variety of photographs.
We're going to load a folder
called content into an SFrame
for that one.
And then we're ready to go ahead
and train a model.
So I'm going to say model equals
tc.styletransfer.create.
And I'm going to pass in style
and content and that's all we
need.
But that's going to take a bit
too long to train for today's
demo.
So once again, I'm going to do
it like a cooking show, and
we're going to load up one that
we've had in the oven already.
I'm going to say model equals
tc.loadmodel and I'm going to
load up my already trained style
transfer model.
Let's take a look at some of
these style images to see what
we should expect this model to
produce.
We have a style image col-- we
have an image column in our
style SFrame.
And let's take a look at just
style number three and see what
that looks like.
It-- sort of like a pile of
firewood.
And this is pretty stylistic.
I think this would be
recognizable if we were to apply
it as a filter to another image.
Now let's take a look at some
content images.
I'm going to load up a test data
set.
And once again this is a data
set that is representative of
the types of images that users
will have at runtime in your
app.
And the important thing is that
the model never saw these at
training time.
So by evaluating with the test
images, we'll know whether the
model can generalize to users'
data.
I'm going to load up a test data
set now.
With tc.loadimages function once
again.
And we're going to call that
folder test.
And I'm going to pull out one
image from the test data set
called ample image.
And I'm just going to take the
first image there.
And I'm going to call .show.
So that we can we see what that
image looks like without any
filters applied.
That's my cat, seven of nine.
She always looks like that.
And we're going to go ahead and
stylize that image using the
model that we just trained.
So I'm going to say stylized
image equals model.stylize.
And in this case, the function
is called stylize because the
model is specific to the task of
style transfer.
And we're going to pass in that
sample image.
And I'm going to say style
equals three because that's the
style that we picked earlier
that looks like firewood here.
So let's see what that stylized
image looks like.
I can call .show on that, and
here is my cat looking like a
pile of firewood.
[ Applause ]
Let's make sure this works on
other styles, too.
I'm going to go ahead and make a
stylized image out of that
sample image.
And I'm going to specify a style
equals seven.
And then let's see what that
looks like.
That looks pretty good.
I wonder what the style was that
we just applied to that.
Let's take a look at style
images.
Style image seven.
And once again we can just call
.show to see what that style
image looks like and yeah, that
looks like the filter that we
just applied to my cat.
Now that we've got a good style
transfer model, we can just call
model.exportcoreml exactly the
same as any other model, and
save it into Core ML format.
Now, let's switch over to the
iPhone where we have a style
transfer app ready to apply the
filters in this model.
So here I've got my iPhone once
again.
And I have an app called style
transfer.
I'm going to choose a photo from
my photo library to apply these
styles to.
These are my dogs.
This is Ryker and we're going to
see what styles we have
available in this app.
We can scroll through all of the
styles here and what's important
to note is that a single style
transfer model was trained on
all of these files.
And one model can include any
number of styles.
So to have multiple filters, you
don't need to greatly increase
the size of your app.
Let's see what those styles look
like applied to Ryker.
Pretty cool.
So--
[ Applause ]
So to recap what we just saw, we
loaded images into SFrame
format.
This time style and content
images into two SFrames.
We created a model using a
high-level API for style
transfer that operates directly
on a set of style images and a
set of content images.
We then stylize images to check
whether the model is performing
well.
We visualized those predictions
in Turi Create.
And finally we exported the
model in Core ML format for use
in our app.
Switching gears a bit.
I want to talk about some other
features in Turi Create 5.0.
We now have Mac GPU acceleration
offering up to a 12x performance
increase in image
classification.
And 9x in object detection, and
that's on an iMac Pro.
[ Applause ]
We have a new task available for
export into Core ML format.
Personalization.
The task here is to recommend
items for users based on user's
historical preferences.
This type of model is deployed
using Core ML's new custom model
support that's available on
macOS Mojave and on iOS 12.
This has been a top community
feature request since we open
sourced Turi Create.
So I'm really excited to bring
it to you today.
[ Applause ]
The recommender model in Core ML
looks just like any other Core
ML model.
But what's worth noting is
there's a section at the bottom
here called Dependencies.
And in this section, you can see
that this model uses a custom
model and that model is called
TC recommender.
And this is just Turi Create
providing support for
recommenders in Core ML through
that custom model API.
Using that model in Core ML look
very similar to any other Core
ML model as well.
You can just instantiate the
model, create your input.
So in this case, we've got that
avatar creation app.
And a user might have picked a
brown beard and a brown
handlebar moustache and brown
long hair for their avatar.
And we can make predictions from
the model.
By providing those interactions
as input.
And where we say k10 means we'll
get the top ten predictions
given those inputs.
So to recap what we've learned
today.
Turi Create allows you to create
Core ML models to power
intelligent features in your
apps.
It uses a simple five-step
recipe starting with identifying
the task that you're doing.
And mapping it to a machine
learning task.
Gathering and annotating data
for use in training that model.
Training the model itself using
a simple, high-level API
specific to the task that you're
doing.
Evaluating that model in Turi
Create both qualitatively and
quantitatively.
And finally, deploying in Core
ML format.
That five-step recipe maps to
code starting with import Turi
Create.
You can load data into the
SFrame format.
Create a model using that
task-specific API.
Evaluate the model with an
evaluate function that's once
again specific to the task that
you're doing.
And export for deployment,
calling the export Core ML
function.
Turi Create supports a broad
variety of machine learning
tasks.
Ranging from high-level tasks
like image classification and
text classification, all the way
to low-level machine learning
essentials.
Like regression and
classification on any type of
data.
And using the resulting models,
you can power intelligent
features in your apps like
object detection or style
transfer for use as a filter.
For more information, please see
the Developer.Apple.com session
URL.
And please come to our labs.
We've got a lab this afternoon
and Friday afternoon.
And we welcome your feedback.
We're happy to answer any
questions you have.
And we'll have all of the demos
we showed today available to
explore.
Thank you.
[ Applause ]