WWDC2017 Session 208

Transcript

[ Applause ]
>> Hello, and good morning,
everyone.
How's everyone doing, this
morning?
Great. Welcome to our talk on
Natural Language Processing.
We are delighted to share our
natural language processing APIs
and tell you how you can
incorporate these APIs into your
very own apps.
I'm Vivek, and I'll be jointly
presenting this session with my
colleague Doug.
In case you're wondering,
there's a third person.
No. I just have a really long
name.
Okay. So, let's start with the
goal of this session.
What we'd like to do, is we'd
like to put you at the center of
this session.
Here is an app.
This could be an app that you've
already published in the App
Store.
Or maybe, it's something that
you're working on, currently.
Or perhaps, it's just an idea
that you've conceived.
If your app leads with natural
language text in any form at the
input, and this could be typed
text on your keyboard.
Could be recognized handwriting
or transcribed speech.
For instance, you may be just
ingesting feeds or social media
feeds into your app.
Or if your app leads with
natural language text at the
output.
What do I mean by that?
if the user is generating
content within your app, perhaps
the user is writing reviews in
your app.
Or it's a productivity app where
the user is writing new
documents or editing documents.
So, if you're dealing with
natural language text, either at
the input or output, we'd like
to harness the power of NLP to
significantly improve the user
experience in your app.
So, when I say harness the power
of NLP, what does it really
mean?
So, I'm talking about natural
language APIs.
Now, these natural language
processing APIs are the same
APIs that drive several first
party apps across the entire
Apple ecosystem, across all out
platforms.
It drives everything from
keyboards to Spotlight to
Messages to Safari.
You've even seen an instance of
the APIs in action at the
Keynote.
You've seen how it can
significantly improve typing
experience for users.
So, let me set this up.
So, I've been communicating with
my friend and we have been
planning a trip to Iceland.
And I'm going to type, ''From
Reykjavik let's go to
Vatnajokull, which is the
largest glacier in Iceland and
all of Europe.'' Unfortunately,
the keyboard does not know what
I'm typing.
It actually thinks I'm typing
Batman.
No offense.
I love Batman.
But I seriously doubt he lives
anywhere close to Reykjavik.
But on the other hand, if you've
been browsing articles, you've
been looking for content based
on Iceland.
Presumably, you have been
planning your itinerary.
So, in this instance, I've been,
you know, reading a bunch of
stuff about Iceland, off the
east coast of Iceland.
NLP, through the power of
machine learning, can
automatically extract names like
Vatnajokull, Egilsstadir, from
this article.
And then, feed all those things
back to the keyboard.
And consequently, if you type
things what do you see?
The stuff that you just read.
And all of that is completely
due to NLP APIs.
So, those are a couple of
instances through how NLP APIs
can influence first party apps.
But as I said, we'd like to
focus on you.
And for the rest of the talk,
we'd like to talk about your
ideas and your apps.
When I look at the audience
here, I see a very diverse
group.
Some of you may be using NLP on
a day to day basis, are experts
at NLP.
When others are really curious
about this.
I mean, NLP and machine learning
is such a hard buzzword that
everybody wants to learn and
leverage these in their own
apps.
So, I think it's constructive to
just spend a little bit of time
talking about what does NLP
constitute, before we go into
the APIs themselves.
So, at a very high level, as I
mentioned, you have natural
language text, which could be
generated through any modality.
And then, you have to do some
processing.
Duh. It's NLP.
You didn't come here to learn
you have to do processing.
But what does processing entail?
It means that we have to convert
raw text into some sort of a
meaningful information.
And this meaningful information
is typically used to improve a
interaction between a user and a
device or between two devices.
Now, let's try to break this
down a little bit further.
When kind of still looks very
upstride.
So, I'd like to break this
nebulous cloud of processing
into fundamental building blocks
of NLP, into something that all
of us can wrap our heads around
and possibly transform into an
API.
So, let's look at each of the
fundamental building blocks of
NLP.
The first is language
identification.
What is the task of language
identification?
Before you start doing any
processing of text, you ought to
know what the language of the
text is.
And that's what language
identification does.
So, it leverages machine
learning techniques to identify
the language or script of the
piece of text.
So, I have a bunch of examples,
here.
If you feed in this string, the
first string, it's going to say
it's English.
Or you have simplified Chinese,
or you have Spanish, or you have
Hindi, or German.
So, once you identify the text,
you can start analyzing the
text.
But when you have text which is
a very large chunk, you could
have an entire document.
I mean, logically, you want to
break down this text into
meaningful chunks.
Sometimes, you want to analyze
an entire document.
But a document is typically
comprised of paragraphs.
So, perhaps you want to analyze
every paragraph in a document.
And you can break it down even
further.
Paragraphs are made up of
sentences.
So, you could analyze sentences
within your paragraphs.
And finally, a more fine grain
analysis would be every single
word in a sentence.
And that is called as
Tokenization.
So, just to give you an example
of sentence level tokenization,
in this particular sentence,
Mister Tim Cook presided over
the earnings report of Apple
Inc.
If I told your machine one rule
based way to chunk this, is
every time you see a period,
chunk this into a sentence.
Is that right?
Wrong. Right?
You're going to incorrectly
hypothesize that there are three
sentences, here.
So, sentence tokenization really
offers you the right sort of
approach to chunk sentences.
Now, this becomes even more
complex in a language like
Chinese, which doesn't have
whitespace.
So, you only have a sequence of
characters.
And in order to do anything
meaningful from a machine
perspective, you ought to break
this down into words.
And that is word tokenization.
Now, let's say that we have the
first two fundamental building
blocks in our kitty.
So, we know how to do language
identification.
We know how to do tokenization.
Let's talk about doing more
complex analysis on the text.
So, the next piece of technology
is part of speech tagging.
So, what do I mean by part of
speech tagging?
It's pretty simple.
So, given a sequence of words,
the task of part of speech
tagging is to confer a part of
speech tag to every word in this
text.
So, if you look at this example,
here, then Cook is a person
name, presided is a word, over
is a preposition, and earnings
is a noun.
And so on.
Now, as an app developer, you
might think, how is this useful
to me?
So, perhaps you are building a
dictionary app.
Right. A dictionary definition
service.
So, maybe kids are reading
books.
And they want to look up the
definition of a particular word.
Let's pick a word like bear,
B-E-A-R.
Bear can either be a noun, or it
could be a verb.
So, when you click on a
particular word, if you know
that is an actual verb, you can
show the right definition of
that word.
Right. So, I think those are
ways in which part of speech
tagging can really help you at
the user level.
So, the next building block is
called as lemmatization.
Sounds very NLPish.
But I think we can break this
down.
Let's try to understand what a
lemma is.
Words can be in appearance
several inflected forms.
So, you can have the present
tense of a word, you can have
the past tense of a word, or you
can have a future tense of a
word.
But what's common to all these
forms?
The root. The root is common to
all these inflected forms.
And that is also called as a
lemma.
So, let's look at an example,
here.
If you were to look at the word,
presided, it's a verb.
And the root form of that word
is preside.
And similarly, if you look at
the word, hours, which is a
noun, the root form of that word
is hour.
Why is this important?
I mean, this looks rather
innocuous for a language like
English.
But trust me, for those of you
who've dealt with
morphologically complex
languages like Russian or
Turkish, lemmatization is
extremely important.
In a language where you have
almost unbounded vocabulary.
I mean, you may have like a
million or 2 million words in a
language, it's critical to break
down those words into lemmas and
suffixes.
And then, do operation for
smaller building blocks.
The last piece of this puzzle is
named entity recognition.
Again, it sounds very complex.
I mean, a lot of NLP lingo.
What is an entity?
What is recognition?
And so on.
But I mean, let's look at this
through an example, again.
So, named entity recognition is
nothing but detecting names
automatically from text.
And these names can fall into
different categories.
For example, it can be person
names.
It can be organization names or
location names.
In this example, Mister Tim Cook
is a person name, Apple Inc.
is an organization name.
And the task of named entity
recognition is to use machine
learning and linguistics
information to automatically tag
ranges of text with these tags.
Okay. So, wait.
I think we've kind of
established the basic
fundamental building blocks of
NLP.
How do we achieve all these
tasks?
That's where we come in.
So, we use a combination or a
blend of linguistics and machine
learning to drive all these
fundamental building blocks up.
And these in entirety constitute
our NLP APIs.
Right. So, some of you may be
going, ''That's a lot of
information.
I knew all that.'' Others are
like, ''Thank you, for the
information.'' While others are
like, ''Enough.
Just tell me how to use it.''
So, let's look at how to use
these NLP APIs.
So, all these NLP APIs are
available across all Apple
platforms through
NSLinguisticTagger.
So, some of you may be familiar
with NSLinguisticTagger.
Perhaps, you've already
incorporated as a part of your
apps, you're calling
NSLinguisticTagger for several
things.
For those who are not familiar
with NSLinguisticTagger, what is
it?
It's a class in Foundation.
It's used to segment and tag
text.
So, every tag, every task that
we described from language
identification to tokenization
to part of speech tagging,
lemmatization, named entity
recognition are all tag schemes
in NSLinguisticTagger.
So, you specify a particular tag
scheme.
You send text to it.
It performs analysis and gives
you an output.
That's what NSLinguisticTagger
does.
So, for more information about
NSLinguisticTagger, I encourage
you to go look at the developer
docs.
But let's try to focus on what's
new in NSLinguisticTagger.
We made significant improvement
to NSLinguisticTagger for this
release.
So, first one is tagging units.
So, the previous version of
NSLinguisticTagger would operate
only on words.
So, if you were to do any sort
of more complex analysis, as I
said, text can be broken down
into a document, into
paragraphs, into sentences, and
then words.
Just doing analysis at a word
level may not be optimal or may
not be sufficient for several
tasks.
So, the new version of
NSLinguisticTagger has different
units.
So, we can operate at the unit
that we'd like.
We can operate either at the
word level, sentence level,
paragraph level, or document
level.
Now, not all tag schemes are
available for all units.
Right. So, if I ask you the part
of speech tag for sentence, it
doesn't make sense.
You have to do part of speech
tagging on words.
So, in order to find out what
units and schemes are
compatible, we also have a new
nifty convenience API, called
availableTagSchemes.
All that you do here, is pass
the unit that you're interested
in, the language, and for that
combination of unit and language
you will get all the available
tag schemes.
So, in addition to these two
improvements, we've also
introduced a new API called
dominantLanguage.
For those of you who have kind
of found it difficult to perform
language identification using
NSLinguisticTagger, because it
operates only on a word level.
So, if I give you a piece of
text it is going to hypothesize
a language for every word.
So, if you only want the
language of the sentence, you
had to do some sort of a ugly
majority word thing and stuff in
your code.
So, all of that you can get rid
of.
You can have much cleaner code
base.
You just call this method
dominantLanguage, pass a string
to it, and it's going to give
you the language of the text.
In addition, specifically for
Swift 4, we've moved from
generic strings to named types
for tags, as well as tagSchemes.
And finally, we've made
significant improvements to the
underlying implementation of
NSLinguisticTagger.
The API interface is still the
same, but the entire
implementation underneath has
been [inaudible] from scratch.
Just to make it scalable.
So, the consequences of that is
you get improved performance, a
higher accuracy, as well as
support for a lot more
languages.
Right. So, that's good.
I think we've kind of
established what the fundamental
building blocks of NLP are.
We've talked about
NSLinguisticTagger.
But let's try to really delve
into the APIs.
But I don't want to just show
code.
I want to [inaudible] of this,
through two hypothetical apps
called Winnow and Whisk, W and W
if you get the pun.
So, the first app is Winnow.
It's a macOS app.
And the second is app Whisk,
which is an iOS app.
And both of these are
hypothetical apps.
So, let's talk about Winnow.
So, Winnow is a hypothetical app
that tags photos with
descriptions.
So, I take a lot of pictures of
family, friends, kids.
And every time I have a memory
associated with that picture.
And I'd like to leave an imprint
of that memory as a description
on the photo.
So, this imprint can be either a
speech recording I leave on the
photo.
Or it can be a text message I
write through my keyboard.
Or it could be a handwritten
note.
So, what Winnow does is given a
library of images, when I take
the image it gives you the
facility to add a description.
So, I have all these
descriptions.
So, it doesn't matter how these
descriptions got created.
Let's assume that these
descriptions are part of your
Winnow application.
And they can be in different
languages.
It can be multilingual.
And the objective is, as an app
developer, I've pushed this to
the App Store.
I'm getting a lot of traction.
I'm really happy with it.
But I want to add more features
to it.
So, the first thing I want to do
to my Winnow app is add a
functionality for searching.
Right. People want to search
things.
I mean, you've written a lot of
descriptions, and what are you
going to do with it?
You know, when you say something
like, kid's birthday, you want
to see all the pictures related
to that.
So, I start off doing a first
pass implementation for
searching.
A query, such as hike, goes into
my Winnow app.
And unfortunately, I get no
results.
Because there's not mention of a
work hike in all my
descriptions.
So, what we'd like to do is
improve the search experience
and solve this problem using the
power of NLP.
So, now if I were to type a
query such as hike, what I'd
like to see is all images that
contain all mentions, perhaps,
all the inflected forms of hike.
So, I see hiked, hikes, hiking,
So, these are all things that
are related to hike, but they
are just different inflected
forms.
And those of you, you know,
understood the first part of the
talk, what does this do?
This is basically lemmatization.
Because different inflected
forms have one root form.
So, let's try to see how to
implement this using NLP APIs.
So, we have a bunch of images,
the descriptions.
It goes into our Winnow app.
Now, we want to use NLP in the
middle.
So, what do we do, first?
We ought to do language
identification.
Because descriptions can be in
different languages.
Perhaps, a friend of yours sent
you a picture with a description
in French from France.
Right. Now, it's part of your
library.
And you want to search for
pictures.
So, once you do language
identification of all the
descriptions, we have to
tokenize the text.
And the tokenization can be
either word, sentence, and
paragraph.
Right. Because some of your
descriptions may be really long,
may span multiple sentences.
Then, we do part of speech
tagging.
And finally, lemmatization.
So, if we have all these
building blocks within the
Winnow app, we can get improved
search experience.
And that is our UI.
We'll see a demo of this in
action, soon.
But let me just go through some
sample code to tell you how easy
it is to use each of these
blocks in your apps.
So, language identification is
pretty much just three lines of
code.
You start off with importing
Foundation.
You create an instance of the
NSLinguisticTagger object, and
you specify a tag scheme.
For language identification, the
tag scheme is just language.
You set a string that you want
to analyze.
In this case, the string is
German.
And then, we call the method
dominantLanguage, that I just
described a few slides back.
On this object, viola, you get
the language of the text.
It's as simple as that.
And under the hood, there's
complex machine learning, there
are all kinds of models.
For you, you just get the
result.
And you can move on to improving
your app experience.
Let's look at tokenization.
Again, we start off with
creating an instance of the
NSLinguisticTagger object.
But now, instead of language as
a tag scheme, we specify
tokenType as a tag scheme.
We specify some text.
And we set the range of the
text.
So, NSLinguisticTagger is still
dealing with NSranges and we
hope to move to ranges as part
of the next release.
But for now, let's set the
entire range to be the length of
the string that we would like to
analyze.
Then, we subsequently set some
options.
In this particular case, I want
to omit punctuation, and I also
want to omit whitespace.
And then, finally, we enumerate
for every word.
So, for each word as we
enumerate, we can find the token
as a substring of the original
string.
So, once you have a token, you
can do whatever you want to do
with that particular token.
So, let's look at lemmatization.
Now, as you see the code
samples, you see sort of a
pattern.
It's very similar.
You again, create an instance of
the NSLinguisticTagger object.
You specify a particular tag
scheme.
In this case, it's lemma.
If you look at all the
fundamental building blocks that
we talked about, it's exactly
that.
Those building blocks are now
translated into tag schemes.
We specify some text.
We set the range of the text
that we'd like to analyze.
And again, we set some options.
Again, we want to omit
punctuation, as well as
whitespace.
And finally, we enumerate over
every word.
And as we enumerate over every
word, we'd like to find out what
the lemma for that particular
word is.
And once we have the lemma, we
can index this in a different
app.
We can use it in many different
ways.
So, let me now, turn it over to
Doug, to show a light
demonstration of Winnow in
action, with the power of NLP.
Over to you, Doug.
[ Applause ]
>> Okay. Thanks, Vivek.
So, what I have here, is the
first version of the Winnow app
that we wrote.
It's a very simple app.
It shows a gallery of photos.
Each photo has a description.
And we have search fields so we
can search for descriptions for
photos by the words in their
descriptions.
But this version of Winnow has a
problem.
It doesn't have any NLP in it.
So, if I were to type something
like hike, and search for that,
I'd get no results.
Even though there are photos in
here that are related to hiking.
I could type hikes, and I get
the photos whose description
have hikes in them.
Or hiking, I get photos whose
descriptions have hiking in
them.
Or hiked. But because there's no
NLP in it, the app has no idea
that these words are all
related.
So, what can we do about that?
Well, let's take a look at the
code.
So, here is our function at the
heart of the application that is
responsible for what it needs to
do, as far as indexing for
search.
So, this function takes a
string, and maybe it's a
description or search string,
and it converts it into a set of
words that are used for the
searching.
And the reason why it has this
behavior is this function is
very naive.
It just uses a standard string
method for writing substrings.
In this case, by words.
So, it gets all the words.
And the only thing it does with
those is to lowercase them, so,
it's case insensitive.
But no NLP.
Well, let's fix that.
I'm going to replace this with
something that should look very
familiar from the slides.
I going to create a linguistic
tagger.
In this case, I'm going to use
the lemma language schemes.
And set the string on it.
A little twist here.
This method here, has two
options.
One, the language, if it's
known, can be passed in.
In which case, we tell the
tagger what the language is.
And this is how you do that.
But if it's not known and not
passed in, then we just ask it,
using dominantLanguage, to
identify the language for us.
And then, we're going to
enumerate through, using the
lemma scheme.
And that will also, along the
way, give us tokenizations.
We get the token.
We're going to use that as one
of our words.
And we also, if we have a lemma,
we'll take that and use that as
one of our words.
By the way, you don't have to
memorize this.
This should be available to you
as sample code.
So, let's try that out.
Okay. So, now in this version of
Winnow, if I type hike, I should
get images whose description
contain hiking or hiked or
hikes.
Let me try another word.
Maybe, party.
This one has parties in it.
Partied, party.
And of course, this is
multilingual.
Let me try a French verb,
marcher, to walk.
Now, I am sure all of you
remember how to conjugate your
French verbs.
But I have a little trouble with
it.
So, I'm glad that I have NLP
APIs to remember that forms of
this verb include, in this case,
marchons, marchais, marchent.
They're all recognized as forms
of the same verb.
Or maybe in German.
The verb spielen, meaning to
play.
I can find images whose caption
contain speilen.
But also, past tense, gespielt.
And the NLP API knows that these
are forms of the same verbs,
they have the same lemma, so
they get found.
And that is Winnow.
[ Applause ]
Back to you, Vivek.
>> Thank you, Doug, for the
great demo.
So, I mentioned that we are
going to talk about two
hypothetical apps.
So, we've covered the first W.
Let's go on to the next W.
Which is Whisk.
So, Whisk is again, a
hypothetical app to collate
social media feeds.
So, I find it really hard,
because I have multiple social
media accounts.
And it's kind of painful to log
into each account, and then, go
look at the feeds, look at the
activity, comment on a bunch of
things.
So, imagine an app that can just
bring all of that in one place.
So, that's the objective of
Whisk.
So, I take social media feeds
from different places, from
different social media accounts.
And then, whisks it all together
into one interface.
So again, you're going to get
all these feeds in one place.
Now, the problem with Whisk is
it's doing very well at the app
store.
But unfortunately, it's kind of
all over the place.
I mean, the content is not very
easy to browse, because I see
some feeds from Pinterest.
I see some feeds from Facebook.
From Twitter.
It's all over the place.
I really want to organize Whisk
app even more, like increase the
user engagement.
And how can we do that?
So, what we'd like to do is kind
of organize the feeds in Whisk
app based on people,
organization, and locations that
you've been interested in, and
what you have been subscribed to
in your feeds.
How can we do that?
So, let's assume that we are
following a bunch of articles in
Twitter.
I'm following ten.
I'm also following Apple Music.
And based on all this content,
the text in all of these feeds,
we can use our NLP APIs, such as
named entity extraction and
automatically tag and extract
entities.
So, in the first feed, you can
see that Tim Cook, Apple, Stevie
Wonder, these are all entities
that are automatically extracted
using the NLP APIs.
Completely through machine
learning.
In the second instance, you're
seeing Pharrell.
And Pharrell is visiting NYU.
And those are two entities that
we extracted.
So, imagine if you had all these
entities, we can organize the
content and make the user
experience within Whisk,
significantly better.
And that's the objective, here.
So again, how will we accomplish
this using our NLP APIs?
So, we start off with some
feeds.
So, we assume that there's some
feed API that's sending us all
this information from different
social media accounts.
So, once you have this feed API,
and then you ingest it into
Whisk, we'd like to bring NLP to
the fore.
And what do you think is going
to be the first block, here?
Right. It's going to be language
identification, because feeds
can be in different languages.
You can have a feed that's
coming in that is German or
Russian or French.
In order to do the right sort of
analysis, you have to do
language identification.
So, once you do language
identification, you have to do
tokenization of the text.
Presumably, some of the feeds
are sentences, paragraphs or
documents, right.
So, you have to tokenize the
text.
And then, finally, you can call
the named entity extraction API,
in order to get the entities
from the text.
And that is how our app would
look.
So, let's look at a code sample,
again, to see how easy this is
to implement.
Again, the same pattern.
We start off with creating an
instance of NSLinguisticTagger
object.
Now, we specify a tag scheme to
be nameType.
So, we've gone through token
type.
We've gone through lemma,
language, and now, name type.
We set a string that we'd like
to analyze.
We set the range of the string,
which is the entire string.
And we set some options.
So, if you carefully observe, in
addition to omit punctuation and
omit whitespace, which we've
seen before.
We also have this option called
as joinedNames.
What is the reason for that?
Names can span multiple tokens.
So, in this example, Tim Cook is
a name that spans two tokens.
So, when we iterate through our
output we want to actually get
that as a person name.
So, we want to iterate over that
join token.
So, that's what join means to
us.
And then, we specify the tags
that we're interested in.
We are interested in person
name, place name, and
organization name.
And finally, you know how to
enumerate over the words.
And as you enumerate over the
words, you're going to get the
token types.
And if the token type has a
particular tag which is of
interest to us, we can get the
span of the text that belongs to
that category.
So now, let me turn it over to
Doug, again, to see a live
demonstration of Whisk in action
in XCode.
Doug, over to you.
>> Okay. So, here we are.
And we have Whisk running, at
least in the Simulator.
And it shows all our feeds.
We could go in and look at each
one.
But that's kind of boring.
What we really want is to
organize these things by named
entities.
So, let's hit the button.
Boom! It's gone through and
extracted all the name it can
find and listed them by an order
of frequency.
And indexed everything.
So, I go and pick Tim Cook.
Let's see all the mentions of
Tim Cook.
I can go and find one, here.
It's nicely highlighted for me.
Maybe. Here's another one.
Tim Cook. Or I can go back, pick
the next entity.
California, location name.
Here are all the mentions of
California.
So, I look and find California.
It's been found and highlighted
for me.
And so, this is Whisk.
Now, how does it work?
Well, let's take a look at the
code.
So, here is the important method
in Whisk.
This is the extractEntities
Method.
Takes piece of text and finds
all of the named entities that
we want in it.
Should start to look very
familiar, now.
We create a tagger.
We are interested in the
nameType scheme.
And we set the string on it.
We set some options.
We don't want whitespace or
punctuation.
We want names to be joined
together.
And then, we enumerate through
it.
And each case, if there is a
name tag and it's one of the
kinds we're interested in, that
is, person, place, organization
name.
Then we find the text in that
token.
And we create an instance of our
private namedEntity class, here,
using that token and tag and
range.
And so, very simple.
That's all there is to it.
That's all that's needed to go
through and extract the named
entities from this text.
Go back to you, Vivek.
[ Applause ]
>> Great. So, now you've seen
NLP APIs in action through two
hypothetical apps.
I want to delve deeper into what
are the benefits of these APIs?
I mean, you've seen it's easy to
use.
It's kind of very systematic to
use.
It has very similar patterns.
But beyond that, what are the
benefits?
The first is homogenous text
processing.
What do I mean by that?
Now these NLP APIs are
available, as I mentioned,
across all Apple platforms.
And as a user of these API, you
are going to get consistent text
processing across all platforms
and a consistent user
experience.
Furthermore, these are the APIs,
as I mentioned.
These are the same ones that we
used in our first party apps.
So, a user of your app is going
to get the same sort of
experience of any other Apple
app.
Let's talk about the second
benefit.
It's privacy.
All of the machine learning in
NLP that we've talked about,
happens completely on device.
As a user of this, everything is
on device and the user data does
not have to leave the device.
And that's great for you.
Right. You don't have to have a
cloud API.
You don't have to do anything.
Everything happens on device.
In addition to privacy, the
underlying implementation of
NSLinguisticTagger was also
completely revamped for this
release.
As a result of that, we've seen
significant improvements in
performance.
So, the code base is highly
optimized on device for all
platforms.
It's multithreaded, now.
And existing clients of
NSLinguisticTagger can see
significant speed up.
For instance, Chinese
tokenization is 30% faster on
iOS, and this was measured on
iPhone 7.
Named entity recognition is 80%
faster with the new code base.
And for those of you who have
not used NSLinguisticTagger in
the past, these relative
improvements don't really mean
much.
I mean, what is 30%, what is
80%?
It's all relative.
So, let's look at some raw
numbers.
So, I'm going to use a yellow
line to specify a thread.
So, if you look at part to
speech tagging on an iOS device
with a single thread, we can
process 50,000 tokens in a
second.
Everything on device, using
machine learning on device.
Named entity recognition, on the
other hand, we can process about
40,000 tokens per second.
Now, imagine for a minute, what
is the average length of an
article that you'd analyze or
read?
It's about 400, 500 words.
So, you can process hundreds of
articles, extract named
entities, in a second on an iOS
device.
And that's terrific.
And we are so excited about
this.
[ Applause ]
So, in addition to privacy and
performance, NSLinguisticTagger
also offers support across a
wide variety of languages.
For those of you who localize
your apps, this could be very
useful.
Language identification is
supported for 29 scripts and 52
languages.
Tokenization is supported for
all iOS and macOS system
languages.
Lemmatization, part to speech
tagging, and named entity
recognition is supported for
eight languages.
And everything other than
English was added for this
release.
And our English models have also
been significantly revamped.
And talking about accuracy.
So, those of you who really
talking, you know, our family of
machine learning, you've seen
all of the benefits of the API.
You know it works well.
It's easy to use.
The big question is, ''How
accurate are these
technologies?'' So, let's look
at accuracy.
It's a perfect segue.
So, these models, I'm showing
you only results for English and
Spanish, for brevity, here, work
remarkably well.
Our part to speech tagging model
for both English and Spanish,
achieves accuracy over 90%.
And this is on a tag set of
about 15 tags.
The exact tags that are being
supported for
NSLinguisticTagger, you can find
in the Apple Developer docs.
For named entity recognition,
our accuracies are in the mid
80s.
And that is state of the art.
Again, all of this on device,
using complex machine learning
techniques.
So, before we kind of wrap up
the talk, I'd like to spend a
few minutes talking about, you
know, giving you some debugging
hints.
Now, that you're kind of
familiar with how to use the
APIs, I'm sure you'll run into a
few, you know, issues.
So, one heads-up that I'd like
to give is, in case you run
these APIs and you get the part
to speech tagging output or
named entity recognition output,
they'll all be other word.
Which means that it doesn't
confer a person name or a place
name tag.
It might just be a consequence
of the models not being
downloaded onto your device.
What do I mean by that?
So, all of the part to speech
tagging and the named entity
recognition models are all
downloaded over the air across
the iOS platforms.
And why is that?
So, as you've heard, multiple
times, machine learning is all
about improving models with more
data.
So, what we would like to do is
revamp and train our models,
from time to time, so that
accuracy of our models is always
state of the art.
And then, we do that.
We want to push that model to
you as an over-the-air update,
as quickly as possible.
So, all these models are not
installed completely on disc.
So, they're all delivered OTA.
So, for iOS, how do you get the
models?
So, as soon as you install a
particular keyboard.
Let's say you install the French
keyboard, all the French assets
will get downloaded to your
device.
And similarly, for other
languages.
Now, the second hint I'd like to
give is if you know what the
language you're dealing with,
you can explicitly set the
language.
What do I mean by that?
So, let's take a string like
Hello.
And if you pass a string like
Hello, to the language
identification API, what do you
expect the language to be?
Hello, is a word that's used in
so many different languages.
Right. So, you can be smart
about how you leverage these
APIs.
And that like the really art of
NLP and machine learning.
The APIs automatically give you
a lot of information.
But you're also extremely smart
in how you can utilize it in
your own app.
So, if you know that language
explicitly for certain cases, it
might behoove to set that
language.
If a string is fairly long, then
use the APIs to get the
language.
So, it's a tradeoff.
Depending on the application,
you can choose how will you like
to use it.
So, in summary, we've talked
about our natural language
processing APIs.
And these APIs are available
through NSLinguisticTagger.
We've talked about support for
new units in this release.
So, in addition to just word
level enumeration, you can also
get sentence, paragraph, and
document level.
So, these sort of units are
going to become more and more
pertinent as we add more
functionalities to the APIs.
Because of significant revamping
of the code base itself, the
tagger is significantly faster.
It gives us higher accuracy.
And it also supports a lot more
languages.
So, for more information about
this talk, you can go to the
Session 208.
You can look at the transcripts.
You also have the sample
projects that Doug described,
both the Winnow and the Whisk
project.
But we do have a lot more in
store for you, at WWDC.
First, there are some related
sessions.
So, for those of you who
attended the Introduction to
Core ML session, yesterday,
there's a subsequent session for
Core ML in Depth.
That's tomorrow.
And we also have some very core
Accelerate and Sparse Solvers
which are like matrix
multiplication and low-level
stuff that you can attend on
Thursday.
NLP is a super interesting area
for us.
We're investing a lot of time
and effort into this area.
We really want to understand
what are the sort of APIs that
would make a difference for you?
And difference for our users?
Right. So, just solving
something for the sake of
solving it is not our objective.
We really want to hear from you.
We would like to hear your
feedback about what are the
problems that you face, with
respect to text classes?
And what are the sort of APIs
that would really make your life
better?
So, if we can find a good middle
ground for the features that we
develop and also, those that can
be exposed as APIs to you.
I think it's be great.
So, please get the conversation
started.
Come talk to us, as part of the
lab.
Tell us your problems and tell
us your interest.
And what you'd like to hear and
what you'd like us to do.
And we're all ears for it.
Thank you.
[ Applause ]