WWDC2018 Session 718

Transcript

[ Music ]
[ Applause ]
>> Privacy is about people.
My name is Joey Tyson.
I'm a privacy engineer at Apple,
and I know you've heard a lot of
great information this week on
exciting new features and you're
ready to get out there and build
some new apps, but I know you
also care deeply about your
user's privacy.
And this is the first of three
big ideas I want to explore with
you about privacy today before
we get into this year's updates.
Because we're going to talk a
lot about data privacy, but we
can never forget that data is
about people.
It belongs to them.
So when I say that privacy is
about people, I mean it's about
building a relationship of trust
with your users.
This lays a foundation for
better engagement, which leads
to better apps.
Think about other relationships.
It's the people that you trust
that you're more likely to work
with and spend time with.
And as your users understand why
you're collecting data, how it's
being used, as you handle that
data respectfully and
thoughtfully, you're going to
get better data, because they're
going to be more comfortable
using your apps and sharing
information, and this builds
loyalty over time.
And I start here because I want
you to apply this context to
your development process.
None of us do engineering in
isolation, so whether you're
working with health records or
just building a simple puzzle
game, the information you gather
and the ways that you use it
could have a very real impact on
people's lives.
So it's critical for each of us
to think carefully about the
technologies that we're
building.
In a recent commencement address
at Duke University, Tim Cook
talked about Apple's approach to
privacy, and he said that "In
every way, at every turn, the
question we ask ourselves is not
what can we do, but what should
we do."
And that leads me to the second
big idea, ask the "should"
questions.
No matter what your role,
whether you're a solo app
developer or part of a large
organization, you can be the one
to stand up for your users.
Remember your responsibility
towards them, and ask questions
about the data flows in your
app.
For example, why do we actually
need this data?
This isn't merely an accusation.
It's to think about is this
necessary for our use case?
Should we collect it?
Would this surprise our users?
If people understood this and it
scared them, why should we be
doing it at all?
Could we use less granular data,
less precise?
Are there other approaches we
should consider?
And should we delete or
aggregate this data after a
period of time?
Part of the reason for asking
these kinds of questions is that
we can all fall prey to
assumptions about our data.
So just thinking, we should just
log this for everyone.
Maybe that's the way we've
always done it in the past or
the way others do it, but again,
is it really necessary for what
we're trying to accomplish?
Or you might think that data
couldn't possibly be sensitive
in the context you're working,
but maybe that data is very
sensitive in a different context
or for users in a vulnerable
population.
If you're taking data that was
gathered for one purpose and
apply it in a new way, would
users understand or expect that?
You also hear people talk about
personally identifiable
information, or PII, but even
data that falls outside that
definition can still have a
privacy impact.
Just like if you're protecting
data with encryption and good
security, that's wonderful, but
privacy is so much more than
that.
Because this again doesn't get
back to the should questions of
should we even have this data at
all.
Now as you're asking questions
about the data flows in your
app, if you want to even go one
step further, you can create
privacy guarantees.
These are high-level statements
about the privacy expectations
in your app that you want to be
able to make.
And by establishing these early
on in the development process,
it provides a framework to guide
you as you're building your
features and something to test
against once you're done.
There's some examples on the
screen here of these kinds of
statements that are similar to
statements Apple has made about
some of our features, and
there's many options for
implementing each of these,
which brings me to the third big
idea.
Align your data practices with
your use cases.
To illustrate this, let's think
of data collection, the amount
and types of data that you
gather.
We mentioned earlier those
assumptions.
You know, you may think
sometimes well shouldn't I just
gather as much data as possible?
Well, you know, in the past you
may have hear people call data
the fuel of the new economy.
Like actual fuel, data should be
handled with caution.
Because data is very powerful,
and that unlocks a lot of great
use cases, but because it's so
powerful, it can also be
dangerous if not handled
carefully.
Gathering data creates overhead
for you as an engineer.
You're going to need to spend
time and resources managing that
data, keeping up with it,
filtering it.
It's time you could be spending
working on new features for your
users.
It also creates liabilities.
We've all heard about companies
suffering data breaches, which
is a bad situation.
But if the data that gets leaked
includes information that's not
relevant to the use case, that's
an even worse situation.
Unexpected data collection
creates all sorts of risks.
And it destroys that foundation
of trust with your users.
The next time you think about
gathering as much data as
possible, I want you to picture
these tanks of chemicals and
remember your responsibility to
your users to handle their data
carefully and thoughtfully.
Instead, you want to practice
what we call proportional data
collection.
This is the idea of collecting
only what's necessary to achieve
your goal, and again sometimes
you might start off thinking
that you need a lot of
information when a different
dataset may suffice.
You can even start with the
assumption of no data and figure
out what's actually necessary
for what you're trying to solve.
This gets back to user
expectations.
People should understand why
you're collecting data and how
you're using it.
It should be in line with what
they expect.
You should always be able to
provide a clear rationale for
the use cases that you're
building.
But of course, this is about
data collection, but when we
talk about aligning data
practices with use cases, that
extends to the entire data life
cycle and being good stewards of
the information that's been
entrusted to you.
So even beyond just proportional
data collection, you want to
develop and use privacy
techniques throughout your app's
workflow.
You could develop a whole
toolbox or repertoire of
techniques that will help you
build privacy into your app.
Things like aggregation,
providing transparency to users,
using a scoped identifier
instead of a real identity,
automatically rotating those
with time.
Even more advanced techniques
like differential privacy.
I don't have time today to go
into the entire list of
techniques available, but what I
want to focus on now is the idea
of adjusting these to match your
use case.
You can think of a mixing board
for music.
If you have one track that's
particularly loud or soft, you
may need to adjust others to
balance things out and achieve a
good mix.
And again, there will be times
when you do need to collect a
lot of data for a particular
feature, but in those cases, you
want to make sure you adjust
those privacy techniques to
create a great experience for
your users.
Ideally, these apply across all
systems where the data lives so
those privacy guarantees stay
consistent and are billed as
technical enforcement rather
than just policy statements
about what you plan to do.
But I know this can all be a
little abstract, so to
illustrate further, I want to
share some features that Apple
has built where we've this kind
of thinking.
First is activity sharing where
you can share fitness data with
your friends.
Now for me as a privacy
engineer, I like to turn all of
these sliders up to 100 percent
as much as possible, but that's
not always feasible for a given
feature.
In this case, you're sharing
data with friends, so they know
your name.
They know whose data it is.
So you can't just make this data
de-identified.
It's already very identifiable
as part of the use case.
Consequently, we turn up other
privacy techniques like only
showing a summary of the data,
not minute-by-minute statistics
or the exact location of your
run.
We also provide a lot of control
over who you share with and
when.
Now, in the Apple News app, we
collect analytics data using a
scoped identifier that's not
connected to your Apple ID.
That gives us more flexibility
around the precision of data we
collect, but since it's still
sensitive information, we still
provide control through things
like being able to reset that
identifier at any time.
Finally, there's photo memories.
You may have seen these on your
device.
These use facial recognition
data to identify people in
pictures.
It also uses precise location
information to connect similar
photos together.
Now that's very sensitive data.
So consequently, there's another
privacy technique we turn way
up.
All the processing to build
these memories happens on your
device.
And by the way, that's a great
tool for your toolbox to do
processing locally.
So to recap, three big ideas.
Privacy is about people, ask the
"should" questions, and align
your data practices with your
use cases.
In the time we have remaining, I
want to talk through some
features and tools that are
available to you as developers
to help you build privacy in
your app.
And these fall into two general
categories of accessing user
data and more broadly, data
stewardship.
So for data access, let's start
by talking about iOS, and much
of this guidance will apply to
tvOS and watchOS as well.
Let's imagine you're building a
game for iOS where players can
compete against each other, and
you want them to be able to
upload a photo to identify
themselves.
Now we've all seen those
permission prompts for access to
photos, but wouldn't it be great
if we could just have the user
click a button, select a
picture, and have it appear
immediately in their app?
Well, you can already do this
today.
Because we have a feature
available called out-of-process
pickers for contacts, camera,
and photos data, where the
picker that appears runs outside
your apps process so the only
information that's shared back
with the app is what the user
selects.
We talked about those privacy
techniques.
This is a case where because
this doesn't involve ongoing
access to the entire library,
the user has control by picking
what they share, so we don't
need to show a permission prompt
and ask them to make a decision
about future access.
This is the default method for
accessing contacts, camera, and
photos data.
There are going to be times
where you may need access to the
broader library, but in most
situations, you'll find that
this works great for a whole lot
of apps.
This is a case where it does
just work and only requires a
few lines of code.
You can see some snippets here
for how to call these pickers in
your app.
Now as I said, there are going
to be some times where you do
need access to the broader data,
and as you know, there's a
variety of protected resources
available on the device, but if
you're using one of these APIs,
you need to keep in mind three
things before you start
requesting access.
You should only request access
to the data that's necessary for
your use case in your app.
If you don't actually need the
entire library of information,
rather than requesting it, you
should look for an alternative
solution, like those
out-of-process pickers.
You should only make these
requests when it's needed.
You want it to be in the moment
when a user makes a decision not
when they first open the app and
they're bombarded with questions
that they don't even understand
yet.
You want the prompt to be in
context.
But also you want to rely only
on the API for status.
Remember, a user can revoke
their decision at any time.
You just want make sure your app
still functions regardless of
what the user had decided.
Now when you request access, as
you know, you need to include a
purpose string or a usage
description.
And when I say you need to,
these are required.
You'll find your app rejected in
app review without these, and in
fact, you'll find your apps
start crashing if you try to
access this data without a
purpose string.
Now this is one way of providing
transparency to users.
It's certainly not the only way.
By that, I don't mean showing a
fig prompt before the real one
to get them to click.
I mean this should be part of
the overall program of informing
your users about data flows in
your app.
The goal here is to explain the
reason for a request so that a
user can make an informed,
effective decision based on
their priorities.
When I say explain the reason,
this is not what I'm talking
about.
And we've seen purpose strings
like this in the past, but
again, this is going to lead to
rejections in app review.
We're increasingly enforcing
quality purpose strings both
through automated validation and
manual review.
Placeholders or blank strings
are not going to be sufficient.
Just saying advertising doesn't
tell a user much.
Requires location doesn't
explain the why.
Even this last one about more
relevant content, that's nice,
but it's pretty vague.
And when you look at our own
maps app, when it requests
location, this is what you see.
It will be displayed on the map
and used for directions, nearby
search results, and travel
times.
So this is explaining the reason
the use case for this request.
It's specific, includes examples
of how the data will be used.
The TV app also, this is another
one that Apple wrote using your
location to determine what's
available to you and show you
live games, events, and news
from your area.
Remember, if users understand
why they're being asked for this
data, they're going to be more
likely to allow it.
If you were building a transit
app for a subway system, you
might write something like this.
This app uses your location to
show nearby stops and stations
and allows you to plan trips
from your current location.
Again, explain the use case, be
specific, provide an example.
Some additional guidelines to
remember when you're working
with protected resources on a
device.
Access should not be required
for your app to function.
Again, this can lead to
rejections in the app review
process.
Your app should have graceful
fallback mechanisms so that even
if a user declines access, it
still functions.
For example, with that transit
app, if the user denies location
access, you can have a field
where they can enter location
manually and use that instead.
Again, you want to verify the
authorization status of your app
whenever it needs this data to
make sure that the user still is
getting access, and stay aware
of third-part STKs.
Again, requesting access should
only be for the use cases of
your app, and if you're
including libraries that change
that or tell you to set purpose
strings, you should probably
look for a different solution or
update your code.
Finally, you want to provide
ongoing transparency.
Again, this should not be the
only time that a user gets to
understand how their data is
being used.
You can do this in a privacy
policy, which is required for
apps in the App Store, or
throughout your app you can
provide links and documentation
to help users understand how
their data is being used.
Now this is all broader guidance
around these kinds of APIs, but
there's a few specific ones I
want to highlight with changes
this year.
The first is for WiFi network
information.
If your app uses see and copy
current network info, you're
going to need to add an
entitlement,
AccessWiFiInformation in Xcode.
This is a capability that you
can enable for your app.
For example, if your app is
communicating with a hardware
accessory and needs to verify if
they're on the same network, you
would need this.
If you're not doing that use
case, you don't need to worry
about this.
This is only if it's necessary
for the functionality of your
app.
You may have also heard about
our new Health Records API.
Because we know that developers
have a lot of great ideas around
building apps using health data,
but we also recognize that's
very sensitive information.
So again, adjusting those
privacy techniques rather than
just a simple permission prompt,
we provide a greater
transparency and control for the
user.
You can see there's a purpose
string, a link to a privacy
policy, which should be a
website or if it's in your app
available while signed out.
There's also controls over what
categories of data to share,
whether it's current or current
and future data, and there's a
session from Tuesday you can
refer to that provides a lot
more information about how to
use this API in your app.
Finally, for iOS, I wanted to
share a quick update on how
Apple's been using the technique
of differential privacy to
improve our apps and devices.
The first new use case this year
is for commonly misspelled
words.
If you're typing and correct
yourself, devices that are opted
in to device analytics will now
donate data on those words to
help us improve our keyboard
algorithms even further while
protecting user privacy.
Also, for Safari, since last
year, we've added the ability
for those devices to donate data
around websites that typically
cause crashes to improve the
stability of the browser.
So that's iOS.
Let's talk for a moment about
macOS as well.
Because as you may have heard,
we've made some changes to how
protected resources are handled
on macOS.
It's an expanded list of
categories of data where there
are protections in place, and
these can now trigger a
permission prompt or for some of
these an opt-in through system
preferences.
I just want to highlight this so
that if you're developing for
the Mac, I want you to be aware
of some of these changes because
you need to know how they're
going to affect your app if
you're accessing these
resources.
Again, since these can trigger a
permission prompt, you want to
know when that's going to happen
so your users aren't surprised,
and please note, this applies to
all third-party app processes
including those outside of the
app store.
Just like with iOS, you'll need
to set a purpose string for
these permission prompts as
well, and again there's a
session from Tuesday that goes
into a lot more detail on how
this works for your app.
Now to talk about accessing data
on the web, I'm going to turn it
over to my fellow privacy
engineer, Brandon Van Ryswyk.
[ Applause ]
>> Thanks Joey.
The web is one of the largest
venues for data access today.
If your business depends on
providing content on third-party
websites, this section is for
you.
This year we introduced the
Storage Access API.
The Storage Access API allows
users to engage with logged-in
content from embedded third
parties across the web including
from domains that have been
classified as a tracker by
intelligent tracking prevention.
Now the Storage Access API does
this only with the user's
explicit consent.
Let's go through an example.
Here the user is browsing a news
site, news.example, and the news
site has an embedded video
player from video.example.
Now the user has a paid account
on the video site and would like
to grant the embedded video
access to its cookies so that
they can enjoy the benefits of
their subscription while reading
the news.
To accomplish this,
video.example needs to implement
the Storage Access API.
Video.example should add a call
to the Storage Access API when
the user clicks the play button
in their app.
Now this is an asynchronous API
that will return a promise, so
you should be prepared to handle
successes as well as failures.
So when a user clicks the play
button, this will kick off a
request, which will result in a
prompt asking the user if they
would like to grant
video.example access to its
cookies while embedded in
news.example.
If the user clicks allow, this
choice will be sticky, and the
user won't be prompted again on
this combination of domains.
But if the user clicks deny, the
site can always reprompt.
Let's assume the user clicked
allow.
The request will go through, and
cookies will be returned to the
embedded site.
Now, this could create a
tracking risk as now
video.example has their users
logged in identity associated
with their presence on this news
site.
Now this is especially important
given changes to intelligent
tracking prevention this year.
Now outside of the user consent
provided via the Storage Access
API, cookies from domains that
are classified as trackers will
be partitioned immediately and
can never be used in a
third-party context.
Additionally, after 30 days
without user involvement, these
cookies will be purged entirely.
Now importantly here, access via
the Storage Access API will
count towards this 30-day
interaction timer.
That means that users who
interact frequently with your
site in a third-party context
will stay logged in.
So in this sequence, the user
will visit video.example both in
a first-party context, logged
into the home page, and in a
third-party context, where it's
embedded across the web.
So first, the user visits the
site in a first-party context.
Now notice that the days since
interaction timer will read zero
as the user is currently
interacting with the site.
But as the user interacts with
the embedded content throughout
their web browsing, the timer
will update.
This means that when the user
returns in a first-party
context, the days since
interaction timer will read 5
days despite there being 45 days
since the user was last on
video.example in a first-party
context.
Adopting the Storage Access API
will allow your users to stay
logged in and prevent unwanted
tracking.
Now privacy does not end with
gaining access to a user's data.
Privacy is a continued
obligation to your users to
maintain their trust throughout
the data's lifetime.
This is where data stewardship
begins.
Now I want to give you examples
from four areas of data
stewardship for you to think
about when developing your apps.
First is deletion.
Part of being a good data
steward is respecting your
user's intent to delete
something from your app.
So you should recognize that
there are data flows that go
outside your app, and you should
ensure consistency between these
systems when the user deletes
something in your app.
Now the operating system doesn't
know what's happening inside
your app, so if you've donated
information to Siri Shortcuts or
posted a notification, you
should make sure that you delete
that content when the user
removes it from your app.
For example, if a user deletes
someone in your app's contact
list, Siri Suggestions should
not suggest them to message
using your app.
Or if a user deletes a thread in
your messaging application, you
should delete the notifications
for that content as well as it
would be unexpected for a user
to see notifications still on
their device from a thread they
thought they'd completely
deleted.
And finally, if you're a
passwords manager, and you've
donated passwords to the New
System Passwords API, you should
make sure to delete this
information if the user removes
the site from your password
manager.
Now data stewardship continues
through to device tracking,
which means something very
specific in the context I'll
talk about today.
You might have questions about
the devices that use your apps.
For example, did this device
already consume a free trial, or
was this device previously used
by an abusive user or for
fraudulent activities.
We offer an API called
DeviceCheck that allows you to
answer these questions.
DeviceCheck lets you set two
bits of data per device, which
are stored by Apple and can be
returned to you with a
signature.
These bits persist across a
device reset or a device erase
install.
These bits provide
high-integrity answers to your
questions about a device's
history without exposing unique
device identifiers.
Now, you should adopt
DeviceCheck and not rely on
unsupported device tracking
mechanisms like finger printing.
As Craig said in the keynote, we
continue to remove entropy from
our platform and to remove
functionality that is being
abused to uniquely identify
users.
So you should adopt DeviceCheck
to answer your questions while
being a good data steward.
Now in addition to being a good
data steward in your own apps,
you should consider your
third-party partners as well.
Now, you as developers are
responsible for all of the code
that chips in your app.
This includes code that you've
written but also code that
you've imported in the form of a
library.
So you should understand how
these libraries that you import
access or transfer your user's
data off the device.
This way you can be complete
when giving transparency.
Don't just talk about the code
that you wrote.
You should describe the full
impact on a user's privacy.
And as Joey mentioned earlier,
you should avoid unnecessary
requests for resources.
So, for example, if a library
that you want to use requires an
entitlement or access to some
other sensitive resource that
isn't required from the
functionality you're trying to
get out of this library, you
should either find a different
library or reach out to the
developer of that library and
ask them to remove that
sensitive resource request.
Now thinking about third-parties
extends to your server side as
well.
You should understand how data
flows to all third parties your
server's touch.
Now this includes the full
breadth of systems that support
your app, not just analytics or
advertising networks but the
network security systems, the
email provider that sends
customers password reset emails
or third-party customer support
integrations.
Being a good data steward means
taking responsibility for the
full picture and considering
privacy when considering
partners to work with.
Now for a topic you may have
heard a little bit about,
machine learning.
So across the industry, much of
the talk about machine learning
centers around the performance
characteristics of a new
algorithm or the power of a
cloud-based solution.
And while these are important
technical developments, as Tim
said in his commencement
address, the question that we
ask is not what can we do but
what should we do.
Now this applies particularly to
machine learning, and we've been
working on it for years.
Face ID was built with
privacy-friendly machine
learning at its core.
And we've made it easy for you
to add Face ID authentication to
your apps using the
LocalAuthentication API.
You can take advantage of the
work Apple has done to build
strong biometric authentication
using privacy friendly machine
learning techniques.
Similarly, ARKit uses machine
learning to model the
environment around a user's
device.
And new in ARKit 2, you can
create, persist, and store this
map of the environment in your
own apps, but you should collect
this map only if it's needed for
your feature as this data might
be quite sensitive.
It comprises a representation of
what's around a user.
So if you send it off the
device, it should be expected.
Like if you're playing a game
collaboratively with a shared
object.
And if you use Game Center, you
can take advantage of the
MultipeerConnectivity API, which
supports end-to-end encryption
to transfer these models between
devices.
Now Face ID and ARKit are good
examples of features that Apple
has developed that depend on
privacy-friendly machine
learning that you can use in
your apps.
But many of you want more
flexibility.
Create ML and Core ML allow you
to build your own features on
top of machine learning.
With Create ML and Core ML, it
is easier than ever to add
on-device machine learning to
your apps.
Create ML allows you to train a
machine learning model directly
on your Mac, and Core ML lets
you then take this model and
evaluate it directly on a user's
device.
This avoids collecting sensitive
user data to evaluate the model,
and protecting sensitive data on
your servers requires a lot of
engineering work.
Evaluating a model on a user's
device can lower your server's
security requirements and will
lower your breach risk as you
don't hold this sensitive
information in the first place.
Now these two APIs make adoption
of on-device machine learning
easy.
And you should already be asking
privacy questions when
developing features like the
ones Joey went over earlier, and
these questions are great, and
Joey and I use them every day
doing feature reviews at Apple.
However, machine learning
requires you to ask a new set of
questions to address the same
underlying privacy goals.
For example, you should ask does
my model reveal anything about
the data it was trained on?
It's actually possible to invert
a machine learned model and
recover much of the data it was
trained on.
This could result in unexpected
disclosure if you ship a model
with your app and it's inverted
to expose information about
people it was trained on.
Now this is an area of active
academic research that you can
learn more about in the paper
that I've put on this slide.
Similarly, you should ask, could
I infer more about my users than
they expected?
So users might expect that you'd
classify activity type via
sensor data.
But you should ask, did I
accidentally encode the fact
that this specific user uses a
wheelchair?
It could be great to offer a
feature for wheelchair users,
but this should be clear and
sold as a feature.
As with general data collection,
you should obtain new consent if
you have a new use case enabled
by machine learning.
Now it turns out that two small
modifications can help mitigate
both of these issues.
The first is to ensure that you
train on the right data.
This means training on a
sufficient quantity of diverse
inputs that were collected with
the proper consent.
The second is to keep your model
complexity proportional to the
goal that you are trying to
solve.
Both of these techniques can
prevent model overfitting, which
makes a model inversion or
unexpected inference more
likely.
Now at Apple, we believe that
considering questions like these
are an important part of
building products that users can
trust.
Because fundamentally privacy is
about people.
It's about building trust with
your users and respecting your
users in handling their data.
By applying the techniques that
we've gone over in this
presentation, you too can build
products with great features and
great privacy.
Now in summary, I hope you take
away three big ideas about
privacy.
That privacy is about people,
that you have to ask the should
questions, and that you should
align your data practices with
the use cases you're trying to
solve.
For more information, please
check out these sessions, and I
look forward to seeing you at
our lab after this session so
that we can help you build
better apps through better
privacy.
Thank you.
[ Applause ]