WWDC2019 Session 202

Transcript

[ Music ]
[ Applause ]
>> Good morning.
[ Applause ]
My name's Nick Gillett.
I'm an engineer here at Apple on
the Core Data team, and it is my
pleasure to welcome you to Using
Core Data with CloudKit.
Today, we're here to talk about
a belief that I have that all of
my data should be available to
me on whatever device I have,
wherever I am in the world.
And to make this a reality, we
have to make it a lot easier for
you to add this functionality to
your applications.
I'm sure many of you came in
today with an iPhone, and you
may also have a Mac at home.
You may even be toting around a
MacBook or a MacBook Pro in your
backpack while you're here at
the conference.
And the data that we create on
all of these devices is
naturally trapped.
Right?
There's no easy way for us to
move it from one device to
another without some type of
user interaction.
Now, to solve this, we typically
want to turn to cloud storage.
Because it offers us the promise
of moving data that's on one
device seamlessly and
transparently to all of the
other devices we own.
And cloud storage has benefits
even if we only happen to have a
single device.
Right?
As our applications create data
on this device, it can be backed
up in the cloud and stored.
So that when we get a new
device, whether by choice or by
accident -- ask me how I know --
on the way out of the store,
that device can be restored to
the one that we always knew and
loved.
And it may surprise you to learn
that there are some existing
technologies on our platforms
that can help us with this
problem.
For example, Core Data provides
a robust set of APIs for
managing data locally in your
applications and on disc.
And the CloudKit framework
provides access to one of the
world's largest distributed
databases.
Both of these frameworks are
available on all of our Apple
platforms.
And because of that, enable us
to build a wide variety of
applications.
In fact, these frameworks are
actually quite similar.
They even express themselves
using a common set of patterns
and paradigms.
Modeling their APIs in terms of
objects, models, and stores.
Now, in Core Data, we call these
objects instances of
NSManagedObject, and they're
what gives our application
access to values that we store
on disc.
CloudKit similarly exposes a
CKRecord, which is a key value
like store for accessing data
that you've stored in the cloud.
These objects are described by
what we like to call a model.
And in Core Data we call that an
NSManagedObjectModel, which you
can create in code or using the
model editor in Xcode.
Quite similarly, CloudKit uses a
Schema.
And the CloudKit schema can be
defined either by CloudKit
dynamically as you use CKRecords
in the development environment
or using the CloudKit dashboard.
Finally, objects are persisted
to use the Core Data vernacular,
in what we like to call a store.
In Core Data those are instances
of NSPersistentStore.
But in CloudKit, CKRecord's are
stored in a CKRecordZone or in a
CKDatabase.
And so, as you can see and
indeed as many of you have
pointed out to me over the
years, it would be great if we
could make it easier to combine
these two conceptually similar
frameworks.
And so to show you just how much
easier we've made that this
year, I'd like to take you
through what it's like to create
a brand new application in
Xcode.
Here you can see, I have Xcode
open.
And I'm going to create a new
iOS project as a Master Detail
application.
I like Master Detail
applications because they give
us a great UI to build on top of
when exploring features of Core
Data.
So, I'll select it and click
Next.
And then give my application a
name.
In this case, just WWDC Demo.
And, because we're here to hear
about Core Data, we'll check the
Core Data checkbox.
New in Xcode 11 is this checkbox
called Use CloudKit.
And this tells Xcode that we
want to generate an application
that's designed to use both Core
Data and CloudKit.
So, let's check that, and then
we'll click Next and find our
application somewhere to live on
the file system.
After we tap Create, Xcode
generates the application for
us.
And if you've ever built a
CloudKit application before,
you'll know that there are a
couple additional things we need
to add to this before it's ready
to build and run.
These come in the form of
Capabilities, which we add using
the Signing & Capabilities tab.
We need to add two.
The first is the iCloud
capability.
So, I'll add that by clicking
this plus (+) next to
Capability, type iCloud, and hit
Enter.
When I do that, I can hit the
CloudKit checkbox and you'll see
that that adds push
notifications for me as well.
Xcode has also automatically
created an iCloud container
identifier for my application to
use.
Next, we want to add a
background mode capability.
And to do that, I'll hit the
plus (+) again, and type
Background, and hit Enter.
And the reason we do this is to
enable remote notifications,
which allow our application to
receive push notifications when
it's not running.
So, let's run this application
and see what Xcode has created
for us.
Here you can see that we have a
very simple application with two
view controllers.
A table view on the left and a
detail view controller on the
right.
And I can add some data to this
application using the plus (+)
button in the upper right-hand
corner.
By default, Xcode generates a
very simple Core Data data model
for us that's just a simple
timestamp.
But, we're here to see what it's
like to add sync functionality.
And to do that, we need another
device.
So, let's run the application on
our iPhone.
And you can see that we have the
master view controller and all
of the data that we added on our
iPad.
Now let's add some data using
our phone.
And of course, watch that sync
back to the iPad.
Now, because I have a magic push
notification in this controller,
I can make this happen whenever
I want.
But let's see something that's a
little bit more realistic.
I'm going to delete all the data
from this application that was
added on my iPhone from the
iPad.
These top four rows.
And then, I'll do the same thing
from the iPhone, deleting all
the data that was added from the
iPad.
Now, previously I used a magic
push notification.
But I'm not going to touch the
controller anymore.
I'll just let this video play so
that you can see what it was
like to actually watch these two
devices converge when I filmed
this.
Right?
Now, as contrived as that might
be, it's pretty awesome that in
a few simple clicks we've built
an application that syncs end to
end using both Core Data and
CloudKit.
And I'd be doing you a
disservice if somewhere in the
application delegate I'd hidden
15,000 lines of code.
So, let's see what that looks
like.
Now, this is a pretty standard
application delegate.
In fact, if you've ever built a
Core Data application before
using Xcode, it'll look very
familiar to you, including this
area where we set up the Core
Data stack.
The only thing that's different
about this application is some
new API in Core Data this year
called
NSPersistentCloudKitContainer,
which is designed to help you
manage Core Data stores that are
backed by a CloudKit database.
Now, if you've ever built a Core
Data application before using
Xcode you will have seen
NSPersistentContainer here
instead, which is
NSPersistentCloudKitContainer's
superclass.
Because of that, you can add
CloudKit functionality to your
existing Core Data applications
by changing as little as one
line of code.
What exactly is
NSPersistentCloudKitContainer?
Well, it's an encapsulation of a
set of really common patterns
that we saw everyone have to
build when they wanted to
implement end-to-end sync using
CloudKit.
And it's designed to save you
thousands of lines of code.
It's also a foundation that we
hope to be able to build on with
you iteratively in the years to
come.
And of course, to do that -- get
ready, here it comes -- we need
your help.
We're going to need feedback
about how
NSPersistentCloudKitContainer
works for you and what features
it's missing, or whether or not
its existing features are enough
to meet the needs of your
applications.
So, let's look at a little bit
-- a few of those features in
detail.
NSPersistentCloudKitContainer
provides your application with a
local replica.
A complete mirror if you will of
the Core Data -- or the CloudKit
database that backs it.
And, it also implements a robust
scheduling and error recovery
event loop so that your
application doesn't have to
worry about any operations.
Finally, it handles the
transformation between instances
of NSManagedObject, that I
mentioned earlier, and CKRecord.
Now, a local replica is
important for a number of
reasons.
But it means that as your
application works with objects,
it will be writing them to a
local store file managed by Core
Data.
And, it will be reading them
from that file as well.
And this is because of the
difference between the
performance profile of the local
database versus cloud storage.
Right?
When we think in terms of
latency of accessing a local
file, we can read from a file on
disc in milliseconds at the
worst case.
Whereas, over the network, it
might take us seconds or minutes
to fetch the data that our
application needs.
Likewise, the local store file
can provide your application
with much higher bandwidth,
measured even on our iPhone
devices in gigabytes per second,
whereas the cloud is sometimes
limited to megabytes or
kilobytes per second of
available bandwidth.
Now, this local replica
necessarily adds a significant
amount of complexity, and that's
why
NSPersistentCloudKitContainer
implements a robust scheduling
and error recovery event loop
for you.
So that as your application
writes data to the local store,
NSPersistentCloudKitContainer
automatically moves those
objects up to the cloud.
And, when something changes in
CloudKit,
NSPersistentCloudKitContainer
will schedule work on the system
to bring those objects down and
import them into your local
database, making them available
to your application.
Of course, during this process,
your objects are necessarily
transformed from instances of
NSManagedObject into instances
of CKRecord by
NSPersistentCloudKitContainer.
And likewise, when something
changes in the cloud, those
CKRecord's will be made
available to you as instances of
NSManagedObject in your local
store file.
And that's what
NSPersistentCloudKitContainer
brings to your application.
It's a complete local replica of
everything in the private
database in CloudKit.
You should know, though, that we
do manage a specific custom zone
for Core Data syncing.
We implement automatic
scheduling so that you don't
have to worry about optimizing
any operations or scheduling
them in the system.
And, I think more importantly,
you don't have to worry about
implementing any error recovery
logic in your application.
Finally, we implement automatic
serialization from
NSManagedObject to CKRecord.
And we use your NSManagedObject
model to figure out how to do
this.
But it'd be obtuse of me to
stand up here and tell you that
once you've adopted
NSPersistentCloudKitContainer,
you're done, and you have a
fully functioning application.
So, I'd like to spend the rest
of the talk thinking about what
it means for you to build on top
of
NSPersistentCloudKitContainer.
And I think that starts with
building great applications that
are powered by Core Data.
Later, we'll also look at how
you can extend the foundation
that we've built into
NSPersistentCloudKitContainer to
customize it to your needs.
Now, for me, building great
applications with Core Data
starts by absorbing a lot of
knowledge.
And, to that end, we've written
a ton of documentation this year
about how
NSPersistentCloudKitContainer
works and how you can integrate
it into your applications.
There are some features of Core
Data that I feel go really well
with
NSPersistentCloudKitContainer.
Things like the
FetchResultsController, which
helps you build scalable user
interfaces backed by a
significant amount of data.
And query generations which help
you stabilize those user
interfaces against changes that
might be happening in the
background.
Like say from
NSPersistentCloudKitContainer.
And finally, there's history
tracking, which we introduced a
few years ago to help you
understand what's changed in the
database.
And with
NSPersistentCloudKitContainer,
you can use it to decide whether
or not any of those background
updates are relevant to what
your user's currently doing.
We'll be covering these and a
whole lot more in our session on
Thursday at 3 pm.
And, to go along with that,
we're also introducing a new
sample application this year
that's designed to give you
something to hold in your hands,
to feel how
NSPersistentCloudKitContainer
works along with all these other
features of Core Data.
It's designed to manage a set of
posts.
And posts are a theme that we've
been building on over the last
few years in Core Data.
Right?
They're a great object for
helping us understand how
different pieces of your object
graph will be affected by
CloudKit.
Here you can see that our data
model is pretty simple.
Right?
We have a title and some
content.
And then, we have a set of tags
that we can associate to each
post.
The application will even let
you see what it's like to manage
photos with CloudKit.
Allowing you to attach files
from the device's Photo Library
to a post.
Now, let's talk about what it's
like to build on top of
NSPersistentCloudKitContainer.
And, as you might expect, this
is a fairly dense section of the
talk.
But you should know that our
documentation covers in more
detail a lot of things that
we're going to talk about today.
So, don't worry if some of this
goes right out the window.
There are a number of ways that
we see clients internally extend
NSPersistentCloudKitContainer,
beginning with working with
multiple stores.
And, we also see that clients
like to customize the schema
that they've been using with
CloudKit, and indeed, the one
that we use with
NSPersistentCloudKitContainer.
This of course, as you may know,
is because CloudKit is available
on a number of different
platforms, not only at Apple but
also using Web Services or
JavaScript.
And so, you should be able to
work with
NSPersistentCloudKitContainer's
objects even if you're not doing
so on one of our platforms.
Finally, we'll take a look at
Data Modeling for collaboration.
Now, there are a number of great
reasons why we might want to use
multiple stores in our
application.
Especially when working with
network backed stores.
Multiple stores can help us
segregate our data between
distinct use cases in our
application.
And, they can help us provide
different types of constraints.
Right?
So, if we wanted to have one
store file that manages a very
specific set of validation
constraints, maybe to validate
user input, we can use a
separate store for that.
Multiple stores are also a great
way to throttle or coalesce data
that's written very frequently
on device.
And this can happen when you're
reading something from the
device that's generated maybe by
the device itself or by an
algorithm that you have which
creates data.
If this algorithm generates data
at a very high rate, it can be
very expensive to constantly
sync all of that data up to
CloudKit.
And so, we see clients inject
another store file into the mix
and use that to coalesce this
data until its ready to be
analyzed and then uploaded to
CloudKit later.
To show you how this works and
how Core Data can make this easy
for you, we're going to leverage
a feature of
NSManagedObjectModel called
Configurations.
And here you can see we have our
sample apps,
NSManagedObjectModel in the
model editor in Xcode.
Now, let's say that I want to
add locations to the post.
Right?
I want to record the location
where a post was created
whenever that happens.
But locations can be generated
at a high rate by the system and
I don't need them unless a post
is actually being created at
that location.
So, let's separate them from the
rest of the data model.
I'll start off by adding a new
entity by clicking this plus (+)
in the lower left-hand corner to
store my location information.
Now, my location's going to be
pretty simple.
It's just going to include a
latitude and longitude that are
both doubles that are exposed to
me by the Core Location
framework.
Of course, yours may include
other things like altitude or
accuracy.
Now we want to segregate these
locations from the rest of our
data.
And to do that I'll create a new
configuration, again, by hitting
this plus (+) button, but this
time holding down my click to
expose a menu that allows me to
add a configuration.
I'll call this configuration
Cloud and add all four of the
entities that I actually want to
sync to it.
Here you can see that now the
Cloud configuration has just the
entities that I want to be
synced with CloudKit.
Let's create a new one called
Local that will store our
location objects.
And, in just a few lines of
code, we can put this to work
for us, empowering Core Data to
tell our stores automatically
what types of objects they
stored.
At the top you can see we create
an instance of
NSPersistentCloudKitContainer,
and then, we leverage something
called
NSPersistentStoreDescription,
which tells
NSPersistentCloudKitContainer
about the types of stores that
it's managing.
We create an instance of
NSPersistentStoreDescription
that points at the file called
local.sqlite to hold our
location information.
And, we assign it the
configuration of local that we
just created.
Then we set up our cloud store.
Similarly, we create an instance
of NSPersistentStoreDescription
and we point it to a different
file; cloud.sqlite.
Then we give it the
configuration Cloud, which tells
NSPersistentCloudKitContainer
that only the entities like
post, and tags, and attachments,
and image data should be stored
in the store.
Finally, we assign it an
instance of NSPersistent
CloudKitContainerOptions, which
tells
NSPersistentCloudKitContainer
what iCloud container identifier
this store should be synced
with.
Last but not least, we assign
these two store descriptions to
the PersistentStoreDescription's
property of
NSPersistentCloudKitContainer.
And with
NSPersistentCloudKitContainer we
can take this even further.
We already have local store and
a cloud store.
But what if we want to share
some data in CloudKit across
multiple applications we happen
to work on?
Well,
NSPersistentCloudKitContainer
supports that as well.
In fact, because you're using
Core Data, your application will
easily be able to handle all the
data from these stores at the
same time.
And, Core Data will
automatically help you write
data that you insert into the
correct store file
automatically.
We can do this by just adding
three lines of code.
Right?
We create a new StoreDescription
that points to our shared store
file and give it a new
configuration that we might
create called shared.
Finally, we assign it a new
instance of NSPersistent
CloudKitContainer Options that
identifies the container we want
our shared data to be stored in;
in this case,
iCloud.com.wwdc.shared.
And of course, last but not
least, we assign it to
PersistentStoreDescriptions.
Now let's talk about the schema.
And, I want to cover a few
points about the schema that I
think are sort of the
highlights.
The most important things that
you need to know when you're
looking at the records that we
create in CloudKit.
I'll start off by talking about
how we manage record types and
how we manage entities, which
are what you create in your
NSManagedObjectModel.
Then, we'll look at how we
implement a feature called Asset
Externalization, which allows
you to seamlessly store
arbitrarily large values in
CloudKit using
NSPersistentCloudKitContainer.
Finally, we'll talk about how we
manage relationships and how
that might be different from the
CloudKit experience you're used
to.
To do that, we're going to use
the ManageObjectModel for our
sample application.
And I'm going to start off by
focusing on the post entity.
You can see that it has two
attributes; a title string and a
content string.
And it also has two
relationships to the attachment
and tag entities.
Core Data generates an actual
class for you to use in code as
a subclass of NSManagedObject
that looks like this.
And you can see that all of the
attributes and relationships are
represented on that class.
Now, this is the record that we
will generate in CloudKit that
goes along with this post.
And these are some example
values that I've used to
populate this record and make it
what we call fully materialized.
Now, there are some things that
I want to highlight to you
first.
The record ID.
Core Data owns the record ID for
all of the objects that it
creates in CloudKit.
And, for each one, we will
generate a simple UUID to use as
its record name.
When the Record Name is combined
with a zone identifier you get a
CKRecord ID.
At the bottom, you'll see how
Core Data manages type
information.
And, there are two interesting
things here.
The first is, what the heck are
all these CD under bars?
This is Core Data's way of
segregating the things that it
manages to be separate from ones
that either CloudKit implements
for you -- you wouldn't believe
how many people add modify date
to their CKRecord.
Or, ones that you might add
yourself.
And so, we prefix everything;
the record type and all of our
field names with CD under bar.
But, in the CD entityName field,
we keep the real entity name of
the object that this record goes
with.
And we do this so that we can
implement a feature called
entityInheritance, wherein you
could have subclasses of a post,
let's say.
Perhaps an image post, or a
video post.
The actual entities will always
be identified by the CD under
bar entityName field.
And we do this so that you can
implement CK queries that get at
all of the entity hierarchies
you're interested in by querying
a single record type.
Now let's look at how we
implement these two strings.
Because these are variable
length fields and our
translation of them to CloudKit
has some interesting quirks.
You'll see that we've inflated
them to four total fields.
And the reason for this, is that
this is how we implement asset
externalization.
Here you can see that we have
both a CD under bar content
field and a CD under bar content
under bar CKAsset.
This allows us to store strings
that are arbitrarily large.
Anywhere from simple kilobytes
all the way up to hundreds of
megabytes or even gigabytes in
length.
And when I said that this record
is what we call fully
materialized, what I mean is
that you won't ever actually see
all four of these fields at the
same time.
If the strings are all very
short, then you'll just see CD
under bar content and CD under
bar title in the record.
But, if one of them grows to be
very large, approximately larger
than 750 kilobytes, or if the
total size of the record exceeds
CloudKit's maximum 1 megabyte
limit, you'll begin to see asset
fields, or in this case, CD
under bar content under bar
CKAsset in the record instead.
If you're consuming our records
on your own, then you'll need to
check both places to see whether
or not a value has been set for
a specific attribute.
Now let's take a look at the
relationships on a post.
You can see here that they're
both instances of NSSet in the
object that Core Data generates
for you to use in your code.
And this is because a post has
to-- what we call too many
relationships.
That means that a post can be
assigned too many attachments or
it can have many tags associated
with it.
The attachments relationship is
what we like to call a
Many-To-One, because an
attachment can only be assigned
to one post.
When we generate code for this,
you'll see that there's an NSSet
on the post, but there's only a
single post object declared on
the attachment ManageObject.
And this is the record they will
generate for an attachment.
You can see that it has a UUID,
an entity name, and a record
type just like a post does, but
you'll also see an additional
field called CD under bar post.
This is how we store the
relationship for a Two-One
relationship.
The UUID of the related record
in CloudKit will always be
stored on the object it's linked
to.
And, you might be wondering why
we don't do this with
CKReference instead.
And that's because CKReference
has some limitations that we
don't think work well for Core
Data clients.
Namely that it's limited to 750
total objects.
But by storing a relationship
this way, you can hold as many
will fit in your CloudKit
container.
Now let's look at a Many-To-Many
relationship.
In this case, the one between a
post and its tags.
You can see that there are two
NSSet's on this object and this
is because both objects can be
related to many of the other
type.
But when we generate the records
for these objects, there aren't
any fields that we materialized
to hold this relationship.
Instead, Core Data materializes
a custom join record.
Now, if you're familiar with
relational databases, you'll
recognize this concept.
It's basically an extrapolation
of a row in a join table.
And it's called a CDMR or Core
Data Mirrored Relationship.
A CDMR contains a set of twoples
that are designed to describe
the two objects that are linked,
beginning with the entity names
of the two objects that are
linked, and their record names.
Now, as I said before, this is
not the record ID.
This is the record name that you
have to combine with a zone
identifier to get the identity
of the records that have been
linked by the CDMR.
Finally, we also encapsulate the
exact relationships that were
used to make this link.
So why am I spending so much
time talking about
relationships?
Well, they have a huge impact on
how we model our data for
collaboration.
And I need to be very clear
here, collaboration is not
conflict resolution.
Conflict resolution is
implemented automatically by
NSPersistentCloudKitContainer
using a last writer wins merge
policy.
And the reason we do this, is
that the job of conflict
resolution is to keep the object
graph and the data in CloudKit
consistent with how you've
modeled your data.
But, over the years we've heard
that this can be a little
frustrating.
So, let's take a look at how you
can implement better merge
behavior and align that merge
behavior with your specific
customer use cases using
relationships.
To do that, we're going to use
our post application again.
And I'm going to create a post,
but I'm not going to assign it
any content.
I'll let it sync over to another
device let's say, and then I'll
make some edits on each device
at the same time.
Now, this is what we would
traditionally term to be a
conflict.
But, actually, this is a great
example of two devices
attempting to collaborate on a
value.
Now,
NSPersistentCloudKitContainer
seeing that something has
changed in CloudKit while it was
away, will resolve this to
preserve one of the two values.
Right?
Using a last writer wins merge
policy.
Which means that we'll either
get collaboration is great or
everyone should do it.
And it's true.
Of course, you might be thinking
that -- well, this is kind of
dumb.
Why don't you just concatenate
the two strings together, Core
Data?
And we could.
Except that if you took our
advice over the years and
implemented incremental saving
you might end up with something
like this, which is not what
anyone expects.
And it's not even English.
So, how can we do better?
Right?
Clearly this is a problem that's
very important to us and it's
very dear to our customers.
Well, let's look at our post
entity and see if there's some
changes we can make that would
make this better.
The first thing to do is to stop
creating collisions on flat
values.
Content is simply a flat string
and there's nothing that we can
infer about the merge behavior
that you want for that string,
simply from looking at it.
But, if we break it up as a
relationship, multiple devices
can contribute contributions to
the content field all by
themselves.
And because of the way we store
Two-One relationships, many
devices can contribute these
objects at the same time without
creating a collision on the post
object itself.
So, here I've broken it up as a
very simple post content entity
with a simple string.
Now, because of how we store
Two-One relationships, these
will be combined by devices on
their own.
Right?
We need to traverse them in
order to assemble the final
value.
And we call this eventual
consistency.
As both devices contribute post
content objects, they'll
download the post content
objects from the other device
and concatenate them or merge
them, or whatever topology you
want to apply to this problem to
assemble the final value.
But we want that to be the same
across all devices and a
difference in download order
could create a different final
value.
So, we can order them say with
something as trivial as a date.
In this way, the devices can
order the post content objects
before they do a concatenation
using a simple sort descriptor
in Core Data, giving them a
consistent ordering and merge
behavior across all devices.
But, I'm a bit of a distributed
systems nerd and time is evil.
In fact, a devices notion of
time might not even line up if
they're in your house together.
So, we can take this a step
further and implement a full
causal tree by leveraging a
relationship to a parent
contribution and giving the
entity some information about
the device that's making the
actual contribution.
In this way, we've implemented a
rough sketch of something called
the conflict-free replicated
data type.
And this is an awesome emerging
field of computer science that
allows us to deploy algorithms
to create consistent merge
behavior across different
user-user scenarios.
And that has been Using Core
Data with CloudKit.
It's been my pleasure to show
you
NSPersistentCloudKitContainer
and how you can easily implement
sync functionality in your
applications with it.
We've also got some great new
sample code and documentation
for you this year that I hope
gives you a great experience
with all of these new APIs.
And finally, I can't wait to see
what you build with
NSPersistentCloudKitContainer
and how you extend it.
We'll be in a ton of labs this
year.
Every day this week.
And as you know, we have a talk
on 3 o'clock that covers a lot
more features in Core Data in
Thursday.
Thanks a lot, and I hope you
have an awesome WWDC.
[ Applause ]