WWDC2013 Session 211

Transcript

X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
[ Silence ]
>> Hi. I'm Tim Isted.
Welcome back after lunch.
Let's talk Core Data
performance,
so you've taken the
time to learn Core Data
and you've built
your app around it.
Maybe you found
that occasionally you get
slightly just re-scroll view,
table view.
Maybe the information takes a
while to come into your app.
Maybe users are reporting
a few occasional issues.
We're going to look at we can
identify those issues today
and how we can fix them, so when
we're talking about optimization
and particularly with Core Data,
we're trying to make a balance
between memory usage on the one
hand and speed on the other,
so the more memory we use,
the more information we have
in memory, but we're
using more memory.
The less information we have in
memory, the slower things get
because we have to pull it in
and push it back out again.
In iOS, obviously, you're
even more limited with memory
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
than you are on the desktop,
so you can't sacrifice
memory for speed.
You have to find a way to
optimize where possible
without using too much
memory, so some of the things
that cause people problems are
loading too much information.
Maybe you're pulling in
more than you really need.
Maybe you're firing
too many faults.
You're trying to
use up something
that you haven't previously
indicated you wanted
and Core Data has to go out
and fetch it and bring it in.
Maybe your queries
are too expensive.
Maybe your text searches are
using the wrong type of query
or using something more
expensive than you really need,
and finally, a more
advanced thing,
maybe you're incurring
too many locks
when you're accessing
through context.
So today we're going to be
talking a lot about Instruments
and not only how we use
Instruments to find problems,
but how we interpret the
information that we see
and figure out what it means
and what we need to do about it.
We're going to be talking
about debug logging
so that you can see exactly
what's happening underneath the
hood and what Core Data is
doing behind the scenes,
and we'll be talking about some
of the optimizations we can make
to our model, to fetch
requests and predicates.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
We're looking at concurrency
and optimizing text searches.
So let's jump right in and talk
about measuring performance
and primarily our tool for this
is Instruments, but obviously
with Xcode 5 you've now got
debug gauges that are there
without even running
Instruments,
so you can get read outs
on memory usage, CP usage,
et cetera, but your
first port of call
for a more advanced information
is the Core Data Instrument,
but that's not all.
Also consider running a
time profiler allocation.
See how much memory
you're using.
Look at file activity.
Look at disk IO.
How many times are you hitting
the disk to pull information in?
Though when you're
using instruments,
you need to know two things.
First of all, what
are you looking for?
What are the patterns
you're trying to find out?
What's wrong?
How do you interpret
that information?
How do you use it
to make a change?
And you also need to know
how long to something --
should something take.
At app launch it should be fast.
Whatever you're doing to load
initial information should
happen quickly.
A table view should
pop up straight away.
If you're doing something
in the background,
obviously that can take longer,
but how long is too long?
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
How short is the right amount?
Let's jump straight to a
quick demo, so here's an app
and it has a problem that I'm
sure you -- many of you've seen,
that when I run it
in the simulator takes
really a rather long time
for anything to happen.
Eventually my UI appears, and
if I look in the debug gauge,
my memory usage is
way up at 640 Meg.
That's a lot.
On a device, that's
probably going to get killed,
and if it took that long
in the simulator to launch,
it's going to take much,
much longer on a device
and probably won't
even launch at all,
so we can say it's taking
a long time to launch,
but that's rather subjective.
How do we get an
objective measure?
Lets choose Profile,
and we're going
to choose the Core Data template
and let's see what comes in.
Let's look at the Core
Data Fetches Instrument,
and I'm going to look at the UI.
Just before that UI eventually
appears I get some blocks
up here in the read out,
and this time line
shows a wide block.
Those blue and those pink
blocks are big, and this is bad,
particularly at app launch.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
And to see exactly what's bad,
let's look in the
bottom right hand corner,
so we've got a duration here
of just over three seconds,
so the initial fetch request is
taking three seconds to come in,
and we're fetching 505 objects
so there are two things
we might look at here.
One is the fetch is
taking way too long.
Three seconds is like a
millennium in computer cycles,
and we're fetching 505 objects.
Do we really need to do that?
Let's look at optimizing fetch
requests, and the first thing.
Don't fetch more than you need.
This is a really, really
quick fix that you can make.
You've got a table view.
It's showing maybe ten,
twelve, thirteen views --
rows, depending on, you
know, your configuration
of the table view on the device,
so don't fetch all 505 objects
from the store just
for the first ten.
And there are various
ways we can do this,
but the very easiest is
to add one line of code.
Use a fetch batch size,
and this limits the number
of results that come in.
We can use a fetch
batch size of twenty.
That accounts for what's in
the table view at the time
and a little bit of scrolling,
so do that we just
set a fetch batch size
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
on the request itself.
And what this means is that
the results that we get back
from the batched fetch is
a special kind of array.
It behaves just like
any other array.
You can query the count,
and you get the right number
of objects back as to what would
be there without the batch,
but only the first fetched
batch size worth of objects are
in the array, in this
case the first twenty,
so as you iterate through,
everything works as normal,
but when you reach the 20th
object and then you try to put
in the 21st, that's a
promise from Core Data
that the results will be
there when you need them.
So Core Data will go out,
and it will execute another
fetch request to bring
in those objects
as you want them,
and this all happens
automatically,
so you don't need
to worry about it.
Just set this.
Iterate through the results
as you would normally,
and the information flows
in as it did before,
but you're minimizing the amount
of objects you have in memory
because you're limiting the
initial fetch to twenty.
Let's take a look in the
app and see what happens,
so here I've got a fetch
batch size of zero.
That's the default.
It means an unlimited batch.
Everything will come in.
Let's change it to 20, and
then we're re-profile the app,
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
so product, profile,
Core Data Instruments.
Sorry, I'm going to run it first
apparently, and here we have --
the memory has gone right
down, so we're using a tenth
of what we had before.
Instead of the 600 Meg
before, we've now got 62,
so that's a big improvement.
It's not ideal.
That's still quite
a lot of memory,
but it's better than
it was before.
Let's profile it and see
what's happened, so stop it
and re-profile, and take
a look at the read out.
So this time the app
launches very quickly.
If we look in the simulator --
if we look in the time line,
we've got a very thin line.
The thin line is good.
Thin line means a
very fast fetch.
That's what you want to see.
You don't want to
see wide lines.
You want to see very thin ones.
If we look in the simulator,
and we scroll the table view,
then you'll see more
entries appear
in the fetches instrument, and
this is Core Data going out
and bringing in more
objects as it's needed.
This would, you know, to
fulfill different rows
in the table view, so you've got
the thin line at the beginning
for the initial fetch and then
you've got subsequent lines
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
appearing, ten lines again
to bring in more objects.
If we look in the bottom right
hand -- this slide, this time,
we see something interesting,
so we've got one fetch count
of 505, and that's Core Data
going out and doing a count
for fetch request to see how
many objects should eventually
be in this array, 505,
and the duration is short.
That's three milliseconds,
less than three milliseconds.
That's okay.
That's a reasonable amount.
The first fetch batch that
comes in is for 20 objects,
and that's 176 milliseconds,
177, so a huge improvement
over the three seconds we had
before just by changing one line
of code and setting a fetch
batch size, but is it enough?
One hundred and seventy six
milliseconds is still quite a
long time, especially in
the simulator on a device
that will be more, and at
app launch you want the UI
to appear immediately,
so what else can we do?
Let's talk about optimizing the
data model, so if you've been
to the WWDC presentations
on Core Data
and performance before, you'll
know that we've talked a lot
about designing your
model for your apps usage
and in particular saying
don't over-normalize.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
And normalization in a
traditional sense means
that you're trying to
minimize the amount
of information that's
duplicated across a database,
so if you've got something
in one place you don't
really want it anywhere else
because if you need to
change it you've got
to update it everywhere.
With Core Data, you
might be able
to get a performance benefit
by actually caching information
or storing it in multiple
places where you really need it,
so you don't have to keep
firing relationships to go off
and get related information.
Duplication is not
necessarily a bad thing.
We've also said that when
you're using large blobs,
in this case our
photos that we're seeing
in the contact list, you should
use external storage attribute.
It's a simple checkbox
in the data model.
You just enable it, and we will
store blobs as you set them
on a property -- out to
disk, but not necessarily.
Sometimes SQLite is actually
better for small file sizes.
It's much more responsive
giving back data blobs
than using a separate file,
so if the file is below
that threshold we will store
it out there in SQLite.
If it's above the threshold,
we'll write it to a file
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
and store a reference.
That all happens automatically.
You don't need to
worry about it.
You simply set a data property,
and then you get it
back when you want it.
Now this is my current entity.
I have a first name, a
last name and a photo,
and that's what's
displayed in my UI.
Do we really need
that photo in there?
It might be that we could
change our UI to get rid
of the little thumbnail and just
show first name and last name,
but we still want the photo.
Maybe we're going to
push a view controller
on screen that's going to give
us maybe the picture of the user
and then all their
information overlaid on top,
so in previous talks
we've recommended
that you split your data
out into a separate entity.
Anything that's a big
blob of data should go
into a separate entity, in
this case, a photo entity
with a photo data attribute,
and there's a one-to-one
relationship
between them, but there's more.
Let's see what happens
when we try this.
I've put the data
into my photo entity,
and I've got my external
storage checkbox used.
Let's profile this again.
This time the app
launches quickly again,
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
but there's some
marks appearing.
Have a look in this instrument
as I scroll the table view,
got little black marks appearing
the Core Data Cache Misses
Instrument, and this is because
we have actually still used
that photo to display a
little tiny thumbnail.
We've got the contacts
being loaded,
and that's what we're
fetching right now,
but each time we access that
photo relationship to display
in the table view, Core Data's
having to go out and fetch
that related photo
blob and bring it
in so we can create
the thumbnail.
What we should really consider
doing is maybe we can pre-fetch
the objects that we need.
If you're seeing that kind
of activity in an instrument,
it generally means that you
haven't set enough information
on what you -- what you're
trying to bring in and use.
You've requested your contacts,
but you haven't told it
you also want the photo,
so you do that by calling
set relationship key paths
for pre-fetching and
providing an array
of key paths, in
this case photo.
That's the name of
the relationship,
but that might not be the
right thing to do in this case
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
because our photo is 10 Meg.
We're actually displaying
a little tiny thumbnail.
We don't really need an
entire image just to show
that thumbnail, nor do we really
want to scale it at runtime,
so what we should really do is
cache the thumbnail separately
in the contact identity.
If we're loading the first name
and last name and the thumbnail
in one go, and we're
displaying it in the UI,
let's have it all
in the same entity.
Less data takes less
time to fetch.
We're not skipping
a relationship.
We're not having
to scale in memory,
so let's move to
this arrangement.
I've got my first name,
last name and a thumbnail
in the contact entity and
photo data in the photo entity.
What happens when
I profile this?
Again, product profile.
Look at the Core Data
template, and let's look
in the Fetches Instrument.
As I scroll my table view
this time I see no activity
in cache misses, but the
information is still brought
in by the batches, seems
to be very responsive.
The scrolling's great.
Let's have a look
in the bottom right.
This time my fetch durations
have dropped dramatically,
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
so the longest thing right
now is actually the count,
505 objects.
It's taking about
three milliseconds.
All the 20 individual object
fetches, about two milliseconds,
even less than that
as it goes on.
That's much more realistic.
That's what it should
be at app launch.
We want the UI to
appear immediately.
We want the information
pulled in quickly.
We don't want to keep
the user waiting.
Good user experience is
what we're aiming for,
so that's basic problems
with simple applications.
Let's talk a little bit now
about performing
background tasks,
so this might involve importing
information from somewhere.
It might involve doing some
complicated calculations
on a lot of data that
you've already saved.
First of all, let's look at
dealing with a simple app.
This is an earthquakes app.
It shows the location of recent
earthquakes, which it gets
from a [inaudible] web service.
It pulls in some JSON
data, translates it
and updates the local store.
What I have right
now has a problem,
so profile it once more.
Let's look at the Fetches
Instrument, and I'm going
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
to hit my little
refresh button in my UI,
which will trigger an update.
So what you're seeing
is a large amount
of activity in that instrument.
It's kind of like red
meadow grass, lots and lots
of thin lines, one
after the other.
This is also a bad thing.
I'm not going to make you
sit through all of this,
so let's skip on a little bit
and hey, it's still going on.
This is now nearly a 50
second import operation.
That seems a little excessive
even for a background operation.
If you look in the bottom
right, we've got a whole bunch
of fetches that just
have a count of one.
That's -- that seems a bit odd,
and what's even more if we look
in the Saves Instrument, you see
that we've got a single
save happening at the end
and it's taking 175
milliseconds.
Again, quite a long time even
on the -- on my simulator,
going to be much longer
than that on the device.
When we talk about locking
later, this is going to be key.
We want to keep the
time it takes to save
to a minimum wherever
possible, and the reason
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
that that's bad is that we're
performing a fetch request
for every object that
we're trying to update.
Every item in our JSON
dictionary we're going out
and we're executing fetch
requests to see if it exists.
If it does we update it.
If we don't, we insert it.
Let's talk about a
better algorithm,
so the first you do is you
sort your input objects
by some kind of ID.
That's how you tie some relation
data to the thing in the store,
and you execute one sorted
fetch against your store to pull
in all the objects that have
a matching ID from the ones
that you're trying to
import, and then you iterate
through them concurrently.
If the next object in each
enumerator is the same,
you update.
If it's different, you insert,
so a picture's worth
a thousand words.
Let's take a look
at this in practice.
Say we've got three
objects, and we're trying
to update our local store, and
we've got a couple of objects
that actually exist in
this particular data.
First thing we do, we sort them,
so the fetch request
must be sorted.
We can sort our JSON
dictionary as well by ID,
and we enumerate concurrently,
so the first object in each
of these collections has ID 101.
It's a match.
That's an update, and we go
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
to the next object
in each enumerator.
In this case, they don't match.
That's saying it's an insert,
so we can insert the
object into the store.
And this point we increment
only the information to update
because we haven't yet
dealt with the one --
number 104 in our
existing objects.
This time they match.
It's an update, so
we can update.
That's a much more efficient
algorithm for inserting
or update because you're only
ever executing one fetch request
to the store.
You're only pulling
in one lot of data.
You're not going
out individual times
to get each object separately,
and you won't see these
little crazy lines going
across in the time -- in
the profile in Instruments.
Let's see what difference
this makes.
I've got two enumerators
here, JSON quake enumerator
and matching quake enumerator.
When I run -- I profile it.
See how much faster this is.
[ Pause ]
Boom. That's pretty good.
I've got one line in my
Fetch Request Instrument now.
It's not thin.
It's a little bit wide,
a few pixels wide,
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
but it's a certainly
better than before.
Instead of like a
minute's worth of import,
we've got maybe a second.
I kept saying that thin
was better than wide.
Why is it wide?
Well our fetch count is actually
23,000 objects, so I had a batch
of JSON dictionaries to
update, and I've gone out
and I've fetched
all my existing ones
and I got 23,000 objects
coming back into memory.
That's probably not a good
thing, and it took just
under 300 milliseconds, so again
that will be slow
on an actual device.
This isn't good.
You want to minimize the number
of objects you're dealing
with at any time and keep
those fetch durations low,
again for locking purposes
as we'll talk later.
What we really want to
do is work in batches,
and this applies both to the
implementing update or insert
but also any kind of background
activity that you're doing
where you're pulling in
lots of objects at the time.
You should be working in batches
rather than the whole lot
at once, and it's
worth experimenting
to find the optimal batch size.
Maybe it's 500.
Maybe it's 1000.
Maybe it's 550.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
Just that little bit
of difference means
that you get things much faster,
but without using too much
memory, so experiment and test
on every device that
you support.
Just because it works
on an iPhone 5 very well doesn't
necessarily mean it'll be
so good on a 4 or even a 3GS.
So, let's talk about minimizing
memory usage we'll be batching.
You've got a number of ways to
deal with getting rid of some
of the information that
you've just pulled in,
and the first way is to turn
a single managed object back
into a fault, so this
can be really helpful
if you've got a single managed
object and it has a bunch
of related objects hanging
off it, and you fetch them all
in to do something and you've
been operating on them,
and then you've finished
with it.
So on to arc, you've probably
got a strong reference
to your managed object and then
this will have references all
the way across to all the
related objects, so they'll stay
in memory and use memory
that you don't really
need at this point.
So by calling refresh object
merge changes on this object,
you'll turn it into a fault.
All its other data
will disappear,
and it will be re-fetched
if you need it,
but that's a single object.
What about, you know,
doing a batch?
The most efficient way
is to reset the context
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
by calling reset,
and this will clear
out all the existing
managed objects
so they'll be re-fetched the
next time you need to use them.
It does come with a caveat.
If you have any existing
references to any of the objects
in that context, they will be
invalid after you've reset.
If you try and access
one, you'll get an error
in the console that
says something like,
"Could not fulfill a fault."
So let's look at the batch
size and see what's happened
when I've re-profiled the app.
Got a batch size of 500 here.
That seemed to work quite well
when I profile with
the instrument.
Check the fetch requests, and
then click refresh in the app.
I get a little flurry
of activity,
but this time it's
only a few entries
in the time line, which is good.
They're few and thin, and the
times are far much reduced,
so if I look down in the bottom
right you can see we've only got
500 objects coming in at a time
and we're dealing with a matter
of maybe 13, 14 milliseconds, so
each of these fetches is going
to be very, very short and
we're not using too much memory
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
because we're using a
batch, so this is the ideal,
most optimized background import
that we could possibly
use for this process.
So that's importing data.
What about operations
that we need to do?
Well, it's always worth saying
only fetch what you need.
We talked earlier about
the fetch batch size,
so that's a way to limit
the number of results
that you get initially rather
than fetching everything,
but there are other
things you can consider.
If all you're doing is
performing a calculation
on your data, you don't need
read write access to that data.
Potentially you don't even
need an entire managed object
with all your business
logic hanging off it.
What you might be able to do is
to use a dictionary
request type --
result type, so when
you execute this,
instead of getting
managed objects back,
you get plain NS
Dictionary instances,
and you can even limit
the number of properties
that you actually get so rather
than getting your
entire managed object,
you can just specify one value.
In this case, I want
to get the magnitudes
of all the earthquakes
in my app.
Perhaps I'm trying to use
this make a calculation.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
Maybe I want to get a minimum
or a maximum or an average,
so I'm loading all
the information in
and then performing
some calculation
in the background on that.
And that certainly will work,
but it's not the best way
to do this because it requires
us to bring in everything
and build objects around
the data that we have.
What's far better is to
use aggregate operations,
and what we can do is we could
actually have Core Data query
the SQL store --
SQLite store and say,
"Perform these operations
on the data,
and give me a single
result back,"
so in this case I've got
an expression description
called minimum.
That would be the key in the
dictionary that I get back,
and I'm calling expression
for function Min.
As it might seem, you're going
to get minimum values back,
a single minimum value back
for the minimum magnitude
in the app, so rather
than having
to fetch all the information in,
I've got a single
result come in.
Now you can take this further
because not only are
you specifying that,
but you can also specify a
predicate, so if you want
to say, for example, "Give
me the minimum values
across the last seven days,"
you could still set a predicate
on the fetch request and
set the properties to fetch.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
I just want this expression
description to come back,
and you will get a dictionary
back with a single result,
minimum and the minimum value.
This is another complicated
potentially scenario.
I want to summarize my
data, so I want to look
at all the different
magnitudes and get the number
of earthquakes that happen
with that magnitude, and yes,
there really is a
magnitude of minus one.
It's a logarithmic scale,
smaller the earthquake
actually goes to minus one.
How do I do this?
Well I might go and I might
go and get the unique values
from the store for magnitude
and then count how many
earthquakes actually have
that magnitude.
That's not the most
efficient way though.
We can actually do this
with a single fetch request.
We can execute expression for
function count, so that's going
to give us a count of something,
in case the magnitudes,
and we're going to supply
the properties to fetch,
both magnitude and
account, and we're going
to group the results using
group by, specify magnitude.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
And this will actually
give us back a dictionary
with all these results in a
single dictionary with magnitude
and count, and that's it
so we can just display
that straight in a table view.
We've not had to load anything
into memory other
than this dictionary.
We're not performing any complex
calculations, so that's going
to be quite complicated
when we implement this,
and it's also interesting to
see what's really going on.
So you've made these
optimizations.
You've tried these
aggregate queries,
but what you really want to
know is actually what Core
Data's doing.
In the labs quite frequently
we hear that, you know,
you find Core Data a
bit of a black box.
You're not sure what's
really happening,
so you can actually
use SQL logging
to tell you exactly what Core
Data's doing, passing argument
on launch or even in
your user defaults called
Com.Apple.CoreData.SQLDebug,
and you supply an Apple of one,
two -- I'm sorry, a value
of one, two or three.
Don't forget the dash
on the beginning.
This will give you
the raw SQL queries
that Core Data's
generating, and the more --
the higher the value
you provide,
the more information
you get back.
You also get exact timings
for how long a fetch took
or how long an insert took
and writing the values back
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
and forth, but it does give
you a lot of information
and don't use it unwisely.
Just because you can see
how we've used a schema,
don't trust that that's always
going to be the same way.
You should never, ever try and
access data directly in SQLite.
Always, always go
through Core Data.
We have changed it in the past.
Don't risk it.
So, let's take a look at this.
I built my app.
I've changed the UA slightly so
that I now don't have my map.
I actually get some
summary data.
When I run the app, I get my
minimum average and maximum,
and I also get my nice list
with the different magnitudes
and the number of
quakes that happened.
My fetch request is maybe
a bit more complicated now
in using expression
descriptions,
so let's use logging to find
out exactly what's going.
I'm sorry.
I'm going to go back a bit.
Right, so if I stop the app, I'm
going to go to Product, Scheme,
Edit Scheme, and I'm going
to add a launch argument.
Again,
-Com.Apple.CoreData.SQLDebug.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
Let's specify a value of one.
When we re-launch the app, we're
going to get some information
in the console, and here we
can actually see the raw SQL.
Select Min.
That's the SQL minimum value
of magnitude from quake
where time is equal to this.
I've actually got a predicate
on there giving me a rough idea
of the last seven days.
Let's take a look when I go and
look at the grouped information.
Well, we get more
information now.
We get the very complicated
SQL statement.
Select magnitude.
Count it from quake grouped by
magnitude, ordered by magnitude,
and this gives us the
dictionary back that we want,
so at this point you might
be tempted to use NSLog
or something to see exactly
what you're getting back
so that you can work out how to
use it, but you don't have to.
Let's try increasing the value
for the log level to three,
so again, Product,
Scheme, Edit Scheme.
Change one to three, and this
time when we re-run the app,
we get a lot more information,
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
so we've got our
average showing there.
When I hit list, I get actually
the entire dictionary results.
I get all -- to see
all the bound variables
that Core Data has used to
give us the information back
from the queries it's executed,
so this can be really helpful
to debug what's going on.
If you make performance
optimizations,
you can see whether what
you've done has actually made a
benefit, whether it
looks great in the SQL
and it gives you a little peek
into what we're doing behind the
scenes, so that's debug logging.
We talked a little bit
about background tasks.
Let's talk about concurrency
models that we can use
when we're working
with background tasks.
In previous years we've
talked in great detail
about the different
confinement types,
so you should be using private
queue types for background work,
main queue types for your UI
work, and then you interact
with those with the
perform block API.
Don't use the old style
thread confinement,
and you might perhaps have a
set up that looks like this
with a main queue context and
a private queue context talking
to the same persistent
store coordinator.
Maybe you've got something
a little more complicated.
Maybe this time you're
using a nested context.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
Maybe a main queue talks
to the private queue,
which talks to the
persistent store coordinator.
This will give you asynchronous
saves because a save --
and it only goes up one
level, so when you save
in the main queue context,
it saves the information just
to the private queue context.
When you call save on that, it
will then save to the store.
Maybe you've got something a
little more complicated still,
or maybe even more than
this, and this is all fine.
If it's working for you, that's
great, but if you're running
into a non-deterministic issue.
Maybe occasionally
you're seeing something
that you can't reproduce, maybe
a little jitter in a table view.
Maybe something's taking
a little bit longer
to save sometimes, but
not the other times.
It may be that you're
running into a locking issue,
so when you're working in this
context, and you call save,
in order to make sure nothing
goes wrong during the save we
lock the level above
because that's
where the save information
goes to.
That doesn't mean
that, you know,
you're completely
locked out from anything.
It just means that if you need
to execute a fetch or a save
in the main queue context up on
the top right, that will have
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
to wait until the lock is
removed, but you can continue
to use objects just
like you did before.
If you have to queue to save on
this context in the background,
so maybe you've got that 173
milliseconds saved in the end
of a big batch operation.
Then we lock the
persistent store quantity.
The save goes up one level,
so again, you can continue
to operate with the
objects you already have
in the other context, but if
you need to execute a fetch
or a save, it won't complete
until the lock is removed.
Otherwise the data
might be corrupted.
When we're fetching, potentially
we're having to pull information
out of the store, so the
locks go all the way down.
Again, you can continue
to use your other context
as you were before, but
if a fetch or a save needs
to happen it will be blocked
until the lock is released,
so if you're not seeing
problems with this,
this is absolutely the model
that you should be using,
if you are seeing issues that
you can't do anything about.
You've done all the other
optimization tricks.
You might want to consider using
a different concurrency style,
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
and this time you have two
persistent store coordinators,
two almost completely
separate Core Data stacks,
so you've got one stack
for your background work,
one stack for your main
queue work and they both talk
to the same persistent store
file, and the benefit of this is
that when we need
to perform a lock
for something the only
really relevant bit to one
of the other context is the
store file, so when you work
in the background, for example,
the lock is on the store file.
And file locks will come
and go much more quickly,
so if you're performing
background work,
like a massive import,
operation,
then this will be a good way
to avoid locks particularly
with a change that we've just
made, but if you're saying,
"How do I use this in the
way that I am right now?
What happens?
How do I merge changes across?"
Well, just like you do now.
When you get a contexted
save notification,
you can call merge changes from
contexted save notification
and pass the context across and
poof, it will be imported just
as you wish, but you should ask,
"Is this really what
you want to do?"
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
If you're using a fetch
results controller, for example,
in iOS and you've got -- you've
just imported maybe 1000 objects
into the background and your
Fetch Results Controller is tied
to a table view.
You've got all the
delegate methods set up.
Then we have to make 1000
changes in that table view
on a row by row basis, so
rows are being inserted,
moved, pushed up and down.
That's probably not
what you really want.
That's going to take a while.
Instead what you probably want
to do is listen out for them,
contexted save notification
and then re-fetch the data,
reload it in the table view.
That will be much
faster to call.
Perform fetch on the
fetch results controller
and reload the table view
than making 1000
changes individually.
If you were there this morning
-- well, you see the video.
We talked so that we've not
made price [inaudible] mode the
default journaling mode with
Core Data with SQLite in iOS 7
and OS X 10.9 Mavericks.
And what this means is that we
actually now support multiple
concurrent reads and one
concurrent write on the file
at any one time, so if
you've got multiple stacks
and you're doing a
background import over here
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
and it's saving data into
the store, there are no locks
on the store at this point
because there's only
one write happening.
Multiple reads can be happening,
potentially from multiple stacks
and getting that
data out, no locks.
This is also available in iOS
4 and 10.7 Lion and above,
just by setting this
options dictionary
when you add the persistent
store, and you call
that at persistent store
with options method
and that's SQLite
PRAGMAs option,
journal mode equals wall.
Okay, let's move onto
something a little different,
text queries, so
there's some very,
very simple things you can
do immediately to speed
up when you're working
with predicates
that involve some kind of text
query, so a predicate is the way
that you limit the
results that you get back.
In this case, I'm querying
against my contact entity.
Perhaps the contacts app that
you saw at the beginning,
and I'm asking for all the
objects that have a first name
with John, and they're
aged over 40.
In this case I'm asking for
a text query to happen first,
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
so I'm saying I want to query
all the objects with the name
of John, so I'm going through
ever single item in that table,
pulling out the ones
that are called John,
and then I'm executing the
age query on them to pull
out the ones that are over 40.
Text comparison is
quite expensive
and much more expensive
than the numeric comparison.
Computers are great
with numbers.
A greater than is actually
a very cheap operation,
so what's much better to
do is to swap these round
and put the numeric
comparison first.
This means that we're pulling
out, looking through our data,
checking all the ones
that are over 40 and then,
we then use that reduced
set to check for the ones
that are called John, so always,
always put your numeric
comparison,
your most limiting
numeric comparison first
and that will speed up
your predicate immediately.
You should also look at which
type of query you're using,
so in increasing cost at the top
we've got begins with and ends
with and that's by
far the cheapest query
that you can execute.
What we're doing at that point
is we're checking the first
characters match.
As soon as we come across
one that doesn't match,
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
we can move on to the next one.
Ends with is the same thing,
but coming from the other end.
Equality, well obviously we're
checking all the characters
at this point.
Contains is more expensive
because we have to work along
and see whether it contains, and
then we'll keep going through
and Match is the most expensive.
We fire up a regular
Expression Engine for this.
This is going to take
up a lot more time than,
for example, a begins
with query.
If you're at case and
diacritic insensitivity,
that's the square bracket CD,
that increases the
cost even more,
so it's always worth
trying to work
out whether this is really
what you want to be doing.
If you writing a
dictionary app, for example,
and you allow the user
to search in that,
which is what they probably
want to do in a dictionary,
what they're probably
looking for is a word
that matches the first few
characters that they've typed.
You probably don't need
to do a matches query.
What you really want
is a begins with query.
That will be much faster, give
the results much more quickly,
so that's simple
predicate optimization,
but what else can we do?
Well I talked to you about case
and diacritic insensitivity
being more expensive.
How can we deal with that?
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
And again, let's come
back to this idea
of not duplicating information,
and somehow actually duplicating
information is a good thing.
We can use a canonicalized
search, so in addition
to a text property that
maybe supports case
and diacritic insensitivity,
so maybe you've got
different cases.
You've got lots of
different marks.
Maybe, hopefully, you've
localized your apps
for multiple languages and
the user's typing all sorts
of stuff, but when they search
they want to just be able
to type a few characters easily.
We can save a second
text property at the time
that you update the
localized text that has all
that stripped out,
so you normalize it.
You remove all the case
and the diacritic marks
and you store it in, you know,
a canonicalized property
in the same entity.
You can then use a normalized
query, a square bracket N
and parse in a normalized
version of the query term
that the user's typed.
So for example, if
they typed t-h-e,
it should match maybe
capital T, h, and then e acute
or something, and that will
happen much more quickly
than having to fire up whatever
we need to do to do the case
and diacritic insensitive
search,
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
so that's another way
to speed things up.
If that's not enough, then you
might consider using tokens,
so rather than maintaining
a separate attribute,
we actually maintain an
entirely separate entity
and in this case, this is a
journal app, so I allow the user
to have a journal
entry with a date
and then they can type
whatever they want in there,
but to make it as fast as
possible to search I'm going
to create a second entity that
has a many-to-many relationship
between these two, that contains
all the tokens whenever they set
a string for the text
in my journal entry
and that will be a normalized
token of each word in the entry,
and we can get those tokens
out by calling something
like components separated
by characters in a set.
And maybe you want to separate
by whitespace, by symbol,
punctuation, maybe a
combination of all of those.
Maybe in an -- a, you know,
engineering app you should think
about whether symbols
are important to you,
but essentially you're
separating out tokens.
And this means that you've
got tokens stored separately
from the text and
you can now query
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
against those using
a begins with query
because if the user's
typing something,
it's probably the beginning of
a word that they're looking for,
and you can go straight
out to the token entity.
Find the matching ones, and
then immediately get the related
journal entries that
are tied to that,
so that's even faster again.
If it still doesn't
quite give you enough,
then consider using a separate
stack, and I'm not just talking
about a different
managed object context
and persistent store
coordinator.
I'm talking about a separate
store file all together,
so you've got a completely
separate system.
You've got your search
tokens stored on one side
and you can query against that
without having any
effect whatsoever
with the primary context on
the other, and if you do that,
and obviously you cannot have a
relationship between one object
in one store and an
object in the other store,
so use the URI representation
and you can call that
and store the result that
you get from managed objects
and store that in a secondary
persistent store stack.
When you need to pull the
results out, you can then get
that and then query
for objects that match.
If that still doesn't give
enough, then you might need
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
to consider having a hash
table in memory, and for this,
you would create a
table for every --
for the first three letters
of every token that you have.
That's not that many items,
and you can keep that in memory
and then as the user
starts typing something --
well the first three letters are
going to limit it quite heavily,
so then you can go out and
get the related tokens,
the related journal entries
and that will speed
things up considerably.
So this is a debugging
and performance talk.
Let's have a quick
chat about iCloud.
Most of the content
here was discussed
in great detail this morning,
so if you missed a session,
catch up with a video.
We've added a lot of information
to help you debug problems
that you might find in your app.
In particular we've added the
debug gauges, so we saw earlier
that you were using
these to find
out how much memory an
app is using very quickly.
The iCloud gauge will give
you the usage that you've got,
how much space is
left so you can check
that everything is
connected properly.
You'll get a little graph
of transfer activity,
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
so as information goes up to
the Cloud, you get a green bar.
As information comes
down, you get a blue bar,
so that will help you check
that information is being pushed
to the Cloud and
pulled in correctly.
It will also give you a summary
of your documents directory
with all the information that's
in there, so you can see,
you know, when files are
changing, when they're --
and when they're being created.
We've also added a logging
option, so in addition
to being able to see what Core
Data's doing underneath the
hood, you can also see what
the ubiquity system is doing.
Very similar as before,
Com.Apple.CoreData.
Ubiquity.LogLevel and
again it takes a value
of one, two or three.
Specify this in the
same way as before,
and your launch arguments
or in your user defaults,
and you'll get a whole
bunch of logging.
If you do run into
issues, enable the logging.
File a bug, and that
helps us look
into whether it's a
problem, so that's Core Data.
Let's recap what
we've talked about.
Above all, don't work too hard.
Measure everything first.
Use Instruments to tell
you what's going on.
Get an objective measure of
problems that you're seeing.
If something's taken a long
time, see how long it's taking.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
Wherever possible don't
bring more information
into memory than
you really need.
Use SQLite to do
your work for you.
If you can perform some
kind of aggregate operation,
maybe a minimum value
in the database,
have it do that for you.
Don't bring all the objects in,
and then calculate something.
Measure again.
See every change that you make.
Make sure that you're
making those bars thinner
and things are taking less time.
Balance memory against speed.
Don't use too much memory,
but don't slow things
down by completely minimizing
the memory all together
and then you're doing
a fetch request
of one object multiple times.
Optimize your predicates.
Make sure you're doing the
right thing with text queries.
Optimize your fetches.
Use batch size.
Optimize your saves so that
you're not saving too much.
Use batches.
Measure again.
Profit. And for information, see
the usual suspects, Dave DeLong.
He's our foremost evangelist.
We have some documentation,
and the dev forums
are there to help you.
Thank you very much.
[ Applause ]
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
[ Silence ]