WWDC2018 Session 407

Transcript

[ Music ]
[ Applause ]
>> Good afternoon everyone.
My name is John Hess.
Today I'm going to be joined by
Matthew Lucas, and we are going
to be talking to all of you
about practical approaches to
great app performance.
Now, I'm an engineer on the
Xcode team, and I've had the
luxury of spending the last
several years focused on
performance work.
First, with Project Find, and
Open Quickly, two areas of Xcode
that treat performance as the
primary feature.
Most recently, I've had the
opportunity to do a survey of
Xcode GY responsiveness, and I
want to share with you the
approaches that I take to
performance work, both in code
that I'm intimately familiar
with, and in code that I'm just
experiencing for the first time.
Now, if I could get everyone in
today's presentation to just
take one lesson away, it is that
all of your performance work
should be based on measurement.
Before you start solving a
performance problem, you should
measure, to establish a baseline
so you know where you stand.
As you iterate on solving a
performance problem, you should
measure it each step of the way
to ensure that your performance
changes are having the impact
that you expect.
When you're done solving a
performance problem, you should
measure again, so that you can
compare to your original
baseline, and make a quantified
claim about just how much you've
improved the performance of your
application.
You want to share this with your
boss, your colleagues, and your
users.
Now, when you think about
improving performance for your
users, you need to think about
what I like to call the total
performance impact.
If you improve the functionality
and performance of one area of
your application, by 50%, but
it's something that just 1% of
your users encounter, that does
not have nearly the breadth of
impact as improving some other
feature by just 10% that all of
your users use all the time.
So make sure you're not
optimizing edge cases, and make
sure that your changes are
impacting all of your users.
Now how do we fix performance
bugs?
Well, how do we fix regular
bugs?
Normally it starts with some
sort of defect report from
users, and we take this report
of the application not behaving
the way that people expect, and
we find some way to synthesize
steps to reproduce so that we
can cause the failure at will.
Once we've done this, we attach
a debugger to our program, so
that we can see just what our
program is doing while it is
misbehaving.
We combine that with our
knowledge of how the code is
supposed to work, to modify it
as necessary and eliminate the
undesired behavior.
We verify that we haven't
introduced any unwanted side
effects, and we repeat as
necessary until we've completely
solved the bug.
I've fixed performance bugs in
just the same way.
Except instead of using a
debugger, I use a profiler, and
a profiler is just a fancy tool
for measuring.
I find some set of steps to
reproduce the program being
slow.
And I run those steps with a
profiler attached, so that I can
get an insight into what my code
is doing while it's running
slowly.
I combine that knowledge with
what my program has to do to
accomplish the task at hand, and
I find steps that are happening
and remove them, because the
primary way you make your code
faster is you remove redundant
steps from whatever is that is
calculating.
Now, I make the modifications to
the source code, and I repeat
and measure as necessary until
I'm happy with the total result.
When I'm doing this type of
performance work, I often find
myself in one of a handful of
scenarios.
And these different scenarios
change the way that I go about
testing the code in question to
reproduce the bugs.
Sometimes I'm up against a big
performance regression, right?
Everything was moving along
smoothly, then someone checked
something in on our team, maybe
it was me, and performance has
fallen through the floor, and
now we have to go back and find
out what caused this regression.
If this regression is very
pronounced, or it's in an area
that I don't think it's likely
to regress again in the
immediate future, I may just
test it with my hands, manually,
with the profiler attached.
However, your performance
victories are going to be
hard-won battles, and they can
easily be lost through a slow
stream of regressions.
I would encourage all of you to
write automated performance
tests to capture your app's
performance, so that you can
ensure that it's not regressing
over time.
Another scenario I often find
myself in, is, are applications
performing the same as it has
been for a long time?
Maybe it is running at 45 frames
a second in some drawing test,
but we expect it to run at 60.
It needs to be improved
marginally, and we have reason
to believe through our previous
performance work that we can get
there through spot fixes and
incremental changes.
Now, in this type of scenario, I
probably also have automated
tests already in play, because I
understand my performance over
time.
And a third scenario, our
application is just suffering
from a poor design and
performance is orders of
magnitude worse than it should
be.
We know that we can't improve it
with simple spot fixes, because
we've tried them in the past,
and we are still stuck here with
a very sub-par performance.
In a situation like this, you'd
want to do a total performance
overhaul, where you are
redesigning some core part of
the feature, or the algorithms
in question, so that performance
is a primary constraint.
And definitely in these cases,
you would have performance tests
to measure that you're actually
hitting your performance
targets.
Now, it is important that you
know just what to test.
I want to caution you that I
don't ever immediately jump to
these sort of performance
overhauls as a way of fixing a
performance problem.
I love to do that.
It's sort of Greenfield
engineering, where you get to
design things from the ground
up, but it's very risky.
You're going to end up with a
better product at the end, but
it's going to be a turbulent
path getting there as you rework
an entire feature.
When you're doing this style of
work, it is imperative you
understand not only the
functional constraints of the
code in question, but also the
performance constraints, and the
typical use patterns that your
users are most frequently
applying to this feature, and
you only get that by having done
performance work in the area in
the past.
I'd like to share an anecdote
about our work on a situation
like this, within Xcode.
In Xcode 9, we reworked Project
Find, with performance as a
primary goal.
It was our goal to deliver
search results in just tens of
milliseconds.
When we were going to discuss
this feature with our
colleagues, we were often
challenged to perform searches
across large projects for things
like string, or even the letter
E.
Things that produce millions of
results, right?
And certainly if our application
could produce millions of
results quickly, it would be
fast on anything.
But if you consider what typical
patterns are, we search for APIs
we use, the names of our own
classes, the names of, you know,
images that we're referencing.
Things like that.
They produce dozens, maybe
hundreds of results.
Certainly, it is essential that
the application works decently
when you get a million results,
but the normal use case is
hundreds of results.
Now, some of your work in doing
a task like search is going to
be proportional on things like
generating the raw results, and
other work is going to be based
on how efficiently you can index
the text in the project, and
avoid work in the first place.
In these two scenarios, you're
likely to have completely
different targets for what you
would optimize to make one of
these searches faster than the
other, right?
So it's essential that you
understand how your users are
going to use the product, so
that you can optimize for the
right cases.
Now, in all of these cases, I
need to do some form of testing,
whether it's manual, or
automated.
I want to share with you two
types of performance tests that
I will typically write to
measure the performance of
Xcode.
We will either do unit tests, or
integration tests.
Let's compare and contrast them.
In a performance unit test, it's
your goal to isolate some
feature of your application and
measure it all by itself.
You might mock out its
dependencies, and you might
launch it in a context where it
has been isolated.
If I were to write performance
unit tests for Xcode's code
completion, I might write a
series of three small tests.
One of these tests would measure
talking to the compiler and
getting the raw results, the raw
set of code completion
candidates back.
Another performance test would
measure correlating, ranking and
scoring those results, so we
knew which ones to display to
the user.
A third test might take those
already prepared results, and
measure putting them into UI
elements for final display.
And in covering all three of
these areas, I would have pretty
good coverage over the major
components of code completion in
the IDE.
Now, there are some great
aspects to these performance
unit tests.
They're going to be highly
focused, which means if they
regress in the future, I'm going
to have a very good idea on
where the regression is, because
the code that is running has
been scoped so well.
They are also going to produce
much more repeatable results
from run to run.
They're not going to have a big
variance in the times that they
produce.
Again, because the code is so
focused.
Now, let's contrast that to an
integration test.
In an integration test, your job
is to measure the performance of
your application as your users
experience it.
Holistically.
So, if I was writing code
completion unit tests for Xcode,
I'm sorry, integration tests, I
would launch the full Xcode app.
I would open a source file.
I would navigate to the source
file, and I would type, and I
would bring up code completion
over and over again.
When I profile this, to see what
Xcode is doing, and how much
time it is taking, I am going to
find that this test is anything
but focused and quiet.
Xcode is going to be doing
drawing and layout as I type.
It is going to be doing syntax
coloring as I type.
In the background, it might be
indexing, fetching get status,
deciding to show new files in
the Assistant Editor,
and all of these things are
going to be competing for CPU
resources, along with code
completion.
Maybe when I look in the
Profiler, I'll see that we spend
80% of our time syntax coloring,
and 20% of our time in code
completion.
And with this data, I would know
that the best way to improve
code completion performance
would be to defer syntax
coloring.
I will never gain that type of
knowledge with a highly focused
unit test.
So if I can get everyone here to
take two things away from this
presentation, the second one
should be that your performance
investigations should absolutely
start with these wide
integration tests that measure
how the users experience your
application.
So I'm talking about testing,
measuring and profiling.
And right now, I'd like to
introduce you to profiling in
Xcode with instruments.
Let's head over to the demo
machine.
Today we are going to be looking
at a performance problem that we
fixed between Xcode 9 and Xcode
10.
I want to show it to you.
I'm going to launch Xcode 9, and
open our solar system
application.
Now the problem that we are
going to be looking at is
creating tabs.
I'm going to just press
Command-T quickly a couple of
times, and as you can see, the
whole screen flashes black, and
it takes several seconds to
create those tabs.
That definitely doesn't meet my
expectations as far as
performance goes, and we need to
fix this.
So let's take a look at how you
would do that.
First, I'm going to launch
Instruments.
That is our profiling tool.
You can do that from the Xcode
menu, under Open Developer Tool,
Instruments.
Now, I'm currently in Xcode 9,
so if I choose this, it's going
to launch the Instruments from
Xcode 9, and of course, I want
the Instruments from Xcode 10,
which I've put here in my doc.
So I'm going to hide Xcode, and
bring up Instruments.
Now, when Instruments launches,
we're presented with a list of
profiling tools that we could
use to measure our application.
There's all kinds of tools here.
They can measure graphics
utilization, memory consumption,
IO, and time in general.
It can be intimidating to know
which one of these profilers to
start with.
I would encourage all of you, if
you just learn one of these
tools, it should be the Time
Profiler.
I use it for 95% or more of my
performance work.
When your users complain about
your app being slow, they're
complaining about it taking too
long, and long is time.
If it turns out that you're slow
because you're doing too much
IO, that is going to correlate
with time, and you will be able
to see this with the Time
Profiler.
So if you learn just one
instrument, it should be the
Time Profiler.
Let's take a look at how that
works.
I'm going to launch the Time
Profiler by just double clicking
on it here, and make Instruments
take the full best op.
Now, we'd like to record Xcode.
In the upper left-hand corner of
the Instruments window, you can
control which process you're
going to attach to and record.
By default, hitting this record
button would record all
processes on my Mac.
I just want to focus on Xcode.
I'll switch this popover to
Xcode and hit record.
Now, I like to keep an eye on
this area of the window to track
view while I'm recording.
So I'm going to resize the Xcode
window to be a little shorter,
so I can still see that, and
then I'm going to do the thing
that was slow.
I'm going to create a couple
more tabs.
And you can see the graph
changed here.
Now, I'm going to go ahead and
quit, and return to Instruments.
So what just happened?
While the Profiler was running,
it was attached to our process
like a debugger.
And it stopped it, thousands of
times per second, and as it was
stopping it, it gathered back
traces.
Now, just a reminder, a back
trace is a description of how
your program got to where it
currently is.
So if you're on line 6 of
function C and you got there
because main called A, called B,
called C, then your back trace
is Main, A, B, C.
When Instruments captures one of
these back traces, it notes,
hey, we just spent one
millisecond in function C.
It says one millisecond, because
that is our sampling interval
for recording once every
millisecond.
Now, on the main thread, all
these back traces are going to
start with the Main function,
and they're probably going to
call Application Main, and
they're going to branch out, all
through your source code after
that.
We can collapse these back
traces together, and overlay
them into a prefix tree, so they
start at Main and work their way
out.
And we can bubble up those
millisecond counters that we
captured at the top, so that we
can hierarchically see how much
time was spent in all the
different areas of our source
code.
And we are going to look at this
data to try and find redundant
and unnecessary operations that
we can make faster, and that is
our primary method that we are
going to use to improve the
performance of our application.
Now, as you can imagine, we're
capturing thousands of back
traces per second.
There is an overwhelming amount
of data for you to wade through
in instruments.
My primary advice to you is that
you want to filter this data as
much as possible so that you can
see the course grain performance
leads, and not focus on minutia.
All right?
So I want to show you how to
apply a bunch of powerful
filters and instruments.
So as I did the recording, you
remember, I had the track view
visible.
I did that because I wanted to
see how the CPU utilization
changed and where it was
changing, while I was creating
new tabs, and I noted to myself
that it was right here.
I simply dragged and selected
over that area of the trace, and
I've caused instruments to only
focus its back trace data on
just that time interval.
Everything over here, this is
before I was creating tabs.
Everything over here, this is
after I was creating tabs, when
I was quitting the application.
That's not what I'm trying to
optimize right now, so I don't
need to see that data.
Now, in the bottom area of the
Instruments window, Instruments
is showing me all the traces it
collected.
By default, there is one row per
thread that was running.
And in this example it looks
like there was only four threads
running.
Sometimes you'll have much more.
Depends on how concurrent your
application is.
I often like to collapse these
in the name of focusing, and I
also like to collapse them so
they're based on the top level
functions executing in each of
the threads, rather than the
thread IDs, because that
corresponds better with how I
use Grand Central Dispatch.
Down in the bottom of the
Instruments window, I'm going to
click on this button that says
Call Tree, and I'm going to zoom
in on it, so you can see what
I'm about to do.
There are several filters
available here.
One of them is separate by
thread.
It is on by default.
I am going to go ahead and
disable that, and instead, all
of the threads are going to be
grouped by their top level entry
point, rather than their thread
ID.
Now, looking at this trace, I
can see that of all these
threads running, which by the
way, below the main trace, which
is the aggregate CPU usage, the
CPU usage is broken down per
thread, I can see that almost
all the other threads were
largely inactive during this
trace.
I can focus on just the main
thread by selecting it here, and
now I'm only looking at traces
from the main thread during this
time period.
I'm ready to start digging into
this call hierarchy, so I can
see what my application was
doing.
Often, I'll walk this with the
keyboard, by just pressing right
arrow and down, over and over
again.
But I'd like to show you the
heaviest back trace inspector
that Instruments offers.
If your Inspector is not
visible, you can toggle it with
this button, and the heaviest
back trace will be available
here, in this tab, Extended
Detail.
Now, the heaviest back trace is
just the trace that occurred
most frequently.
It's the back trace that
happened most frequently while
we were recording under the
current selection.
And you can use this to quickly
navigate many frames deep at a
time.
I typically look through here,
looking for my own APIs, and
things that would surprise me
for taking up this amount of
time, or for areas where we make
a significant branching point in
the number of samples.
Now, looking through here, I see
this call, which is to IDE
Navigator, replacement view, did
install view controller.
Now, I'm familiar with this API,
because it's an internal API of
Xcode.
And in the trace, I can see over
here on the left-hand side of
the window that it is
responsible for 1.19 seconds of
the total time we're recording,
or 45% of the time.
That is far and away above my
expectations for how much this
method should cost.
However, it's hard to focus on
what is happening here.
Right? I'm, there is all this
other stuff at the bottom of the
trace, and it looks like I'm,
you know, 30 or 40 stack ranges
deep.
That can be intimidating.
I want to show you how to focus.
The first technique is back here
in that call tree popover again.
I'm going to use this popover to
choose the flattened recursion.
Let's go ahead and do that.
And now you can see that, that
repeated set of method calls
that was right here, oops, has
been collapsed.
I'm sorry, let me scroll down.
That has been collapsed.
In fact, I'm confident that I
want to continue my performance
investigation inside of this IDE
Navigator area, API call, and I
can refocus the entire call tree
by context, clicking here, and
choosing Focus on Subtree.
And Instruments is going to take
that symbol up to the top of the
call graph, it's going to remove
everything else, and it is going
to reset the percentages at 100%
so I can focus on just this.
Now, I can continue to walk this
sample with the arrow keys to
see what we're doing.
And I'm familiar with these
APIs.
And it looks like we're doing
state restoration.
And as I continue to expand
this, I can see that we are sort
of deep inside the table view,
and in addition to there being
this sort of hot call path, you
know, that is taking large
number of the total percentage,
there's all these other
incidental samples as well.
It's easy to get distracted by
these.
One of them here is OPC Message
Send.
This can occur all over your
tracers if you're writing
objective C.
Even if you're writing Swift
code, as you work your way into
the system libraries, you'll see
this.
You'll often see its counterpart
functions, OPC, Load Strong,
Load Weak, etc., Retain, you can
remove all that content from the
call tree by context clicking on
it, and choosing Charge OPC to
Callers.
That's going to tell Instruments
to take all the samples that
came from lib OPC and remove
them from the call data, but
keep the time as attributed to
the parent frames that called
them.
I tend to treat those objective
C runtime functions as just the
cost of doing business when
writing objective C code.
It's rarely the case that I'm
going to attempt to optimize
them out, so I just prefer to
remove them from the data, so I
can focus on the things that I'm
likely to take action on.
Another very powerful filter
that you can apply, and one that
I'm going to use to remove all
these small samples that
occurred during this set of
frames, is here in the call tree
constraint section.
Let me show you.
I'm going to tell Instruments
that I would only like to see
areas of the trace that
accounted for let's say 20 or
more samples.
I'm picking 20 because I know
that I've selected about a two
second interval and 20
milliseconds is going to
represent about 1% of the total
work, and that is about the
granularity that I like to work
at by default.
So with call tree constraints
set to a minimum of 20, I now
focus this down much more
significantly.
Now, I mentioned here that we
were expanding out my view
items.
I see that in the fact that
we're calling NS outline view,
expand item, expand children.
Now, a lot of people would stop
with the call graph at this
point.
They'd see I'm calling into a
system framework, and I'm
spending a lot of time there.
This isn't my fault, right?
What can I do about this?
I can't optimize NS Outline
View, Expand Items.
You absolutely have the power to
influence these situations.
For example, the system
framework could be spending all
of this time because it's
operating on data that you
provided it.
It could be taking a lot of time
because you are calling this
method thousands or millions of
times.
It could be taking a lot of time
because it's calling back into
your code through delegation.
And most importantly, you can
get an insight into what the
system framework is doing by
expanding down through the
Instruments tree, and looking at
the names of functions that are
being called.
In fact, that's exactly how I
learned to fix this bug.
As I expand the trace into the
outline view, I can see that it
is calling these two methods
here.
Batch Expand Items with item
entries, expand children, and do
work after end updates.
Now, those are big clues to me
that there is probably some
opportunity for efficiency
through batching.
As you could imagine, the
outline view starts with a small
set of items, and then we are
trying to restore expansion
state in this area of our code,
and so we are telling it to
open, for example, the top item.
And when we tell it to open the
top item, internally you might
imagine that it moves all the
other items down.
Then you ask me to expand the
second item.
It moves all the items down
again.
And the third item, and so on.
And by the time you're done,
you've moved those bottom items
down thousands of times.
That is all redundant work, and
that is exactly the sort of
thing I'm looking to eliminate
when I'm trying to improve
performance.
Now the fact of these method
calls talk about batching leads
me to believe that there is
probably some API where I can
ask the outline view to do the
work in bulk so it computes all
the positions just once, instead
of over and over again as I make
the calls.
I also see a call that says to
do the work after end updates.
Now, sometimes an API will offer
sort of bulk method that
operates on an array, and other
times, it will offer a sort of
transactional API that says I'm
going to begin making changes,
then you make a bunch of
changes, and then you say you're
done, and it computes something
that happened for the whole
range of your changes, more
efficiently than if it had done
them all individually.
So at this point, I would head
over to the NS Outline View, or
NS Table View API, and I would
look for some such method.
And there is exactly one there.
In NS Table View, there is
methods for beginning and end
updating, that allow the table
view to coalesce, and make all
this work significantly more
efficient.
Of course, we adopted that in
Xcode 10.
Let me show you.
I'm going to launch Xcode 10.
I'm going to open the source as
an application, and I'm going to
create a couple of tabs.
And you can see, there is no
awful flashing, and the tabs
open much more quickly.
Now, I'd like the tabs to open
even quicker than that, right?
So what am I going to do next?
I got lucky here.
It's not every day that you're
going to go into the trace, and
find something so obvious and
easy to fix, that is responsible
for 50% of the sample.
Right? In fact, there is not
going to be any other huge lead
sitting there waiting for me.
Instead, what I'm going to need
to do is go through that whole
sample, with those course
filters applied, so I'm only
looking at operations that take
about 1% of the time or more,
and I'm going to look for every
single thing that I see that I
think I can come up with some
mechanism for making a little
bit faster.
I'm going to note them all down
on a piece of paper or in a text
document or something, and then
I'm going to start solving them.
Now, I need to pick an order to
solve them in, right?
Because sometimes the fifth
thing on the list, fixing it
with an obsolete, whatever fix
you would do for the second
thing on the list, and it feels
bad to do them in the wrong
order, such that you did
redundant work, because that's
the whole thing we're trying to
remove in the first place, is
redundant work.
But it's very hard to predict
how these things are all going
to play out.
And you often can't know until
you've already done the work.
So do not let this stop you from
getting started, because you're
going to get your second 30%
improvement by stacking 10 3%
improvements.
Okay?
Now, I want to go back to the
slides, and show you some of the
techniques we typically use to
make those continued
improvements.
Far and away, the thing that
comes up the most frequently is
using those same techniques the
outline view was using.
Batching and deferring, right?
You have an API, and when the
API is called, it has some side
effect.
And then you have some code
calling your API in the loop.
That's what you're doing-- the
primary piece of work that is
being requested, and having a
side effect.
Well, if no one was reading the
result of the side effect, then
you're doing that work
redundantly, over and over
again.
You can often get a much more
efficient interface by using a
batch interface, where a client
gives you an array or some sort
of collection of all the work to
be done, so that you can compute
that side effect just once.
Now, sometimes you have many
clients, right?
And they can't batch across each
other, and you can get even--
you can still get that same
style of performance through
deferring the work and doing it
lazily.
A third easy way to improve
performance is you look through
that instrument's trace, is to
find areas where you see the
same thing being computed over
and over again.
For example, you have a method
in its computing, the size of
some text, then you see the same
thing happening several frames
later, for the same text, and
again, and again.
Now, in this situation, of
course, you want to try to just
compute that value one time.
Compute it at the top, and pass
it down or maybe cache it.
Another technique you have
available in your UI
applications is considering how
many views you are using to
render your UI.
It can be very great for your
source code organization to use
very small views, with small
sets of functionality, and to
compose them together into
larger pieces.
But the more views you use, the
harder you tax the rendering and
layout systems.
Now, this is a two-way street,
because smaller views often led
you to have more fine-grain
caching, which can be good for
performance as well.
But generally, you can tweak the
number of views that you have in
order to have a significant
impact on performance.
It is not always best to have
fewer views, otherwise all of
our applications would just have
one giant view for the whole
thing.
Another technique that comes up
pretty frequently is using
direct observation.
We often have two areas of our
source code that are loosely
coupled.
Maybe one area knows about the
other, and they're communicating
with each other through some
indirect mechanism.
Maybe they're using NS
Notification Center, some
block-based call backs,
delegation, or key value
observing.
Now something that I see very
frequently is we'll have some
model code, and it's going in a
loop, being changed, and every
time it is going to that loop,
it is firing lots of KVO
notifications.
You can't actually see that in
the model code, of course, but
over in some other controller,
it's madly responding and trying
to keep up with whatever is
changing in the model, and
you're burning lots of CPU time
doing this, that ends up being
redundant when you consider the
whole scope of changes.
Now, if this was direct callouts
from the model code, either
through notifications,
delegation or manual block-based
call backs, it would be much
more obvious that this was
happening as you edited that
model code.
And you might decide that it is
totally appropriate to pull some
of those notifications out from
inside the loop to outside the
loop, to have a big impact on
performance.
Now, alternatively, on the
controller side, you could use
one of these deferring and
batching techniques to avoid the
redundant work and just not
respond synchronously.
Last, this is an easy one.
Once your code is already on the
[inaudible] happy path, you
know, it's already linear, and
it's not going to get any better
than linear.
That's sort of the minimum
performance that you're going to
get.
You're after all the constant
time improvements that you can.
Now, an easy one is that if
you're using dictionaries like
they were objects, then you
probably know you're doing this,
if you have a bunch of string
constants for all the keys, then
you can get a big improvement to
code clarity, to code
completion, to re-factoring, to
making the validating your
source code, by using specific
types.
It couldn't be easier with
strucks and swift with their
implicit initializers and
conformance to equitable hash.
And this can just be hands-down
an improvement to your source
code, and you'd be surprised at
how much time you're spending in
string hashing and string
equation if you were doing this
millions of times on lots of
small objects.
So with that, I'd like to turn
it over to Matthew to talk to
you about how we've applied
these techniques inside of
photos.
[ Applause ]
>> Thanks Jim.
Hi everyone.
I'm Matthew Lucas, an engineer
in the photos team, and today I
want to give you some practical
examples on performance from
directly from photos.
So first, let's talk about
photos for a second.
We are all familiar with this
app.
It lets you store, browse, and
experience your favorite
moments.
So you can browse your favorite
moments from the moments view,
that you can see here.
It's is the default view.
But you can also get another
view from the collection, or the
years.
And I'll talk more about this
view later.
Now, libraries today can go from
1,000 to 100,000 assets previous
depending on your love for
photography.
And we all love capturing those
fun and precious moments we live
every day.
So we are patient enough to
capture them, but we are less
patient when something like this
appears.
How would you feel if something
moments like this would be
displayed in Photos the first
time you launch the app?
Now, you may also experience
something like this, where we
are showing a lot of
placeholders, and that's really
not great.
Maybe you're soft scrolling,
you'll be lost in this gray
area, the [inaudible] would
start to load, but then you'll
keep scrolling and then you'll
experience some frame drops
because the views are being
updated.
Well, our goal is to not show
views like this.
We think this is not providing a
great user experience, but we
understand that sometimes it's
unavoidable.
But when it's too frequent, this
isn't really great.
Now, when you work on an app,
you want to make sure that it's
responsive, and usable at once.
You also want to make sure that
the animations are smooth.
And these two attributes are
really crucial to providing a
great user experience.
If the users don't find your app
relatable or pertinent, they
might stop using it.
Now, to illustrate these two
points, I would like to give you
two examples.
And the first one is going to be
how we optimize launching to
this moment view.
The second one is how we build
the collections and years view
for good scrolling preference.
First, let's do launching
[inaudible].
So what is launch?
There are three kinds of
launches.
The first and more expensive one
is the find referred as called,
and it depends the first time
you are going to relaunch your
app after it reboots.
So basically, nothing has been
cached yet, and it might require
some bug run processes or some
libraries to load.
Now, it also happens when the
system goes under memory
pressure and starts reclaiming
some memory.
Now, if you kill an app, it
might not trigger a code launch,
because the system decides when
the resources should be paged
out.
And when you kill an app, and
you relaunch it a few second
later, it's almost guaranteed
that you'll hit a warm launch.
And we call it warm, because the
resources or the dependents are
still in the cache, so it's
faster to launch.
Now, the last type is-- we call
it hot, and it's basically a
resume, because it's when your
app is already running and is
being brought back to the
foreground.
So when you start measuring
launch, you should start by
measuring the warm launch.
And the time it takes to launch
during this warm is less
variable than the cold launch,
and the test iteration is much
faster as you don't need to
reboot your device.
Now, the way we measure launch
is by evaluating the time it
takes from the moment you hit
the application icon, and until
you can start interacting with
the app.
And what I mean by interacting
is that it's really using and
not interacting with a spinner.
A common pattern is to dispatch
some work and display a spinner
in the meantime, well that
doesn't make the app usable
sooner, so we are trying to
avoid that here.
Now there are three goals that
we are shooting for at Photos,
and the first one is that we
want to instant, we don't want
to display any spinner, and we
don't want to display any
placeholder or [inaudible].
And I Have to be honest with
you, we-- you might see some
placeholders the first time you
synchronize with iClub, but when
the data is local, we really try
our best to not display any.
Now, what do we mean by instant?
Well, the time it takes to
launch should be the same time
as the zoom animation from the
home screen.
That is usually between 500 and
600 milliseconds, and that way,
the transition from the home
screen to the application is
seamless for the user, and the
user can start interacting with
it, as soon as the animation is
done.
And by the way, this is the
lowest recommendation, not
something just for photos, so
it's valid for any apps.
Now, let's look at how photos
launches today.
If we look more closely at what
is happening exactly, you can
see that photos is all set up
and ready before the animation
is done.
And if we dive into the launch
anatomy, you will see there is
mainly two parts.
The first part is being spent in
DYD, this is the loader that is
going to load and link all of
your dependent libraries, but
it's also going to run your
static initializers.
And your control over that part
is limited, but it's not
impossible.
I would encourage you to watch
the DYD session from last year
in order to get more details on
that part.
Now DYD is also calling Main in
your object table, which leads
us to the second part here,
where you have lots of control
over, and this part, you need to
make sure that it stays under
500 milliseconds.
Now, the first [inaudible] pass
that is being scheduled right
after the Did Finish launching
will mark the end of your
launch, and this is basically
when your app should be usable.
There are a few principles that
we will be referring to during
this session, and these are
really the common pillars of the
performance work that we
achieved.
The first one is that we want to
be lazy and defer the work that
we don't need.
The second one is that we want
to be proactive, and it's valid
for two things.
It's valid for being proactive
in order to anticipate the work
that we are making it later, we
also want to be proactive and
catch regressions quickly, so
you should make sure that you
have continuous integration
testing in place.
And the last point is we want to
be constant, regardless of the
total amount of data that we
need to load.
Now, if we were taking a naïve
approach, and we were loading
everything we needed during
launch, this is how long it
would take roughly for a 30,000
item library.
First you need to initialize the
database, then you need to
prepare some view controllers.
You need to configure the data
sources, load some library
images, and fetch the cloud
status.
And keep in mind that this might
vary as the data grow, and in
fact, the data will grow forever
as people takes pictures every
day.
So at Photos, really keep in
mind that we are dealing with a
non-bonded data sets.
Now, let's see how we optimize
each of these steps for Photos,
and let's start with
initializing the database.
So first, usually, the database
is initialized and loaded when
the first query is being fired.
One optimization that we have
found was to do it as early as
possible in the background
thread, so that it doesn't have
to do the initialization when
the first query has been fired.
And this is an issue, especially
if the first query is being done
from the main thread.
Now, we spend a lot of time and
we are still spending a lot of
time reviewing all the queries
that we're doing during launch,
and we want to make sure that
the work that we are doing is
only the necessary one, and we
are not doing more.
Now, lastly, we want to ensure
that all the queries that we are
doing are efficient as possible,
and we want to avoid the complex
query as much as possible as
well.
And we sometimes we understand
that we need this, and for these
cases, we are setting up some
indexes, so that we can speed
them up.
Now we are aiming for, at most,
30 milliseconds spent in that
initialization.
So next, let's look at how we
are preparing our view
controllers.
So we have four tabs
representing the main features
of the app.
And so the first thing that we
need to be careful of is we want
to minimize the work that is
being done in the initialization
of these three non-visible ones,
and the rule that we are trying
to follow here is to do as
little work as possible in the
initializers.
We really want to do the bare
minimum, and note all the data
in the view that loads.
This also allows us to
initialize our controllers in
constant time.
Now, lastly, we also want to
ensure that only the visible
views are loaded.
It's easy, and we often regress
on that part, so you should
really be careful about that.
So preparing the view
controllers, we are now aiming
for 120 milliseconds.
But preparing view controllers
implies configuring the data
sources, and let's look at that
chunk next.
So the Moments view is a
representation of these things,
events in your life, and the UI
represents that by having this
group of photos, and these
headers.
In this library, for example, we
might have 500 moments, and in
order to build a view, we need
to load all the moments up
front.
But the only thing we need
really for these moments is only
the meta data so we can build
the view.
We don't need your content.
So the first thing we do is we
fire that query, which is super
fast.
And then we are only loading the
content that we need here.
In that case here, we are only
going to load the visible
content, which in our case is
going to be between 7 to 10
Moments.
Since our deficit is limited,
and finite, we can allow
ourselves to do it synchronously
on the main thread.
Now, we also want to anticipate
and schedule the work so that we
can start loading the remaining
data as synchronously.
And we do that on the bug run
thread, with the right quality
of service to make sure that it
doesn't preempt the main thread
from running.
Now we are aiming at 100
milliseconds here.
So lastly, our data sources are
also providing some images and
let's see how we optimize that
part.
So this was by far the biggest
chunk here that we are all
attacking, and when we realized
that we were spending multiple
seconds loading this image
during launch, we realized that
we were doing too much work.
So the first thing that we did
is that we evaluated the number
of images that we needed during
launch, and we are only loading
that during that first
transaction.
In that case, that can be up to
60 including some piling above
and below.
And next, in order to load those
images firstly, we need to make
sure that we are all loading
only low-resolutions one.
That way we are loading fewer
pixels in memory, and it is much
more efficient.
That chunk is now representing
200 milliseconds.
And this is, by far, the biggest
gain that we had.
Which I need to be a constant
time, and that's really great.
Now, sometimes you have to ask
yourself the question, is this
really needed during launch?
And one of our examples here is
this footer view.
That pulls information via the
network or the database, and
literally first our design was
to not show it during launch.
To prioritize all the images
that we are seeing here.
We wanted to show as much images
as possible.
So that may be simpler.
We are now only scheduling that
work post-launch, and we cache
to process information for
raising later.
Now, if we would have had the
requirement of displaying this
information, one approach could
have been to leverage the
register background at refresh
API from UA kit, that will
proactively clear your app so
that you can start preparing
some content when the user is
going to launch your app.
So now, that part has gone from
launch, and that saves us 400
milliseconds of CPU time.
If we look at the updated
breakdown here, we can see that
we now have only 450
milliseconds worth of work.
We are now fitting into that 500
millisecond time window, and
regardless of how things can be
represented concurrently here,
the most important part of that
is to really make sure that you
think about the cost of
preparing your content.
And what I mean by think is
really measure it.
Now, you should strive for doing
work in constant time,
regardless of the total amount
of data you are loading.
In our case, really have
unbonded data assets, and we
need to stay constant.
Now that we have launched the
app, we need to start using it.
And let's see how we did
collections and [inaudible] for
good [inaudible] performance.
So as I mentioned earlier, our
users can seamlessly transition
with animation from the Moments,
through the collections, to the
years view.
And this is a complex hierarchy.
We have thousands of pictures to
display.
We need to support live updates,
we need to also support
animation between these layers,
and we also have some gestures.
Now, we also have some goals
here.
For the experience we want to
provide to our users.
The first one is the same as
before, we don't want to have
any spinner.
We don't want to have
placeholders, but we also want
to have smooth animations.
And by smooth animations, I mean
60 or 120 frames per second,
depending on the screen you're
running on.
Now, remember the principles
that we've seen before.
Well, they are all applicable
here.
We want to be lazy and defer the
work we donate up front.
We want to be proactive, and
catch regressions quickly, but
we also want to be constant in
our layout passes, and
regardless of a lot of data that
we are loading.
Now, this time, we also want to
be timely, and we want to
remember the rendering loop
cycle.
And what I mean by that is that
I want you to remember that we
only have 8 or 16 milliseconds
to render that frame, so we need
to make sure that we are not
going over that time, otherwise
we would start dropping frames.
Now, let's take a step back, and
look at what we are trying to
achieve here.
We wanted to have this portable
view, with sections and mini
cells in it.
And that is basically what your
Collection view is providing,
right?
Except that in this extreme
case, we are restricting the
limit of what we could achieve
with a basic approach.
And that resulted in too many
views, too many layers.
But also in an increased layered
complexity, and that also had an
increased memory cost.
So we needed to innovate here,
and we did that by restricting
the number of views drastically
while still using a collection
view.
We used a technique more
commonly used in video games,
that is called atlasing.
And it basically consists of
combining a set of images into a
single one.
We do that efficiently by using
only very small thumbnails
first, then we stamp all the raw
image data on the canvas we are
using as a strip.
Now, we use image raw data so
that we can avoid decoding each
thumbnail as we send.
So basically we are displaying a
random strip of images.
Now, we generate and cache them
on the fly so that we can be
more flexible.
And as we render multiple images
into a single one, we are
registering the number of cells,
layers, objects drastically,
which simplifies the layout and
the time spent building it.
Now, the separate works well,
but it has trade offs to
consider as well, and this is
one of them.
So if someone tries to long
press or force search an item
here, we will need to figure its
position so that we can achieve
the preview correctly.
And as we display a single
image, we need to maintain the
mapping of each individual
image, and its render strip.
Now, you might be thinking, why
are we generating them on the
fly?
Well, we need to support live
updates, that's the reason.
We need also to support
different view sizes.
For example, we have landscape
here.
But we also have portraits.
And also we can do that because
we can [inaudible] because our
user's labor typically grows
organically over a long period
of time, and the cases where we
might need to generate thousands
of them are pretty rare.
Now, you may be wondering also
why are we not generating the
whole section then?
Well the answer is that our
design record is to do this cool
animation, where you can see
that the collections are
expanding into their own
sections or collapsing into
group ones, and the other way
around.
So if there is one thing that
you should also remember from
that second part is you should
really think about the layout
course of your hierarchy and
measure it.
Lastly, you should always think
about performance.
At Photos, we care deeply about
it, and this is really part of
our daily job.
For more information, you can
come and see us in these three
labs that are mentioned here,
and I hope that you have a great
conference.
Thank you.
[ Applause ]