WWDC2010 Session 131

Transcript

>> Erik Neuenschwander: Hi everyone!
Welcome to Performance Optimization on iPhone OS.
My name is Erik Neuenschwander I manage one of
the software performance teams on iOS and heading
across the stage is Ben Weintraub who
is with me today to do a bunch of demos.
Ben's a Performance Engineer for us.
So, thank you all for coming.
We're really excited to be here and
have a great hour ahead for you.
I hope you are here already because you think performance
is important or maybe you wandered into the wrong talk
in which case take a seat and we're going to spend the next
hour trying to convince you that performance is important.
And one of those reasons is that performance
is a key aspect of App Store reviews.
If you think about when an application is slow or just
not fun to use that's going to drive down your reviews,
that drives down sales and we want you to
make money and that's why we're here today
to tell you how to get superb performance in your app.
Luckily you already have the tools in the form
of instruments and Xcode to get good performance
and you already have the skills just with
your software development background.
And so, what Ben and I are going to do today is focus
on some common cases where performance can be an issue
and give you some clear strategies
to get good performance in those.
So we'll start by kind of the most important thing which
is just talking about how to test and how to measure
in performance scenarios then we'll spend a lot
of time with three key scenarios, namely launches,
scrolling, and keeping your memory footprint low.
Lastly, because we know that you have things to do other
than performance in your development we're going to talk
about how to prioritize performance issues.
So let me start with measuring performance.
Probably the most important thing I can say is when you're
dealing with performance issues you need to measure first.
By measuring that will give you an idea of
where you can most efficiently put your time
in to improve your app's performance.
And measuring doesn't have to be hard.
You can do it just through manual
testing of your application.
Use it. Find scenarios you are unhappy
with and start working on them.
But new in iOS 4 we also offer automated testing.
And automated testing can give you ways
to get repeatable, more repeatable results
and kind of more efficiently execute your test cases.
But whether you are collecting your data manually
or automatically you have to measure numbers
and it may seem kind of daunting because
everything affects performance, the CPU, GPU, disk,
network latency, it can seem overwhelming.
But you can take a step back and
just recognize that trying to guess
where your performance issues lie is-- that's overrated.
You really just need to focus on a scenario
that you find to be bad and then measure each
of those components in turn looking for a bottleneck.
When you think you've found it, you make a change and then
of course you retest to see if you've improved the scenario.
In the end, all you're trying to do is get
something which is going to feel right.
So to gather data you have a lot of options.
One of them is just logging.
NSLog is probably a method you are already familiar with.
You can just take a time stamp at the beginning of some
activity and then at the end write out how long it took.
You can write that out to CISLOG which you can view through
the organizer in Xcode or to a file that you collect
in some other way, whatever is going to work for you.
But you also have more sophisticated tools like
instruments which we'll spend a lot of time on today
and also the simulator which I'm
sure you're familiar with it.
But the simulator is maybe not always the most
appropriate choice when it comes to performance.
You've used sit for a no doubt prototyping your
interfaces and features but you should consider
that the simulator uses the Mac hardware, that's
the Mac CPU, GPU, disk, the Fast Ethernet,
that's going to give you an unrealistic idea of how
your application is going to perform on the device.
Now one exception to that is when
you're dealing with memory issues.
And there actually the simulator can be good.
It's great for finding memory leaks and looking
at your footprint and in fact the desktop offers
in some cases more features like Zombie Detection
compared to what you what you get on the device.
But what you need to remember is that the
device is the final arbiter of performance.
Your customers will be running your application on a device
and so that's where you need to be doing your testing.
In fact you should do all of your
speed related testing on the device.
And when you fix memory issues that you find
in the simulator make sure those are
playing out as you expect on the device.
That's really what's important.
So, lastly, I want to say that you should measure
early and measure often with the idea of trying
to collect numbers when you have a scenario that you like.
When you get a good result you want to record that as a
baseline and just have those numbers in your back pocket
so that later on when you discover that some scenario
has regressed you can go back and look at those numbers
and get an idea of where the problem might be.
I talked a minute ago about logging and logging is great
but of course doing anything has some performance hit
and logging is something that doesn't
benefit your customers at all.
So you want to turn off or otherwise remove the logging
from the apps that you submit to put up on the App Store.
I talked about testing on the device.
It's the final arbiter so you should really
test on every device you are going to support.
And that kind of sounds like I'm telling you to
buy one of everything we make which would be great
but for a lot of you that's probably unrealistic.
And so at a minimum you want to test on the
oldest device that you are planning to support.
That's likely to be the slowest and today
for most of you that would be the iPhone 3G.
So now let's hit those key scenarios
starting with speedy launches.
Launch is a very important performance scenario.
If you think about it, first of all, when a user buys
your application, the icon appears on the screen.
They tap that icon to launch.
This is the hello.
This is the out of the box experience for your application.
So you want the user to have a
good experience from the get go.
Or if you think about when somebody who has your app
says, "Hey, hey look at this", and they reach out,
they show their phone to their friend,
they're going to launch your application.
And again that's the first thing a
potential customer is going to see.
So even aside from that, launch is a very common scenario.
For non-multitasking devices every time a user switches away
to do something else it's going to quit your application.
So, when they tap on that icon again
it's going to be a launch scenario.
On devices which support multitasking,
instead of launch it's more often a resume,
a transition out at the background state but
you'll find that in your code there's a lot
of shared work between launch and resume scenarios.
So everything Ben and I talk about today
will still apply to resume as well.
Lastly, there is a stick.
If your application is too slow then the
operating system will actually terminate it.
And that's to keep the system responsiveness up.
If you think about it we don't want
a device where the user is reaching
out tapping, nothing is happening, it seems hung.
And so if an application is behaving too
slowly the OS will actually terminate it.
And we do that with the system service we call Watchdog.
So Watchdog is constantly looking
at an application and measuring the
"wall clock" time to reach certain checkpoints or dates.
And "wall clock" time if you are not familiar with the
term is just seconds ticking by on the clock on the wall.
It's not CPU time or anything fancy.
It's literally just the time that your
user is waiting for something to happen.
So these values that you see on screen,
they're subject to change, right.
Ideally you want to keep these things as short as
possible because all it is, is the user waiting.
But when we're talking about launch your
application has up to 20 seconds to be able
to return from applicationDidFinishLaunching.
It's quite a long time.
You really want to be far below that.
But that is the upper limit.
For Resume and also for Suspend there's less work
to do so that time out goes down to 10 seconds
and Quit is actually the shortest time out because
well you should already be saving out your state
on a regular basis anyway so there should be very
little to do when you actually quit the application.
Also new in iOS 4, there's the complete operation,
multitasking scenario, and for that say uploading photos
to a social networking site or
something, you get 10 minutes.
And if you watched the multitasking talks you can also
find out more about how to handle and avoid that time out,
suspending your background operation gracefully.
So, when you want to collect the data to
figure out how close you are to those numbers,
figure out how to get it down, you want to make
sure that you are testing with a realistic data set.
Your application may launch or resume
really, really quick with no data.
So, instead you want to think about a user who's been using
your application for 6 months and has a lot of bookmarks
or photos or clips or whatever it is
that's your application's data set
and create a stable realistic data
set that you can use for that.
To collect the data you will use
the Time Profiler instrument.
Time Profiler works with iOS 4 devices and it
collects back traces at regular intervals showing
at that instant in time what your application is doing.
You can then look at them n aggregate and get a pretty
complete picture of where the execution time of your app is.
And you are really looking for two things primarily.
First if we're talking about launch you want to
look for work that you just don't need to do,
work that you can take out of launch defer until
just slightly later or maybe even on demand,
waiting until the user does something that's going to
request the work that you're doing currently during launch.
So move that out of the launch path.
But the other work that you'll see is work
that-- you look at that function and you say,
I got to call that function as part of launch.
That's necessary work.
And then you want to sort by running time and
look for the thing which is taking the most time.
Because that's where there's the most upside
for your effort to make your app fast.
So, to show you using the Time Profiler instrument to get a
demo application running faster we'll turn you over to Ben.
>> Ben Weintraub: Alright, thanks Erik.
So in order to show you guys what we're
going to be doing with instruments,
we've created a sample demo application here.
So let me just show you the app quickly
first so you get a feel for what it does.
So, basically you select photos from
your photo roll or from the camera.
And then for each of those photos
you can create a composition.
So if I select one of these I get this nice
Andy Warhol style composition and for each one
of these tiles I can adjust the threshold
here and then change these colors.
And that's about all the app does.
It's pretty simple.
OK, so now let's take a look at
how it launches on the devices.
I have an iPhone 3G is here.
I'm just going to go ahead and launch the app.
Again I'm using a realistic data set with a number
of compositions in there to simulate what it would be
like if the user had been using your app for a while.
So, you can see that that takes quite a
while to launch even on an iPhone 3GS.
So, in order to figure out where all that time is
going we're going to switch back to instruments here
and we're going to use the Time Profiler
instrument that Erik was talking about.
So, if you've used Shark or the CPU sampler instrument
in the past Time Profiler is similar in concept
but its lower overhead and it's now the
preferred way of looking at CPU bound operations.
The first thing I'm going to do is select the
app that I want to have instruments launched
for me because we're looking at launch times.
I'm going to have Instruments launch
the application on my behalf
and then it'll start collecting data as soon as it launches.
Let's go ahead and watch.
Alright, so you can see now as Instruments is working there
is data being populated into this Call Tree View down here
and then it looks like we're finished launching now.
So I'm going to stop the trace.
OK so let me expand this Timeline view out a little
bit so you can see what's going on a little better.
Alright, so the first thing we notice is in this Timeline
view the purple bars are showing us an approximate amount
of CPU utilization over time.
And if I go to the very end here where I stop running
the CPU it's about 3.7 seconds after the launch.
So I can use the Call Tree View down here to try
and figure out where that time is actually going.
So, the first thing I'm going to do is
if you take a look at these check boxes
over here there's one check right
now that's called Invert Call Tree.
So that-- see that a little better?
So I actually want to see the methods
in the order that they were called.
So, I'm going to uncheck that box and now you see I
have-- Now I have a more reasonable back trace here.
But most of these symbols are not things that I
immediately recognize, these are system libraries that end
up being called through to get to my code.
So the next thing I'm going to do is check
this box that says hide system libraries.
OK, so what that will do is filter the Call Tree's View
down to only the stack frames that
are from my application itself.
So it looks like I'm spending about 2.8 seconds of CPU
time under my root view controller's viewDidLoad method
and specifically almost all that time is under
this generate composition thumbnails call.
So let's switch over to Xcode and take a
look at how that-- what that code is doing.
Alright so, here's my viewDidLoad at the end.
I'm just calling generate composition thumbnails and all
that's doing is iterating over each of the compositions
from my data set and generating a thumbnail for each one.
So, and this is necessary in order to show those
thumbnails in the table view that we saw before.
So, even though I do want to eventually
show those, this isn't something that I want
to block the entire launch of the application on.
So, in order to have my app launched a little bit more
quickly and be responsive immediately I'm actually going
to put this work on to a background thread
using a technology that's available now
in iOS 4 which is Grand Central Dispatch.
So, the first thing I want to do is get
something up on the screen right away
as soon as the user launches the application.
So in order to accomplish that I'm going
to create a set of placeholder images here
and those placeholder images are going to stand in
for my thumbnails while I'm generating the thumbnails.
Alright, so that's great.
The next thing I want to do is actually
get this call into a background thread.
So, I'm going to do that by wrapping
in the call to dispatch_async
and so dispatch_async is an API from Grand Central Dispatch.
And I'm going to put it on a low priority background
thread because I really don't want it to interfere
with the responsiveness of my application.
So I'm just passing on a block here that
calls generate composition thumbnails.
Now, in order to make this method work from a background
thread there's one other thing I need to change.
So this thumbnails array is now accessed from two
different threads and so I need to serialize those accesses
in some way and I can do that using a lock
or again I could use Grand Central Dispatch.
So that's actually what I'm going to do in this case.
So I'm just going to replace the body of this for loop.
Alright, so now you see that I'm still doing the
heavy lifting here which is generating the thumbnail
on the background thread but when it comes time to do my
updates to the UI thread I called dispatch_get_main_queue
in order to send this work inside of
this block back over to the main thread.
So that should make sure that there's no
synchronization issues or anything like that.
Alright, so now let's switch back over to the
device and take a look at the effects of my changes.
So, I have a version of the app
with these changes ready to go.
So let me just launch it.
Alright, so that was much faster.
So the final thing we should do though is make sure that
we can actually quantify that change in Instruments.
So, I'm going to select the modified version of my app.
And again have Instruments launch it for me.
[ Pause ]
Alright, so you can see that we're still spinning the CPU
for quite a bit of time here and that's expected actually
because we still need to do that work of generating
the thumbnails but the difference is that now it's
on a background thread and we can
see that easily in instruments
if we check this separated by thread check box here.
So, I'm going to go ahead and do that.
OK, so now if we look at our main
thread and expand that out.
We can see that viewDidLoad is only
taking about 185 millisecond of CPU time
and all of the real heavy work is
happening on this background thread
that we created using Grand Central Dispatch.
Alright, so that's an example of how you can use Time
Profiler to help speed up the launch of your application.
So, back to you Erik.
[ Applause ]
>> Erik Neuenschwander: Thanks Ben.
So, to get your launches going fast you first have
to remember that the system Watchdog is out there
but you really want to be well below those
levels I was talking about a minute ago.
What you can do is collect a trace using the
Time Profiler instrument like you saw Ben do
and in his case he did less work by deferring
the work out of startup and that's one way
that you'll commonly be able to solve that problem.
But sometimes there's work that you have to do and
that operation maybe slow and you want to make sure
that in one way should they perform you
never block on those slow operations.
In particular, never do networking on your main thread.
If you think about it you don't control how quickly that
server is going to respond so this is a prime candidate
for moving it off to some non blocking thread.
But if you're doing a lot of work you need to optimize
those time consuming activities and make sure--
I talked about a realistic data set but
you should think about what realistic is.
If you're data set is going to keep growing
endlessly over time eventually it will get slow.
So think about the data set that you are looking
at on launch and think of someway to make sure
that that size always remains constrained.
And then like you saw Ben do, collect a new trace
after you've made a change and quantify your results.
So that's speedy launches.
Let's talk about scrolling next.
And scrolling is another really important scenario.
I mean how many of you have used UITableView, right?
It's a very popular class.
And this is because you want to show large amounts
of data and both on the iPhone, iPad, iPod Touch.
It's a great way to do it.
But because we have this direct manipulation UI when a
user reaches out and wants to scroll through that you want
that to seem like they're actually manipulating those cells.
And if it stutters that will break that kind of
seamlessness that you want in your application
and that's going to create a bad scenario for the user.
So the way that we measure if scrolling
is going well or not is frames per second.
Which is abbreviated as FPS and we pronounce that fips.
And if you're looking for a good
FPS number the magic number is 60.
60 FPS is completely smooth.
You can use the Core Animation
Instrument to collect that FPS data.
It does measurement of FPS in real time.
And if you have any animation which goes on
for longer than a second it's very, very easy.
You just look at the number that is
presented to you and that's your FPS count.
The Core Animation Instrument can also work for subsecond
animations but then you have to do a little bit of math.
If you think about a 0.3 second animation which only
draws 18 frames well then Core Animation is going
to report 18 FPS.
That's all that happened during that second.
But if you think to yourself well,
alright 18 times 10 divided by 3 aha!
that's 60 FPS.
So, you can kind of go through that but if you
have an opportunity to make your animations longer
at least for testing it will save you that math.
In addition to doing FPS measurement Core
Animation also has a set of check boxes
that can show you visual cues about
how rendering is happening.
And in particular one of those is Color Blended Layers.
And we'll show that today.
But there are many others and there have been some great
talks on instrument specifically and I'd refer you to those
to learn more in depth about the Core Animation instrument.
But to show you collecting some
FPS data with that same demo app.
I'll send it back over to Ben.
>> Ben Weintraub: So, here's our
application and so if we scroll
through these we can see that the
scrolling is not too great.
It's pretty chunky and it can certainly be improved upon.
Alright, so going back to instruments now, we're going
to use the Core Animation Instrument as Erik mentioned
to sample the frames per second that
we're getting out of this application.
So I'm going to just select that application as
our target and actually with the Core Animation
because we already have the app running we don't need to
have Instruments launch it for us so we can use this attach
to process feature and I'll just start recording.
Alright, and now all I'm going to do is just scroll
down through that list that you saw previously.
Scroll to the bottom and scroll back up to the top.
Maybe do a little bit more so we get some more data here.
OK, great.
So let's stop the trace now.
Alright, so if you look over this frames per second
column here you can see that the numbers we're getting
for FPS are not that great, when we're scrolling here.
We're certainly nowhere near the 60 frames per second
that Erik was talking about as the-- should be your goal.
So in order to diagnose why that's happening you
have a number of different tools available to you
but one of the most common problems with scrolling
performance is if you're creating a lot of objects
and then throwing those objects away immediately.
So, we can use the Allocations Instruments
in order to diagnose a problem like this.
So, I'm going to start a new trace with the
Allocations Instrument and again select my app
and then I'm just going to start recording.
So, the allocations instrument
is going to collect back traces
of every allocation that happens inside of my application.
So that's anytime that you allocinit a new
object or when you call malloc directly
or calloc or any of those other functions.
And then it's going to aggregate all that information
together in this nice statistics view for us.
So, what I'm going to do now is scroll through a bunch
of the table view cells that I've got here just scroll
to the bottom of my table view and you'll notice
when you do this on your device is if you try it
out that the performance of your application will be
somewhat degraded while you do this and that's OK.
It's actually to be expected because of the amount of
data that the Allocations Instrument is collecting.
Like I said it's getting back traces from every allocation
inside of your app so that's quite a bit of data.
Alright, so I'm going to just stop the trace now.
OK. So the first thing that I want to do is restrict
the portion of the timeline that I'm looking
at to only show the area where I was scrolling.
So, if you recall we waited until
this big spike here sort of went away.
So, I'm going to move the cursor to about the point
where I think we started scrolling and now I'm going
to use the inspection range tool to have
Instruments only look at this portion
of the timeline that's highlighted in blue.
Alright, so now if I look in the statistics view down here
each one of these lines represents one category or one type
of object that I maybe created in my application
and Instruments is giving me some statistics
about each type of object that I created.
So in this case what I'm looking for are objects
that I create and then throw away immediately.
So those, we call transitory objects because
they're-- they have a short lifetime.
So, I'm going to sort by the number
of transitory objects here.
And if I look up at the top I see
a bunch of malloc allocations.
I see some pretty generic looking
things CALayer, CF basic cache.
None of these really means a whole lot to me to begin with.
So what I'm actually going to do is
use the search functionality instrument
to search for a class from my application.
So because my application is called compositions I have
a class in here called composition table view cell.
And if I take a look at this particular class I see
that I have 93 transitory instances of that class.
So, what that probably means is that every
time I need to bring a new table view cell
onto the screen I'm creating a
new one just for that purpose.
So if I actually get rid of this search
and then look at the objects adjacent
to that composition table view cell class I see that
I have 93 of a bunch of different objects that look
like they might be associated like UIButton,
UIView, UITableViewLabel for instance.
So these things are probably being
created along with my UITableView cells.
So, Erik's going to talk about an API that we have
that will actually help you avoid this problem.
>> Erik Neuenschwander: So I'm going to show you actually
visually what Ben will show you in code in just a minute
which is making use of what we call cell reuse.
So, the way the application is behaving now and you see
that with all the transitory cells getting created is
that we create them, they scroll onto the
screen, they scroll off the screen exactly
as you'd expect and then they get deleted.
And then they have to get recreated, they come back on the
screen and so we want to avoid doing all those allocations
because that's contending with the scrolling
and giving the poor FPS that you're seeing.
So if you use cell reuse which is
an API that Ben will show shortly.
These cells still have to get created.
There's no avoiding that.
They scroll on, they scroll off but then they get recycled,
because it's just more of the same
that's going on to the device.
So after you've created that initial set
of cells you actually have a steady state
as they come on to and off of the screen.
So that's a little graphical showing of that.
I'm going to send you back to Ben to
show you both how to do cell reuse
and then some other tricks to get us some good FPS numbers.
>> Ben Weintraub: Alright, thanks Erik.
So, if we take a look at our table view
cell per row at index path method here.
We can see that right now we're just-- every time we call
it we're creating a new Autoreleased composition table view
cell that's a subclass of UITableView
cell that's custom to our app.
And so instead of doing that we're going to
go ahead and use the API that Erik mentioned.
So, let me show you how that looks.
Alright, so there's a couple of things here.
The first thing to note is this reuse identifier.
So this is just an arbitrary string that you
can give whatever value you want but the idea is
if you have multiple types of table view cells
in your table view with different lay outs
for instance then you can uniquely identify
those different types using reuse identifier.
So the next step is before we create
a new table view cell we're going
to call this dequeueReusableCellWithIdentifier
method on UITableView and that's going
to ask the table view whether it has any reusable cells
for us to use rather than having to create a new one.
And then if that fails we'll go ahead and create a new
table view cell which is OK because we do need to have
as many cells as are visible on
screen at any given point in time.
OK, so let's take a look at how
that looks on the device now.
I have a version of my app with this change.
OK, so as you can see the scrolling is a little bit
better but still not as good as it could be probably.
So, in order to figure out why that is, we're going to use
one of the other features of the Core Animation Instrument
and that's this check box over here
that's labeled as Color Blended Layers.
So it's probably easier if I just show
you what this looks like on the device.
I'm going to check this check box and then immediately
you'll see what the results look like on the device.
OK, so now you can see we have a number of views here
that are colored red and some that are colored green.
So, the green views are good in
this case, the red views are bad.
The green views mean that-- those are views
that are opaque and red ones are not opaque.
And what those non opaque views or layers mean is that
the graphics hardware actually has to do more work
in order blend them with the views that are behind them.
So, this Color Blended Layers check box
can be really useful in identifying that.
Alright, so let's switch back over to Xcode
and see if we can fix those two instances
of non opaque views that we had in our table view cells.
So the first one of them was this thumbnail, so the
thumbnails I know are generated inside of this class
and specifically in my thumbnail with size method here.
So, in this case in order to generate those thumbnails
with the cropping I'm calling UIGraphicsBeginImageContext
in order to start a new image context stuff and
by default this will give me back an image context
that has an alpha channel which means it will be non opaque.
So in this case I actually don't need that.
And so, I'm going to go ahead and call
a different variant of this method.
So, UIGraphicsBeginImageContextWithOptions allows
me to pass in a flag here that's this yes parameter
that will specify that I want an opaque image context.
And that means that the UIImage that I
returned from here will also be opaque.
OK, so that should fix the thumbnails.
And let's take a look at those date labels
you probably noticed were also not opaque
and so that's in my composition table view cell class.
Alright, so here in the initializer I'm
just manually setting them to be non opaque
and setting the background colors to nil.
So this may seem a little bit silly but this can
actually happen and does happen relatively frequently
when you are playing around with different
layouts and maybe you wanted them to be nonopaque
for one particular layout and then
you forgot to switch them back.
Another way it sometimes happens is people think that they
need to set the UILabel instances to be non opaque in order
to get that nice blue highlight color to show
through when you select the table view cell.
And that's not actually the case.
UIKit will handle that for you automatically
so you don't need to worry about that.
So I'm going to get rid of these two lines because
there's no reason for those labels to be opaque.
So let's go back over to Instruments now.
And again, I'm going to run a version of
my app with these changes and we're going
to check the Color Blended Layers
check box and see how it looks.
So, let's switch over to the device.
Alright. So, here's a versions of the app after
we made those transparency related changes.
And I'll just check Color Blended
Layers, and now you can see
that those thumbnails and the date labels are both green.
So that's great.
So, now that we've made all of these changes we want
to try and quantify what the impact to that was.
So, if we go back to instruments-- we can use the Core
Animation Instrument again to measure the frame per second
that we're getting with our modified app.
So in order to do that I need to select the running copy
of the app and just hit Record and then all I'm going
to do is scroll down to the bottom of this Table View.
And then back up to top again.
[ Pause ]
Alright. So, you notice that we still have a little
bit of room for improvement but we're getting
into the 50s now in our frames per second.
So that's certainly a great improvement
over where we we're previously.
Alright. So, back to you, Erik.
>> Erik Neuenschwander: Thanks, Ben.
That's actually the best behaved
that application spend with the FPS.
It's actually a live demo there.
And Ben practiced very well to get a
scroll that gives us some good numbers.
So, you see we didn't quiet reach 60 there but
hopefully you could see the visual improvement
when Ben went back to the application.
And also we can see quantitatively that
we went to the 20s and 30s up to more
in the 50s and so that's a big improvement.
So, when you're thinking about scrolling you need to test
scrolling scenarios and you can to that with manual testing
or using automated testing with
flip gestures to get scrolling
through a data sets that's going to give a good scenario.
When you're scrolling, you want to launch the Core
Animation Instrument and use it to measure FPS.
And remember 60 FPS is kind of the gold
standard that's what you shooting for.
Ben made a couple of changes there, first of all,
we have that API to reuse cells and so you can do
that using the UITableView method and it with
style reuse identifier you just want to pass
in a non-nil reuse identifier something that
identifies the kind of cell that that is.
And then, instead of just unconditionally
allocing a cell you want
to use the UITableView method
dequeueReusableCellWithIndentifier passing
that same identifier and if you've already created
and stopped using a cell in the past you'll get back
that instance and be able to avoid the allocation.
Such one key thing that you should be doing pretty
much constantly whenever you have the same kind
of cell that's going past in the list.
But the other thing is to use that Color
Blended Layers and if I can make you sit
through a rather bad rhyme you want the screen to be green.
Right? You're trying to get as much as
green you can on the device because that's
when the device is doing as little work as possible.
And that's true even for you UILabel like Ben said just
through testing you can sometimes turn off opacity.
Set Opaque is Yes by default.
You should leave it that way whatever you can.
And in fact even for you UILabel the system performs some
magic on your behalf, that even if you have opaque labels
when you select in there's that blue background
it will still kind of bleed through the label even
without you setting it to be transparent.
So, there's no need to do that and I hope you
keep the screen green and keep your FPS up.
So that's Smooth Scrolling.
So, let me move on to talked about memory footprint.
And keeping your memory footprint low is
also important because iOS has no swap.
So you maybe familiar from the desktop that when
enough memory is needed that it exhaust physical,
it will go out to the disk and that slows things down
somewhat but there is at least that escape valve.
And on these devices, we don't have that.
So that means that we can have memory
pressure meaning that we're just running
out of any free memory available on the system.
And so again, to preserve system stability the OS
will step in and it will terminate applications
when the device gets under high memory pressure.
The service that does that termination is called Jetsam.
Jetsam is constantly watching memory pressure
and it provides instant lightweight termination
of applications when memory pressure gets too high.
This becomes even more important in multitasking scenarios.
You think about multitasking we do have more applications
that are present in memory so that in general is going
to cause more memory pressure on the device.
So there are some capabilities to preserve applications
with smaller Footprints longer to keep more of them running.
So that's a little bit of care to
keep your footprint low because on--
especially on multitasking devices that
will help your apps stay around longer.
The general reason to keep your memory footprint low is
that it is a shared resource and so you really want to use
as little as possible because that will give
the over all best experience for the user.
If you want to think of it just really tersely
it's that you can stay safe from Jetsam
if your stay low it terms of your memory usage.
There are three areas that we'd like to suggest
you look at as ways to keep your memory usage low.
And the first to talk about is avoidable
spikes and then secondly we'll go into leaks.
Leaks is probably the one your most
familiar with and that third term will talk
about in some detail because it might be new to you.
And that's abandoned memory.
But let me start off talking about those avoidable spikes.
And this is just a bunch of individual, maybe they are small
but very brief allocations which
are all present simultaneously.
So, you get a spike and if that spike
causes memory pressure then even though
in the future a millisecond later you might have
gotten rid of all of it, the OS can't know that.
And so if memory pressure gets
too high you'll be terminated.
So, you want to avoid that and two cases
where it's likely to come up for you.
Is first if your processing large quantities of data.
One example that might come into mind is a video playback
but there you get to use the API in the OS which manages
to play the whole video without
actually causing a lot memory use.
But if you're ever processing large
quantities of data in some other way
in some other way downloading a big
XML document or something like that.
You want to try to approach it as small
individual batches that you can work on in pieces
to keep your over all memory footprint low.
Another case for memory pressure can come up for you
is when you are using a lot of Autorelease objects
that causes object lifetime to grows somewhat and so the
key there is to find a way to reduce object lifetimes.
Let me going to Autorelease in a little bit more detail.
And so for some of you, you may think of
Autorelease as just a way to avoid retain/release.
It's great.
You call Autorelease magic happens and
you don't have to think about it anymore.
But I'd like to kind of pitch it you in a little bit of
a different way which is to think of about Autorelease
as a way to return objects without retaining them.
That way you leave it up to your caller
to retain the object only if necessary.
So, when you actually call Autorelease what happens
is that instance gets added to the NSAutoreleasePool
and that Pool then is going to keep a whole list of these
objects and maintain them or maintain references to them
so they can call release but that release
call happens at the next turn on the runloop.
Of course when they call release as you probably know if the
retain count drops to zero the object will deallocated then
but it's in that instant in between when you call
Autorelease in the turn of the runloop during
that time you can have a bunch of
objects whose retain count is 1.
And they will be deallocated when the
AutoreleasePool gets around to it but in the meantime
when memory can actually spike as you have all
these objects that are soon to be deallocated.
And so, Autorelease is the common cause of memory spikes.
So, to show you an instrument that can help you
identify this and ways to get around it I'll turn
over to Ben and the Allocations Instrument.
>> Ben Weintraub: Alright.
Thanks, Erik.
So, you may have noticed previously when we launch
our application under the Allocations Instrument
that we had this big spike in memory
right when we started up.
So, we're going to try and investigate what's
going on there and see if we can fix it.
So again, we are using the Allocations Instrument,
we're going to launch the right version of it and again,
what Instruments is doing now is just collecting back
traces for every allocation that happens inside of this app.
And this Timeline view is going to
show us graphical representation
of the amount of memory that was used over time.
OK. So, we have this big spike here and then it drops
back down as you just saw it do, so let me stop this trace
and make this a little bigger so it's easier to see.
OK. So, you can see that over the first couple of seconds of
our application's lifetime our memory usage is just growing
and growing and then we have this big drop off here.
So, if we want to figure out where all these allocations are
coming from again I'm going to use the inspection range tool
in order to only look at a portion of the timeline.
So, I've moved my cursor to right near the end of the spike.
Oops, I want the other one actually.
And now I'm selecting just the portion of
the timeline that involves that memory spike.
And then, in this case I actually
want to look at this Call Trees View.
So, the Call Trees View will show me the call
trees under which all these allocations took place.
So, by default, as you'll see the separate by category box
is checked here which means that these Call Trees are sorted
or bucketed by what type of allocation they where.
I'm just going to turn that off because I
want to see them all aggregated together.
And again I'm going to check hide system
libraries so I just see my application's code.
Alright. So, now if I expand the heaviest path here,
it looks like almost all these allocations are
again coming from my thumbnail OS size method.
So, let's switch over to Xcode and take a look at
that method and see if we can improve upon it all.
OK. So, looks like we lost our changes from the previous
time with dispatch and everything but that's OK.
We can still show what we need to do in order
to get around that memory usage problem.
So, what I'm going to do for each iteration of
this for loop when I generate these thumbnails,
that's causing a bunch of Autoreleased
UIImages to be created and then those UIImages
as Erik are all present simultaneously
and they don't get released
until the Autorelease pool for this thread is popped.
So, what I'm going to do in order to fix
this is go ahead and wrap each iteration
of this for loop in its own Autorelease pool.
So at the top of the for loop, I'm just
going to alloc init a new Autorelease pool
and then down at the bottom, I'm
just going to call drain on it.
So, one thing that's important to note
about this API is that when you call drain,
that also releases the NSAutoreleasePool.
It's a common misconception that you
need to call drain and then call release.
You don't actually need to do that.
You just call drain and then it'll release for you.
Alright, so now, we have a modified
version of this app on the device.
So, let's go back over to instruments and take a
look at how it compares with the original here.
Alright. So, I'll just press Record
and start a new trace here.
So, you can see that our memory usage is still growing
somewhat when we launched but that's to be expected.
I mean we do need some memory to do our work
and it looks like we don't have the same kind
of long-term growth that you saw previously.
So, let me stop this trace and one of the nice things
about instruments is that you can take multiple runs
of the same operation and then compare them side by side.
So, by expanding this disclosure triangle over here, I can
now see my previous run in comparison to the current run
and I can see that I've reduced my
peak memory usage substantially.
So previously, we're up at about
2.8 megabytes and now it looks
like our peak memory usage doesn't
get more than 1.4 megabytes or so.
So, that's an example of how you can use
the allocations instrument to diagnose
and help fix memory-related spikes in your application.
So back to you, Erik.
>> Erik Neuenschwander: Thanks, Ben.
You can really see when you compare between those
two runs a big difference between the memory sections
so that's a great way, probably one of the clearest
ways that you can see your memory usage change.
So, to talk, to kind of wrap up
Autorelease, you should use Autorelease.
It's a feature but it is a little bit more
expensive than your typical retain-release and so,
you only want to use Autorelease when it's
appropriate and there are really two cases you need
to consider for that, your code and API usage.
In your code, you'd like to use Autorelease at
framework boundaries with inside your project.
Basically, if you're ever handing
back an object to some other object,
that's going to maintain the lifetime
of your return value on its own.
If you're ever having some say, member of
a class and you're going to maintain that.
You're going to allocate, say in your
init, release it in your dealloc.
This is something where you control the entire lifetime
of that object and so, you can just use retain and release
to manage it that way and get a little bit
more efficiency and some very good control
over how long that object will be around.
But the other cases when you're using
API like in Ben's example there.
API will return to you an Autoreleased object and
well, there's really nothing you can do about that.
They did it because they wanted you to be able to retain
it if necessary but in that case, as Ben showed you,
you can use nested Autorelease pools to get some
control over that and really shorten down the lifetime
so that we were still seeing those individual
spikes as the thumbnails loaded but overall,
there wasn't that same sort of
triangular memory growth, right?
And that'll help keep your overall maximum memory lower and
that'll keep you avoiding jetsam and keep your app around.
So, let's talk about that second
area to keep your memory usage low
and leaks is probably something you're familiar with.
In fact, if you attended the advanced memory-- the advanced
instruments talked that focused on memory performance.
They gave a really great demo of the leaks instruments.
So, we're not going to do that here but I'll just
kind of give you the quick summary which is that well,
leaks are just memory that you can't get at anymore.
It's gone.
You have no references to it and
so, we have an instrument for that.
It's the leaks instrument and if
you launch your application with it.
It's able to give you all the points
at which that memory was allocated.
So, the point where the memory is allocated
is never really going to be your bug.
After all, you probably allocated that object for
some reason but it does give you context to understand
where your problem is likely to happen and you can dig in
and actually look at the individual retains and releases
and the code where that happens to try
to understand where things went awry.
There are two common ways in which
people end up with memory leaks.
The first of which is just an unbalanced retain-release.
Typically, well more retains than releases.
But there's actually a subset of that which in the
new Objective-C 2.0 run time you can hit which is
when you're using properties and that's
if you forget to release the value
that that property had before retaining
the new value that came in.
So a little trick that you need to make sure that somebody
isn't setting the property to the same value and again,
that other talk which focuses on the leaks instrument
can show you very clearly in code how to do that.
But using the leaks instrument, you can
quickly get on top of memory leaks in your app
and use that again to keep your memory usage well.
So let me talk about that third one which is
abandoned memory and this may be a new term to you
but abandoned memory is almost like a leak but not quite.
It's a memory which you still have an act of reference too.
So you can still access it but
at this point, it's left over.
It's a memory that you are actually never going
to choose to access again, and so therefore,
you might as well release it, free it up and
get it out of your application's memory space.
So the allocations instrument is the right tool for the job
here and it offers an additional feature called Heapshot.
Using Heapshot, you can take a snapshot of while
you're heap and then run through a set of operations
and take a second snapshot and then compare
the heap between those two operations.
And what you're looking for are
differences that you don't expect.
You don't really expect that that object was still hanging
around in that second Heapshot and you can then figure
out how to free it and keep your memory usage low.
So I'm going to send it back to Ben one more time.
We have one more little problem in our
demo application and he'll show you how
to use Heapshot in the allocations instrument.
>> Ben Weintraub: Alright, great.
Thanks Erik.
So, we're going to use the allocations
instrument again as Erik mentioned and I'm going
to select my application and just go ahead and launch it.
So, the best way to go about finding abandoned memory
problems is to choose some common user scenario
in your application and then to run that scenario once.
Mark the heap using this Mark Heap button here.
And then run the scenario again
and mark the heap a second time.
And then, instruments will allow you to look
at the deltas between those two operations.
So, let's switch over to the device
for a second and I'm just going
to show you the operation that I've chosen to do here.
So it's a pretty common one in my app.
I'm just going to select one of
these compositions, select the tile,
cycle through a few colors here and
then go back to the main table view.
So as a user, I wouldn't expect there to be any increased
memory usage from just doing that operation once.
So after I've done that once, let's switch back
over to instruments and I'm going to mark the heap.
And then I'm just going to do that one more time quickly.
Alright, so I'm just selecting a tile, cycling through
these colors, and now I'm back to my table view.
Alright, so now, I'm going to mark the heap a second time.
Alright, so let's stop this trace
and make it a little bigger again.
So, you can see these red flags in the timeline
here and those represent the points in the timeline
when I took those two Heapshots and the first
one is labeled as baseline by instruments.
So if I expand that out then each of these
lines corresponds again to some type of object
in my application's memory space and I could see how many
of them were live and how much memory they're taking up.
So that I'm not particularly interested in
at the moment because what I really want
to see is the delta between these two points in time.
So in order to see that, I want to look at
this Heapshot 1 snapshot here so I'm going
to click the arrow next to it in order to focus on it.
So now, all of the objects that are listed here are
actually objects that were added at some point between,
or they were allocated at some point between these
two red flags on the timeline and are still alive
at the point when I took the second snapshot.
So, it looks like the biggest by far category of
allocations I have here under the non-object category.
So I'm going to expand that out and I'm just
going to sort of scan down this heap growth column
until I see something that's big and then there's a
40-kilobyte allocation out of a total of about 255 kilobytes
of heap growth between those two flags and
so that certainly stands out to my eye.
So I'm going to select that allocation and
then click on the expanded detail view here.
And what that will do is bring in the
backtrace under which this object was allocated.
So if you take a look at this backtrace, you see that we're
calling malloc from this tile backing initWithImage method.
So tile backing objects in my application are actually
grayscale bitmaps that are used to generate those threshold
and colorized tiles that you see in the app.
And I know that I'm doing some caching of those
objects in my imageForTile with size method.
So let's head over to Xcode and take a
look at the cache strategy that I have.
OK, so here's my imageForTile with size method and this
is actually a reasonable place the cache and the reason
for that is when I adjust the threshold as
you saw in the very beginning of the demo.
I don't want to have to regenerating the
grayscale version of these tiles every time.
So there's some reason to have this cache.
And so what I'm going is I'm just generating
these strings that are cache identifiers
and I'm using those as the keys in my cache.
I have just a mutable dictionary here.
And then the values are the tile backing objects themselves
which can be quite large because they're again bitmaps.
And so in this case, it looks like what I'm--
the way I'm generating those cache keys
is by taking the three color components,
the RG&B components for the tile color
that I'm generating along with the width
and height of the tile that I'm generating.
But as I mentioned before, these
are actually grayscale bitmaps.
So the tile backing object for a red tile is going to be
exactly the same as the tile backing object for a blue tile
and there's no real reason for me
to have to cache them separately.
What I end up doing in this cache is just caching
the same thing multiple times under different names
and this is actually a common problem with caches.
So I'm going to go ahead and get ride of
these portions of the cache identifier
and change my format strings so I no longer need those.
Alright, so that should help somewhat.
The other thing to keep in mind though is that I'm caching
in one of my model objects and these model objects stay
around for the lifetime of my application in my case.
And so there's actually no reason to be caching
these tile backings once the user has gone back
to that main table view screen in my app.
They can just be regenerated the next time
that the user selects a different composition.
So, the next thing I want to do is actually add a method
that will allow me to flush out this cache entirely.
So, that's pretty simple.
I just added a flush caches method.
All it does is to release my cache and set it nil and
that I need to add that to the header file as well.
Alright, and then the final thing I need to do is
actually call this method from my view controller.
So in my compose view controller which is what the view
controller that handles the screen with all the tiles on it.
I have a view will disappear method and at the end of that
method, I'm just going to go ahead and call flush caches
on the current composition and that will make sure that
those tile backing objects don't hang around for too long.
Alright, so we're running a little bit short on time.
So I have a saved version of this trace
that I'm just going ahead to show you.
Alright. So this is-- I performed
the same operation twice and again,
we can see this nice side by side view with instruments.
Oops, alright.
So here's the-- below, you can see the original version and
you can see there are memories growing constantly overtime
and above, you can see the version with the changes
that we just made and so you can actually tell that--
you can just see visually that we're not growing our
memory usage overtime as we were in the previous version.
And then the other thing, the other way that quantified
this is-- so if we look at our original trace,
we see that the heap growth, for
Heapshot is listed as about 430k.
If we look at our second trace, we can see
that the heap growth is listed at about 13k.
So that's a pretty significant improvement.
So that's an example of how you can use the
new Heapshot feature that's new in iOS 4
to help diagnose abandoned memory problems.
So back to you Erik.
>> Erik Neuenschwander: Alright, thanks, Ben.
We have a lot of demos there, so I
appreciate you guys not applauding
but that was Ben's last run so let's give him hand.
[ Applause ]
Alright, we'll just finish up with a couple
of comments on that and prioritizing.
So, when we're talking about keeping your memory usage
low, I want to remind you about jetsam being out there.
It will kill your application if
memory pressure gets too high.
So you can focus on three areas:
leaks, abandonments, and spikes.
We talked about all three of those and showed you
instruments that you can use to get on top of that.
The leaks instrument is great for leaks and the
allocations instruments is all around useful including
that Heapshot feature to show you as Ben
showed you overactive caches or other ways
in which you're abandoning memory which you can reclaim.
A way that you can often end up
with spikes is through Autorelease.
So you want to target, limit your use of Autorelease
but if you're using an API which is a heavy user
of Autorelease objects then you can have nested
Autorelease pools and that works out great.
So, we know that you have things to do other than
just work on performance in your applications
so let's talk a little bit about how to prioritize
performance issues relative to the features
and bugs and everything else that you have to do.
And so at least on the iOS team, we believe more or
less that there can be show stopping performance issues.
Ones where performance is so bad,
we won't even ship the release.
And part of how we do that is by establishing goals
early on in the release and getting consensus around that
so that everybody knows what we're trying to head for.
And so that's certainly something you can do
within your development team to get agreement
about what performance issues you've really got to fix.
But to prioritize them, you really
want to look at it in two dimensions.
First is just the frequency at which
the performance problem comes up.
May be it's a scenario which is very common, for
instance launch or scrolling through your main list
of composition say, in our demo application.
And so if it's a common scenario that means
it's going to hit your user pretty frequently.
You also want to think about not just the
scenario but how often it performs poorly.
Maybe it's just slow every so often and that's going
to be less severe than if it's slow every time.
And the second dimension, you want
to consider is the severity.
If the application is unresponsive for several
seconds, well, that's going to be pretty bad
or may be it's single-digit FIPS in an animation
and we're here to tell you that that looks awful.
Or it could be that one of those things they talk about,
the watchdog or the jetsam is happening to your application
and that's bad because to a user, those
terminations look exactly like crashes.
Your user can't tell.
And so it may be that if you're getting feedback from
your customers that your application crashes a lot
and you can't figure out what's going on, it
may be that your top crash isn't a crash at all.
It's either one of these watchdogs or these jetsams.
We have one more tool that will work out
well for you there and that's iTunes Connect.
You already know about iTunes Connect, it's how you
get your applications on the store in the first place.
But I hope you've also noticed that we offer third party
crash reports and that crash report gives you, of course,
traditional crashes but down at the bottom,
there's also this bar that shows you the
frequency of different kinds of diagnostic events.
Now, this is picked from just one internal Apple app
on one of the QA servers and you can see in this case,
crashes are the dominant feature of this application.
93% of the time it's in actual crash but we have
7% where a watchdog is happening and then at least
in this case, we have a very few or no jetsam events.
Your bar is probably going to look different
and if that blue bar is small then you
want to dig in to the other two areas.
So for watchdogs, also on that same report page, there's
a breakdown of the different kinds of watchdog events
that are generated, say launched or quit and there's
that button to the right, it says Download Report.
If you click that, you will get reports
from your users out in the field
and it will have the backtrace of your application.
You'll be able to see what it was so busy
doing that the application had to come in
or that the operating system had to come in and watchdog it.
So that can be very helpful for understanding how your
application is getting watchdog of what's going on.
That other class is the jetsam events and there
we offer two pieces of data for you to look at.
One of which is the average memory usage of
your application at the time it was jettisoned.
So if you see that number and it looks a lot higher than
the numbers that you're seeing in your internal testing,
that means that maybe you're not using a realistic dataset
for your users or that you have some database that's growing
without bound and taking up a lot
of memory in your application.
So you want to kind of think about
the test scenarios that you're doing.
That second number, the largest, gives you an
idea of maybe there's some overall memory leak
that you just haven't caught in your application.
So you can use this if you see a lot of green in
that bar to decide to go investigate the memory usage
of your application and as far as your users are
concerned, this is the same thing as fixing a crash.
So it's really worth doing.
Let's wrap up and I hope that you either
came in believing or you'll leave a believer
that performance is critical for your application.
We showed you a lot of different instruments and
ways to use them that you can measure and get data
to improve performance in your app and I want to remind
you one last time that performance testing really needs
to be done on the device and ideally, you should
do it on the oldest device you're going to support.
Three key areas we talked about, launched, very
important, scrolling first constantly and memory usage.
Like I said, your application will get terminated.
So, if you develop clear performance goals then you won't
fight at the end of your development cycle about what to fix
and please visit iTunes Connect
for performance-related reports.
Let me point out three related sessions, we have one later
today and also one tomorrow morning that are advanced going
into much greater depth about performance issues.
Also, if you're a heavy core data user, you can optimize it.
You can find out about that at the talk and there
are three other ones, there are instrument talks
which have already happened sadly so those are on video.
There's more information with these evangelists, I hope you
know, documentation on the website, a link you can click.
Let me--