WWDC2015 Session 412

Transcript

X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
[Applause]
>> CHAD WOOLF: Thank you.
Thank you.
I'm Chad Woolf.
>> KRIS MARKEL: I'm Kris Markel.
>> CHAD WOOLF:
We are performance tools
engineers for Apple.
This is session 412.
We'll talking about
profiling in depth.
This will talk about the time
profiler in Instruments and how
to use it to optimize
your applications.
Time profiler is place to go
when you're trying to figure
out where your application is
spending the bulk of its time.
An example.
It is useful when trying
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
It is useful when trying
to discover what the
application is doing at runtime.
You want to see who
is calling what.
Our session breaks
down a bit like this.
We'll talk a bit about the
motivation on why we wanted
to create a session devoted
to the time profiler only.
But the session will
revolve around demonstrations
and showing you details
on how it works
and how your applications
work below the source level.
Then we'll give you
final tips on how
to use the time profiler
on your own.
Quickly about our motivation,
it comes from the creation
of Instruments 7 itself.
With Instrument 7 we wanted
to go with a new look and feel
which meant new artwork
but it also meant
that for the new feel, we
wanted it to be more responsive
and we wanted to feel smoother.
We'll do performance
optimizations
for our UI in general.
We also wanted to try
new graphing styles.
These graphing styles were
things that we wanted to do
in the past but didn't
have the performance
to in our existing graphic code.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
We knew that we would have
to focus on the rendering
which is a pretty difficult
piece of application.
Instruments have to deal
with hundreds of thousands,
sometimes millions of data
points and has to try to crunch
that down into a
presentation that's both simple
and easy to understand.
There is definitely algorithm
complexities in there.
What that meant for us, we had
to rewrite a significant portion
of our application which is
the track view which lives
up at the top of the app.
Chris and I over winter took our
design for the new track view
and we started building it up
out of a series of prototypes.
We didn't do the prototyping
in Instruments itself,
we broke it out in a separate
application to keep it simple.
This is what one of the
last prototypes looked like.
Now while we were
prototyping, a thing we did,
we set a performance budget.
As we layered on feature
after feature we were constantly
evaluating our performance
of our code against the budget.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
of our code against the budget.
When we found we were exceeding
the budget we would use the time
profiler to figure out where
in our application
things were going wrong.
Sometimes it was an easy
fix but sometimes it wasn't.
Since we were prototyping,
even some of the bigger
structural changes we had
to make to get the
performance back
on track was still fairly easy.
When we integrated it back
into Instruments we use
the time profiler again
to find the hot spots in
our integration points,
and after a few iterations, we
had a version of Instruments 7
which was meeting our
performance goals.
We were thrilled with how
the time profiler got us
through this winter,
that when WWDC 2015 came,
we wanted to create a session
that talks about time profiler
and the problems that it's
really good at solving.
We wanted to share
our experience
when writing the track view.
What we did this year,
we created a demonstration
application on iOS
that looks similar to the
first track view prototypes
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
that looks similar to the
first track view prototypes
and we set performance
targets for ourselves,
100,000 data points is
what we wanted to graph,
we wanted a perfect 60 frame per
second for panning and zooming.
We wanted to make it work on
a iPad mini 1st generation.
We picked the iPad
mini 1st gen --
you know what I'm
talking about --
we knew it would
work well on all
of the other platforms
especially the later platforms.
Chris will show you
the application,
he'll time profile it, and he's
going to show you the things
that we found when putting
this together for you.
>> KRIS MARKEL: Thank you, Chad.
Here I have the prototype
application of an Xcode,
and I want to call
out a few things.
Our initial implementation we
discovered can't handle even
close to 100,000 data points.
Initially we're working with
10,000 data points to get going.
Another important point, you
should do your time profiling
on release builds, you
want to take advantage
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
on release builds, you
want to take advantage
of the compiler optimizations
while you're profiling.
I'll go ahead, start profiling
the application and to do
that I'll, from the product
menu, choose profile.
This will build the application,
install it on the iPad and bring
up the Instrument template user.
Here it is, time profiler
is selected for me.
I'll click choose.
Here you will see Instruments
in our new track view
and we'll get back
to that in a minute.
Right now I'll go ahead
and start recording,
click the record button,
we'll show you the app.
I want to emphasize what you see
here, this is not the simulator,
this is QuickTime mirroring that
which is already on the app.
Here's our graph, I'll go ahead
and scroll, scrolling isn't bad,
it is not bad, I'll zoom out
by pinching, it is okay at fist
and then starts to
stutter, stutter,
stutter, that's pretty bad.
Now, finally, I'm going to
just scroll back and forth
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
Now, finally, I'm going to
just scroll back and forth
with my finger, I'm
actually moving my finger
but the display is not updating,
it is very, very laggy.
That's very poor performance.
Let's look at what's going on.
Let's go back to
Xcode -- Instruments.
We'll stop the profiler
and quickly show you
the new track view.
Here we have the CPU usage.
What this is, this is
the average CPU usage
for specific unit of time.
That time depends on
your current zoom level.
You see the different
parts as I use my app
where it was spending time.
This is scrolling, the zooming
out, this is the scrolling back
and forth while zoomed out.
A nice thing about the new track
view is that I can go ahead
and use this pinch gesture
to zoom in on a piece of data
that I'm interested in.
If you're not using a Trackpad,
you can hold down the option key
and scroll up and down
to zoom in and out.
I want to focus on this
particular piece of data,
the activity right here.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
I was scrolling here.
To do that, I'll apply a filter,
that's just a click and drag
and it selecting just those
specific samples so I can focus
on that particular data.
I'll create more
space down here.
Down here this is
our detailed view.
What this is showing
us is how many --
the percentage of time
profiler samples we have inside
of a particular function
or method
and then we have
the symbol name.
Here is our percentage
and here is our symbols.
Now the first thing
you usually do
when you time profile is you
start to expand these out
and looking for kind of a --
comparing the numbers over here
with specific methods
over here, function.
Looking for things that kind
of stand out as, you know,
jump out at you as something
that's worthy of investigating.
Another option, if we go over
to the inspector pane and click
on the extended detail, we
see the heaviest stack trace.
This is for the main thread.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
This is for the main thread.
This is where focusing here is
where I'll get the most bang
for my buck when doing
a time profiling trying
to make performance
improvements.
Let's see what's going on.
The main is calling
application main, run loop,
there is core animation
work happening, some --
there is nothing
standing out as something
that seems out of the ordinary.
In fact, this is a really common
occurrence when profiling.
You look at what the
application is doing the most,
well it doesn't look like it
is doing anything special.
There is nothing -- no call
to compute the 40th
Fibonacci number here
in here or something.
However, you know,
looking at this call stack,
this stack trace, there is
something I know what my
application does.
It is a simple prototype
app, what it does,
it builds a path
and it draws a path.
I can actually see in here
that there is a call,
a CG context path.
It is not called by my code
according to the stack trace.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
It is not called by my code
according to the stack trace.
It is here, it is taking
up a fair amount of time.
I'll go ahead, click on
that, take a look at that.
If we look back at our call
tree I do see something
interesting here.
What we have, we have
the draw path according
to this call tree is called
by this draw layer
method on UI view.
That's also calling our
drawRect for the graph view.
That's taking a fair
amount of time.
That's one of the
things that the app does.
If I look at the time over here,
context draw path is taking up,
you know, 55% of the samples but
the drawRect, it in very few.
This is interesting,
if I double-click
on the drawRect method it will
take me to the source code.
I see, if you look
to the bottom,
I actually do call draw path
from the drawRect method
but it is not showing
up in any sample.
Everything in here
is an add path.
This is unusual especially
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
This is unusual especially
since my expectation is my own
drawRect method would take a
while to run.
It is basically half
of what my app does.
Looking at this,
I actually notice
that while drawRect
returns a void
and context draw path is
the last method called
and this is probably a case
of what's called Tail
Call Elimination.
To tell us more about Tail
Call Elimination and how
to verify that, back to Chad.
>> CHAD WOOLF: Okay.
So to explain the situation
that Kris is seeing we have
to understand how the time
profiler sees what's calling
what inside of your application.
This is going to get technical,
I'll walk through
it step by step.
On the left, we have the code
for drawRect and on the right,
we have what you would imagine
the stack to look like for
that thread, right before the
UIKit calls into the drawRect.
When that call is made
to the drawRect it will do what
most functions and methods do,
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
to the drawRect it will do what
most functions and methods do,
to establish their
own call frame.
That starts off by first
pushing the return address
from the link register
and the previous value
of the frame Pointer
on the stack.
Now drawRect knows how
to return to its caller
and restore the frame Pointer.
Now the next thing that happens,
we take the frame Pointer
set up to the new base.
Then drawRect will make
room for its local variables
and the compiler scratch
space and that's --
now we have a frame
for drawRect.
Now the code starts to run
and we come down to draw path
and draw path does
the same thing.
It pushes the own
frame on to the stack.
The way time profiler works,
it uses a service in the kernel
that will sample what
the CPUs are doing
at about 1000x per second.
In this case, if
we take a sample,
we see that we're running
inside of context draw path.
Then the kernal looks at the
frame Pointer register to see
where the base of that
function's frame is
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
where the base of that
function's frame is
and find the return
address of who called it.
Now we can see that drawRect
called into draw path.
If we want to see who called
into drawRect we can use the
frame Pointers that were pushed
on the stack to find
the base of drawRect
and then continuously go
back through the stack
until we hit the bottom.
This is a backtrace.
If we take enough of
these and put them
in the call tree view you
can get a pretty good picture
of what's going on inside
of your application.
I want to point out,
the frame Pointers
on the stack are
absolutely required.
If you're compiling code
with fomit-frame-pointer turn
that off to try to do the
time type of profiling
that we're doing here.
Let's look at the
optimize cases.
This is where drawRect
was compiled
with compiler optimizations
enabled.
Same situation, we
have a drawRect frame.
We're about to call
into draw path.
You notice when draw path
returns drawRect is finished.
Doesn't need to do anything.
It is going to return.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
It is going to return.
The way it returns, it is
it pops the stack frame,
restores the previous
value of the frame Pointer
and jumps back to the caller.
The compiler looks at
this and says well,
why does draw path need anything
from drawRect's stack frame.
It doesn't.
Furthermore, why come
in to drawRect --
back to drawRect at
all when it is going
to return to its caller?
What it does, it rearranges
the code like this.
It is going to pop the stack
frame, restore the frame Pointer
and make a direct call
back into draw path meaning
that we don't need to
jump back to the caller.
That's harder to explain
than it is to see.
Let's imagine what it would look
like when running this code.
We'll pop the stack frame off to
get rid of the local variables.
We'll restore the frame
Pointer to the original value
and the value of
the link register.
Then we'll jump to the beginning
of the code for draw path.
Draw path is going to
push its own frame back
on to the stack using
the value that it found
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
on to the stack using
the value that it found
in the link register
and the frame Pointer.
From draw path's perspective
it was called directly
from draw layer in
context from UIKit.
If we take a time
sample at this point,
we'll see the exact same story.
Even though that's not
the actual call sequence
that happened, this is
what the time profiler saw.
That's what we ended up
seeing in our call trees.
This is called Tail Call
Elimination, it is common
in highly optimized code
and has some benefits.
It saves stack memory.
In the process of saving stack
memory, it keeps the caches hot,
that reuses the memory,
the caches and data.
It has a profound
effect on recursive code,
especially tail call recursive
code, where a function
or method calls itself as the
last thing and then returns.
Without pushing those frames
a Tail Call Elimination inside
of a recursive function can
make the performance as good
as its iterative version, so
you don't get the stack growth
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
as its iterative version, so
you don't get the stack growth
and you get the high
performance as well.
This optimization leave on
with highly recursive code.
If you want to turn it off
for the sake of profiling
to show a cleaner stack trace
you can turn it off by going
in the build settings
of the project
and setting the compiler flag,
the CFLAGS to
FNO-optimize-sibling-calls,
turning off the optimization,
and unfortunately the
performance gains with it,
but you'll get a better
result in the time profiler.
If you choose to live with
it, and you want to identify
if a tail call is happening,
then what you can do,
you can look at the disassembly
and call sight of the last call.
If it is a normal call, it
is going to use a branch
and link family of instructions,
that's the first example there.
What that does, it jumps
to the new function
and saves the return value
in the link register.
If it is a tail call case and
we're going to jump directly
into the new function, it will
be a direct branch instruction.
Without the BL.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
Without the BL.
That could be a call instruction
for a branch and link
and the branch would
be a jump instruction,
if you see that,
it looks similar.
Now it is up to Kris if he wants
to disable the optimization
and recompile, or he can
carry on, it is your choice.
>> KRIS MARKEL: I'll
look at the disassembly.
In Instruments, upper right-hand
corner of the detailed view,
there's a button, view
disassembly, if I click that,
I see the disassembly
for the method
and we can confirm the call to
context add path is a branch
and link, the call
to context draw path,
it is just a simple branch.
I'm confident that this is a
case of Tail Call Elimination
that 55% that I saw on the call
tree that was not attributed
to my drawRect should be
attributed to the drawRect.
That is good news.
I know now my drawRect is
on my heaviest deck frame,
my heaviest stack trace
and it is consuming
55 to 60% of my time.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
and it is consuming
55 to 60% of my time.
This is great.
I know where to optimize.
I optimize drawRect,
I'm good to go.
Let's look at this drawRect.
Looking that the drawRect, if
I had a table I would flip it.
There is not much
to optimize here.
It is hard to think
of a simpler drawRect
that actually is functional.
We have 4 function calls,
context, you know, CG calls,
this drawRect really
does not do much.
It turns out that this is
actually a really common
occurrence when doing profiling.
You will take a look at
your hot spots and code,
there is not much you can
do in your code directly
to improve your performance.
You know, this junction,
what do you do?
You know, other than flipping
tables, crying into your pillow
at night, what we did was we
went through, started to look
at the core graphics
documentation and other drawing,
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
at the core graphics
documentation and other drawing,
you know, Cocoa drawing
documentation.
We came across this
particular property here.
This drawsAsynchrously.
Lo and behold, there was a
make my code faster button
that was created by
an Apple engineer.
This is excellent.
Above that, you see I copied
out of the documentation,
pasted it in there.
It says a couple of things
that are interesting.
It says, first of all, it may
improve performance, it may not.
You should always measure.
You know, okay, dad.
Let's measure.
Let's see if this
improves in performance.
This time to start the
Instruments, I'm just going
to do command-I for Instruments.
It will do the same thing.
It will build the application
and install on the device,
bring up the template chooser.
It takes a moment.
Two moments.
Three moments.
Here we go.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
Here we go.
Another shortcut I like
to use, if you take a look
at the choose button down here.
If I hold down the option
button it changes to profile.
What it means, when
I click this,
the application is going
to start recording.
It saves me a step or two.
I'll do that now.
Now the time profiler
will come up.
This is measuring the app.
I'll do some quick
scrolling back
and forth, capturing some data.
I think that's enough.
Let's go ahead, stop
the recording.
I'm going to filter to
specifically the scrolling data.
If we go ahead down here
looking at the detailed view.
This is promising.
I'm actually -- you can see,
there is multiple threads here.
The threads are doing work.
That's really good.
If we go ahead, if
I hold down option,
click the disclosure triangle
I can see what the thread is
calling, there is
some dispatch calls,
some CG calls, that's good.
That's the drawing code.
We'll go ahead, check
the other one to see.
Hold down the option key.
Dispatch, CG calls.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
Dispatch, CG calls.
This is good.
This is looking promising.
I'm multithreaded, in
theory my app is faster.
However, multithreading does
not necessarily mean faster.
We should confirm this is
actually doing anything for us.
One way to do that, I happen to
know this device is two CPUs,
if the CPUs operate in parallel
at max capacity I should
see a 200% CPU usage
in my graph up here.
I'm not seeing anything
over 100%, that's a bit
of a warning sign, it
doesn't necessarily mean
that they're not both doing
work at the same time.
It means that I need
to check further.
How do we check further?
Instruments has what we call
strategies, it is different ways
of partitioning the
data to look at them.
There is three of them.
The first one is the Instrument
strategies, the default,
we're looking at it here.
The second one is
the CPU strategy,
it shows the data per
CPU or CPU relevant data
and the final one is
the thread strategy.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
and the final one is
the thread strategy.
It shows you details on
what each thread is doing.
Let's look at the CPU strategy.
We can see we have
each of the CPUs,
we can see how much
work it is doing.
At the bottom, we see
the combined usage.
A nice thing to do here, when I
zoom in far enough, the details,
what the graph shows me,
it will actually change.
Rather than show making the
average usage, it will show
if the CPU is active or not,
it will go from average
usage, to either on or off.
Now each CPU shows an
on state or off state,
whether or not it is working.
What you notice here, the
CPUs are never working
at the same time, there
is no parallelism here.
You know, this is not good.
If we want to feel worse, we
look at the thread strategy.
What this is showing
us, each icon,
it represents a sample
the time profiler took,
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
it represents a sample
the time profiler took,
you can click on it.
See the call stack.
Here this is on a
background thread
and you see the core graphics
calls, here is the main thread,
you see the main -- the work
we're doing on the main thread.
As you see, if I zoom
in to the right level,
it is probably here,
I scroll around,
you see there is
not really anywhere
where two threads are
working at the same time.
It is jumping from
one thread to another.
So the drawsAsynchronously, this
has not really done anything
for us, in theory, it
may have slowed us down.
Now not only we doing the same
drawing work but also managing,
you know, core graphics system
managed the threads it is
working on, that sort of thing.
That didn't really help.
I'll turn it off.
I'll flip another table I guess.
It is not clear, the
magic button didn't help.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
What do we do now?
This is a really
common occurrence again
in time profiling.
You try lots of things,
most don't work.
We step back.
What does the app do?
It builds a path
and draws a path.
We have seen the draw path code.
Let's think about
the build path code.
That's right in here.
What we wanted to do, we wanted
to investigate the actual
path we're building.
What this code does, it
loops the data elements,
creating a path and
adds a line to that path
for each data element.
We want to know how many lines
we're adding to the path.
This is something that the
time profiler can't tell us.
It can't tell you how long,
or how many times
a particular method
or function has been called.
It doesn't know the difference
between a slow function
that's called a few times
or a fast function
that's called a lot.
In this case we resorted to
NSLog and we just have a thing,
every time we add a path
we increment our counter
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
every time we add a path
we increment our counter
and then we log it when
we're done with the loop.
Something important
to point out, NSLog,
it is not a very fast function.
You don't want it in
high performance code.
You probably don't want to
use it for anything other
than gathering diagnostic
information or debugging.
When you're done,
delete it from the code.
In this case we just comment
it out so you can see it.
What we have found, we're
adding 10,000 lines to the point
in cases where we did
not need to do that.
In fact, there is no way
to display 10,000
lines on this device.
Especially when you're
zoomed out far enough
that all the data fits
within 100 screen points.
There is no reason to draw
10,000 lines in there.
We need to draw 100 lines.
It is a lot less work.
We went ahead, we created an
implementation that did that.
If multiple data
elements, data points are
within a single screen
point it just finds the max
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
within a single screen
point it just finds the max
and draws a single line.
If we're using 100 screen points
we're creating 100 screen lines.
We'll go ahead and switch
to that implementation.
. . We're feeling
good about that.
We'll change the element
count up to the goal
of 100,000 rather than 10,000.
We'll see if that
helped us at all.
I'll use command-I to
start up the Instruments.
Since Instruments is already
open, it is just going
to bring it to the foreground
and start recording immediately.
Here we go.
A new recording.
We'll go ahead and scroll,
the scrolling seems fine.
I'll zoom out.
Zooming performance
is much, much better.
It takes longer because I
have more data to zoom out.
It is actually looking
pretty good.
I'm going to do the
swiping back and forth.
It is tracking my
finger really well now.
It is actually keeping up with
it, doing a fantastic job.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
It is actually keeping up with
it, doing a fantastic job.
Hooray! All done!
Except not quite.
If we look at the actual amount
of CPU we're using while
scrolling back and forth,
we can see, you know,
sometimes it is down to 60%,
it is usually in the 70s or 80s.
Technically we're meeting
our performance goals.
What are we doing --
what's the next thing to do
with this app or prototype?
We'll add additional features.
We know that we need more
headroom than what we have here.
How do we make it faster,
how do we make the app better
and meet performance goals?
We'll focus on this.
We'll filter to that data.
I'll give myself a little room.
In this case, I'm going to
hold down the option key
and click main and
expand this out.
I can actually go down here
and I can see this method here.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
I can actually go down here
and I can see this method here.
You know, now that the drawing
of the paths is fast enough,
it is the building of the path
that becomes the bottleneck.
I want to focus on this method.
I will click the focus button.
That just moves aside
everything outside of the method
and normalizes our percentages
to within this method.
This method is spending 55% of
time in the get next element
and 10, 11% of time
in objc msgSend.
Something that I
know, objc msgSend,
it is a super fast method.
It is super optimized.
But, if I can get that
10% back, I want it.
If we look inside of our code
here we can see it is actually
very clear.
Most of our time is spent
on getting the next element.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
Most of our time is spent
on getting the next element.
This percentage here, it is a
bit higher than in the tree view
because it includes
the objc msgSend time.
If I get rid of that and
make this iterator faster,
I hopefully can get the
performance boost that I want.
To give us ideas on how much
to do that, it is back to Chad.
>> CHAD WOOLF: Let's talk
about objc msgSend a bit.
It implicitly gets inserted
by the compiler whenever you
use the square bracket notation
or whenever you use
the dot notation
to access a property
on an object.
Its purpose is to look up
the method implementation
for the selector and
invoke that method.
That's a long way of saying
that's how we do dynamic
dispatch in Objective-C.
Objc msgSend is extremely fast
and does not push a stack frame.
When you look at
your time profiles,
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
When you look at
your time profiles,
you typically won't
see its effect.
The times that you do see it
would be a perfect example
like we have where we
see it in our iterator.
What we're doing, we're
iterating over 100,000 points
and calling it the get next
method with a small method body.
Just increments a couple
values and returns a structure.
What's happening, all
of that overhead time
from the Objective-C
message send is accumulating
into something that's
measurable.
Is there a way to get
around that overhead?
Not exactly.
Objective-C by design
is a dynamic language,
you have to make the
objc msgSend call
when accessing methods
for objects and classes.
It does this every time
because you can switch
out method implementations
at runtime.
There is no compile time in
Objective-C saying I want
to call in this particular
method body.
The only exception here, if you
do what's called method caching,
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
The only exception here, if you
do what's called method caching,
where you look up the method
implementation yourself
and you call it through
its function Pointer.
In general I don't
recommend that you do that.
It is fragile as
you can imagine.
In general, in my experience
it hasn't given me the kind
of performance wins I'm
expecting because you got
to think about why we're
here in the first place.
The reason we're here, the
get next element method has
that small method body.
Even if you invoke it through
the function Pointer you still
have to have to marshal the
arguments and push the frames
on to the stack and pop
them when you're done.
That's exactly what you saw
in the previous set of slides,
that can be a lot of overhead
with an increment
and then return.
I want to make sure I point it
out that method caching is not
as fast as inlining, what we
really want in this case is
that small method
body to be inlined.
How do you that in Objective-C?
Well, you have alternatives.
The first one, you
could have used C
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
The first one, you
could have used C
and you could have
used structures instead
of an iterator, you
could pass a C ray
into the method for example.
If you want that OO
flavor you can use C++.
The way you use C++
in Objective-C,
is you rename the file
from a .m to a .mm
and then you can use C++ syntax.
Because Arc is usually
on by default,
then you take Objective-C
objects and put them
in STL containers, and put
them in instance variables
on your classes and structures.
This is handy and you get the
performance benefits of C++
and I have used it extensively
in Instruments to get
as much speed as I could
out of the track view
and other critical
areas of Instruments.
I know from firsthand experience
also there is a major downside
to this.
That is that you have to know
ahead of time which parts
of your code are going
to benefit from C++
and which codes will
benefit from Objective-C?
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
Sometimes, until
you're doing profiling,
you can make a mistake there
as we did in the demo case.
We wrote our iterator
in Objective-C not
realizing it would show
up in our time profiles.
Is there a better alternative
than what I just mentioned here?
There is. You knew
it was coming.
Swift is ideal, because unlike
Objective-C it is only dynamic
when it notes to be.
If you make sure that the
performance critical classes are
internal and you use
whole module optimization,
the compiler or the whole
tool chain can determine
when there's only one
method implementation
and inlines it right into
the call site giving you some
significant wins especially
on the iterator case.
Because we're prototyping,
rewriting the iterator
in the View Controller
in Swift is easy.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
in the View Controller
in Swift is easy.
Kris has done that.
>> KRIS MARKEL: I have a Swift
implementation ready to go,
this is an easy port of the
Objective-C implementation
with a couple of the
suggestions they made
in this morning's session
about improving Swift code,
specifically turning on
whole module optimization.
Let's profile this.
Command-I.
It will build.
Install to the device.
It should just start profiling.
Okay. I'm going to bring
the application forward
so you can see.
Here is the scrolling.
Looking good.
Zooming out.
There you go.
Zooming out.
Staying nice and fast.
A lot of data to zoom out of.
Now, if I go ahead, move back
and forth, it moves really fast.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
Now, if I go ahead, move back
and forth, it moves really fast.
[Applause]
>> KRIS MARKEL: It is very nice.
Thanks. Actually we can
go ahead in here and look
at what the CPU usage is.
You know, we've got,
actually further improvement
than what we expected, we're
expecting 5, 6% improvement
from removing the objc msgSend,
this is lower and I can actually
if I turn down this disclosure
triangle you see the two runs
next to each other and you see
the previous run in the lower --
the current run, it
is clearly much lower.
Actually if I go
ahead, I go in here,
I look for my build path method
I have to search for it now, oh,
that's not how you do a search.
If I hit command-F, it will
bring up this dialogue here.
I can type in build path and
it shows me my method here.
If we look at this, you
see here is my Swift code.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
If we look at this, you
see here is my Swift code.
My get next call
which is right here,
it isn't showing
up in any samples.
You know, no samples
landed on this.
Why? It is because Swift
was able to inline it,
whip means that there's
no function overhead,
no method call overhead
whatsoever.
Because the code for the
iterator is inline with the rest
of the code it makes further
improvements explaining the
better performance
than what we hoped
for by skipping the
dynamic dispatch.
Chad, anything else to
tell these nice people?
>> CHAD WOOLF: We
have 5 minutes!
Of course I do.
Some tips for you to explore
Instruments on your own.
The first one to point out,
under the recording settings,
it's called record
waiting threads.
I mentioned that the service
and kernel we use
samples the active CPUs
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
and kernel we use
samples the active CPUs
but if you have threads sitting
around, blocked on a lock
or waiting for I/O, you
check this checkbox,
and the service will sample
the idle threads as well.
If you have code
that's contending
over a lock you see
the hot pods show
up when you enable
record waiting threads.
Another thing that
I find interesting,
in the display settings,
in the call tree section,
it is invert call tree.
Figuratively what it does, it
flips the call tree upside down.
Instead of seeing the leafs at
the bottom nodes of the tree,
that's functions that
don't call into anything,
they appear at the top.
If a utility function is
being called from 5, 6 places,
you invert that call tree to
see who is actually calling
into that particular function.
It gives you a different
perspective
on the data in the call tree.
When you right click a
node in the call tree,
you get a context menu and
there is interesting stuff
under there as well.
One thing that I use from time
to time is the charge to caller,
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
One thing that I use from time
to time is the charge to caller,
so what you can do, you
can charge a function
on method to the caller.
You can charge an entire Library
of Framework to the caller.
There's an option in there
as well to prune a subtree.
You don't want to work on a
specific problem at the time,
you can prune that out
of the data and focus
on what you want to focus on.
What did we learn?
Through all of this,
the first things I want
to remind you about,
is to incorporate
performance targets early.
If you're doing a big
performance rewrite
like we were, start with a
budget and keep monitoring it
because once you start layering
a lot of code on top of that,
it is harder to change.
Secondly, always measure.
Through all
of our demonstrations we
were taking time profiles
with the time profiler, we used
that data to find the hot spots
and retuning it,
by the end we had a
well-performing application.
If you go in blind, depends,
you may be lucky, hit it.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
If you go in blind, depends,
you may be lucky, hit it.
I would start with a measurement
and go after that first.
In the third one, this
is most important to me,
I want to encourage
you to keep digging.
Some of these performance
problems you look
at at first may have
seemed impassable,
you say that's happening
in someone else's code
or a side effect of the runtime.
The reason we gave you the
details on the time profiler,
the runtimer, what the
disassembly view looks like,
is we wanted to show you
there is a whole world
out there of more details.
By using that, you can solve
the performance problems
that we were solving today.
I encourage you to keep
digging, and with creativity,
going out there into
sessions like this,
I know you guys will
be able to fix
and hit the performance
targets you want.
That just about does
it for today.
Stefan Lesser is our
dev tools evangelist,
contact him if you
have any questions.
We related sessions, Energy
debugging issues, turns out that
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
We related sessions, Energy
debugging issues, turns out that
if you have efficient code on
the CPU it uses less energy.
There was that session
on Wednesday.
Also tomorrow, there is
performance on iOS and Watch OS,
good news, the time profiler
can target apps on the watch.
That's a big boon.
Tomorrow we'll be in
labs from 9:00 to 2:00.
Enjoy the rest of
your conference.
[Applause]