WWDC2003 Session 305

Transcript

Kind: captions
Language: en
thank you so let's get the clicker
working it's just cool just works when I
was last up here okay let's talk about
what we're going to cover in this
session first I'm going to talk a bit
about some performance analysis concepts
so just you know as you're going through
the process of thinking about making
your applications fast what are some of
the things you want to consider in the
process of that then we'll take a look
at some specific examples of uses of two
different classes of apples performance
applications on specific test cases so
we've got some high-level tools sampler
Malik debug things like that and then
the chug tools that have been talked
about some earlier in this conference
will take a look at both of those in
this session and where they're
applicable and also a little bit about
the integration with Xcode with these
now one thing that this session is not
really going to cover in detail is oh
you should not use this API and you
should use that API and you'll use this
one here and there's a lot of other
sessions like the carbon performance
session which I think was Friday morning
if I recall where you can get some
details on that and there's a lot of
great performance documentation that can
cover a lot of those details as well now
this is going to focus on the tools here
but first let's talk about motivation
why worry about performance you know
it's a selling point we see it across
the board it's a selling point for apple
with the hardware it's a selling point
with your application if you've got
competitors that do similar things to
your product and they're a lot faster
than you you know that's that's a big
competitive advantage for them and vice
versa performance problems may go
unnoticed I've seen a couple examples of
this in recent days where you know you
look at the system it's just sitting
there idle with a window up and maybe an
inspector up and then you look at the
CPU usage and notice that ninety-five
percent of CPU is being used but it's
sitting there idle what is going on
sometimes you actually have to look for
these things and detect them or this
issues of scalability that you don't see
when you're working with it with his
unit tests but when you sign into the
field and people are really throwing
lots of data at it then problems occur
remember you know unlike with mac OS 9
where you know you had control of that
at a certain point and you are not the
only app on the system there's a
potentially a lot of other things going
on system demons in the background etc
you know it's not nice to nice your
process up to try to fool the UNIX
scheduling and things like that yeah you
really want to make sure that you play
well with the other applications on the
system and finally you know you want to
start thinking about performance from
the get go with your application it can
be really hard to finish up your whole
development cycle and then sit duck it's
too slow and then try to come back in
and graph performance on at the end you
know you might want to wait and do
tuning later on I don't make don't
obfuscate your code from the beginning
with no purpose but think about
performance issues as you go along so
let's talk about a process for how you
might go about doing this one is within
your application take a look at what are
youse is going to be doing the majority
of the time what are the common use
cases that you really need to have
really fast so the simple one for
example is launched I'm have to take a
long time to open your application the
first time around the users might notice
that and say hmm this looks like a slow
application but there's you know a lot
of other examples as well of the common
cases and then what you want to do is
really define some goals for what your
performance goals for specific cases are
let's say you're starting up some
operation and it might take a bit of
time maybe the responsiveness is a big
issue that how fast can the user get
control again may not be done with the
operation the least it lets the user get
back in and proceed with other things in
the
for specific jobs and processing what
your throughput how fast can you finish
certain things and then again
scalability okay it works this well with
the thousand objects what happens when
we take 10,000 or hundred thousand
objects or transactions through this
system I don't have any N squared or
worse algorithms in here do I now then
you want to establish some specific
benchmarks here and what I mean by
specific is you know define precisely
what hardware you're looking at you know
what's the common hardware that your
users are going to be using is it going
to be an 800 megahertz power book or a
dual 1.4 g 4 or maybe both or a 2
gigahertz g5 then define specifically
what you're measuring give an example
with we've been looking a lot at compile
time lately with the compiler and we
look at single file turnaround time well
we have to kind of define pretty
precisely what file are we looking at
because different sized files compile in
different periods of time and if we
measure different one each week we'll
get random results essentially now once
you defined your benchmarks then you
want to add some instrumentation in
maybe specific API calls to start in the
end so that every time around it's real
easy to collect the statistics of how
long did these operations take you want
to be able to measure this on a precise
basis time after time and that's the key
to throughout your development process
tracking results as you go along people
ask you know how did Safari get so fast
how did they do this they tracked
performance throughout their development
it was a key issue from the beginning
and they never allowed performance
regressions to get into their code I'm
sorry that makes it slower I mean each
engineer was required to run the
performance analysis test suite in
Safari before they could check their
code in and if we've made it slower they
weren't allowed to check it in it's a
great feature but I'm sorry performance
is the number one feature so it doesn't
get in and lets you make it
faster and finally once you've gone
through all this effort then out popped
the hot spots but then you can go in and
start to tune these where it really
makes a difference because you know he's
probably all observed that we can tend
to be notoriously bad at guessing in
advance where our performance problems
are going to be G was fun to spend a
week optimizing that particular routine
but it made no difference so when I say
Safari it was really a part of the
process they actually embedded into
their application and into the
development versions internally and
instrumentation in the form of panel
here that their engineers and their QA
staff and managers could pop up and run
tests at any point through the
development process it really made it an
integral part of what they were doing
because it was so important to them they
could do things like check for memory
leaks and sample directly from here so
you might want to consider adding that
kind of thing to your application so
when we talk about benchmarking what
kinds of things might you want to look
at there's a wide variety of factors
that play a major role in performance on
Mac OS 10 you've probably heard us talk
time and time again about memory use so
you have a limited amount of RAM space
on the system and watch we eat through
that then we're starting to page out to
the disk and that's a lot slower so if
you're using a lot of static memory or
leaking memory that can be a big problem
so measure that maybe you're not
actually using that much more static
memory over time but your dynamic memory
how much you you used during this
particular operation really spiked up
and that can cause problems so that's an
area from measurement CPU use now I
mentioned the launch time there's other
things that are you know fairly obvious
gee this is one of the major operations
of my system how long does that take if
it's a fast operation maybe you want to
scale it up and run it 10,000 times and
measure how long does it take to do
10,000 runs
again idle time you're not the only app
on the system if you're not doing
anything in your app it shouldn't be
taking any time and then the spinning
watch cursor is a spinning rainbow
cursor that shouldn't ever come up on
our system and well it does occasionally
in my apps and might and yours but let's
go fix that will show you some ways to
tackle that drawing it might not be
obvious but sometimes you're drawing too
many times to the screen we've got some
great tools to take a look at that now
that we're doing live resizing of
Windows or live resizing of split views
are you getting smoothes resizing during
that you know there's a variety of
things for considering for benchmarking
so so once you've identified your
benchmarks then you need some tools to
take a look at the issues so we've got a
variety of tools on the system for both
monitoring what's going on and then
forgetting n saying okay I see I've got
problems with cpu usage where is the
time actually being spent and so we can
look at memory use cpu behavior and
resource usage like file systems and
system calls and drawing so whatever
cover a lot of these tools as we go
through one thing to bear in mind is we
think about performance is it there's
actually a lot of different levels of
performance in the system that can make
a huge impact on your overall
application so let's think about layers
of design abstraction your application
architecture if you're a multi-threaded
applications you get deadlox between
your threads we have tools like thread
viewers to take a look at that maybe
your multi process and you're getting
Network hangs if you've got the complex
object oriented architecture are you
sending too many messages between the
various objects or maybe one object is
acting as the bottleneck for everything
a god object that everything has to go
through these are sort of architecture
level issues that you might want to
consider from the beginning then within
a specific
module this is a class you can think
about things like your data structures
or algorithms are you allocating too
much memory here in this process or the
the algorithm itself a poor algorithm
for scaling up what's the interaction
with the OS again the documentation
covers a lot of things like this call
and carbon to enumerate the directory
structure is slow you might want to
consider using this instead then
bottleneck routines once you've isolated
it down okay we seem to be spending a
lot of time in the Shrew teen so on the
right of this diagram here we show that
we've got a number of high-level tools
that you can look at some of the higher
levels of the design abstraction once
you get down to things like the
interaction with the OS and bottleneck
routines then the sampler profiling tool
and the shark tool from the chug package
that we'll talk about later start to
kind of overlap in their capabilities
they both let you do profiling and look
at things in somewhat different ways so
both can be helpful when you really get
down to trying to optimize the use of
your processor memory shark is a great
tool for that plus other chud tools and
then Activity Monitor lets you take a
look through everything as well so we'll
be taking a look at a number of new
features on the system on the user CD
there's a new activity monitor
application that replaces the cpu
monitor and process viewer and things
like this really nice application that
the core OS team did spin control a new
application to see what's going on when
the cursors watch cursors spinning i'll
take a look at the integration of the
tools with xcode there's a number of new
features and samplers that will take a
look at and then with shud where you can
really get in and see what's going on
now with the g5 in addition to the g4
things like that so with that let me go
ahead and turn it over to Robert boat
edge performance engineer for looking at
some of the specific tools
okay what's the first thing we need in
order to actually demonstrate the
performance tool cancer we need a victim
and the victim we've chosen this year is
the sketch application this is a small
cocoa application that's available on
the developer tool CD so for those of
you who've seen us using carbon app
through all the xcode demos today this
allows you to realize that the tools
actually do work on cocoa as well now if
you actually go and look at sketch you
won't see any performance problems this
is a program that's intended to do
simple line drawings you know draw a few
rectangles put some text in maybe do an
org chart but if you look at it you
don't necessarily see any serious
performance problems the guys who wrote
it did a pretty good job of making it a
typical cocoa app with no performance
problems so we need to add some
performance problems and and actually
the way that we did at this time was
rather than adding some some assaulting
some bugs in there we decided to try to
increase the scope so rather than trying
to do small drawings we said well let's
imagine our boss comes into our office
and says hey you know that that Sketchup
that's really good I think we could do
architectural software with that and so
suddenly instead of drawing tens of
rectangles five rectangles we're drawing
thousands of rectangles and the question
is what's going to happen are we going
to find any performance problem are we
going to find that our memory uses a
heck of a lot more than we ever expected
are we going to find CPU problem where
we're running too much code and
hopefully this is a situation that many
of you run into in your own code as you
look at applications and find out that
on certain data sets it doesn't quite
behave as you expected so let's take a
look at that so I'd like to run Christy
one up who is the performance engineer
for the text team to actually do a
demonstration for us to actually start
actually let's go on the slide for a sec
thank you so one question is how you
actually find the performance problem as
Dave gave us an idea of some of the
processes that you might go through
whether that's looking for regressions
or following a certain pattern of
measuring certain things every time but
sometimes you don't have that sometimes
you start with a new application and
you're not quite sure where to start
looking so the way I like to start and
the way our vice president likes to
start is to use either the command line
tool top which hopefully you've seen in
previous years or thanks to Erik Peyton
and some of the folks on the core OS
team we now have a new tool called
activity monitor which gives us a way to
look at this if we could switch to the
demo machine now thank you okay so we
have activity monitor over here on the
side and the way activity monitors
divided up is the information at the
bottom of the screen represents the
system-wide information about your
computer so in this window we're looking
at system memory and one interesting
piece of number here is the cajuns page
outs down at the bottom which represents
the amount of swapping your virtual
memory system is doing how many pages
are being written off to disk the other
things the wedge the numbers here
represent how how physical memory is
divided up on your system how much of us
used for user stuff how much of its used
for the colonel how much of the memory
is wired down because their structures
the colonel doesn't their page out like
the virtual memory system the other tabs
for example CPU gives you an idea about
how how much work the CPU is doing in
general kind of like the cpu monitor
application does and the other tabs for
disk activity just usage and so on also
give you summary data the information at
the top gives you details about specific
processes and so we can see activity
monitors sketch and so on and we get
information not only about what's
running but how much CPU usage they're
doing and we can sort this list
according to what's the most cpu
intensive or we can look in terms of
process name or hierarchy in the process
groups so Christie already has sketch
running and we can double click on that
entry to get a little more detail on
sketch and the important numbers here is
the
pretend CPU as usual and the private
memory size down on the bottom now
private memory size is kind of is an
interesting number it represents the
amount of resident private memories
that's being used by this application
that is the memory that's resident in
physical memory and the memory that's
only needed by your app and so this ends
up being a nice number because it
represents sort of the footprint of your
application because that memory is first
of all only based on what your
applications doing and secondly its
memory you can control its the memory
being used for the heap or it's the
memory that you're allocating via vm
allocate and so it gives you a good idea
of what your fault is and how much you
can reduce as opposed to the others
which tend to have a lot of details of
amount that you can't actually reduce so
we can see here that just having Sketch
up took up about 1.6 one megabyte it's
not great not bad that'll do so if
Christy can now load one of our
architectural drawings we have a factory
here okay we're going to build factories
and when we went to art lens the
architecture or the architect goes to
the customer and says here's your
factory the customer says oh I want six
floors not three okay we can do that we
can select the entire building we can
copy it and we can taste it now we have
six floors I don't know that's not
enough let's make it twice as wide so
let's do it again we'll select all will
copy and copies taking a little while
that's that's not good and we can taste
and so we're drawing a couple thousand
rectangles here to draw that building
but we're already noticing a couple
issues one was that coffee with getting
a little slow and we're going to find
out it actually gets a lot slower to go
along but the other thing is if we go
over and look at Activity Monitor we
find out that we're actually using 7.6
megabytes of memory okay so 7.6
megabytes minus 1 megabyte or 1.5
megabytes we used about 6 megabytes of
memory to do those two copies and pastes
okay so we've got a performance problem
here we have a problem in in what we're
doing in terms of the copy so in terms
of tea
you and we have a memory problem because
we switch back to the slides please Oh
another interesting thing about activity
monitor is because it's looking at the
entire system that means that you can
see what's going on in other processes
and one of the things to remember is on
Mac OS 10 your application your
applications of work on the system is
not just a matter of what your
application is responsible for there are
other processes whether they're little
demons that are on the side or more
importantly things like the windows
server where if you're doing a lot of
drawing now your application may only be
taking up sixty percent of the CPU but
the windows server could be taking up
the other the other forty percent so
when you're looking at activity viewer
you also need to look at the whole
system to understand what else your your
application may be doing so that you can
either find other ways you might be able
to optimize okay so let's attack the
first problem what do we do if memory
you seems a little high well why do we
care why don't we just like use as much
memory as we can this will at least make
the people who sell tims happy well
there's a couple reasons for that one of
the ends and generally using too much
memory is not a good thing one of the
reasons is your applications flow
because suddenly all the data that you
want the CPU to be processing as fast as
possible especially on one of these g5s
it can really race is can't fit in cache
or you'll chase it out and so suddenly
you're having to rely on the speed of
the main memory instead of the cash and
so you want to keep your your
application as memory lean as possible
so that you can have as much as possible
in the cache if you're not using the
memory well then it's a sort of wasting
space because the it's sitting in in
physical memory and maybe you're not
touching it and if I come along and I
start playing itunes or I start running
I photo or I start using nail or I start
doing Safari which every one of your
customers is also doing when they're
running their app that means that when
Safari needs more memory to put in some
big page some of your pages may have to
get forced out of physical memory and
written off to disk and so the computer
is going to have to do a lot more work
just because you want to keep that
memory around
so you want to keep your memory
footprint will slow for that reason and
if you've forgotten about the memories
you can allocate it and you've forgotten
to get rid of it it's even worse because
you know you can't free it at that point
and this was going to get copied around
on the disk and because of the virtual
memory system you can actually run in
some rather interesting problems where
you might not have expected things to go
as badly as they did so here's an
example let's imagine we've got some
really large filing on 10 megabytes or
100 megabytes and reading it in when we
need it seems a little slow okay well I
know what i'll do i'll just read it in
before i need it so that it's available
i'll read it into memory in that way
when i need that file is right there and
the problem with that is that what
happens if i go off and i run i photo
and i run itunes when i run mail and
everything else okay those start to need
memory and so some of your pages that
you've brought in get chased out to disk
and then when you actually need that
file or those that part's representation
of the file say suddenly it has to be
brought in off a disk again and so in
order to save that disk read that you
did you've now read it in memory written
it out to memory and read it back in
which is that and really inefficient so
you don't want to do that you want to
try to keep your memory footprint as low
as possible and you want to do that in
terms of both the memory you use and the
memory that you've forgotten about them
that you're leaking now there's two
tools you can use to do this one is
called object alec and it looks at your
memory youth in terms of how many
objects you have and the second one is
called malice debug and it refers to
things it refers to allocations in terms
of where they are so you can see
particular places in your code that tend
to allocate a lot of memory and let's
take a look at the first of those object
alec naturally let's switch to the demo
screen
okay so here I am I'm running I'm
running to get your next code because
Xcode is really cool and I want to go
and do some performance analysis okay
how do you do that well the first step i
usually do or at least the first step i
always hear from everybody is go hunting
around on just trying to figure out
where the performance tools are actually
who knows where the performance tools
are okay the developer tools are in
developer applications that's nice the
performance schools are there too the
problem is you've got to go hunting
around form you've got to use the finder
which was computer centric and not human
Cedric and that kind of thing and that
wasn't very good I was gonna use another
word but I won't say that and so what
and we've improved that so now what you
can do is you're going along and you say
I want to look at performance and you
can go up to the debug menu now there's
now an entry called launch using
performance tool and it will list the
three per point it'll list the
performance sort of skews have I known
this would we known this would make
people happy and in fact if you actually
did install the chudd tools which sadly
I did not because I wasn't a good person
you'd actually have shark there too and
I suggest you install sharply you can
actually see it on that list and so we
can launch object Alec here and here's
the object Alec window and let us launch
sketch in it
and what object alec does is it
instrumental code it runs it you answer
a few questions but it keeps track of
how many objects have been created and
it updates that constantly and it shows
not only the current number of objects
that type that exist but the peak number
that have ever existed during the
lifetime of your program the total
number you allocated in any during the
entire program and so we can go to our
little example we can open our factory
and we can see that we're creating huge
numbers of CF strings and all sorts of
other things as part of doing this work
and here's our factory so let's again
due to our select all and copy and our
pace yeah an object alec is doing it
good work and you can see that things
are updating and let's do that again if
we could now you can notice that to the
far side of the numbers there's some
histogram there's some bars there
indicating how many objects you have
graphically and that's very nice because
that gives you a way to directly
perceive how fast things are changing so
you can see oh my god I'm creating a lot
of these objects really quickly and the
colors actually have meaning because if
it's colored yellow then that implies
that the current number of objects of
that type is only about twenty percent
of what the peak is or less which
implies that you created a whole bunch
of money back dog which may imply that
maybe you're not auto releasing things
quickly enough or maybe you're just
creating a huge number but it's
hopefully going to make you look at that
to try to figure out why you had so many
and the red indicates that you have have
only ten percent of peak value okay so
we've now done our coffee and we can go
over and now what we do is what we do
and all what we should do in all the
performance tools what we're doing is
we're looking at these basically
scrubbing our nose against the data
looking for something that looks
suspicious because you know the
performance will can't really say oh you
know here this is the problem you know
if you fix this piece of code you'll be
happy you know in general it tends to be
much more of a you look at it and you
say oh gee I didn't expect that why is
that happening and then you go and track
down the bug and what we can see on this
immediately is that the second most
common object after general block 14
that is mallets of size 14
because object Alec will look at both
Malik and CF object and objective-c
object is we can see that we have 4000
NS invocation objects and it's
indication that's not in my code and in
fact you know not only do we have 4000
of those things but if we check countess
bites we find out that out of 2.9
megabytes of memory that are used for
all the objects 800k of it is used for
NSN vacations so about twenty-five
percent of my memory is because of these
that's odd well object Alex gives us a
way to track that down so what we can do
is go over to the instance browser and
we can select NS invocation we get a
list of all the objects of that type and
if when we launch the application we
happen to see check the little box that
said keep track of retains and releases
for that object we would see all of the
times that we did a retain in
objective-c and did a release on that
object so that we could find over
retained or we could click on allocation
event as Christie's done here and we can
take a look at the back-trace indicating
exactly where that object was allocated
and we see here that it's allocated in
our select graphic object now this also
shows another feature that's new in the
performance tools that in the past you'd
find some simple and you wouldn't be
able to track down where it came from
now we take a look at the stabs
information the debugging information in
your binary and if we can find the
location of that function we actually
will highlight in the performance tool
either with a little file icon or by
underlining it and so you can double
click on that and project built and
Xcode will actually show the code for
you
and what and what we find out is that
the NS invocation objects are being used
for Vienna undo so every time that we
select a rectangle that we copy it that
we paste it for each of those thousands
we end up creating and creating an NS
invocation object to handle undoing that
at the end ok so we're creating
thousands of these things ok so you know
this is an interesting problem we've got
some solutions here you know we could
just decide that the the undo support in
cocoa is just such a big win
productivity wise we just don't care
that's a fine answer if we really cared
because this is we're going to be doing
lots of architecture stuff then maybe we
actually want to change this and we want
to create our own undo mechanism or a
third option is we could say why are we
allowing undo to select because the H I
guidelines don't require us to and so
you could actually get rid of that so
this is one of the ways that you can
step through finding something
suspicious and tracking down where why
that's happening and tracking it down
your code to understand what the problem
is and that's how you can use
performance tools so can we go back to
the slide please
there we go okay second question what we
do on the cpu seems too high first of
all why do we care again well answer if
you're doing something that's taking too
long it's not only making your
application look bad but you're going to
make my itune skip and I don't like that
so so you need to worry both about your
own application and how it performs and
how you're affecting the rest of the
system because you're not alone there's
lots of other things running on all our
computers so there's a few tools you can
do to track down member CPU use one of
those is a sampler our profiler sampler
can also be used to look at what's
called dynamic memory footprint is
krissy puts it which is a way of
understanding where your call to malloc
are and using those as suggestions of
where you might be doing too much work
and there's also a full called spin
control that's new on the release that
gives you a way to automatically sample
when you look at when spinning cursor
comes up and I'm going to show all three
of these and force the bug look at it on
your own so with Sandler a sampler is a
statistical profiler technically and
what that means is that every 10
milliseconds 5 millisecond 20
millisecond sampler stops your program
and says hey what's going on and it goes
and it gets a back trace from every
thread that's running and looks to see
what work is going on so it gathers a
back trace and then it lets the
application run for a little while
longer and keeps doing that and again at
the end of the sampling if yeah there's
all those back traces together and
smashes them together into a tree so
that you understand the range of ways of
your applications behaving and then it
presents in the graphical way now
there's a couple things you need to
remember about sampler because its
statistical because it's only stopping
the program at times it doesn't know
what happened in between and so that
means that it may not catch all the
functions though any function should
appear in in the samples in proportion
to the amount of time that it's actually
spending running now if sampling every
10 milliseconds isn't good enough for
you that you need find a resolution then
you should try using the performance
pool shark which you'll see after this
and if you need to know about every
single call then you might want to
consider actually using G prof which is
a standard UNIX profiler which requires
you to actually recompile your code so
let's take a look at sampler could we
put
for the demo machine again okay so we
have sampler up up in the upper left
hand corner now on the new version this
is a new UI for this release and up in
the upper left hand corner you get the
type of sampling you're doing you can
sample either based on time or you can
get a back trace every time now it's
called or you can look for specific
function calls we're just going to do
time samples here and what we'll do is
we'll launch sketch in this and we're
going to look at that coffee because
coffee seems like it was going a little
slow and I don't like that so we open up
the factory again and let's do what we
did before and so we'll do a cup so
first of all we need to start sampling
and if you remember how sampler used to
be you actually had to switch over to
the sampler window and press the button
and go back to your application and that
was annoying because you often would get
lots of garbage because of having to
raise the window data that you didn't
care about sampler now has a hotkey so
Christi can actually hit command option
control our to start and stop sampling
thank you thank you we appreciate it and
she can do the copy and paste and again
and crispy can stop now we can go over
to sampler and we can try to take a look
at what's going on now this is the way
that you used to look at samplers well
with a browser and the browser has some
good points and bad points however there
were a lot of people at Apple who
actually would write their own tools to
sort of parse this data because they
like to display it in an outline view
and so crispy actually was very nice
enough to actually put in an outline
view so make sure to thank her and so
you can actually look at the outline
outline do and actually turn down
triangle to see your call tree so for
example we can see here that in every
one of the samples we were in Maine
Maine always called NFC application
Maine and so on and we can sort of step
down in there snooping around and we can
find where we're calling into the menu
code which is right about here now
actually and one of the things you saw
was that accounts were originally in
terms of samples how many times the
program had been stopped and Chrissy
actually just switched this so it was
actually showing it in terms of time
which is tend to be a better way to
actually understand it even though
remember that statistical and so you
can't say that took you know that took
0.01 seconds and what we find is in that
time that we were doing the sampling we
spent about two point four four seconds
and copy okay that seems a little odd
and we can actually turn down the
triangle and see where the time was
spent in coffee and it turns out that we
called for routines there that took all
the time gee that's weird well luckily
we still have that way of linking to the
source code because I can't understand
it from this point of view and so
Christy can double click on copy there
and we get our source code and we find
out that what's happening is that when
we do that copy we create a PDF file
well PDF clipping we created tips
clipping and we create the sketch
internal version of clipping okay this
is because Coco has this nice feature
where you can say hey I can give you a
clipping in any of these formats okay
that's very good because then the
application that you're pasting it into
can say I only work in PDF I needed to
if I need or whatever and it works but
for when we're doing thousands of
objects doing all three tend to be a
little wasteful and so a better way of
doing this would be to use another
feature of
app kid which is to basically say here
the things I'm going I can produce but
i'm not going to produce until you ask
for him and so we could change this code
so that we only said these are the types
of clippings we create and then only
when somebody did the pace would we
actually create the clipping for that
and so that would get rid of this
performance problem we'd make coffee a
lot faster at the expense of making
paste a little slower okay could we go
back to the slides please now the
performance tools as I said are very
good for exploring your data they're
good for looking around and trying to
figure out what's going on but you know
that's not always the best way to work
because when you found a particular
performance problem such as a safari
people found that they really cared
about page time about the page load
times nothing else having to run sampler
every time to gather the amount of time
would be wasteful and so if you know
what you're going to be measuring try
instrumenting your code putting in print
statements will learn how much time was
spent or automatically logging that time
and this is really good because it means
that you can automatically gather
statistics so that you can check for
regression and it means you're always
watching exactly what you want to be
watching and there's many ways you can
do this there's a number of api's in Mac
OS 10 for looking at time some of the
ones that are interesting are up time
which tend to have a nanosecond
resolution or you can use get time of
day if you prefer bsd or the nsdate
class if you're in objective-c and if i
actually when i actually did this on
copy actually strange because i didn't
expect to find anything like that well
let's do this I actually found that I
actually tarde graphing out the amount
of time spent for each of the each of
the clippings and for this task at the
beginning which was called ordering the
list which was sorting the things that
you clipped in from back to front okay
so first one you know the PDF was long a
second one sketch was the longest and
then suddenly when I got to about 4,000
object suddenly the sort would take
forever and unless I actually measured
this and unless I tried it on bunch of
different sizes I never would have seen
them and so this is one of the
advantages that advantage 'as of
instrumenting is it makes it very easy
for you to check
see when things go wrong and why they
are and if you actually look at the code
what you find is that sketch had a
sketch was basically made for dealing
with sense of objects and the way that
it would do the ordered list is it would
use the NS array sort method for those
of you who are objective c fans and so
it we basically said hey go sort this
and there and you had to provide a
comparison routine to be able to say
this is how you compare two of these
rectangles and the way the rectangles
would be compared is it would say hey
what's the index of each of these in the
big array that list everything that's
being drawn okay is this the first one
that fifth one the tenth one that's
pretty efficient accept that that code
would take your NS array and it would
make a copy over here in the nice static
array so that it can do the search real
easily so we'd have to now huge amounts
of memory and then we'd have to do a
linear search so that meant that the
comparison was an order and operation
which meant the sort ended up being like
an order N squared log in or something
about like that and so you end up with
this funky thing we're sort looks really
fun until you got about four thousand
element and then suddenly it was huge so
this is why you want to instrument now
another thing that we've now looked at a
couple ways that you can go looking for
things that are suspicious now one of
the interesting one of the interesting
things about big objectives object
oriented systems is that you tend to
have a lot of layers because you've got
this thing called instrument or
information hiding which is great and so
you don't really know how people in the
minded thinks below but you sort of hope
they're doing the right thing but the
problem is that often they make
assumptions you don't you're using the
API in ways that they don't expect and
so some calls you might be making into
some layer might go yell all the way
down to the bottom of the system and
back up and take huge amounts of effort
or you may have something that you think
is on inexpensive like that that sort
that ends up being a very expensive
operation so Chrissy actually came to
apple and suggested that one of the ways
that we should be looking at systems is
to be looking for this kind of
repetition because object oriented
systems tend to suffer from this and one
of the ways that you can do this is you
can look for Malik's because Malik's
tend to be time consuming and memory
intensive operations and everybody uses
them all over the place and so if we can
poke around and see where Malik's are
being called
we might be able to see where we're
doing repetitive work we really don't
intend to be doing so let's switch over
to the demo machine okay so Christie
will now switch to watching memory
allocation and to using what's called
the trace view which is a way to
actually look at these Malik's in a very
interesting way and we can go back over
a sketch and we're going to do a very
small example because when you're doing
sampling by time you want to have lots
of stuff that you hopefully find all the
functions you're looking at here we're
looking at every single call so you want
to make your example relatively small
and we're going to look at what the
Malik's are being what now it's are
being done when we do our copy and so
Christy has the two rectangles there
she'll hit the hot key to start
recording do the copy stop recording and
we find out to do that we required 6,000
Malik's you know and this probably isn't
that unusual but you know it's big
system and there's a lot of things going
on and in fact if we poke around the
idea is that this graph actually shows
you the height of the call stack going
to each Malik so how many functions you
had to get to before you got to Malik
from Maine and if we zoom in on one of
those will actually find that you start
seeing these repetitive patterns see how
it's kind of like a EKG and so it's a
good gift a bit at very regular patterns
which implies that there's actually some
very regular operation going on there if
we're seeing that signature over and
over again and in fact we go look we
find out that worked down in some code
that's parsing an xml file and it turns
out that when we do a clipping and it's
a PDF file the PDF file has to get
information on the printer because the
printer is used for the size of the page
and the printer ends up going through
the cups daemon and the cups daemon ends
up giving us back xml we have to parse
the xml and so we do lots of Malik's and
we never would have known this and it
might not show up in sampler but this is
a way to understand for what costs are
and some of these are cases where you
might be able to say oh gee I shouldn't
do that and a lot of those are cases
where Apple needs to say oh gee we ought
to fix that and we can actually fix it
for you so you know never run into it
okay can we switch back to the demo
flight or to the sludge please
okay the final demo I'll do today is a
spin control which is a new application
so the problem here is that in general
when you have when your application
takes too long to do something when it
keeps them when messages coming from the
windows server don't get responded to
within about five seconds usually the
windows server puts up the spinning
cursor so usually this implies that your
applications behaving badly it's not
responding quickly enough for the
windows server and so these tend to be
bugs you know you're doing too much work
the problem is you can't sample them
because first of all they're they're
sort of difficult to catch because they
tend to sort of appear and disappear and
even if you could get to sampler usually
your machines doing other things because
they're just any cursor up and so
there's not really a chance to actually
go and attached to it and so the idea is
that spin control automatically samples
your application for you so let's switch
back to the demo machine so christie's
launched spin control which is in
developer applications and you have to
go find that yourself sadly and it has
it basically keeps a list of every time
that it detects a spin and you can set
it for only one application or all
application and we can do that copy that
we were doing that was causing us all
that grief so we can select all again we
can copy and we can paste we can do that
again and sometimes you actually need to
click on the window so that there's a
window events that you might need to
notice that's usually when the spinning
cursor comes up and we can see here the
spinning cursor just came up because we
copied one of those things that takes
800 seconds hopefully not I think I need
to get off the stage soon and it
automatically sampled it and now we
could copy that and paste it into email
to send to a developer to say
something's wrong or we can double click
on it and we get a sampler like view
where you can actually look at the code
and in fact we can go and see that we're
calling oh boy that's nice we're in copy
which turns out to be in NS array which
ends up calling CF array get valuate
index just like I was explaining so I
wasn't lying so spin control gives you a
way to see the invisible let's actually
see
the kinds of things that you otherwise
can't sample so this is a cool tool try
running it on your system leaving it up
and seeing what you catch can we go back
to the slides please thank you very much
there's a number of other tools that
that you need to check out yourself we
don't have time for everything sadly
hopefully you've seen these in previous
years if you've been here if you haven't
you know take a look at some of these
tools take a look at the performance
book to find out how to use them but
they all will have they're all built em
all valuable in interesting ways they
might be able to help you on certain
types of problems and you need to
explore how to use them and which sorts
of problems are best found using any of
these and make sure to watch your
application and with that I'd like to
bring up Nathan slingerland to talk
about the cut tool which is allows you
to look at code one level deeper than
what we've been looking at now good luck
yep relax there we go ok but as Robert
said Nathan slingerland and I'm going to
talk to you today about the shed tools
or computer hardware understanding
developer tools and these are tools
written by the Apple architecture and
performance group their performance
suite of tools that give you low level
access the performance monitors so these
are counters that are built in to our
hardware and the processors memory
controller operating system like that
and using these counters you can find
problems in your code and improve your
code and of course the chudd tools are
freely available with developer tool CD
there's you can bring up shark and Xcode
as as you saw and there's really
available on the web tube so you can
check their for update if you were here
last year we introduced shed tool 20
we're happy to have 30 this year with a
lot of great improvements shark if you
is an instruction level profiler so if
you've ever used shikari from the older
chat tools shark is the successor to
shikari monster is
spreadsheet for performance event so you
can look at these counter results in
either spreadsheet or chart form and
Saturn is a new tool for visualizing
function call behavior and of course we
have a set about other lower level tools
that you can use for tuning things like
alphabet code a very CPU intensive code
that you want to simulate using sim g4
or soon mg5 that will let you see
exactly what's happening at the lowest
levels on the processor and of course we
provide the chat framework API so you
can write your own tools or control the
judge tools so the performance counters
as I feather in our processor and memory
controller and operating system and what
they do is they count interesting low
level performance events though things
like cache misses on the processor
execution stall cycles page faults in
the operating system and Chad let you
let you control these and view the
result so the first tool that we're
going to talk about that uses these
counters the shark the shark is a
system-wide profiling tool and using
shark you can profile a process a
particular thread or the entire system
and in the most general usage of shark
you can create a time profile so this
lets you visualize performance hotspots
either you know in your code or not you
can see if if your hotspot your
bottleneck is actually in your code
using this you can also use unit to find
event profile so you can relate
performance events things like cache
misses to your codes find out where
cache misses are coming from cap you
captures everything drivers Colonel
applications what this means is if your
driver writer or kernel extension writer
you can use shark to see the call stacks
and find out where the time is being
spent in your driver and we're very low
overhead because we are handling
everything in the colonel in addition
once you have your sampling session
taken we provide automated analysis we
attempt to annotate your source code and
just the disassembly of that source code
to point out common problems and other
things that you can do to optimize your
code there's a static analysis feature
to find suboptimal code so if you were
in the earlier chud session you know
that there are some instructions that
are on the g5 that are we need to look
for and watch out for and this will help
you find them and we also provide
optimization tips so it says the
scriptable command line version you can
telnet in and sample things and of
course you can save and review sessions
and pass those around so without further
ado that's the best way to see this how
to use the chat tools and shark is to
have a demo so for that we're going to
use the noble ape stimulation this is an
open source program written by Tom
barber leg and to help me demo I'm going
to bring up Sanjay Patel also the
architecture and performance group okay
so the first thing we'll do is we'll
bring up no Blake okay so here we are
we're stimulating thinking Apes on an
island a tropical island and this map
window is showing us an overview of the
island and the rim little red dots each
red dot is Nate and we can look at it
can focus in on 18 a tape at a time
that's the ape with the red square
around them there and the brain window
to the right here shows what his brain
is how the how the changes are occurring
in his brain when he's walking around
the island and thinking about things so
you know our every good performance
study of course requires a metric in our
case that's eight thoughts per second
this turns out to be the the brain
functionality is a performance critical
in this application so this is our
metric it's about this is running on a
powermac g5 two gigahertz dual processor
machine and we're seeing about 1208 dots
per second great shape so the first
thing we'll do we'll use shark to see
what's happening in the system when we
run while we run 90 bleep so this is the
main a shark window by default we go to
the time profile their other built-in
profiles of course to take advantage of
performance counters but for now just
use the time profile and we also have a
global hotkeys put shark doesn't have to
be in the foreground to use it and
either so let's let's
sample 55 or so seconds and see what's
happening
okay so here's the the profile lifting
the important functions from most sample
to leaf sample and the little right
lower left here we have the process pop
up and this lists all the things that we
sampled during this time period right so
at the top is no belief and we kind of
expect that we know that our simulation
is CPU bound but it's only fifty percent
of the time and it kind of wonder well
okay why is that well if we go to the
thread pop up we can see that in fact
this application is single threaded and
because because it's single threaded
we're not using half of our dual
processor machine so our first step and
optimization with hey let's thread this
thing with a we used the carbon NPAPI
and threaded threaded novalee let's see
what the performance improvement was
like do you remember we had 1200 dots
per second before and we getting almost
double that so that's pretty good but
let's profile again and see what we can
do with this code
great so now we can see that we're
taking up a much greater portion of the
time on the machine and that that's
reassuring we want to do that for our
simulation and and we can see that we've
spawned these threads now we've got two
we've got the main thread at eight
percent and then two other the threads
that are processing the Apes in parallel
forty percent of peace so the next step
we can do is we can double click on any
entry in this profile view and it'll
show us our source code colored with
where the samples were taken so what
this tells you is what lines of source
code the most time was spent on right so
if we look here this the scrollbar also
gives us a way to jump quickly to the
hot spots for this the hot spot is
literally just this function just this
piece of the function is for loop inside
of the cycle troop brain scalar function
so it turns out that this is about
ninety-four percent of the times we
highlight this right so the if we look
shark gives us a hint on how to how to
fix our code or how to make it better we
click on this little ! it says okay this
this this loop contains 8-bit integer
it's taking a lot of time you're
spending a lot of time in this loop
maybe it would be worth the effort to
vectorize miss loop so that was our next
step we went and we vectorize so let's
go back
okay learn that so remember 2400 turn on
vector alright so 10,000 that's nice but
we're still not done yet let's let's
look again with shark and see what else
we could do alright so we see the vector
function showing up there we'll
double-click and we're in the vector
code that's good if you're a shikari
user you probably know that if you you
had this disassembly view that was
similar to this and you can still get
this back this disassembly view is
actually set right now to showed g5
dispatch groups and there's more detail
on that in the full chud chud session
we'll go back to the source code for now
and if we look closely at the hot the
scroll bar we can see that actually even
though we're spending a lot of time in
the vector code that we optimized now
we're relative sending a relatively
bigger portion of the time inside of the
scalar code that we didn't optimize
right before in the first step we didn't
vectorize before so our next step is hey
maybe we should back to ride the rest of
this and you know all these loops are
fairly similar and that's what sharks is
to do so let's go back to the two nobly
so about ten thousand nine and a half
thousand turn on vector optimize and
we're almost 15,000 so this is around 14
or so times the original performance and
what we're able to do is take advantage
of this massive bandwidth we have
available on the power mac g5 by using
alta back okay so could we have the
slides again please thank you
okay we did that oh wait yeah so just to
summarize we compared this against the
power mac g4 so this is the scalar code
running on the current or the current
power mac g4 top of the line against the
power mac g5 and you can see that
actually they're not that all that far
apart in the scalar code we actually we
have a longer pipe on the g5 a longer
pipeline and so we're not entirely
scaling with this higher frequency or
weren't entirely bound in this in this
cpu for this so when we added the
threading we can see that we get a
bigger jump than what the g4 got right
going from scalar to scalar threaded
then vector even bigger jump and vector
optimized an even bigger difference
right and the reason is that as we
improve this code we're more and more
constrained by the memory bandwidth
available in the system well in the g5
we're simply not as constraint right we
have a lot more memory bandwidth to play
with here so we buy vectorizing your
code you can you know if we had just
thrown this on the g5 we would see a
very marginal improvement but by putting
the effort in to vectorize we're able to
take advantage of a lot more of the
system a lot more of what it has to
offer so in addition to the shark we
have some other tools monster allows you
to directly configure the performance
monitor counters and collect data based
on these timed intervals or event counts
or hockey and then look at this in
spreadsheet or chart form it also has
the ability to compute metrics so things
like bandwidth or cycles for instruction
and actually that's how we got our
bandwidth numbers for this when we were
looking at it the command line version
of monster and you can also save and
review sessions for that saturn is the
last well we're going to talk about
saturn is similar in some ways to g
prof. it gives you an exact profile and
allows you to visualize the call tree of
an application it uses GCC to instrument
each function in your application at
entry and exit chords the function call
history to it
a trace file and then for each function
can give you the call count it can also
you the performance monitors to tell you
the counts for each function as well as
the execution times using a low level
timer so okay at this point I'd like to
bring up Dave pain again for recession
wrap up so we've seen a lot of what we'd
been doing with the performance tools
and I'd like to talk a little bit about
some of the ways we'd like to go with
them we've rolled out to a lot of
exciting work with Xcode here at the
conference and you've seen some basic
integration of the performance tools
with Xcode we think that there's a lot
more we can do along these fronts to
really during your development process
bring performance data forward to you so
you can imagine for example that every
time you run your application that at
the end of it would pop up and say oh
hey by the way did you know you'll eat
this much memory when you just ran that
and put that in a smart group or
something like that if you look through
developer applications we're starting to
have a lot of different performance
tools out there and perhaps there are
some opportunities to unify some of the
ones that have somewhat similar
functionalities and you know one of the
things I find as I'm walking through the
tools is that I'm kind of overwhelmed by
the amount of data that's there with
them sometimes you know here's a whole
bunch of data go figure it out so I've
been playing with things like you
actually just human readable thread
names so by the way this is the
heartbeat spread so maybe don't need to
go look at that now we think there's a
lot of exciting ways we can take these
tools to hopefully make them even more
useful in the future for helping you
find your performance problems but if
you've seen those there's a lot of stuff
out there now that you can use the tune
your applications to make the best
impression of your application and of
our system as a whole so you know we've
got a lot there now we've added some in
this release and we've got a lot more to
come so if you want to learn more these
tools are part of the
code tools package for most of the
graphical tools there's a lot of health
information buried within the tools
there's a lot of command line tools
versions of these things so the sample
command and heaps and leaps and go run
the man pages on that there's a lot of
information that's been newly rewritten
in developer documentation performance
on the system the system overview manual
if hopefully you've all read and
memorize that by now since so you've
been working with Mac os10 for a while
but there's a lot of good information
there and then the the web pages have
been redesigned for the performance and
debugging tools so there's a URL there
for getting a lot of good stuff from
that we have two different feedback
addresses one for the Xcode and related
performance tools like sample or Malik
debug etc so Xcode dash feedback at
group apple com for the chug tools chug
tools feedback and with that let's see
roadmap to some future sessions so let's
see tomorrow morning interesting session
and how you can yourselves bring your
carbon applications over to xcode so
from codewarrior to xcode see the
session come visit us in the labs we
have a developer tools feedback forum
tomorrow afternoon and the debugger
session tomorrow afternoon and then I
mentioned the tuning carbon applications
session friday morning at 9am as well as
a testing tool session so is that
Godfrey and the panelists if you'd like
to come on up and do some Q&A