WWDC2004 Session 311

Transcript

Kind: captions
Language: en
good morning everyone welcome to using
performance analysis
from Mac os10 let's dive in I'm Dave
pain working on the performance tools so
this morning we're going to cover some
of the concepts of performance analysis
just general concepts for any system and
then look at the performance tools that
we have available for Mac OS 10 and then
dive into looking at a case study of
real world use of the tools and then
showing some of the cool new work that
we've been doing to implement integrate
some of the features of the sampler
profiling tool into shark and which is
another profiling tool to combine them
and now to add a bunch of powerful new
features to help you find performance
problems faster in your applications so
why are we interested in this after all
those machines are getting faster and
faster right just by DP or hardware
that's really cool well you know there's
a lot of jobs no matter how fast the
hardware is people always need more
power you need it in your development
tools for example faster compilations
etc in general you know I don't want to
have to go out and buy another gigabyte
of RAM to run this application or these
three in combination I'd like my laptop
to have the battery lasts all the way
back to Dallas on my flight a lot of
that is how much should my beating on
the hardware the CPU the memory when the
fan kicks in well you know what's going
on there that takes power as well and on
the laptop and in general we want
applications that play well together
with others that really is important to
reduce the amount of overall memory that
your application is using because you
know I'm running more and more
applications on my system every day you
guys are creating cool apps thanks but I
don't want to have to page tremendously
when I'm switching between those
applications so rather than just diving
in and saying well I think this routine
is going to be slow I want to go rewrite
it because it'll be fun and cool humans
are notoriously bad at guessing where
the performance problems might be so we
need a systematic way to go about
looking at this
you're the expert in using your
application what is for so an approach
to performance analysis first define
what the major operations you're
interested in having fast for the user
are then what are your goals to make
sure that that's nice and fast to the
user for example responsiveness if you
have slow operations then you can either
speed those operations up which is the
best case there's just happen like that
or if it's unavoidable that it's going
to be slow then you want the user to be
able to be in control of the application
again quickly so maybe move the slow
operation to a separate thread so that
the UI can be responsive again
throughput know if you're doing a game
you want lots of frames per second it's
doing a network application you want a
lot of data throughput servers you want
lots of transactions per second what's
the goal for your arena then establish
precise benchmarks for that define what
your target hardware for your customer
segment is what operating system version
you're testing with and the specific
data that you're going to be passing in
and specifically what operations you
want to test then add in time
measurement code instrumentation so you
can time and time again measure that
same operation and see how you're doing
track that throughout your development
this is what the safari theme was doing
to make sure that they had the fastest
web browser on Mac OS 10 to find a
precise set of benchmarks and then ran a
time and time again and when somebody
was going to check in a major piece of
new functionality you couldn't go in if
it caused the system to slow down
noticeably so don't allow regression and
finally if you do identify some
performance problems then focus your
tuning efforts on those hot spot last
easily said but how do you actually do
that how do you find what the hot spots
are got tools to help with that yes
absolutely we've got tools you've got
tools these have been included with your
Mac os10 performance tools for several
you
at this point take a look in developer
applications performance tools as with
all of our developer tools these tools
are free we really want you to use these
things to create great applications they
provide full support for everything you
need to do with Mac os10 including now a
lot of support has been added in for
Java profiling and in addition our GUI
based tools are integrated with Xcode
for a full round-trip of the development
cycle you can launch your binary under a
development to under a performance tool
from Xcode and then you can get back to
the source code directly from the
performance tool back into Xcode so we
have a variety of performance tools for
monitoring performance problems for
analyzing what once you've found that
you do have a performance problem then
analyzing what is the problem where is
it why is it happening in a variety of
areas memory use execution time other
types of resource use and I'm going to
dive into a bunch of these tools as we
go through so in the area of high level
monitoring there's a number of things
you can look at to help answer just you
know what what in general is happening
one of the primary ones is the command
line tool top which is always there for
you you can use that for you know just
my system right now or to look at a
headless server or remotely log into a
system and if you've got a full screen
game going but you know we've got a
bunch of other tools as well they're a
little bit more friendly with graphical
user interfaces so for example a nicer
user interface the top is the Activity
Monitor application it can help you
analyze why is my system slow is there
some particular process that's taking
time is its memory growing so this one
ships on the user CD if the user calls
up your support line that says my system
slow well fire up activity monitor and
tell me what it looks like another one
you might not be familiar with this is
one of the chudd tools we've now
elevated it up into the mainstream set
of performance tools it's called big top
so this actually takes the top
information it graphs it over time so
you can really see how it's changing
over time I find it's very useful to
watch and actually see graphically is
the private memory use of my application
growing or the virtual memory use those
are two of the best metrics and you may
end up you don't want to look at shared
resonant size because that's like your
frameworks or shared with other apps but
the private space could be growing and
that's important to note oftentimes I'm
sure you've seen it the the spinning
cursor comes up and you wonder and wish
I could capture that maybe it's a two
second spin and by the time you're off
typing sample on the command line the
spins over but it makes your app not
feel responsive so spin control is a
great way to just capture that just have
it running in the background all the
time it automatically detects when
applications aren't responding to the
user interface events and let you see
what's going on there now sometimes I
mentioned that the fan will kick in I've
noticed this a number of times that much
machine is sitting there idle and yet
suddenly the fan fires up whoa what's
going on there so i fire up top or
activity monitor and take a look and
some processes using fifty percent of
the CPU maybe it's drawing too much
quartz debug is a great way to look at
this and I've actually seen this a
number of times in real applications
it's periodically drawing and course
debug flashes the graphics yellow on the
screen every time a draw or you can also
set it up to just see now duplicate
drawing in just normal operations very
useful so once you've used the high
level performance analysis tools to see
what's going on overall then now we want
to go and dive into why is this
happening so we have a variety of
profiling tools we're going to talk a
lot about shark today because we've put
a lot of effort into the shark
application spin control i mentioned
thread viewer lets you see the thread
activity of your application what are
all the different threads and what is
X races of them and then for those of
you who doing open GL graphics
programming OpenGL profile is a
fantastic application to help figure out
where the time is going there
command-line tools sample is excellent
for just a quick basic sample process
name or process ID number of seconds
very useful and then we do have the
venerable unix g prof tool but that's
the only one that really requires
recompiling for profiling the others
don't need to do any recompilation of
your application so i mentioned shark we
have a new version that was shown
yesterday in the development tools
keynote shark four-point-oh this tool
helps you figure out why is my time is
being spent in some certain place where
is it going you can look at specific
threads specific processes the overall
system and now you can actually do
sampling over the network as well for
again like dual full screen games
something this captures the all the
information about the system both user
space and kernel space there's a full
session on shark this friday afternoon
i'd encourage you to go to that will see
more of this today but we aren't going
to dive into all the all the full-blown
features of it one of the things I
really love about shark is I don't even
have to attach to a process right oh
it's this now it's always taking the
background with an option escape hot key
that will sample the entire system so if
I notice something seeming sluggish I
can just hit option escape right away
shark fires up and says okay let's
sample and then I can option escape to
stop it and see what's going on and
system and then dive into that specific
process but there's a lot of ways to get
more specific information here different
sampling methods memory tracing function
tracing we'll look at a lot of these
there's three primary views in shark a
profile view that lets you see a
top-down call tree of your function
calls bottom up for real profiling
information and who's calling the leaf
functions we've done a lot of work in
this to really let you hone in and do
filtering and data mining 2 simplifies
the
complex picture there's a chart view
that helps you really see the patterns
of execution of your code this is
excellent for both performance analysis
and just understanding what's going on
in your application it's really cool and
finally within shark itself there's a
code browser that you can see source
code or assembly get hints about what
the assembly code is and get directly
back to the offending lines of code in
Xcode these are all instrumented here
with where the specific lines of code
the time is going into so I haven't
talked a lot about sampler so many of
you may be familiar with using sample or
what's going on with that we've
integrated all of the features of
sampler into shark and we plan to remove
sampler from the system so you can see
here that shark has a number of
additional features and we'll touch on
some of these and then more on Friday
but please try shark and the team's been
very busy so shark is there on your
tiger CDs but there's actually a newer
public beta that's got a number of
additional features that seem banged in
in the last couple of weeks so please
download the new version of shark it
runs on both Panther and tiger and send
us your feedback i'll give an address
later so what about memory use I
mentioned that it's really important to
try to minimize the overall footprint of
your application we have a number of
tools to help analyze what's going on
with memory use so a very nice one is
object Alec this is great for looking at
dynamic memory use both how much memory
am I using right now and how much was
the peak that I used in some particular
operation this you can use this with
Coco applications and this is great for
seeing the your allocations by memory by
allocation type what type of objects so
Coco objects corefoundation objects like
you would also have with carbon
applications and just general Malik
allocations and it says what size they
are because a lot of times you'll
allocate specific sizes at specific
points in your code
so you can see all that and you can look
at information about specific instances
of them it's not quite as good for for
pinning down precise memory leaks Malik
debug is still the best application for
that this shows a full call tree of all
the allocated memory not by type and
it's not so good for dynamic allocation
but it is still the best tool for leaks
there's a command-line tool equivalent
of this called leaks they can also show
you the back traces of where allocations
were occurring if you set this Malik
stack logging environment variable so
another major function of Malik debug
was to help find corrupt memory
operations but we actually have a better
solution for that at this point the
purpose was to help crash your
application if you did something bad
with memory but really you want to be
operating within a debugging environment
if this happens so we have a new Malik
debugging library called guard Malik so
this operates within the context of the
Xcode debugger there's a nice switch on
the debugger menu item now say enable
guard Malik and what this does when you
turn it on is every allocation you make
goes on to a separate virtual memory
page then the end of the buffer is lined
up with the end of that memory page and
the next page is non allocated so if you
overrun the buffer you'll crash
immediately and you're in the Xcode
debugger you can see immediately where
buffer overruns are in your code if you
free the block then we free the memory
the virtual memory page and so if you go
and read or write from that page again
after freeing it then again you crash
immediately this is a great way to find
a really nasty memory problems so you
can learn more about this in the Xcode
debugging session on Thursday morning
and the lib G Malik man page so that's
much performance tool but a great
solution so again I've said we're
putting a lot of effort into shark we're
trying to add a number of these memory
analysis features into shark as well
shark can now do
allocation sampling and show you the
size of your allocations and call trees
there there are still different
strengths and weaknesses of the of our
memory analysis tools again object alec
is great for dynamic analysis and
looking at specific object types malik
debug is good for leaks and we want to
add leaks detection in the shark but
shark has new capabilities too so that's
it for a broad brush overview of the
performance tools now let's dive into a
specific case study now I'm not actually
going to do the requisite planetary
motion simulator and that seems to be so
popular I'm going to be looking at an
application called disk inventory X this
is an open source application it's kind
of cool and actually kind of useful that
uses a concept from ben shneiderman at
the university of maryland for
representing hierarchical information in
a compact two-dimensional space so it's
an open source application GPL i'll be
sending changes back to the author he's
pretty excited about that and as we go
through we'll be looking at a number of
areas of what might be slow here on time
memory other resource use so in your
application as you look at something
like this what might you want to look at
of course major operations how long does
it take to open a large document if the
application is idle again you should be
taking zero percent of the CPU and again
watch for you I spins and deal with
those memory sides I've talked about the
importance of looking at dynamic memory
use will see that leaks one thing that
may not be obvious is auto released
objects with Coco applications if you
create a separate thread or if you have
a foundation based tool it's really easy
with a lot of the cocoa API to end up
creating an auto released object but
maybe not getting back and freeing the
auto release pool very frequently maybe
it's a long running thread and those
objects just build up and get paged out
and that can take I've actually seen
applications crash due to this problem
system gets low you
crash so also look for at disk and
network activity and will be
specifically looking at some of this
with our sample application here so what
I want to do is switch to Hook's will
switch to demo one here so this is the
disk inventory application what we've
done here is actually we can't see the
menu bar up there if we can get the menu
bar that'd be great but so what we've
done is taking a look at the our
applications directory that's got 1.9
gigabytes of space in it and I'm
interested in where is that space going
so what this application does is
graphically show me the size of the
files the larger the rectangle the
bigger the file and the color represents
what kind of file it is so we can see
the blue is a disk image so wow I have
at least one big file here okay that
looks like the adobe photoshop seven
disk image i probably don't need that
down here a couple other disk images
application packages so this is kind of
cool I can click on a directory and see
how much space that director is taking
now I can move around with the mouse and
see things there so it's actually kind
of useful so let's go ahead and quit out
of this and bring up the performance
tools folder I'm going to launch the big
top tool that I referred to and I'm also
going to launch spin control I'll just
put spin control down here in the
background now with big top I can look
at things like the cpu usage as i move a
window around we can see that the cpu
use goes up and down as expected let's
go ahead and launch the disk inventory
application again and i'm going to look
at the specifically the disk inventory
process and watch the memory size of
that as i go through and
so what I'm doing is going to open
recent and actually reopening the
applications window there and scrolling
analyzing that so you can see the memory
use is climbing here I've added a little
instrumentation window here and it took
a little bit of time to analyze that 1.9
gigabytes so that's like about nine
seconds to scan that folder and a little
less than a second here so about ten
seconds to look at this and I haven't
tried this operation on this machine we
can also show package contents and note
that we actually caught a little spin
here as well with spin control at this
point so it took about four seconds to
show the package content and with the
spin I can come down here and select
that and show a text report and to see
what was happening in there so we were
making a bunch of recursive calls to
determine the file kinds inside of the
package that I'm looking at so that's
that's interesting so we saw the memory
use climb that's not totally surprising
because we were building up data
structures to represent this but we
should look at that and see if we're as
efficient as we could be let's look at
one other things I resize this window
here notice the slight pause before
redrawing and that was interesting with
the memory use a little spike there
actually that looks like it was probably
over about a megabyte of dynamic memory
creation while I was resizing that
window so maybe that dynamic memory
creation and deletion has something to
do with why it's not as fast as it could
be so let's go back to slides
but okay so when we tested this we don't
have such beefy hardware in the labs you
know we have mere mortal dual g5 so my
resultant testing this the same
directory in the lab was a little slower
than that i was actually almost 20
seconds for scanning the folder and
getting the file sizes and about 10
seconds actually to classify the file
kind and showing the package contents
was again pretty consistent there at
about four seconds for total of almost
33 seconds to scan not quite two
gigabytes of space there's a lot of 80
gigabyte discs out there on your
personal computer systems so what 20
minutes to scan my disk what if i have a
terabyte disk farm and i want to use
this technique that could be nasty maybe
we can speed this up now I've often
heard the question of what are the best
timing api's for instrumentation on the
system so mach absolute time as a mock
API that's the fundamental call this
goes down in and read the time-based
register out of the CPU there's a number
of other different api's that you can
use for different you know depending on
what's convenient for you like get time
of day is a nice portable API in the
unix environment these all end up
calling down into mock absolute time
this is the way the actual code that I
used in this application so i simply
call Mauch absolute on say get time
gives me a 64-bit value back I guess I
could just recall it call it directly
then I in subtract time once i have two
of these I just subtract them and apply
a conversion to get me a double value
that seconds makes it easy to print so
with that I've identified that we have
some issues I'd like to bring on one of
our experts in analyzing those two
issues and also then creating tool to
help do this process so it's my pleasure
to introduce Christie warrant
what makes software slow algorithm you
know those are you have taken computer
science courses have run into things
like quicksort versus bubble sort now if
use bubble sort on large data set it'll
make your program run you know extremely
slim other things are expensive
operations you know file open your
network called IPC even things like
Malik and locking primitives even though
the relatively fast can be expensive if
you do them a million or a billion times
a more subtle thing that I'm sure you
sure you've encountered this is doing
something more than one let's suppose
that you know Dave and I are writing
different functions and have a large
program we both call quicksort on an
array and even suppose it's the same
array you know because we're not in
intimate communication all the time we
can do this sort in two different places
in our program and this would be bad but
it would be it would show up in a
profiler as it's a call to quicksort so
this is an example of doing some more
than once but it's the real problem here
is what i call complexity now this is a
graphical depiction of a program running
in this case its finder get info you
know I just did a trace of the memory
allocations and these are the
development version of the code not the
one that you're getting but there are
over 100,000 events and each one of
these vertical bars is a sample so the
vertical axis is call stack dip so this
is a picture of each of the call stacks
as you go online and as you can see
there's a lot of redundant visual
information here and that's really
interesting so that's a result of the
complexity of your modern many-layered
software so what is complexity well
complexity has as they just said layers
and many modules and good saw fringing
technique says hide your implementation
don't let your client know what your
details what you're doing you give them
a black box but there's a problem here
you know that is you hide the
performance cost of what you're doing I
have a function foo that takes a boolean
value you set food it true it could set
a value in a register it takes a few
milliseconds or microseconds you set
food a true it could go up to eight a
date
days you know do some the sonication
launched a rocket it could take minutes
or even hours same call two totally
different results and so what do we do
here you know innocuous calls can lead
to surprisingly complex complex excuse
me I'm sure you must have run into
things like this in your own development
so we're back to this picture we're
going to zoom in on this graph and it's
not just complexity at the high level
look at as you zoom into the finest
detail you see repetition on many
different levels it's like a fractal I
going to Mandelbrot set you see course
you know grand detail and then finer
structure as you zoom in it's amazing
what we run into with software and
processors today it's just incredible so
to analyze performance I've come up with
simple formula the impact of any
operation you do is equal to the cost of
that operation times the number of
places it's used in your code so like in
the Dave and me quick dry exam or
quicksort example you know there's two
uses of it that are redundant so that
makes it twice as expensive as it needs
to be now traditional profilers make it
really easy to understand cause you know
you sample a program you see all the
lead functions that you spend time in
but it's hard to see use because these
are complex patterns of usage that often
go through not just my library but you
know ten of your libraries you know
scattered throughout the system so to
help us analyze use you know we have two
techniques available to us one is called
call stack data mining and this is a new
functionality that we're introducing
we're not aware of this being available
you're elsewhere and other programs and
the idea here is you can strip away the
stuff that you don't want to see and
focus on on what you really care about
i'll get i'll describe that more in a
second the other approach is this
graphical analysis the idea here is you
visualize your execution trace as i've
shown and in terms of there's a
technique called software fingerprinting
where when you see similar patterns on
the picture they ramin that you're going
through the same code path
as his repeated patterns that are the
same like heartbeats on an EKG it means
you're going to the same code path and a
hundred or a thousand or a million times
and it's at least worth looking to make
sure that you aren't just doing it on
the same data over and over again or
even if you are doing a different data
can you hoist things out like in a quick
draw I mean a quick sort sorry each time
you do a compare the pair made go
through a whole bunch of layers of
objects you know a whole bunch of
overhead there's nothing to do with you
know just comparing two values so remove
that stuff pull it outside of the
iterative structure and your program
will run a lot faster but with this tool
you'll be able to see these things you
know they'll shout out a few things that
you'd either have to go digging through
code and spend countless hours trying to
find the problem if you didn't have
these tools this is working so in data
mining content so I talked about
stripping away what you don't want now I
have a question for you how many of you
have profiled to program your program
and seen not your code but like
countless system file system frameworks
that stuff annoying you know some of you
a lot of yeah i mean when i first used
the provel is the first problem i ran
into when I was new at this and it was
like well what good is this I don't I
can't do anything about you know see you
live I can't do anything about app kit I
can only just means on my own print so
wouldn't it be great to have a button
that goes you push this button and all
that stuff goes away and you see your
functions as the leaves and the charge
you know the cost of those of the system
libraries all ascribe to various
functions in your code wouldn't I be a
lot better I mean I find that really
useful and that's a very coarse
operation if you want to do finer
grained stuff you could strip away one
library note say you have any working
with the app kit team or the foundation
team and say if you want to see details
in those libraries but you want to get
rid of the low level so I'm gonna get
rid of course foundation I'm going to
get rid of stand-alone exclude by
library lets you strip out a particular
library in the trace and it's all
non-destructive it just gives you a
different view on it and it revealed and
you give her those libraries and a
charge of the concept so by transform
being the data you can focus in on the
hop steps that you care about this idea
of flattened library instead of just
completely limiting a library you can
flatten it to its entry points so you
can see where your code is only calling
into these libraries we don't feel the
details of how and say CS dictionary is
implemented you don't care about that
you just care that I'm calling CF
dictionary get valium so to help you see
what you want use the thing focus symbol
and focus symbol you choose a particular
call tree that you want to look at and
when you do that you strip away
everything that's above it or to the
sides of it so I just told you a lot and
it's kind of sick so I'm going to give
you some pictures to help illustrate
this so here we have a main and it calls
it an it function you know do example
which does your work and some cleanup
and then you have this bar function
that's called four times by New York
function the real frame is probably more
like a hundred or a thousand times but I
made it simple here and this in turn
calls core foundation it's using a CF
dictionary get valium now but if I just
profile this I'll see functions mostly
from these ones in yellow these are leaf
functions it'll be far removed from what
you care about so if we do the exclude
library those go away and now bar
becomes early function so by doing that
operation now instead of seeing these
things are removed from what we care
about we're in what we care about so now
flatten library is similar let's go
through this quickly but it replaces the
library with the entry point focus do
example strip those away and boom so by
doing these transformations you can
manipulate your calls isn't this is also
really cool because you can make really
good performance arguments when you
strip these things away you're no longer
to try to point out something here point
something there something there and
maybe it makes sense you actually see
the count from the places that matter
and you can make really good arguments
to other people that we need to work on
this stuff we need to fix it so with
that let's go do a little demo
we're going to launch our application
we're going to watch shark now how many
of you abuse some version of shark
before it's about most of you so this is
shark for and you're going to see a lot
more of it in the gut shark session
later this week on friday at three
thirty but you need a little preview
today the UI is a little bit different
you know the original shark with a
time-based sampler that lets you sample
the entire system that's really cool
it's good to have it around the
background and whatnot but in this case
you want to focus on particular
processes and there's a number of
different things we can do we can trace
memory allocations function traces we
can trace various Java things view do
Java and for this application we're
going to use it what's called sampler
time profile now this is a we choose
this because the program uses file
system operations and this involves a
lot of waiting on the colonel and this
trace does the best job of attributing
those costs to the user called let's go
over to our application and do open
recent and the shark gives you this
option escape hotkey which is really
handy so I'm in the middle of you I
manipulate and I can hit option escape
and do start so we're gonna start our
scan and this stops every thread and
records a sample whether it's doing
something or it's waiting on a colonel
tom now that's done we stop and we have
a heavy view this is a view of all the
leaf functions as I've been timeout and
the relative percentage of the counts
that we're in so we click on this you
know syscall thread switch and on the
right here this is one of the nice new
features as you can see a back trace of
that particular similar so we see that
we're in a heartbeat thread well that's
not dragging its sleeping sleeping until
date so in this case let's get rid of
that thread so we're going to go down to
the thread top up here and choose one of
the threads and it's going to give us
the one that wrench in so the topmost
function here is get adder list and this
gives you a very large call stack to
look at now
to help us navigate things a little bit
there's a nice little thing over your
cold color by library and when you click
that you make some of you may remember
this from sampler that feature was in
there and now we see that we've colored
things like you know by a different
colors so you know libsystem is lavender
calm pages read discs inventory is brown
this has helped us make a little more
sense of it visually without spending a
lot of time let me make this a little
bigger so we click on get out our list
you we see that yes we're in user code
you know the status item load child was
our own thing but let's use the exclude
library operation we're going to exclude
libsystem be deal i'm and when that
happens that goes away and we see that
you know fair amount of time is spent in
carbon core so we're going to do this
again we're going to exclude library
carbon core and we'll just a few more
times so there one piece of user code
comes up let's do core foundation and
launch services and you see that you
know it's FS you know item load
properties and this load child are all
floating up is pretty major players in
this profound before we go in and look
at those in a little more detail let's
go over to the heavy and tree view this
another new feature in shark it lets you
see both the heavy and the tree view the
top down view simultaneously so in the
top down view we start at the start of
our program kind of like that diagram I
showed you and you wash down through
your program until you get to our code
but there's still a problem you know you
probably have seen this before to that
there are always app kit calls there's
all these system calls it makes it kind
of hard to look around i'll make you
expand one of these trees the outline
will be awfully big and hard to keep
track of so we're in luck there's a
function button over here called flatten
system libraries which does that
flattened operation and all the system
libraries and when we do that now this
guy simplifies and it's a lot more
manageable is only a few layers of these
calls so give this contact you could
also exclude them if you want to do but
in this case as i was useful to you
help me keep track of where I was but
then we go then we'll notice another
problem which is we just expanded you
know this recursive called a load check
out this is a file system application so
it's very natural to write it in a
recursive style but if any of you have
tried to analyze performance on a
recursive function it's rather difficult
because you know tease fillet of the
recursion you may call out to a little
branch function and each of them
individually will show up as relatively
small contributors but there's no way to
kind of gather them together and focus
them so for example you know you this
fine FS item path shows that the point 1
percent here point well you know one
percent here at different levels but
kind of hard to make determine if that
means anything or not but luckily
there's another option here called
flatten recursion we click on that and
look what happens load child becomes a
single thing and look at that and it was
named parent suddenly pops up to forty
three percent of the overall time so by
using this data mining we're getting
past the obstacles and getting to the
parts that are interesting and by the
way with the shark for download that
Dave mentioned there's actually a nice
tutorial that you can go through that
it'll walk you through these things you
don't have to remember everything I'm
going through today so let's double
click on an it name with parent
and pray to the demo gods there we go so
you know this shows you source now
annotated with you know I'm percentage
of the time those you just need a shark
I've seen this before but there's a
couple new things that are really cool
you notice that various symbols are
underlined so we may be dope that means
you can follow that link just like in a
web browser so we double click on self
load properties which is our heaviest
line we go to another source file and
you can navigate forward and back and
this way you can you know move around
and explore your performance from in a
way that's much more concrete least it
is to me it wasn't better to deal with
source and deal with these trees of
symbols so I found this really a nice
feature so now we can look at our
problem this class is an SS item you
know suggest even its name suggests that
something you do on every file system
item that you encounter iterate through
these directories and if you look at the
details here it does an FF path make
rest it does final attributes it pass it
does FS get catalog on so there's a
bunch of these in it about five
operations and if I look elsewhere
there'd be a minute sixth operates that
we're doing for every file in the
directory now Dave there are bulk file
system operations that we support
they're really cool and you can reduce
this from doing this for every file to
just doing it for every directory and
this should give you a really nice speed
up so please consider using that you
know in optimizing your program so while
Dave's off working on that I'm going to
show you some function tracing a date
you know I just showed you data mining
and how to analyze your program using
data mining now i'm going to show you
graphical analysis using this feature
called function tracing you can specify
a list of functions that will let you do
an exact trace of the functions that are
called so i can choose function trace
and there's some presets down here you
can also enter your own if you you have
a set that you particularly like so i'm
going to file i am and this gives you a
list of finally I think there's a little
bit hard to read but you just
unix final calls and i already made a
preset here called file i/o we're going
to choose that and go back to our
program open recent start recording and
this time oh I get we'll just have to do
this again that's nice thing about char
cuz it's pretty forgiving so we've and
when you're doing exact trace you want
to do it for a relatively short time or
you might wind up with you know hundreds
of thousands or even millions of sampled
even in that short time we got sixty
thousand samples and this kind of a cool
view you get a distribution of different
file system calls in the percentage of
time that you've used them so it gives
you a hint of what your program is doing
but there's an even better thing we can
look at here let's go to the chart and
in the chart here let me do one thing so
you get selection is out of the way
here's a chart here this was kind of
wavy pattern and let's just zoom into it
a little bit you like here and this is a
new feature this is a really nice zoom
control as you drag along you can zoom
in and out just like we did in that
movie that movie wasn't fake it was just
film from you know the actual live
program so we go in here and we see this
thing that looks like we're iterating
over files it's kind of different loves
if you look if you're looking a finder
outlined view you'll see it's a similar
kind of pattern and you'll see that load
child shows up in the stack here so
let's just do flattened recursion and
look what it does is it completely
flattens out our trace and we come down
here and we find a fingerprint this
little shape here is very redundant it
occurs over and over again even in this
little thing and that happens to be in
your load child and knit name with
parent and then load properties so we
found our culprit with graphical
analysis very quickly so use both
techniques you know if you have an idea
of what functions are
are expensive already you can do a
function train you know if you need to
figure out what areas are expensive due
time tracing and use call stack data
mining okay back you did okay praying
for the audio God excellent excellent
good job thing okay moving on so I did
my homework while Christie was speaking
and what I've learned here I studied the
app in each directory we're making a
directory content that path call to say
enumerate all the file files and folders
in this directory then for each one of
those items let's go through and do a
number of things to gather the
information the program wants to display
so again I'm getting an F s ref for each
item so that i can make additional calls
with that i want to know whether it's a
folder or file or a symlink because i
don't i don't want to navigate the
symlink to duplicate the representation
of the space taken by the file so i make
an attribute set path call on that i
want to get the file sizes so the data
fork and the resource fork and also the
parents ID to see if I'm on the same
volume I don't want to walk off the
multiple volumes here so I'm making a FS
get catalog info call on that and
finally when it's doing that classifying
files it's saying I want to get the kind
stirring as as represented in the finder
so if its dot nib file we want to show
that as interface builder document so
for each FS ref we end up calling down
to launch services saying get me the
kind string for this FS ref this file so
having done my homework I did learn
about the fs get catalog info bulk call
so what this does is it's optimized I
can say for X number of files I can
specify how money I want in a given
directory
i'm looking for a set of information
here i want to get the bit that says is
it a directory or is it a file I'm gonna
get the parent directory ID I want to
get the resource and data for exercises
I'm going to get to type and creator
information and we'll see what I'll do
with that and the next slide and then I
want to get just the full array of SS
reps for all the individual items and
the full array of entry names so I get a
raise of all this from one-call that
before I was making you know lots of
file system calls so in the classifying
file so again what we were doing was
hitting the file system once for each
file to say get any kind name for them
and then the way the code was written
it's actually storing that kind string
for each individual different file
before we step back and think about it I
just don't have that many different
kinds of files and I really don't need
to query the file you know about the
specific file what I care about is the
kind and the information that specifies
the kind is the type the creator and the
extension so I can build a dictionary to
map now this triplet of type creator
extension to the file name kinda strange
so I actually put all of those into a
string and just make it unique use that
as a key into an nsdictionary to do a
lookup now if I don't find it in the
cache there I can make a different
launch services call to say given this
triplet of information look up the
Chinese strings for that now that's not
even hitting the file system right so
I'm going down from an order and
operation here for once for the each
file down to zero file system accesses
and I'm also only storing the kind
string for each different kind not once
per file so I'm also significantly
reducing my memory use so before I show
the results let's see if there's
anything else that we can determine from
the application here so do a memory
analysis demo
so let's go ahead and quit out of the
app and bring up our performance tools
again so let me also point out shark is
now up here in the performance tools it
used to be down in the tread folder now
it's really going mainstream here so
let's look at object Alex though so we
double click on that what we do here is
we launch our target application from
what they an object Alex because it
needs to set up from the environment for
it so with this I simply I get go and
what I want is to keep the back traces I
could keep reference counts on objects
that I don't need that in this situation
so it goes head and launches the app
hasn't done too much yet I'm going to
change the scale here because I I might
have a lot of objects and we'll see that
this application is doing live updates
let's go ahead and and walk that folder
hierarchy again as we go through we see
let's do it auto sort we see that we're
building up a lot of CF strings these
are there's a currently allocated items
we're building up a lot of FS items in
all those makes sense kind of FS item
I'm getting one for each file system
item the CF string is actually the name
for that particular file system item so
that's being stored so that's kind of
useful I can see the peak amount how
many has been the peak of any particular
type that I had and I can see the total
amount and you saw again live update and
auto sorting so if i go to total this is
really interesting we see the different
colors as the bars here what the red
bars indicate as opposed to blue is the
percentage of objects that you have left
remaining what's the current number of
objects the total eve allocated red
means that you have less than ten
percent of them remaining so maybe
you've got a dynamic memory issue
they're creating more of them than you
actually need yellow means I believe
it's a 25-percent or a third
so the bright color indicates the number
there's currently allocated so we can
see that we have a lot of CF stirring
still we had a little bit more peak and
we had more total that we got rid of but
what's this NS half store we can see
that we've got 24 of them left but we
allocated 180,000 of them in going
through this what's up with that I can
actually double click on this and get an
allocation shark and see what the
dynamic allocation pattern looked like
here so it kind of looks almost like we
just might have been walking the
filesystem hierarchy and we're doing
something here it looks similar to some
of the patterns that Christy showed in
sampler I can go in and look at specific
instances of these objects and where
they're allocated and what the contents
are we can see this is to the library
spots another path there I can look at
call stacks go down through the send the
maximum path another thing i can do is i
can set a mark and say i'm only
interested in seeing the number of
objects since the mark if I do the show
package contents we see that I can just
watch how many objects are created
during that operation there so you can
look at get a lot of information about
your application here through this what
I want to do here is go back in and look
at now you'll notice that it actually
took a lot longer to run under object
Alex because the amount of time that was
taking so don't do time analysis while
you're doing this but let's go back in
and do some a memory analysis with shark
so I'm going to switch to the Mallik
trace operation and startup disk
inventory again and select the disk
inventory process so going back to disk
inventory let's now look
at apps once again and like Christie did
i'll start the sampling and stop it
and I'm actually going to jump directly
health that's right first off I want to
show that with the value here if I
switch the value you can actually see in
the call tree the amount of memory that
was allocated by the various calls here
and I could exclude everything that I
don't have source code for and get down
to just seeing the stuff that I do and
where that allocation is going so that's
that's fairly interesting let's go
directly over the chart view remove the
exclude nose source and we can see from
the chart here that again this looks
like we have some interesting pattern so
let's just click on one of these and
might be potentially interesting there
let's zoom in a little bit see what we
might see zoom zoom zoom interesting
little sawtooth patterns here so if I
click on this we see a number of
different allocations of pads and I can
just use the cursor keys to move through
so it looks like in fact it's from the
code as I'm walking down through the
file system hierarchy the way the code
was written was that when I got to each
FS item it was making several calls to
say I need to know the past right here
I'm not I'm being a good citizen for
memory and I'm not storing the full path
we teach object that would be overkill
that's too much memory use so I'll
dynamically ask for it so I'll get my
path by asking what my parent folder is
and then appending my name to it but my
parrot says well what's my name let me
ask my parents and then append my name
to that on up we go so that's
dynamically creating lots of auto
release and its appstore to string
objects with cocoa and then the next
thing that we see happening is then we
spend a bunch of time actually auto
releasing that so we can see the auto
release time there so you can see the
impact of too much memory use so it
since
recursively descending down through the
file system I should be able to at each
level say well this is the path that i'm
currently at and when i go down into the
next directory level deeper just append
the past part to that and pass that down
through i don't have to recursively go
back up every file system item so that
significantly reduced the amount of
memory we're using so now let's go back
through and say okay that's all good did
we have any results well I was busily
coding away and slaps a new binary up
there so let me dynamically enable some
optimizations and let's try it again so
off we go boom okay so one of my test
results here the folder scanning of the
floor took over 10 seconds in this case
about nine seconds before is now a
little less than two seconds remember
this was from vastly reducing the number
of file system operations the
classifying files that again was asking
for the file kind string for each file
dest out of virtually instantaneous
because you'll remember I'm doing no
file system calls there now and if I do
this show package contents operation
boom again 0.16 whereas before is about
four seconds so we can see that we've
significantly reduced the amount of time
the program is taking let's switch back
to slides please so to summarize what
the tools helped me do is figure out
that I should use both file system calls
and there's documentation about this I
actually copied much of the code from
the performance documentation I used
caching of the file kind strings so I
can just do rapid lookups and not queer
the file system that helps me reduce my
storage for the file kind strings I
talked about reducing the dynamic
creation of the past strings as we go
through and then as you go through you
know optimization is an iterative
process right you've got a hot spot so
you go in you tune that you make that
faster
and now you've got a different hotspot
so there's actually interesting to
discover that once we made the file
system access was a lot faster that the
way it was updating the UI for feedback
about what it was doing display this
past displays attacks display this path
was actually starting to take a fair
amount of time and so I just display
fewer passes because all you want to
know is where you are and so that made
things faster also because that's not an
important part of my my process here you
so we made significant improvement here
this is the measurements I got in the
lab again on somewhat slower hardware we
ended up making that the file system to
Russell seven times faster classifying
file kind it depends on the the size of
your file system but that's like
infinitely faster much faster showing
file content for a total of call it ten
times faster so now this starts to get
to do a more useful application for me
so we've covered a lot of things here
today a lot of tools a lot of techniques
we have a lot of documentation about
this on the system for both of these
plus the tools have documentation in
them and with them does man pages for
the command line tools and so in
conclusion you know we just seen we have
some powerful tools that help you both
monitor to see if you've got performance
problems and then analyze what the
problems are we put a lot of work into
shark working with Nathan and Sanjay
doesn't bring it doing a very
collaborative effort here to try to
improve both the power add more new
features but make it easier to approach
and understand at the same time so we
need to know how we're doing what you
know does this work for you if i remove
sampler for the systems that is that
going to cause you a problem so download
the beta please send us your feedback so
I'm going to bring da be able to grow on
stage use our Mac os10 technology
evangelist for this this is a feedback
list that you can send information
feedback about this too