WWDC2001 Session 705

Transcript

Kind: captions
Language: en
good afternoon it's last session of the
day I hope we all still have a lot of
energy because we have tremendous amount
of information to show you a great
presentation and without further ado I
think we'll just roll right in robber
bondage our performance tools guru hi my
name is Robert Bowditch and I'm a member
of Apple's developer tools group where
I'm responsible for the performance
tools what I'd like to do today is tell
you something about the tools and give
you a quick introduction to them now why
are we giving this talk now I'm sure all
of you have great apps that you're ready
to ship at Macworld New York in just
about two months and hopefully they're
all pretty much feature complete right
thank you so that means that you've got
about two months to go and you should
probably be concerned a little about now
about whether your applications actually
perform as well as you want them to be
performing and more importantly that
they're performing well enough so that
your customers also think they're good
and therefore hopefully you're going to
start using the performance tools to
actually track down performance problems
and make your apps as good as they
possibly can be what I'll do is I'll
show you those tools so what you should
hopefully learn today is first of all
you should find out that we actually do
have some cool tools available secondly
you should get some you are going to see
me go through a sample program and
actually find some performance bugs in
it so we're going to do some real-world
examples and hopefully you can take that
home and get so excited about these
tools that you're actually going to go
in and dive through them I'm not going
to do a quick - I'm not going to do a
tutorial I'm not going to go through in
detail of well you pull down this menu
and you press that button and and this
is what it shows I'm really going to
show you what you can gain out of these
in hopes of exciting you enough that
you're actually going to go off and play
with these yourself hopefully some of
you will be playing with them as I speak
if you've got questions about the
frameworks about what you should be
doing in your carbon programs to make
them more efficient I am NOT going to
answer those sorts of
I am not the right person to answer
those and I would probably be telling
you the wrong stuff in those cases you
want to go to the framework talks or
look at them on DVD so for example the
carbon talk that was at one o'clock was
absolutely or two o'clock was absolutely
wonderful
go take a look at that if you write
carbon apps the java talk was really
good about java performance as well so
just a quick summary so what are the
possible causes of poor performance on a
Mac OS 10 system one of the most obvious
is excessive use of memory that is that
your application requires a working set
of memory that that is more than it
really needs you know this can be
because you you have a lot of code
because of dead code and/or stuff that
isn't really necessary it could be
because you're allocating too much
memory via malloc it could be because
you're mapping in shared files that sort
of thing there's many ways that you can
increase your memory footprint a second
way that you could be affecting
performance is if your application is
causing too much sleep is basically
using the CPU too much executing too
much code a third way is that you might
not not actually be doing anything you
may be waiting for something to happen
such as going off on the network or
going out to disk so we have both the
case of doing too much and doing too
little and finally there may be cases
where in graphical application you may
be doing too much drawing and therefore
you're doing too much computation you're
using memory you're talking with the
window manager and everything's closed
down almost all of these problems
eventually turn into memory problems on
Mac OS 10 because all of them have some
connection with how much memory is in
the system and what memory is being
touched and the problem is that as soon
as the memory footprint of your
application and of the rest of the
system becomes larger than you actually
have physical memory we have this
wonderful thing called virtual memory
and it sort of gives you extra memory
and to get around that it basically
takes some of the memory out of the out
of memory and it writes it out to disk
and then it takes a mother and takes it
off of disk and puts it into memory and
the problem is as soon as this happens
suddenly your application is going to be
judged by the speed of the disk and not
by the speed of the processor so
anything you can do to cut memory use to
keep your working set as small as
possible is a great thing
okay so here's a quick summary of the
tools that are available with three
categories that may or may not be the
most meaningful but they seem to work
the first one is execution behavior we
have tools that help you understand what
code is executing this includes tools
such as sampler and sample they do CPU
sampling to find out what code is
currently running by checking
occasionally the second set of tools we
have are for understanding heap use for
understanding how much memory you
explicitly ask for during the running of
your program
these include tools such as malloc debug
as object Alec which is a tool for
understanding objective-c in core
foundation use and the command-line
tools heap leaks and malloc history
malloc history is particularly cool
because you can actually tell it hey in
this process I've got a buffer that
starts at this address where did I
allocate that and it'll actually tell
you it'll give you a call stack saying
where that came from
finally there's a set that I call system
state because I don't have a better way
to describe them these include tools for
helping you do drawing such as ports
debug it includes tools for
understanding how you use system calls
such as SC usage or file system calls
which is FS usage as well as taught as
programs like top which gives you sort
of an overall state of the program of
the machine or VM map which will tell
you how virtual memory is laid out for a
specific process ok so how are we
actually going to look at the
performance tools what we're going to do
is well we're going to do public
embarrassment we are going to we are
going to rip apart one of my own
programs a little test app that I've
been working on called thread viewer and
we are going to see if we can find any
performance problems that either I put
in or the linker from code that I
borrowed ok Scott you want to come up
here and show thread viewer now the
important things to know about thread
viewer is that first of all it's a small
carbon app or me excuse me it's a small
cocoa application intended for helping
us do performance analysis
actually we need to go to demo - oh we
are on demo - and what we can do a
thread viewer is as Scott just did we
can say hey I want to look at a specific
process in this case the dock and what
it does is it shows us a running
timeline showing us what's happening
with specific threads in that task so
for example we can see here that dock
uses two threads the one on the bottom
is the main thread the one above is a
secondary thread that's helping do
computations now the little blocks there
represent 50 millisecond intervals in
the life of that process the green
represents that that dock was running
during that when it was examined the
yellow indicates that it had been
running in the last 50 milliseconds but
it wasn't currently running the gray
indicates that it's blocking it's
waiting for something to happen the
light green and light red are our
weights also the green means that it's
waiting in the run loop waiting for
something interesting to happen the red
indicates that the program was waiting
on a lock now exactly what thread view
does doesn't really matter except to
understand the pretty colors
the important thing to see is that this
is a somewhat realistic program and that
you can use it to examine other programs
so obviously was there anything else I
wanted to say on that yes so what's
wrong with this any comments does anyone
see anything that looks obviously wrong
with this program hmm there's a lot of
green
Oh non green actually I was hoping on
thread viewer but that's an interesting
thing that that the doc is actually
doing a fair amount but that's mostly
because Scott's playing with it however
with thread viewer hopefully there's
nothing that you see here that makes
thread viewer look bad and that's the
first lesson I want you to take home
tonight which is that usually if you
just go and look at a program you're not
going to be able to tell that there's
necessarily any performance problems in
order to understand performance you need
to do measurement you need to do
software metrics you actually need to
look at the program and measure things
like how much memory it's using and
measuring how much time it takes to do
things you need this for two reasons the
first one is so that you can act
look at it and say whether those numbers
actually match your expectations did I
really expect it to take two seconds to
load that web page the second thing is
that when that system or as the system
changes you want to be able to note
regressions and unless you wrote down
how it behaved two weeks ago you won't
actually notice that you've been losing
5% performance every single time thus
you need to write stuff down and in fact
for the projects that I work on the ones
that are relatively time critical I
actually have a checklist of things that
I measure and I write them down and just
stick them in a folder it's it's
low-tech but it works so we don't know
how the system performs and we'd like to
find that out what we can do is we can
start by looking at overall system
behavior and we'll start by running a
text utility called top now what you see
on top here is at the very top of the
screen you see information about the
system the information below represents
individual lines referring to a single
process so some of the things that we
want to look at is or some of the things
that I find very important or
interesting that you should probably
look at is first of all the page in and
page out rates at the very top so what
that is what the line that says page
ends page outs is showing us is it's
telling us something about the virtual
memory system specifically that line
alone is telling us how many pages the
virtual memory system is writing out to
disk and how many it's bringing in the
first number represents the number of
pages that have been moved since the
system was rebooted the number in
parentheses represents the number of
pages that have been paged in or paged
out in the last second the important
number is the one in parentheses for me
what I tend to say is that you should
look at that you should look at talk
every now and then when your system is
running if you ever see those numbers go
above zero for any length of time you
know if they stay at 20 or at 50 that
usually means that your paging a lot it
usually means that your the amount of
physical memory you have is not enough
for all the things you're running which
may indicate that your app is taking up
too much memory or that something else
is now if you're hitting 50 or a hundred
then basically your systems thrashing
it's spending all of its time through
pages out to disk and bringing them back
in and that and at some point around 50
or 100 you're going to be hitting the
limits of the disk that it can't page
any faster and so if you ever see
anything at that level that means that
your systems really hosed oh that's a
technical term the other thing that's
interesting is on the bottom part you
can see that in the list of processes
there's a column that says percent CPU
which says which tasks are actually
spending the time running and we can see
here that thread viewers responsible for
three or four percent of the CPU okay so
it's it's not taking up a lot of CPU
time we can also see a column far to the
right that says RP RVT that's the other
one that I think is interesting that you
should look at our PRV T stands for
resident private memory it's telling how
much memory is physically in physical or
is in physical memory on the computer
and it's telling you how much is
actually private to that application
alone and this number tends to be a good
measure of how much memory your
application needs in order to run so go
look at that number when you want to get
a rough footprint now notice if you have
a lot of things in memory there could be
a lot of memory paged out sitting out on
disk and in that case it would not be
resident so on a heavily loaded system
that number may not make as much sense
now one thing we should do is let's show
another mode of top-top has a huge
number of modes and in fact this is
probably as good a time as any to try to
sell books there is a book called inside
Mac OS 10 performance it's available I
believe on the CD that you got in your
bags it's available on the web and it's
available from fat brain nicely bound
like this and it describes a lot of the
performance tools and it along with the
various manuals for top will tell you
what some of the flags are what I'm
going to do is I'm going to show you or
Scott's already going to show you a
option to top the top d option extremely
clear and extremely user friendly which
gives us a few other pieces of
information we can see the number of
page faults the two I'm going to point
at is CS W which is the number of
contact switches way to the right and
messages sent messages received that's
how many
mock messages are getting sent around in
the system how many have to be received
now there's a reason why I'm telling you
this if Scott actually plays around with
thread viewer well I guess it is kind of
what we see is that the number of
context switches that is the number of
times that something got paged out so
that something else could run is pretty
high for both window manager and for
thread viewer and the number of messages
sent and received is relatively high
does anyone know why that's happening
yes I heard it what's happening is that
thread viewer is drawing and the drawing
is not just done by the thread viewer
it's actually done by thread viewer and
window manager together and so at times
when the when the thread viewer is doing
drawing it actually has to communicate
with the window manager so this is the
second home take home lesson today which
is that sometimes the program that
you're running can't be measured just by
looking at that program in isolation you
actually have to look at demons and
servers that might also be doing
computation in the case of drawing you
want to take a look at the window
manager and at your application there
have been times where I've seen an
application that looked like it was
performing great it's only taking 30% of
the CPU doing a lot of drawing but the
window manager was responsible for 60%
of the dry drawing and so to understand
overall problems you want to look at
both so Scott I actually saw something
kind of interesting in that last top
display can you switch back to it okay
so does anyone see anything interesting
in this display for thread viewer well
we we don't know that it's leaking what
we see is it looks like
maybe there's a reason why we're doing
that
maybe it's intentional but we're seeing
the amount of resident private memory
growing at least I think it's growing
but luckily top has this little plus
there and it actually says how much
memory is going so it seems to be going
up and it's using 37 megabytes of memory
hmm this looks suspicious let's see if
we can track down that problem so one
thing I can try to do is I can try to
prove that it's actually growing yes I
know that seems kind of silly but let's
try it anyway so what I want to do is I
want to see if I can actually prove that
it's increasing I want to identify a
friend over time now one of the problems
is I don't really have a tool for this
okay I'm not doing my job however I can
show you how you can make a tool like
this what Scott can do is because top is
a command-line tool because it's a UNIX
non Mac like tool that means that you
can actually use the various features of
Unix to basically roll your own
so what Scott has done here as is he has
typed in a little shell script while
one-call top with the - shell option
which is basically a one-shot give me
the data and grep which is a search for
the line that starts with thread and
print that out then sleep for a second
and keep looping and if Scott runs that
what we start getting as we get the top
data only grabbing the line for thread
viewer okay so and we can see here that
the amount of resident private memory
which is the last column visible on
where thread viewers obscuring things is
actually growing so geez I've got a
memory leak so what you should take out
of this is that command-line tools
regardless of how Mac like Mac unlike
they maybe still have uses you can use
them to roll your own tools you can run
them via telnet from another machine if
you want to inspect somebody else's
machine and you can run them even when
the window manager is not responding or
when you've got full screen because
you're you're debugging a game so don't
ignore the command-line tools they do
have a place
okay so what are the possible causes for
the memory leak okay one possibilities
we're not throwing away old data any
other suggestions history actually
that's a good one well I'm not quite
sure actually so let's see if we can
track it down one thing I'm going to do
is I'm going to try another command-line
tool it's called heap actually Scott can
try running it but the slide actually
has the output heap is a tool that's
intended to give you a dump of how much
how much memory you're using via malloc
and the important most of it actually is
just saying oh here all the buffers
you're allocating
the important part is the very top
though because one possibility although
a relatively unlikely one is that we
could have fragmentation of our heap you
know we could allocate a million bytes
and then free it and then allocate ten
bytes that's the whole subspace and keep
going so that the amount of heap space
keeps growing even though we're not
actually using a lot of memory what we
can do with heap is we can look and we
can see that there's actually two heaps
what we call zones one for core graphics
and one for other default malloc zone
that represent about say 2.3 megabytes
of memory we can see that from the
overall allocation part we also see that
the total number of nodes malloc the
amount of space were actually using is
about two megabytes okay so two point
three - 2 megabytes is about 300,000
bytes of unallocated memory
this indicates that heap fragmentation
is not our problem okay so this means
it's memory we're actually allocating so
another thing that we could do is we
have two megabytes of memory that's
allocated our virtual memory size or our
resident private size was 38 megabytes
okay that leaves thirty six megabytes we
don't know where it's coming from so
what we can do is we can run another
tool this one's called VM map and what
it does is it gives us information on
how our virtual memory system is laid
out most of the time this isn't
something you really care about but
every now and then it might actually be
interesting and if Scott runs it oh yes
you've got to run that basically what
I'll do is we'll say okay there's a page
at address 0 it's 4
veidt's and it's basically unreadable
there's a page at 4096 and it has the
executable and it runs for 14k or
something and there is the dump
very friendly however it's got some
interesting information and if you ever
wondered about how the application was
laid out in memory this is how you'd
find it out so that when you're looking
in the debugger and you find yourself at
address 101 six you can find out why you
might be there
however the slide actually shows you a
little more detail about what's actually
there
in this case we see that there's a
malloc buffer at address 139 a zero zero
zero and there's about sixteen thousand
bytes so that's space for the heap a
couple others and then below this the
512 thousand bytes allocated for one of
the stacks for one of the threads we see
address 14 B six zero zero zero four
thousand bytes fourteen six D four
thousand bytes one four six F zero zero
zero four thousand bytes this may be the
source of our leak see where it says
four K on the slide here sorry I
actually this is a good reason why I
shouldn't have done this so over on the
slide you can see the port one for
sixty-four thousand bytes readwrite so
the interesting thing is we keep
allocating these four thousand byte
buffers this is kind of odd and that
would have been allocated via VM
allocate that was how you'd actually
create a virtual memory page and I went
in when I found this problem and went
into the debugger and said what do I see
there and what I did if you can you
bring up thread viewer is I found that
basically that page was zero that
virtual memory space was zero except
that there were a couple numbers that
looked oddly like those thread IDs to
the left hand side the five a zero three
five nine zero three which made me think
oh this may have something to do with
like you know a system buffer or
something and it turns out that what's
actually happening here is that the mock
API is sometimes will return you
basically a buffer it'll return you a VM
allocated page that you actually have to
free and the code which is actually in
one of the libraries that that thread
your depends on does something like hey
tell me all the threads that are
associated with this task and you get a
buffer back that contains a
and you need to make sure to create but
that yellow line wasn't there and by
adding that line we got rid of the
memory leak now this is not a case you
will ever run into none of you will ever
ever ever hopefully use the mock calls
to find out what threads exist and this
is something I'm doing as a performance
tool however what you should take out of
this is first of all I had to use
multiple tools to track this down and I
had to compare their output to actually
figure out what the problem is this is
one of the problems with performance
that usually the tools aren't just going
to drop an answer out in your hand plan
on inspecting things and thinking about
them to actually find find out what the
causes are okay okay let's do another
example now we saw that we were using
about 2.3 megabytes of memory in the
heap that is in memory we allocated VM
malloc and that seems a little
suspicious too so what are some of the
reasons why we might actually allocate
that memory well one thing is that we
could just have really big data
structures we can be allocating two
megabytes of data structures because we
want them a second thing is that we
could be caching things even though we
don't need them we're actually saving
them just in case we need them again and
a third case is that we could actually
be creating things and just forgetting
to destroy them and there's a huge
amount of other things but the problem
is that with any of these the more
memory you allocate via malloc you know
mount basically the system will just
keep giving you the memory and
eventually what happens is the system
grows to the point where it won't fit in
physical memory anymore you start
thrashing or you start swapping stuff
out to disk and as a result your system
slows down so there's two tools I'm
going to show you that will help you
track down why memory is being allocated
the first one is called malloc debug and
it tries to show you via a call graph
where memory is allocated so that you
can track down the big allocations the
second one called object Alec tries to
refer to things according to how many
objects exist so we'll start off with
malloc debug and Scott can bring that up
what we do with malloc debug is we
select an application
and we can launch it and Scott can
connect and say okay I want to look at
the dock and we got our little thread
viewer and now Scott can go back to
malloc debug and hit the Update button
and we find out that to get to that
point in the code we needed 2.6
megabytes of memory now thread Viewer
isn't that complex so this seems a
little suspicious to me
now like I said malloc debug works off a
call graph so what it's doing is it's
showing the the various function that
were called so for example Scott can
start clicking on start which is a
secret function in the run time which
you don't need to know about but start
at main okay so main is it main is the
top of our program and the number to the
side of that the one point three
megabytes represents the amount of
memory allocated by main and everything
below it by all the functions it calls
the rest of the memory is allocated on
different thread that isn't started at
main that's why you don't see it and
what we can see is to the right of main
one point three megabytes of memory was
allocated in NS application main and ten
thousand bytes was allocated in calls to
allocate unneeded buffer both of these
were called directly by main and if you
went to the source code you see that
okay and its application main you don't
need to worry about that because that is
the root of the carbon or of the cocoa
framework and that's how you get it
started so of course there's gonna be a
lot of memory allocated the call to
allocate unneeded buffer which calls
malloc allocates a single 10,000 byte
buffer Scott can look at the listing
below where it shows the buffers double
click on it and get a hex dump and we
can find out that that buffer continued
we can look at the buffer and maybe we
can detect what the cause is and it has
a string that says this buffer isn't
really needed okay so that's obviously a
buffer we can get rid of I think but
this shows you how you can walk through
the call graph and you can actually find
where there are allocations and and you
can say hey this looks suspicious and
try to fix it and if you understand how
your program is structured top-down this
makes a lot of sense and you can walk
through the tree looking for things that
are interesting
because this is basically sort of a
scrub your nose on the screen until you
see something interesting kind of
problem now the other thing that we can
do is if is there's times where we don't
really care how we got there from Maine
but we're sort of curious about who was
calling Malik directly to try to assign
blame so Scott can switch the mode of
display from standard to inverted and
this means we're looking at the call
tree from the bottom up so for example
we can see here that there's seven
hundred and forty thousand bytes
allocated in calls to Malik and four
hundred and eighty nine thousand bytes
of that was in was when NS is owned
Malik called Malik that's the Objective
C version of Malik and two hundred
ninety thousand of that wasn't when we
called NS read image which called in us
Zone Malik which called Malik now thread
viewers a pretty simple program it
doesn't have any icons it doesn't have
any pictures and so the idea that NS
read image is responsible for two
hundred and ninety thousand bytes seems
a little suspicious and so we can start
walking up the call graph to see who
called what to get two NS read image now
if we go up past all the app kit stuff
and we can see actually Scott can you
point it where the library in the
application name are so we can see that
we end up in a function called image for
process which is in the thread viewer
library that's what the thing in
parentheses means so this is my code and
this is what's being called to actually
cause those NS read images to be called
now unfortunately I know this code and I
know what that function is that's the
function that handles the attached
dialog box for thread viewer which Scott
can bring up and so even though that
dialog box had been closed those icons
were still around and the reason why was
that basically thread viewer was trying
to be clever it decided it would be
really smart if it could cache those
icons it could keep them in memory after
the dialog box closed just on the off
chance it was going to need them again
now the and that's nice I don't have to
read them in off a disk they're sitting
in memory the problem is because of
virtual memory as soon as the
application got a little too large what
would happen is we take those pages that
hadn't really been touched in a while
and we'd write them out to disk and so
when we actually did bring up that
attached dialog box again well to avoid
reading the disk we'd go to memory which
would require reading the disk once
again you know the idea of saving memory
is pretty important because it's
probably gonna be cheaper just to get
the stupid thing off disk and so the
idea of caching those icons was really
stupid what would have been much better
was when the dialog box went away if we
just got rid of those icons and then we
recreated them on the fly the next time
we open that dialog box because it was a
relatively infrequent occurrence oh one
thing I haven't mentioned here that I
should is that everything that you saw
in malloc debug refers to currently
allocated blocks so if something was
allocated and then freed you won't see
it in malloc debug so that's something
to keep in mind as you use it now malloc
debug has a number of other features
that are useful for example one of the
things that can do is it can help you
identify leaks in your program places
where you're allocating memory and then
you forgot to forget to deallocate it
now these are are really important to
track down because leaked memory stuff
that you aren't you're forgetting to
free and never free basically sit in
memory they occupy space that keep you
from having the stuff you're using close
together and if you have a long-running
app that goes for days like a server app
your memory just keeps growing and
growing and growing until 30 days into
the thing
suddenly the app crashes and you have
absolutely no idea why so and even if
you have a small short app this is still
going to affect performance you know
this this is a particularly strong point
for me because there's at least one game
that I've been trying to play lately
that after about four hours of playing
it has a memory leak and eventually sort
of runs out of memory and so please if
you write games don't do this so malloc
debug can help us detect this what we
can do is we can switch from show me all
things or show me the things that have
recently changed which is new and we can
switch to the definitely commode show me
only the buffers that aren't referenced
anywhere in memory the way it does that
is basically garbage collection it goes
scanning in memory looking for pointers
to the buffer and any buffer in that's
been malloc that it can't find a pointer
to it assumes
on reference because if we don't have a
pointer to it there's absolutely no way
we can free it and we see here we have
about ten thousand bytes leaked and
Scott can click on the Malek and see
that there's a case an image for process
interesting where we're allocating a
couple buffers and we've lost the
pointers to them so we can't free them I
actually looked at this code what I'm
doing here is when I go and read the
icons basically I have to make a list of
or I have to grab the command line to
find the icon so you don't want to know
and I keep a buffer around for that list
of arguments and in most cases at the
very end of the routine I actually free
it however in a few cases I would find
that there was an extraordinary
situation that I knew I wasn't going to
find an icon and so I would basically
just exit the pro-x at that routine but
I forgot to release the buffer no one
else does this right so so yeah this
really is just one of those things that
I wanted cuz it helps me so so this is
another case where malloc debug helped
us and in fact it's showing us another
lesson of software engineering which is
if you keep finding bugs in the same
code maybe this means you need to
rewrite this routine now there's a
couple little things to remember one is
that you may say oh you're only leaking
10,000 bytes
who really cares what's 10,000 bytes
however the way that malloc debugs
leak detection algorithm works as you
remember was if you can't find a pointer
to this buffer anywhere then it must be
leaked this first of all means it
doesn't do well with circularly linked
lists because everything will have a
pointer to it and so it'll never be
linked or I mean he'll never be leaked
also that means that if you have a tree
data structure the root doesn't have any
pointers to it but everything else is
pointed to and so it's never leaked so
every leak that you see in malloc debug
may be important and so you should go
track it down finally one more thing
about malloc debug this doesn't
necessarily impact performance but it
does impact correctness malloc debug can
also help you track down pointer
problems and that's and what happens is
that there's a number of bugs that are
really subtle intermittent
examples include cases where you free a
buffer but you continue to read and
write from it even though somebody else
now has the buffer these are miserable
suddenly data is changing and you have
no idea why the second kind of bug that
is really nasty is when you have buffer
overruns where you have let's say a
string that's 40 bytes long but you
write 45 bytes into it and so suddenly
you trash the next thing after it malloc
debug can help you track down both of
these
excuse me malloc debug can help you find
both of these and the reason why I say
help is because what malloc debug does
is it tries to encourage your program to
crash if it does stupid things what it
does and here scotts brought up the
memory dump let's see do we have any so
one of the things that it does is when
you free memory malloc debug carefully
goes and it erases the memmer the
contents of that buffer and it replaces
it with 55 hex so we would have seen a
lot of cases actually there we are of 5
5 5 5 5 5 5 5 so if you try to read data
from this buffer that you've freed
you'll get garbage and hopefully you'll
crash or you'll behave badly if you try
to treat those as pointers it's even
better because 55 55 55 55 is almost
always unallocated memory and as soon as
you touch it basically your app crashes
so if your application ever crashes in
malloc debug its trying to tell you
something
hook up with a debug or there's
instructions in malloc debug to tell you
how to track down these kind of bugs and
find the pointer problem because it'll
save you a lot of grief later the other
thing that malloc debug does is it puts
guard words on each end of a buffer at
the beginning of the buffer it puts the
string dead beef the first part that's
highlighted so de adb EEF then you have
the buffer and then at the end you have
the thing beef dead I did not make this
up it just came this way and and what
will happen is that if you over run the
buffer malloc debug checks occasionally
on freeze to see whether anything
happens and if it ever finds that those
bytes have been changed what it will do
is it
actually print out a warning
unfortunately the warning goes out to
the console so please keep the console
open while you're running malloc debug
yes this is lame we're working on it
we'll try to do something about it
okay thanks for that Scott the second
tool I'll show you is called object Alec
and it's intended for helping you
understand how many objects you have
rather than where they were allocated
specifically it's mostly useful if
you're doing objective-c work or if
you're doing core foundation what we can
do is again select an application in
this case it doesn't understand bundles
and we can start the program running and
what happens is it gives us this little
histogram it shows us a bunch of numbers
grouped by the type of object and these
little histograms for showing how many
objects we've created the first number
and the darkest bar represent the
current number of objects that exists
the second bar represents the peak
number of objects of that type that were
created during the run okay so the peak
number that ever existed at one time the
final bar the lightest color represents
the total number of objects of that type
now
this way of organizing things is really
nice for certain types of tasks for
example well actually let me mention one
other thing one thing that's good for is
identifying trends so we can see motion
here we can watch the numbers growing
and so it's very nice for seeing that
memory use is expanding for example a
second thing is it can help us when
we're trying to prove various statements
about our program so for example the
thread viewer display has basically
keeps sampling the application finding
out information about how it's currently
running and throws those on the
right-hand side of the display okay the
information that Scrolls off the Left
disappears and the way that's done is
with a data structure called a thread
data and what happens is the new thread
data's get put on the right side of a
big array and when the information
becomes out-of-date because it scrolled
off they get thrown off on the left-hand
side and hopefully the objects are
destroyed
so one bug I could really imagine doing
is that I could be forgetting to delete
them correctly resulting in the number
of
red date is growing without bounds and
eventually my system performance would
degrade so we can prove that then Scot
has actually done this we can actually
click we can find thread data in object
Alex display and see how many objects we
have and what we see here is that the
current number of thread data objects is
various it goes between about 45 and 55
the peak number was 62 but we noticed
that total number is about 1400 and
growing so this implies to me that I
actually did this correctly so this
isn't a bug we're keeping the correct
number of objects in sort of our ring
buffer throwing new items on the right
hand side and pulling objects off the
left so object Alec has been able to
help us prove that one other thing how
many of you are objective-c programmers
or interested in being objective-c
programmers okay you as you do test code
and make sure you use object Alec
because one of the things that it's
really great at that you may have seen
when we actually brought up the attach
panel is that you can tell it to keep
track of every single time that you
retain or you release a data structure
basically Objective C has reference
counting and it only deletes the object
when it you haven't retained it anymore
and so if you have a program you can
figure out whether you're actually
destroying objects correctly and if
you're not you can find out when you
retained it one too many times to keep
it around in memory so use object Alec
especially on the example code thank you
Scott okay so that's memory the third
case of performance we might try to
track down is what code is executing now
there's a number of ways that we could
be using too much CPU we could be
executing code that we don't need to
either that it's dead code or that it
somehow is not really doing a value that
calculating a value we care about we
could have an algorithm that's much more
expensive than we ever expected
something that's let's say a quadratic
you know an N squared algorithm rather
than linear
we could have cases where there's some
operation that's much more expensive
than we thought one example that was
pointed out in the carbon performance
session was that now that we have home
directories that could be out on an NFS
server when you go and get your
preferences you
be going across the network and so
something that may have been a really
quick grab that from the disk kind of
operation suddenly may take seconds to
actually get a result back and so you
may not have expected that certain
operations would be as time-consuming as
they are as you may have seen in obvious
keynote there's also the problem that
you may be checking for events by
polling by constantly checking and
seeing where the mouse is for example
rather than waiting for something to
happen and having the system say hey by
the way something changed in general the
the normal way that you solve this kind
of problem is the law of is is basically
tracked down the biggest problem tracked
down the most expensive routine because
if you can cut this cut the cpu cost of
that biggest routine if you can make it
faster you're going to improve the
performance of your system as a whole so
don't try doing the little things try
doing the big things first so so what
we'll do is we if we want to try to
improve performance what we want to do
is find the expensive calls improve them
the tool for doing this or at least one
of them is called sampler which Scott
has just brought up so with sampler we
can select an application or we can
connect to something that already exists
and what we can do after running it is
start sampling that and what sampling
means is that we stop the program
occasionally and ask what's going on so
every 20 milliseconds we stop the
program and we say where are you
executing and we get basically a stack
back trace then we let the program
continue stop it again get another back
trace and keep going and the advantage
of this is that for a relatively little
impact on the running system we can
actually find out what code is most
likely to be running this is statistical
we don't know all the things that ran in
between but we've got some good idea of
what we were actually seeing so what
Scott I believe has done is he's
actually done some sampling wealth well
thread viewers drawing so let's see if
we can find out how what thread viewer
is doing when it's running personally
I'm very concerned about how much time
it's taking to actually grab its samples
to find out what's running in thread
viewer and I want to find out how much
time thread you are spending drawing now
I know a little about this
program I know that thread three happens
to be where the sample where the where
the data gathering goes on in thread
viewer and we see that thread three was
found executing 486 times that's the
number of samples that were taken all of
those samples occurred in the function P
thread body which was calling sample
threads okay which isn't like which is
my code and every time that we stopped
sample thread we found one of two things
four hundred and seventy of the times we
found ourselves in you sleep which is a
way of basically stopping for a few
microseconds to wait for something to
happen or actually wait for a fixed time
in sixteen of the times we stopped it
out of 476 though we found ourselves in
this function called thread viewer
controller log which happens to be the
code that actually does the the logging
that gets the information for thread
viewer and if we look at that we find on
the far right a sample stack and we find
out that most of the time was actually
doing what's called sample once all
threads which is getting the stack back
trace which thread you er can actually
display so what's happening here is that
we found that basically about four
percent of the time that we actually
looked at thread three it was actually
doing something it was actually
gathering data and the rest of the time
it was doing nothing this seems pretty
good this means that the the data
gathering is relatively cheap and that's
really good for a performance tool so
Scott can actually just sort of ignore
all that because we don't know because
it doesn't look like there's a
performance problem
well prune that out of the tree so we
don't have to look at it and now we can
look at thread zero which is the main
thread where the drawing goes on and
there were about seven hundred and
twenty times that there was sampling or
that we sampled the main thread and most
of the time it was in main which is not
surprising and then it goes down into
this DPS next event and the block until
next event is basically sitting in the
run loop it's not really doing very much
so we can see here that that 570 out of
the 720 times we were sitting in CF run
loop okay so this is kind of interesting
so 572 samples 552 of the times that we
found ourselves in CF run loop run we
were actually in a function called Mach
message geez why are we spending so much
time in Mach message well I'll give you
a hint that actually happens to be a
kernel routine so obviously there's
something wrong with the kernel because
you know we're just sitting there in
Mach message all the time actually does
anybody know what's going on there thank
you very much we are waiting for a Mach
message
okay so Mach message is basically saying
sending off a message probably sending
it off to like the window server or to
whoever's giving us events saying hey
let me know when something actually that
I actually care about happens like the
mouse moves or we need to do a redraw
and so most of the time we're going to
find ourselves in Mach message overwrite
trap please do not open a bug against
the kernel saying hey you guys I keep
finding my code running in here okay
that's why you see Mach message
overwrite trap the rest of the time
however we find ourselves in see up run
loop and if we look up the sample stack
and look for where the numbering change
is where where the tree diverges so to
speak we find ourselves eventually in
thread view draw rect and we find that
only in 17 out of those 700 samples okay
once again maybe 3 or 4 percent of the
time we found ourselves in thread view
draw rect which is actually the thing
for doing the drawing and 10 of the
times that we sampled it and found it in
draw rect we found it doing some NS
string drawing and the rest of the time
we found it drawing rectangles so this
implies that the drawing code is pretty
efficient too though we weren't spending
very much time doing it and this
suggests that if I wanted to up the
redraw speed so that thread Euler wasn't
just sort of flashing the screen every
second redrawing the display I could
probably make that animation much better
so that's a good thing to know okay so
we didn't find a bug but we learned
something about how we're actually going
how we could improve this application so
one other thing as I said this is
sampling there are
few other tools for helping you out
there's a tool called sample which gives
you similar data that's a command-line
tool that's really good for finding out
why the machine is hanging for example
or why an application is is stuck in a
loop you can actually run sample and get
a stack back trace the other tool that
you may want to know about is G profit
that's the standard UNIX profiler we
actually have that on our system it
generates a text report saying here's
the code that's running if you want
slightly more accurate data g prof is a
better way to go however it requires you
to recompile your program sampler malloc
debug and the others don't require you
to do to recompile and so they're much
easier to use now another way that we
could be having performance problems is
if we're using the disk badly that is if
we are trying to read the disk at the
wrong time and so on in the carbon
session they actually went through how
important it was not to try to do disk
accesses for example when you're reading
when you're doing drawing because of the
possibility of blocking and slowing down
your drawing and also to minimize the
amount of reads and writes you do during
application launch to try to make the
application launch as fast as possible
so one thing we can imagine is trying to
understand how the application uses the
disk what files it tries to access now
luckily there's a really cool tool that
will actually help us with this it's
it's also a command-line tool and it's
called FS usage
and what FS usage does is it basically
dumps out a text report for a given
process and it actually tells us every
single file system call that we do every
open and close and read and write and
get directories and the like now it has
to be run this route because it's
actually a security hole because you
could in theory find out what other
people are doing with it remember this
is the fun of multi-user operating
systems and Scott did not expose his
password to unlike me in a previous demo
and what we get is is we can say FS
usage for thread viewer and we can see
the reads and writes and in fact we can
find the name of the file we can find in
the far right column the amount of time
it took now this is a relatively boring
example but you can imagine if you ran
this for example on simple text and
actually that's a take home a bit of
homework for you all go home try running
FS usage on simple text when it starts
up and watch what filesystem accesses it
does to go and get the list of fonts and
get the resources and like and you'll be
surprised so what we can see here though
if we get back to my problem is we find
that thread viewer is every second doing
an open F stat a right and too close and
all of these are taking less than a
millisecond they're taking like two ten
thousandth of a second to do according
to numbers on the far right but this
seems a little wrong to me and we can
see the file that we're doing is is
slash temp slash thread viewer log and
we could actually go and look at that
file if we needed to to try to
understand what was going on actually
don't worry about that so so FS usage
has shown us that thread viewer is doing
something really brain-dead so the
question is where is that brain deadness
in my code and thread viewer I mean uh
FS usage doesn't tell us that
luckily sampler has a mode that will
actually help us on this problem rather
than just doing CPU sampling sampler
will let us do several other things it
has a mode that helps us track down
Malick's which is very similar to malloc
debug it also has a mode called watch
for file actions which will
instead of stopping the program and
getting a stack back trace every time
that every 50 milliseconds it will do
that every time you call one of the
system file routines or it will crash
let's try that again who did we kill the
doc no good now so one of the problems
with thread viewer is because it's a
performance tool and because we're
running it up it has this tendency to
stop applications when it's looking at
them so that it can snarf their memory
in the like and one of the problems with
demoing it which I'm not sure why I was
silly enough to do that is that if
thread viewer manages to crash when it
has stopped the program there's this
nasty habit that suddenly the doc is
hung which makes for really good demos
because suddenly you're trying
desperately to get the system back
luckily this didn't happen okay so
Scott's got this up and can basically
run the program and sampler for a while
and then can hit update and can get a
list of all the places where allocations
occurred here we get the normal call
tree starting at the root starting at
main it's a little more interesting to
go with the invert call tree option here
and here we can see that there were 380
places where we did read 112 372 where
we did LC 128 opens I'm like and so I
think we were doing opens and writes so
let's click on open and we see that of
those 128 opens we see a number in CF
read byte and like the one that's
probably interesting is the F open call
there is that it yes which happens to be
in the thread view controller get sample
array and what's happening here is that
in my code for gathering the information
on the thread for some really silly
reason I put in basically a little loop
that said I think I've got that
something like this open this file right
out the samples to the disk close the
file okay and because I'm doing this
every second that's relatively
inefficient it didn't affect thread
viewer but you can imagine if you had
this in your code you might want to know
about the fact you were opening and
closing the same file a lot of times it
might be more efficient if I done
something like just open the file once
and then just kept writing to it every
time I needed
data or I could just yank this code out
because it's actually pointless in this
program okay
so you've seen two ways that you can use
sampler both to stop the program
occasionally to find out what codes
executing and you've seen a couple of
the cool another of the cool features of
sampler which is to look at file system
accesses the final type of problem I'm
going to tell you about is drawing
because all of our applications are
graphics based all the good ones that we
do now most most of the good ones we do
it's very important it drawing is very
important because it's how we
communicate with the user you know all
the value of the Macintosh is basically
presenting things to users in graphical
ways so that they can understand things
so that they can do the creative work
and let the computer do all the boring
stuff and so drawing is a key issue and
you want the drawing to be as efficient
as possible so your application runs
well the problem is that if you do too
much drawing you're going to use CPU
time you're going to use memory because
you've got buffers you're going to be
using mock messages as we saw to
communicate with the window manager
which means we'll be blocking and so too
much drawing will cause a lot of
blocking will cause things it will be
too much work and so we want to minimize
that so what we can do is there's a
really cool tool done by the core
graphics team
it's called quartz debug and Scott will
first bring up thread viewer again our
sacrificial victim
and courts debug has a number of
features the two I'll show you first of
all it has this option called flat
screen update and what flash screen
update is anytime that it has to redraw
any part of the screen courts reports
debug tells the Windows server to
actually color that in yellow and so
that makes the amount of drawing
explicit so you can actually see what
goes on in the dock yeah we actually do
a lot of work there so that's really
cool you know it's a and so if you had
some case where let's say during the
same drawing cycle you were redrawing
the same thing twice this would be a way
to tell another thing it points out is
that the way that I handled the drawing
and thread viewer is that I just erased
the entire portion that I'm animating
and redraw the entire thing maybe it
would be more efficient if I actually
just redrew the parts that changed every
time that I got some new samples oh oh
there's a really cool feature here I
forgot to show you one of the things
that you can do with thread viewer is
that if the program has something
interesting there you don't want it
scrolling off the left and I don't have
history because I've forgotten to add
that so what you can do is you can press
that pause button and the pause button
stops the application that I'm looking
at in this case the dock and it also
stops the sampling because the programs
not running so I don't need to gather
any data Oh however someone wasn't very
bright and when he implemented the pause
button although he pauses the sampling
or the thread data gathering he didn't
bother to stop the display and so the
display that was on a timer so that
every second it would cause a redraw
gets redrawn every single second and so
this is a bug that would be extremely
hard to track down in any other way you
know it would be very hard to see that
when you hit the pause button this
happens and if you were using a tool
like sample or excuse me like sampler
even with that you might not realize
that the reason you were calling draw
rect was because you'd forgotten that
bit of logic the nice thing about quartz
debug is it makes it easy to perceive
the drawing problems it makes it direct
and so you can immediately see what the
problems are
and so you know this alone is a
wonderful feature now another feature in
quartz debug is what's called show
window list and this tells you what the
window manager thinks all the windows it
knows about are and as we can see there
are actually about six windows that are
part of quartz debug even though only
once on the screen hmm so thread viewer
actually has six windows open some of
them are off screen some are actually
one is appearing and the problem is that
every single window we create whether
it's an off-screen window whether in
this case some of those are actually
windows that we created via interface
builder that are just not appearing
until we actually bring them up all
those windows need to have space in the
window manager and so they occupy memory
and thus contribute to our memory
footprint and contribute to the chance
we might be swapping once again this is
something you may not know you might not
realize how many windows you actually
create and therefore quartz debug
actually gives us a way to find that out
and it tells us exactly how many windows
we've created and so now I could go in
and I could try finding out exactly
which off each of those windows was and
in the case of dialog boxes make sure I
only create them when I actually need
them as opposed to keeping them up all
the time okay I didn't go through all
the tools today there were a couple that
you may want to examine on your own the
first one is called SC usage it's
somewhat like FS usage only it looks at
some of the system calls like get time
of day or Mach message and it'll tell
you how many calls you're making to that
and you may find some interesting
behavior that you didn't expect in your
system secondly there were a number of
tools I didn't mention about heap use so
for example we saw a little about heap
but the idea of being able to get
basically a text output describing all
the buffers you've allocated may be
interesting to you and so that may be
useful there's also a command-line tool
called leaks which is like the leak
detection in malloc debug its leak
detection algorithm is actually a bit
better than malloc do bugs it'll
actually find any buffers that aren't
referenced from things that are
reachable from from well-known spots so
to speak it'll only find things it'll
actually find groups of data structures
that are leaked and so it's actually
more useful
there's also a tool called Mallik
history which I mentioned earlier that
will actually help you identify for a
given allocation for a given address
like you know 0x e 1 C 0 for who is
responsible for that who actually
allocated that block so here in the
debugger and you find something
interesting you could actually look at
that there are also a couple tools for
understanding how your application is
running so we saw a sampler sample as
the command-line equivalent and we had a
quick introduction to VM map which is
useful for understanding how virtual
memory is laid out in your application
if you're coming over from the nine side
everything is completely different in
and it may be interesting to actually
look at that and realize how memory is
laid out as I said before I was really
trying to just tease you saying these
are the cool tools there are a number of
hints that I should stress again or
stress for the first time the first one
is that all these tools have a very nice
property which is that you don't need to
recompile your code you don't need to
instrument it you just run the tools and
they work this makes them much more
available it's very easy to just sort of
go in and look at your own app look at
other people's apps you know if you're
curious about how some of Apple's own
applications do disk stuff you can
actually use some of these tools to find
out how they're accessing the disk so
that's a big advantage take advantage of
it second if you're coming if you're
working on code warrior and you're using
CFM binaries as your output so that you
can work on nine and ten you need to do
a little work to actually get the
performance tools to find information in
the program to get the symbolic
information first of all you need to
make sure that you compile your code
with the in-line traceback table option
on this is part of the code generation
settings and basically this says put the
name of the function
excuse-me put the name of the function
immediately after the code in the binary
second code wire gives you the chance to
actually use its version of malloc
instead of using the system's version of
malloc if you use the code warrior 2 or
the code warrior version basically it
asks malloc for a huge buffer and then
it subdivides it and hands it out if you
do that then tools like malloc debug
heap leaks and the like won't be able to
help you with memory analysis so make
sure that you actually turn on that
option and unfortunately I'm not quite
sure exactly where it is third object
alik although a nice tool does have the
problem that it doesn't understand what
CFM apps are which is probably not that
big a deal because it really understands
core foundation and objective-c only
however if you want to look at it and at
least see how many objects of malloc
size 20 you have which it will tell you
you need to actually select the launch
CFM app hidden in the system folder and
then make sure that you actually name
the application you're running on the
command line just as if you were trying
to run the application from the command
line okay so that's my presentation for
today
the as I said in two months you're about
to hopefully all ship your apps at
Macworld New York and you want to make
sure that you give the best impression
to your customers so start tuning your
code
the primary thing you want to do is cut
memory use in all ways that's going to
be the the best way to actually make
your programs efficient and so take a
look at heap take a look or take a look
at how you're using the heap take a look
at how you're using memory take a look
at your private memory use also remember
that you have that just looking at the
programming isolation isn't going to be
useful make sure to write down some
metrics measure how much memory you're
using measure how long common operations
take decide whether it owes are
appropriate and then compare them across
multiple build so you can note
regressions also remember that your
application is not just your binary but
is also some of the servers that you
connect to and so make sure in the case
of drawing to look at both the window
manager and at your application and go
out there and please create some great
apps and I'll be looking forward to
seeing them at Macworld thank you very
much
and thank you Scott
oh damn I forgot to go had a slide if
you want more information about these
tools first of all they're all available
on the developer tools CD you've all got
a copy of it go off and play with them
if you want documentation all the
graphical applications have
documentation built in all the command
line tools have man pages online as
standard UNIX tools should there is also
documentation in the release notes
section as I mentioned before there are
also books to help you inside Mac os10
performance is a really cool book that
talks about how to tune your application
it gives you information from the level
of how the system works to documenting
the tools and all of us engineers
actually tried to contribute to this
also there's a Mac os10 system overview
book and this is really good for
understanding just sort of the overall
ideas behind Mac OS 10 and we try to
suggest that people actually look at
this so that they understand some of the
terminology and with that I will turn it
over to Godfrey thank you very much
Robert
and Scott so the last session of the day
information resources we've put up in
all of our other tool sessions and the
information remains pretty much the same
so a roadmap we wanted to point you to
sessions 121 and 122 even though they've
already occurred so that when you go to
the the DVDs that you'll receive after
the show you'll see some other areas
where we talk about performance tuning
tomorrow our last sessions for the tools
track happen in Hall 2 at 9:00 a.m.
that's the debugging of Mac OS 10 and
the feedback form for Apple developer
tools at 3:30 p.m. in j1 please attend
we've very interested to hear your
feedback at the end of the day if you
have questions on tools you can contact
me I am the technology manager for
development tools
that's my information up above and the
developer tools feedback at Mac OS 10
tools feedback group.com
you