WWDC2000 Session 195

Transcript

Kind: captions
Language: en
hi my name is Robert vote
and today I'm going to talk about
or Apple's performance tools for Mac
os10 and hopefully we're actually going
to have a demo so the first question or
the way I'd like to start this is why
are you all here why should you actually
care about performance tools especially
today well it turns out that with the
changes in Mac OS 10 this is a perfect
time to be very concerned about the
performance of your applications we're
all working on a new operating system
where the libraries that we're used to
are now working on different system
routines and may not behave the way that
we're used to them behaving as a result
we need to actually take a look at our
apps and decide whether the system calls
and the library calls that we used to be
doing actually have the same book
performance that they used to whether
there's any changes and their semantics
and what operations they do whether
there's changes in how they behave in
addition some of the algorithms we may
have chosen in the past may no longer
work as well in Mac OS 10 and here's
three examples that actually come out of
some of my experience at Apple the first
one is the difference in that in how the
heap is done for example in Mac OS 10 we
no longer have fixed sized heaps instead
the heap will expand as far as it needs
to as long as you keep allocating memory
as a result the idea of allocating
memory and then setting the purgable bit
doesn't make sense anymore because the
operating system is never going to
bother to purge this memory there was
one case in the finder actually where
they were loading in the background
image they would load the compressed
image into a buffer then they would
uncompress it into another buffer and
then they had a third copy for as a
working copy that was marked as purgable
and the idea was that if the memory was
ever needed that copy would get blown
away and it could be recreated easily on
Mac OS 10 at least two of these buffers
weren't really necessary the idea of the
purgable case didn't really make sense
because it was never going away and the
coffee that was on the version of the
file on disk wasn't needed to be copied
into memory because we have memory
mapped files and so with cases like that
you need to worry about exactly what
your apps doing with memory similarly
the case of polling is much more
expensive on a multi task operating
system than when you're only expecting
one application really beat taking
control the CPU at a time
if you're sitting around looking out on
the network looking on the filesystem
profile to appear waiting for the mouse
to move those are cycles that are being
used by the CPU that can't be used for
other applications and so you don't want
to pull on Mac os10 because you're going
to drag down the performance of all the
other things that might be running in
the background and finally because we're
no longer operating in a single address
space the idea of inter-process
communication becomes a bit more
difficult
we can't just sort of pass a pointer and
provide another app a sneaky way to look
into our memory instead we need to
explicitly use one of the real IPC
mechanism such as Mach messaging or
tcp/ip or we need to use shared memory
or we actually need to map memory into
both processes using the mock underlying
virtual memory mechanisms in addition
many of the tools that we're used to
using may no longer work or may not make
sense anymore
a good example of this is even better
bus error this is a quick and dirty tool
that will basically make sure that your
app is not writing or reading from
address 0 by putting a bogus value there
on Mac OS 10 this is not necessary
anymore because the operating system by
default makes sure that for every task
the first page of memory ends up being
non readable non writable if your
application tries to read or write to it
boom it crashes you get an immediate
feedback that you're doing something
badly isn't that nice
in other cases there's there's new tasks
that may be necessary there's other
cases such as understanding about
purgable and non purgable that no longer
matter and so you need to understand
different sets of tools and so hopefully
what you'll learn today are some ideas
about what tools are out there and
perhaps what tools are necessary as some
ideas about third-party things that can
be filled in so as an overview I'm going
to start out by talking about two
classes of tools the first set of tools
are a set of unix-like command line
tools that give you information about
the low-level state of the system the
second set of tools have some graphical
and exploratory tools that actually give
you a higher level understanding about
how your application is running some of
these may be familiar to you such as
malloc debug or sampler for each tool
I'm going to try to give you a little
bit of background about how it's used
what its purpose is and also hopefully
give you enough excitement to make you
want to go off and try these on your own
and explore them for each of them I'll
also try to give some of the details
about how you interpret its data and how
to use it to actually analyze your
system however this is going to be a
survey there's just not enough time to
really go into depth about what's going
on and so hopefully this will at least
for you to explore and ask questions
finally there's two other themes I'm
going to try to keep going through as I
talk the first one is that I want to
tell you a little about how you might
try to approach performance problems
these won't be a very high level but
hopefully these will be some tricks the
second issue I'm going to try to do is
give you some little hints about
performance problems that I've seen such
as what I talked about on the last slide
once again I'm not going to be able to
go into detail on these if you're
looking for specific details about how
to make your calls to let's say
corefoundation more efficient or
two-carbon talking to the people who are
responsible for those libraries going to
those sessions such as a choreographic
session yesterday or some of the carbon
sessions or the core foundation sessions
will give you more ideas about some of
the the obvious things you should be
doing to make your app more efficient so
Scott how are we doing ah okay so let's
start off with command-line performance
tools how many how many of you have
experience have actually used Unix good
most of you actually will have a leg up
how many of you think that command-line
tools are the work of the devil okay
thank you Scott well actually there's
some very good reasons to have these the
first one the tools that we have here
are basically meant to be quick and
dirty tool to give you information about
the state of your machine and there's
three really good reasons why you want
to use them the first one is that
they're minimally invasive that is when
you actually use these to analyze your
system you're going to get more of an
idea about how your system or how your
application is behaving on your computer
as opposed to how the tool is actually
affecting how your app runs on the
computer the second thing is that
because all the
two command-line tools that means you
can actually run them remotely if you
don't want to upset the screen if the
machine is hung you can login via telnet
and you can run these commands and find
out what's going on and finally because
all of the command-line tools are
basically just text-based applications
you can use any of the UNIX filter
commands to convert the data into a
format you like if you want to see let's
say every 10 seconds how much memory
your application is using you can easily
write a little script that goes around
and every 10 seconds pulls one of the
tools to actually find out how much
memory is being used and so in this way
you can sort of roll your own without
having to do anything too deep the first
tool that I list here is actually PS
which is a standard UNIX tool that
stands for process status it gives you
information about what processes are
running on the machine it tells you
about how much memory is used and so on
I'm not actually going to talk about
that because it ends up because there's
some other things that might be more
useful ok so let's take a look first at
top top is something that you can use
instead of PS to find out about the
state of your system it's actually
something that comes that there's some
implementations of top on other
unix-like operating systems this one was
specifically written by us and what it
does as you can see is it gives you a
list of the process is ranked in
basically newest to oldest order at the
top it gives you information about the
status of the system it starts out
saying what the load average is what the
average number of runnable tasks happens
to be it tells you about how many
processes there are how much memory the
line starting with memory shows you how
much memory is wired that is dedicated
to uses of the kernel only the second
line shows you how much memory is active
in active blah blah blah below that you
can see how much virtual memory there is
there's currently 688 megabytes of
memory allocated to virtual memory not
all that may actually have memory in it
but that's how much the virtual memory
system thinks it has in addition it
shows how many pages have been put out
to disk and brought back in with the
page ins and page outs and the number in
parentheses there is important because
that's actually a delta that shows you
how many pages have changed in the last
second
why don't we run QuickTime Player so we
actually get something interesting here
and what we'll do is we'll simply run
QuickTime Player and let's let's think
of a hypothetical problem let's assume
that we're working on the player and
we're finding that the framerate doesn't
seem high enough and we're not sure
whether we're correctly throttling it
down for some reason or if we're not
getting enough CPU this is actually not
a problem as far as I know but it's a
good story
so what we can see here is on the second
line you see launch CFM app here's a bit
of trivia the QuickTime Player is
actually a pest executable it's in the
same format that you would have seen on
Mac OS 9 and as a result whenever you
try to execute one of those on Mac OS 10
the launch CFM app serves as a wrapper
to actually load that into memory and so
that's why you don't actually see
QuickTime Player in the list of
processes and what we can see is that
the QuickTime Player is using about 25
to 30 percent of the CPU we get the
elapsed time the number of threads the
number of Mach ports which is an
abstraction for communicating between
the kernel and the system and and the
application other interesting things
include the are private which is the
amount of private memory memory that is
only for this particular running version
of the application that's different from
all the memory that's needed that can be
shared between multiple copies of
QuickTime Player if we had multiple ones
running or other applications using the
same libraries so that our private is a
good measure of how much memory your
application is is using right now and
our shared shows how much memory is
being used for the application itself
which can be shared and all the
libraries which can be shared in memory
mapped files and all those things that
aren't only dedicated to one application
now we can look at this and we can say
gee we're only using about a third of
the CPU what's going on here are we
spending too much time on disk are we
throttling well one thing we can do is
we can look down the list and we can
understand whether our application is
doing anything bizarre it depends on the
rest of the system and in this case
there certainly is we see at the bottom
actually in a lot of
people probably well can everyone see
the line starting with 50 window man
never mind I'll just read it out down
towards the bottom there's a line that
says 50 window manager it's using about
20% of the CPU it's run for about 51
seconds and so on what's happening here
is that the window manager is actually
responsible for doing the drawing to the
hardware and so all the applications end
up talking to the window manager and so
it's not too surprising to see execution
divided between the two because the
QuickTime View player is spending some
of its time getting all the images ready
its shipping them off to the window
manager and then the window manager
blasts them up on the screen so we're
seeing that we're spending about 50 60 %
of the CPU actually doing meaningful
computation and filling up the CPU
what's happening with the rest of the
time well there's some other tools that
we could do let's go on the hypothesis
that maybe there's something going on
with the disk there's another tool and
before I and one of the nice things
about top is that it has a huge number
of extra modes and features hidden
please check the man page there's
probably some view that's perfect for a
performance problem you're trying to
track down but I'm not going to show
them all if we're trying to go to the
file system though and understand how
we're using that there's another command
that might be useful and that's called
FS usage with FS usage we name either
name application or we name a process
and we hit return
actually just to QuickTime play or
actually do everything okay well what
we're going to do is we're going to get
a huge amount of information and if
Scott hits the spacebar we'll start
seeing it and what you're seeing here
are all the accesses to the disk that
are going on so you're actually seeing
the file system system calls being
performed and we can see what the act
what was being done like read or write
or page ins page outs doing the status
of a disk that sort of thing we find out
how much time was elapsed and whether it
actually had to give up the CPU to
another task to let that transaction
finish and the application responsible
if Scott actually widened that window
we'll get some more information
we'll know exactly which file handle was
accessing that and how many yeah there
we go
it'll actually say what file handle it
was in the process and how many bytes
and what we can see here is that the
QuickTime Player is getting chunks of 16
thousand bytes at 32 thousand bytes and
we don't see any cases where it's having
to wait too long so that probably means
we're not having to do anything too
weird with the disk and we're not
waiting for stuff to come off the disk
maybe another thing that we could check
out is we could ask about how the memory
is laid out are we using a lot of malloc
space and the like or just if we were
curious about how applications are laid
out in memory in Mac OS 10 we might want
to have some sort of a tool for
visualizing that and there's another
command-line tool called vm map and what
we can do is named vm map we can specify
the process ID or the name of the task
and vm map will give us a listing of all
the regions of memory where they start
how much space they are it will actually
start off telling us only the readable
regions where the non writable and then
it will tell us the writable ones at the
end and what we can see here of interest
is on the first line we see a symbolic
name page 0 we see a starting address
which is 0 it's 4 kilobytes then we see
the permissions which is in the UNIX
style octal and everybody knows how to
read octal of course oh my saying that
page 0 is actually 0 slash 0 which means
that it's non readable non writable
that's the thing that's saving us from
doing page 0 accesses if you try to
dereference you know a pointer which
actually has the value 12 you'll know
about it you'll be able to catch those
immediately you're not going to have to
worry about strange memory corruptions
and the like below that we can see the
application starting an addressing 1000
there's a couple places that are cut off
with the rd which are guard pages which
are again non readable non writable
pages at the end of stacks for the
various threads so that if you go over
the end of if you fill the stack it's
not going to crash assistant or it's not
going to trash memory it's simply going
to crash when it hits that
and you can see all the libraries
starting at address for one-30 and going
down and you can see the names of the
files that are being loaded as libraries
along the right hand side if this is too
small
don't worry try it at home hopefully it
will make perfect sense if we go down a
little further we'll actually see the
writable regions and here we can start
seeing things like the malloc allocated
regions and so we can find out which
pages malloc where a was placed at and
most of the malloc buffers were actually
placed right below the application
another tool that might be useful is we
might be asking ourselves well is is the
application running slow because we're
doing some obnoxious system call that's
just hanging forever and there's another
tool called FC usage and what sv usage
will do is it's going to look for all
the mock system calls going down into
the kernel it will tell us how fast or
which ones we were calling off and how
much time we were spending what you can
see here is some information about how
often the app got preempted how often
time the CPU gave execution time to
somebody else we can find a number of
contexts which is below bump all that
interesting stuff
the second section shows us how much
time we spent idle and how much time we
spent busy and what we're seeing there
is that you know we're spending a lot of
our time in in user mode running the
application and a fair amount of time
waiting in the app probably because
we're doing a lot of disk accesses below
that we find the most popular system
calls being done and we find we're
actually spending a lot of time on
semaphore weight and mock message over I
trap okay you might say gee that's weird
maybe we're locked on a semaphore that's
a very good guess unfortunately it's not
completely true because on a lot of
applications on Mac OS 10 there will be
usually one or two threads that are
basically waiting for something really
bad to happen they sent a message off to
the system saying let me know when
something bad happens and they just sit
there on message overwrite trap which is
send off a message overwrite the buffer
when it comes back wait until we get a
message back
and nothing ever comes back and so
they're constantly waiting so
understanding that those are having huge
wait times doesn't necessarily buy us
anything however in some cases
understanding we're spending lots of
time doing semaphore signals may tell us
something about how our apps running
that we're spending too much time
actually waiting on critical sections or
something let's see what else do we want
to show I guess that's about yet okay so
those are the command-line tools
everyone who has had covered their heads
because they were afraid of them can now
come back up because we're actually
going to look at things that look nice
and that don't use any nasty
technologies so the next thing I'm going
to show you or some graphical tools that
tend to give you a little higher level
information they don't give you quite
the immediacy but hopefully will help
you understand what's going on the first
of these is called Melek debug and the
point of malok debug is to help you
understand how your application is using
heap memory so what it does is for every
allocation that your app is doing it
will keep track of how much memory was
created where that memory was created
and will give you a way of seeing what's
currently allocated in the system it's
really good for answering questions like
how much heat memory is my application
using am I using 500k am i using 10
megabytes are there any places where I'm
using large chunks of memory am i
allocating 3 megabyte chunks for some
array that I don't realize are there
places where I'm over running or under
running buffers preparing trying to
trash somebody else's memory which is a
great way to make subtle memory bugs are
there cases where I might be leaking
memory where I'm allocating things but
forgetting to free them in all cases
what malloc is going to try to do is
give you information about how you're
creating memory using malloc as the core
idea and unlike some other tools you
might be using what it does is it tries
to give you a snapshot of how much
memory you're using right now as opposed
to showing you memory that you'd
allocated before that's been freed for
example so it's only a snapshot the way
malloc debug does this is kind of cool
what it does is it has its own version
of malloc that's been instrumented
and it's lied to that version of Mallik
under your application when you launched
it and as a result it makes it very easy
to use you don't have to worry about
were you compiling your code to make
sure that this new library is used you
don't have to change any source you
don't have to do anything it just works
and that's one of the advantages of
these tools in addition because we have
our own version of malloc what we can do
is is when malloc is called we can
actually keep track of the call stack
and find out how you actually got there
and because every other allocator and
the system whether that's in core
foundation whether that's in carbon
whether that's an objective see all of
those eventually go through malloc so
this is a single point to actually find
out how you're allocating memory so
let's do a demo here so here's the
malloc to bug window so Scott can either
select the application by pressing the
Browse button and going through a
browser or can choose it off a drop-down
list he can then press launch to
actually start it up and why don't we
update it to see what the current status
is and we find that in this case we're
launching simple text we find that we're
actually allocating about 700 K to get
to the point where we've actually
started the application and what we see
in the window below is basically a call
tree it shows us all the ways that we
got down or that we ended up going to
malloc to bug or calling malloc so for
example from start we called underbar
start and eventually we got down to main
after going through some system stuff
main called malloc through about 4
functions either calling an it cursor or
do initialize or do event loop or some
strange hexadecimal value there actually
let's go through that hexadecimal thing
so for 1 1 0 is actually another little
secret as you might know for one one is
information and this actually is the
place where you go to do it to load
dynamic libraries that's where you get
information about how to call other
functions cute huh
so what happens is that when your
application launches it tries to load
all these other libraries
and as a result it has to call the
initialization routines for each of
these libraries and down inside that
call domain that that implicit call that
you didn't actually have to make in your
code the initialize high-level toolbox
initialized quick-draw initialized
carbon core all happened automatically
and we noticed that they actually
allocated about 400 K so a good deal of
the memory that was allocated during
launch was actually in these
initialization routines now going from
the top down is sometimes interesting
especially when you know your code it
but sometimes it's interesting to see
why or how you got down to malloc and
what was happening down at the other end
we can not only show the tree from this
side but we can also invert it and we
can change the style of the tree and so
now what we're doing is rather than
looking at how we got from from main and
called down through the program down to
malloc we're going to look at milk and
we're going to look at the ways that we
were called by that malloc was called so
for example if we select malloc these
are the ways that malloc was called
allocate memory called malloc add usage
called malloc global cache allocate
called malloc and for each of these we
can get some idea about how malloc was
called and what the reasons are let's go
through one little example here actually
let's do the VALIC one so and it's too
bad we didn't actually have the better
example but this will do
we can select VALIC and we see that we
actually have a 65,000 byte chunk that
was allocated through one of the calls
down that way and VALIC was called by
allocate memory which was held by
allocate zeroed memory which was called
by new handle I would prefer to have a
better example than this but this one
will do here's another little bit of
trivia what's happening here is that in
Mac OS 10 when you create a new handle
and create the memory attached to that
handles are actually sub allocated
there's actually a big block of space
that's been subdivided into handle sized
spaces and somebody's got to create that
memory so what happens is the first time
that you call new handle it actually
goes and creates the sub allocated field
so that 64k chunk is the place for
handles to live
may not make sense it's a system-level
idea but the idea is that we can
actually crack down from this collagraph
what the point of that memory was
especially if you're looking at your own
code not looking at the innards of the
memory manager what we can also do is we
can actually go to something a little
simpler like like Alec and won't you
select one of the buffers down below no
actually um yeah select a buffer so you
also get a list of all the allocations
so not only do you find how you got
there but you find a list of the buffers
that were allocated by calling down that
way it'll tell you the address that it
was allocated at the size and so on and
if we double-click on it we get a memory
dump so we can actually look at memory
this is really useful if you're if you
find you're allocating six thousand
bytes somewhere and you're curious why
now you can double click on it and take
a look and try to understand why that
memory was allocated now one other thing
that you can do actually press the back
button is as I said actually Queenie um
can we run the leak leak example so one
of the other things that you can do with
malloc debug as I mentioned is that you
can actually do a bit of analysis to
find cases where you over ran or under
ran buffers and these are really nasty
bugs because they tend to be
intermittent they tend to be really
subtle they tend to only occur after the
program's been running for a while and
then suddenly it crashes and so you'd
like to track these down
what malloc debug does you can do update
and then let's do an inverted or
actually go to trashed is you can change
the mode from showing all the currently
allocated Regents only showing what are
called the trashed ones and if Scott
actually selects start we see that
there's two buffers that are trashed
that is where the where we know we over
ran or under ran it and the way we know
that is we actually have some guard
words on either side and when those get
overwritten we know that we did
something bad
if Scott actually double clicks on one
of those you can see the ten bytes the
ten zeros malloc debug then what it does
is one it allocates space it puts two
special
strings at either side it puts the hex
value beef dead at the back end of the
buffer and then if Scot presses back
you'll see that it puts dead beef at the
beginning and the last word down at the
bottom and so when those words change
malloc debug knows you've done something
bad it actually also in an extremely
user friendly fashion ends up putting a
message out to the console yes we need
to fix this but it will actually give
you some indications of when it actually
notices that something gets trashed
so keep the console window open if you
can when you run malloc debug so you can
see this in addition let's try leak
analysis next there's also the idea of
leak analysis okay now for for those of
you who've used zone rancher zone Ranger
has an idea about leaks and what it
considers a leak to be is any memory
that you allocate but then don't be
allocate in some operation that should
have actually been cleaned itself up so
example opening a document and closing
the document and if you have more memory
than you start if you have more memory
allocated than you started out with
you've probably got a leak malloc debug
goes off the definition that's more like
purifies that any memory that's not--
that cannot be reached by a pointer
probably can't be referenced and
therefore it is leaked so we go with
that definition and the way you can do
leak detection is that you can start it
up and you can change the selection mode
to leaks and what it does is malloc
debug will now scan through memory
looking for anything that looks like a
pointer and if it's a pointer it goes
and it sees whether that's a pointer to
the beginning of a malloc region or to a
handle which points to a memory region
or a couple other options and if it does
find a pointer like that it marks the
block is reachable if it doesn't find it
then it says it's not reachable and it's
probably a leak and after a little while
it comes back and it shows us only the
allocations that would have been leaked
and what we can see here if Scott goes
down a little further or actually go
from the inverted side is we find about
182 thousand bytes this is not
completely true unfortunately because
there's a few cases of false positive
in system routines and let me just step
through a couple of them the calls to
Malik from the global cache allocate
which are from 80s Alec our cases in the
font code and this is a case where
they're doing some interesting things
with pointers that this doesn't detect
and we can basically ignore those out
yes this is ugly there's an internal
version I hope to roll out at Apple real
soon now like in the next week to get
around this problem it didn't make it on
the CD hopefully we can put it on the
website but for now hopefully these will
give you some hints on how you can
actually look at this stuff and then
laugh at me every time so we can select
global cash allocate and we can say okay
this is this is your material let's
select the prune menu the path item and
we can pull that out so we don't have to
look at it so we're only looking at the
things we think to be leeks
similarly the allocation with new block
turns out to be a case in icon services
prune that out the case in VALIC is the
handles we can prune that out it will be
better in the future yes and eventually
get down to the point where the only
things left are things that are probably
leaks there's some documentation on this
and the release note like I said we'll
have a better version that'll that will
do this a little better but this is a
way to start looking for memory that
might not be reachable and if you find
memory that you allocated in your
application that's leaked this is
probably a good indication that you
might have a problem ok let's move on
hmm well that's probably a good idea how
are we doing on time ok so one of the
things that that's really nice about
zone rangers idea is this idea of you
allocate memory or you create an object
you destroy it and hopefully the amount
of memory doesn't change that idea is
really nice for understanding the effect
of certain operations and we can do
something similar to that what you can
do is you can say well let's say we're
having a slowdown when we start typing
and malloc debuff or enum in simple text
and we're curious why that's happening
so we can try to see if we're doing a
lot of allocations or a lot of work what
we can do is we can select go back to
all or actually that's fine
what we can do is mark a point in time
by pressing mark we can then go and type
into the buffer to do the event that
we're trying to watch and then what we
can do is go over to Malek debug and we
can change to the show only the new
nodes show only the newly allocated
memory and we'll find after a moment did
we actually press mark oh there we go we
find that we allocated 400,000 bytes
oh my gosh what are we doing actually
there's a good reason for this but it's
a great example if Scott actually
actually we should go to standard for
this one because it's an easier way to
see this is why it's exploratory you
have to sort of dash around and explore
and it makes it an interesting thing to
demo what we end up finding is that most
of that memory if we descend down the
biggest path is inside called voices
thread and what's actually happening
here is that simple text has been voice
enabled so that it actually can do
text-to-speech and so what happens is
that to speed up the load time which is
something good that you should care
about for performance of course that was
one of those performance minutes in
order to speed that up what you want to
do is you want to make sure that you do
as little as possible when you're
launching the app and maybe do the rest
later and this is the case where they're
doing that but they don't bother to
actually load the voices until a few
seconds after the windows actually
appeared and one of the things that they
have to do is load in the voices and do
all the data structures to make sure
that text-to-speech actually works so
that's okay that it's delaying things
and that's a cute trick to actually
improve the performance of your apps
thank you very much okay let's go
through a few little moments actually
one thing I'll point out is is the idea
of the call trees may be a little weird
as I said the idea is that every time
you do a malloc allocation we get sort
of this call stack of all the ways you
got down to malloc that can be thought
of as those vertical lines on top or
horizontal lines on top when you look at
the normal tree what it does is it
collapses together all the things at the
the main end of the tree to overlap the
similarity so that you can see how it
starts to
urghhh and notice this is a Teresa we
don't pull it back together again at the
other end similarly when you do the
inverted tree we do the opposite we
start collapsing things together at the
Mallik and to find the ways that we
called malloc that were similar so we
can start seeing where things diverge
from that end also the cover of as
another one of the little issues we
should probably cover just to explain
malloc debug there's also the question
of leak detection as I said the hope the
way that the leak detection works is to
go scanning through memory looking for
pointers to member or to buffers that
are allocated by malloc there are cases
where the leaks won't be noticed this is
just part of the problem with with doing
leak detection in some cases there may
be a value in memory that looks like a
pointer you may have you know five F
zero zero zero zero zero zero because
you've got a null terminated string in
those cases the a random point or random
value and a pointer to something that's
actually a malloc buffer might not be
distinguishable you can't tell why that
plate that stuff was put into memory and
as a result you might get cases where
things that are actually leaks may not
be leaks similarly there's some cases
where there may be leaks that don't get
detected this garbage detection
algorithm this garbage detection
algorithm is relatively simple anyone
who's played with them should
immediately see some holes one of those
is that if you have a list of circularly
linked structures so you've got a big
loop of things every object points to
something else and therefore all of them
are referenced and so they'll never be
detected as a leak similarly a tree of
data structures will always appear will
only have the root of it unreferenced
and therefore you may only see let's say
a 20-byte leak when you're actually
leaking a huge data structure so always
pay attention to even small leaks just
in case now I mentioned that there were
a number of problems with various system
routines that we're doing clever things
with pointers in general what was
happening is that our definition of leak
is that there's a pointer to the
beginning of a buffer and if there's a
pointer to the beginning of the buffer
it's reachable however in some cases in
your own code in others people will have
pointers into the middle of a buffer for
various reasons and no pointers to the
beginning usually because they're trying
to hide secret information at the
beginning in those cases malloc
bug is not going to be able to do leak
detection correctly the next version may
help another issue to keep in mind is is
my favorite question or comment people
constantly come to me and say this tool
is horrible you know I use it and all my
application ever does is crash this is
actually the same problem that people
had with even better bus error you know
gee every time I use this my machine
crashes why don't you write better
software I love hearing that story from
the guy who wrote that but what's
happening is that now like the bug is
trying to tell you something it's trying
to tell you something extremely loudly
you're doing bad things with pointers
okay there are a number of cases of
operations that can cause subtle and
intermittent memory bugs examples of
those include over running or under
running buffer so you trash somebody
else's buffer or freeing memory and then
continuing to use it and modify the
values even though somebody else has now
got that memory in you're trashing their
values malloc debug tries to solve both
those problems the first thing it does
is that every time you free memory it
overwrites that memory with 7f to make
sure that there's absolute garbage in
there and that hopefully if your app
tries to read that you'll notice the
second thing is that you saw that
overruns regarded with with dead beef
and under runs with beef dead and so if
you end up trying to access beyond
you're going to get a bogus value also
as a result you may see your program
behaving strangely you may see odd
values in variables that shouldn't be
there or you may find your application
crashing when trying to access address 7
F 7 f 7 F 7 app when you get crashes on
your app in malloc debug that don't
happen normally the first thing to do is
that there is a preferences panel that
has the clear freed memory option turn
that off and try it again if your app
runs then you're doing bad things with
freed memory what you can then do is run
the program inside gdb using malloc
debug special library there's
documentation on this in the release
notes and the debugger will will drop
you off exactly where you should pay
attention and the final bit of
information about malloc debug is
questions about taking the taking its
advice
once again mount debug is primarily a
tool for exploring your data it's a
really good tool for the writer to
actually look because the writer
understands their own code and and may
be able to say gee that's odd
they're still uses for this in testing
if you have cases where you're leaking
memory if you've got block under runs or
overruns or you're referencing freed
memory that's a red flag
there's something to be fixed in terms
of exploring I can't give you very good
details about how to explore basically
go off and see what's out there see if
you've got any really big allocations
see if you're allocating a lot of really
small things that you didn't expect look
for odd cases look for patterns the best
advice I can give you that's really
concrete is I tend to find it much more
useful to use the inverted graph rather
than the standard one but that may be
because I tend to look at the system
libraries a lot more so hopefully you
find this useful the second tool that
I'd like to show is a tool called
sampler and you can think of this as a
really cheap profiler what sampler does
is every 20 milliseconds or every 50
milliseconds it stops the program and it
says hey where are you running and it
actually gets the call stack for all of
the threads that are currently running
so it knows the current point that's
executing like malloc to debug it
provides basically the call stack so
that you can browse through those and
try to find out exactly how things are
running now the reason why Mel or why
sampler is good is that it's extremely
easy to perform you use it it works you
don't need to recompile your libraries
or recompile your application like you
would with profiling you don't need to
have special profiled versions of
libraries you don't need to make any
changes of the code it just works
you can run this on any of the
applications on the system and in fact
all these tools are on the CD so please
go out and play with them and in
addition because it's only stopping the
program every 26 milliseconds of 15
milliseconds hopefully it will be doing
very little to the applications running
behavior as opposed to let's say doing
full profiling and so this may be a way
to get really cheap data to find
performance problems that should be
explored in more depth
I'll also point out just in passing
there's also come
in line tool called sample where you
this type sample and the process ID or
the application name and how many
seconds of sample for and the interval
and it will put out a text-based report
saying where it found the program's
execution this is really good if your
application hangs or if it seems slow so
that you can actually track down what
the performance problem is and its
really good for basically cutting and
pasting and putting into a bug report so
let's do a demo okay so here's the
sampler UI once again we can select an
application we can launch it and let's
actually change the sampling rate to 20
milliseconds and then we can launch and
sample it and we can see how simple text
launches and what's going on during that
and so eventually the window will come
up there we go and we can stop sampling
and now we get a set of call stacks
showing what's going on we'll start off
with the extra threads so thread to so
they're 155 samples 155 times words stop
the program that it found execution in
thread 2 and all those were basically in
mock message overwrite wrap ok so it's
basically sitting there waiting for a
message we can ignore that so we can
actually add that to excluded stacks
down at the bottom to get it out of our
view thread ones pretty much the same
way except for about 8 samples it's
basically sitting there doing nothing so
we can ignore that one also and then in
thread 0 if we click on the 1000 block
and start and start and main now we can
start finding out what was going on so
we had 158 samples at 20 milliseconds
that's what 10 10 3 seconds most of the
time was being spent in do event loop
the wait next event is pretty trivial
that's just when it's spinning so we can
ignore that and the last 6 samples were
actually you want to go down to that
actually in you can see the call stack
on the far side showing the entire tree
so you can see that we were in do event
loop in hand which called handle event
which called eventually resume the
current event so that was doing the
setup for the app this is a relatively
uninteresting
example feel free to go off and try your
own code and hopefully you'll find some
some very interesting things about how
your app is running and where it's
finding it there's also a way that you
can invert the call graph so you can
look from the bottom up and you can find
the common functions that work that it
found it running in if you find that
your functions are listed down here that
probably means you have a tight loop and
you're spending all your time there
often you'll find that the application
is stopped in system calls when it was
sampled and that's why you're seeing
calls a string compare or to mock
message over I trap and the like okay
there's one big caveat I should mention
although I've said that this is a cheap
method of profiling remember sampler is
not providing comprehensive accurate
data its sampling it's a statistical
approach that means that it's not going
to show you all the calls that are
actually happening just the ones that
were happening when it decided to stop
the app second the numbers refer to how
many times it found it in that function
not how many times that function was
called if we found 150 samples in main
or in some arbitrary function that could
mean it was called 150 times it could
mean it was called once but every time
it looked it was it was in that or that
function could have been called 150
thousand times and we just happened to
see it when it was in that if you're
trying to get if you have small quick
executing functions those are going to
appear statistically based on how what
percentage of the time they actually
take to execute so with longer sample
runs and smaller sample times you'll
start getting better data and you'll
start seeing the smaller functions
appear in addition because this is
sampling there's the question of
sampling error when are we going to see
the pro or what are we going to see when
we stop the program well because the way
sampler works is it takes control of the
CPU and the other process stops that
means that the other application is
going to be a preemption point and so
wherever the operating system decides is
good time to stop the thing is going to
be where you're going to see it in
sampler that could either be because it
ran out of time and the operating system
took control away or it could be because
the application made a system call and
the operating system said you're never
going to finish this in time I'm just
going to give control to someone else
while you
waiting for this disk access and so you
may see disk accesses you may see some
of the system calls much more frequently
than they really appear okay let's not
worry about object Alec actually let's
do it let's just demo it quickly so
another tool that's available is object
Alec this is a tool that was originally
intended for objective-c
but still can be useful for programming
and carbon for programming in Basics in
just C the idea is that this is trying
to be a lot more like zone ranger but
it's trying to give you ideas about how
fast you're adding data how much data
you're using how quickly it's increasing
and what it does is it shows you a
histogram and what it does is it divides
up all the allocations based on the
class of the object so you can see
allocations or how many CF dictionaries
you had how many NS strings and in the
case of just plain malloc allocations it
just says malloc - 46 446 byte malloc
allocations what the histograms show you
is first for the darkest bar it shows
you the current number of objects of
that type existing in the system
the next darker or the next lighter
represents the maximum number of objects
of this type that ever existed at once
and the final bar shows you how many
objects of that type have been allocated
so watching this run can give you an
idea about in general how your app might
be behaving and might give you some
hints about objects that you're creating
a huge number of that might be
performance problems there's some other
features in this and other features and
the other tools please go play with them
okay okay so let's do one example here
let's talk about how we'd actually debug
something for real and the example I'm
going to use is one of my own things so
that I can be very embarrassed
specifically it's the mount debug leak
detection what happened was that when I
actually implemented or added support
for carbon memory
I found that leak detection got much
much much much much slower about a 10
times slowdown this was very bad
however I'd only change the algorithm in
small ways so I was extremely confused
about what was going on what I needed to
do was I needed to use multiple tools to
understand exactly what was going on and
this is something you're probably going
to find you really need to play around
and look from different angles to find
out why your something's behaving less
than optimally the first thing I did was
I ran sampler I had sampler look at my
process when it was doing or went at the
application when it was doing at the
leak detection and what I found was it
most of the time it was actually
spending in a cult known as vm region
which is a system level call that will
tell you about what parts of virtual
memory for a specific process are we are
actually mapped in which ones don't
exist whether they're readable writable
this was important for being able to
identify when I was checking a pointer
figuring out whether there was anything
at the other end so that I could read
that data without knowing that the
system was going to crash or actually
the application was going to crash
because it won't the system won't crash
in OS 10 thank God
the solution was that this data didn't
change during the time I was doing
analysis and so I could actually I found
I could actually cache that and I
increase the speed by about a third
better than it was the second thing was
I started listening to my machine I used
my ears another tool and I found that
the disk was chattering away using top I
looked and I found that I was swapping
about two thousand pages a minute okay
so my machine was basically spending all
its time throwing pages out to disk and
bringing them back in this is not very
efficient unless you happen to be a disk
drive and what I found was that although
it was spending all that time
swapping around all the execution was
being spent in my code it wasn't doing
other IO you know it was just trying to
swap and what it turned out to be after
commenting certain parts of the code out
was that I was checking for pointers in
places I shouldn't have been in places
that were only readable memory that you
couldn't had reasonable pointers in and
as a result I was searching around in a
lot of places and because of the change
in algorithm suddenly I was looking at a
lot more pages in random places and
instead of sort of linearly passing
through memory and looking at only a few
places I was looking everywhere randomly
and causing huge performance problems as
a result what I was able to do was
minimize the number of out of order
checks and tighten up the checks on what
I was going to look at other pages for
and as a result got the speed-up down to
about a factor of two and since then
I've looked at my algorithm and gotten
it down to like only ten seconds from
thirty which was pretty cool so the
take-home lesson here is plan to use
lots of tools plan to explore lots of
parts of the system and plan to learn a
lot of trivia welcome to the world of
performance ok for those of you that are
planning on porting from Carbon yes
we've heard great testimonials about
people who went off for lunch and
converted their app over to carbon as I
and that's really good and in fact in a
lot of cases that's probably good enough
however there as I gave you examples
there may be places in your code where
there actually are mismatches and the
algorithms that don't quite match the
new world and so porting isn't porting
is only going to be half the work you're
going to have to look at the app you're
going to have to understand how it works
and see if you can find any performance
problems plan on using these performance
tools plan on using multiple tools and
exploring as I mentioned before and in
addition remember that one of the things
that we're getting with Mac OS 10 is a
huge number of pieces of infrastructure
that are really going to help us out and
so plan on looking at them and deciding
what you can actually use examples
include memory mapped files you don't
actually have to read memory or read
stuff from disk into a buffer the
operating system will kindly just map
that file into virtual
and when you try to touch that page you
will actually map it into memory for you
so you don't actually need to keep
multiple buffers around and in fact if
you try to keep those buffers around you
maybe being too clever because the
operating system may be keeping a copy
of the memory map file in your address
space and so suddenly you've got two
copies
similarly we now have pthreads a really
nice thread implementation these are
threads at the level of the operating
system they don't have a lot of overhead
because they're part of the OS so plan
on looking at P threads and seeing if
you can exploit those and finally we
also now have the POSIX file i/o and
there may be cases where that's much
more useful to you than the standard Mac
OS toolkit so take a look at that and
see if that'll actually help you in some
cases in addition a certain
vice-president who shall remain nameless
hacked on a few weekends and excuse me
let me rephrase this certain people high
in the company happened to be very
interested in algorithms and happen to
be very interested in malloc one of the
problems on many Mac OS compiler
implementations was that the malloc
implementations used to be really bad
and a lot of people have used sub
allocators instead of going through
native memory management because they
want extra efficiency or they don't
think the performance is going to be
that hot we have a really nice
implementation of malloc thanks to
someone's nights and weekends so think
twice about using sub allocators try the
new malloc it's really efficient there's
some really cool new little features in
it go play and finally I will repeat
again and again again pulling bad
blocking good don't sit and wait for
something to happen have the OS go off
and tell you when it's done
how the OS take control away from you
and give it to someone else so that
other processes can actually run and
you'll have a nice feeling of a
smoothness all through the app instead
of having yours take up CPU I and as a
final warning in a horrible place some
of these tools do work with tough
binaries some of them don't necessarily
work so well and we want to improve that
with malloc debug and sampler they
currently do not identify the peph
symbols and so you're not going to see
the symbols in your own application if
you're running Mach o
native binaries
isn't a problem this is something that
didn't get on to the CD hopefully we can
actually put it out on on the developer
website so that everybody can use this
but plan that the version on the CD may
not do a good job with peph binaries
that you may not be able to see much
about your program so to conclude Mac OS
10 is really cool but the differences
between it and how you use to work may
has a lot of differences the algorithms
you use are probably going to need to
change so take a look at them use the
performance tools to analyze them and
have a great time with native Mac OS 10
applications so if you've got thank you
if you have questions or actually if you
want to use the tools they're in flatten
slash user slash bin for the
command-line tools the graphical tools
are in system developer applications
documentation is available as man pages
for the command-line tools now it's a
bug and sampler have documentation in
them and there's also a nice release
note on mouth debug explaining some of
its idiosyncrasies for this particular
release if you've got questions or
feedback if you send mail to Mac os10
- tools - feedback it goes - I believe
the entire group we'd love to hear your
comments suggestions about other tools
that are really necessary because we're
all going to learn what's really needed
when moving over to Mac OS 10 and if you
have any other issues Godfrey de Georgie
is our technology manager for the
development tools group and I will bring
him up so that he can tell you about the
other forums oops
group Apple comm thank you very much
okay we we have about 12 minutes for Q&A
so it's
[Laughter]
whatever roadmap for the next two
sessions in the in the tools tools group
debugging applications Emeco s 10:00
tomorrow morning at nine o'clock and
carbon low level would be another
another good session people interested
in performance and why don't we just get
our whole
you