---
title: WWDC2001 Session 705
framework: wwdc
role: article
path: wwdc/wwdc2001-705
---

# WWDC2001 Session 705

## Transcript

Kind: captions Language: en good afternoon it's last session of the day I hope we all still have a lot of energy because we have tremendous amount of information to show you a great presentation and without further ado I think we'll just roll right in robber bondage our performance tools guru hi my name is Robert Bowditch and I'm a member of Apple's developer tools group where I'm responsible for the performance tools what I'd like to do today is tell you something about the tools and give you a quick introduction to them now why are we giving this talk now I'm sure all of you have great apps that you're ready to ship at Macworld New York in just about two months and hopefully they're all pretty much feature complete right thank you so that means that you've got about two months to go and you should probably be concerned a little about now about whether your applications actually perform as well as you want them to be performing and more importantly that they're performing well enough so that your customers also think they're good and therefore hopefully you're going to start using the performance tools to actually track down performance problems and make your apps as good as they possibly can be what I'll do is I'll show you those tools so what you should hopefully learn today is first of all you should find out that we actually do have some cool tools available secondly you should get some you are going to see me go through a sample program and actually find some performance bugs in it so we're going to do some real-world examples and hopefully you can take that home and get so excited about these tools that you're actually going to go in and dive through them I'm not going to do a quick - I'm not going to do a tutorial I'm not going to go through in detail of well you pull down this menu and you press that button and and this is what it shows I'm really going to show you what you can gain out of these in hopes of exciting you enough that you're actually going to go off and play with these yourself hopefully some of you will be playing with them as I speak if you've got questions about the frameworks about what you should be doing in your carbon programs to make them more efficient I am NOT going to answer those sorts of I am not the right person to answer those and I would probably be telling you the wrong stuff in those cases you want to go to the framework talks or look at them on DVD so for example the carbon talk that was at one o'clock was absolutely or two o'clock was absolutely wonderful go take a look at that if you write carbon apps the java talk was really good about java performance as well so just a quick summary so what are the possible causes of poor performance on a Mac OS 10 system one of the most obvious is excessive use of memory that is that your application requires a working set of memory that that is more than it really needs you know this can be because you you have a lot of code because of dead code and/or stuff that isn't really necessary it could be because you're allocating too much memory via malloc it could be because you're mapping in shared files that sort of thing there's many ways that you can increase your memory footprint a second way that you could be affecting performance is if your application is causing too much sleep is basically using the CPU too much executing too much code a third way is that you might not not actually be doing anything you may be waiting for something to happen such as going off on the network or going out to disk so we have both the case of doing too much and doing too little and finally there may be cases where in graphical application you may be doing too much drawing and therefore you're doing too much computation you're using memory you're talking with the window manager and everything's closed down almost all of these problems eventually turn into memory problems on Mac OS 10 because all of them have some connection with how much memory is in the system and what memory is being touched and the problem is that as soon as the memory footprint of your application and of the rest of the system becomes larger than you actually have physical memory we have this wonderful thing called virtual memory and it sort of gives you extra memory and to get around that it basically takes some of the memory out of the out of memory and it writes it out to disk and then it takes a mother and takes it off of disk and puts it into memory and the problem is as soon as this happens suddenly your application is going to be judged by the speed of the disk and not by the speed of the processor so anything you can do to cut memory use to keep your working set as small as possible is a great thing okay so here's a quick summary of the tools that are available with three categories that may or may not be the most meaningful but they seem to work the first one is execution behavior we have tools that help you understand what code is executing this includes tools such as sampler and sample they do CPU sampling to find out what code is currently running by checking occasionally the second set of tools we have are for understanding heap use for understanding how much memory you explicitly ask for during the running of your program these include tools such as malloc debug as object Alec which is a tool for understanding objective-c in core foundation use and the command-line tools heap leaks and malloc history malloc history is particularly cool because you can actually tell it hey in this process I've got a buffer that starts at this address where did I allocate that and it'll actually tell you it'll give you a call stack saying where that came from finally there's a set that I call system state because I don't have a better way to describe them these include tools for helping you do drawing such as ports debug it includes tools for understanding how you use system calls such as SC usage or file system calls which is FS usage as well as taught as programs like top which gives you sort of an overall state of the program of the machine or VM map which will tell you how virtual memory is laid out for a specific process ok so how are we actually going to look at the performance tools what we're going to do is well we're going to do public embarrassment we are going to we are going to rip apart one of my own programs a little test app that I've been working on called thread viewer and we are going to see if we can find any performance problems that either I put in or the linker from code that I borrowed ok Scott you want to come up here and show thread viewer now the important things to know about thread viewer is that first of all it's a small carbon app or me excuse me it's a small cocoa application intended for helping us do performance analysis actually we need to go to demo - oh we are on demo - and what we can do a thread viewer is as Scott just did we can say hey I want to look at a specific process in this case the dock and what it does is it shows us a running timeline showing us what's happening with specific threads in that task so for example we can see here that dock uses two threads the one on the bottom is the main thread the one above is a secondary thread that's helping do computations now the little blocks there represent 50 millisecond intervals in the life of that process the green represents that that dock was running during that when it was examined the yellow indicates that it had been running in the last 50 milliseconds but it wasn't currently running the gray indicates that it's blocking it's waiting for something to happen the light green and light red are our weights also the green means that it's waiting in the run loop waiting for something interesting to happen the red indicates that the program was waiting on a lock now exactly what thread view does doesn't really matter except to understand the pretty colors the important thing to see is that this is a somewhat realistic program and that you can use it to examine other programs so obviously was there anything else I wanted to say on that yes so what's wrong with this any comments does anyone see anything that looks obviously wrong with this program hmm there's a lot of green Oh non green actually I was hoping on thread viewer but that's an interesting thing that that the doc is actually doing a fair amount but that's mostly because Scott's playing with it however with thread viewer hopefully there's nothing that you see here that makes thread viewer look bad and that's the first lesson I want you to take home tonight which is that usually if you just go and look at a program you're not going to be able to tell that there's necessarily any performance problems in order to understand performance you need to do measurement you need to do software metrics you actually need to look at the program and measure things like how much memory it's using and measuring how much time it takes to do things you need this for two reasons the first one is so that you can act look at it and say whether those numbers actually match your expectations did I really expect it to take two seconds to load that web page the second thing is that when that system or as the system changes you want to be able to note regressions and unless you wrote down how it behaved two weeks ago you won't actually notice that you've been losing 5% performance every single time thus you need to write stuff down and in fact for the projects that I work on the ones that are relatively time critical I actually have a checklist of things that I measure and I write them down and just stick them in a folder it's it's low-tech but it works so we don't know how the system performs and we'd like to find that out what we can do is we can start by looking at overall system behavior and we'll start by running a text utility called top now what you see on top here is at the very top of the screen you see information about the system the information below represents individual lines referring to a single process so some of the things that we want to look at is or some of the things that I find very important or interesting that you should probably look at is first of all the page in and page out rates at the very top so what that is what the line that says page ends page outs is showing us is it's telling us something about the virtual memory system specifically that line alone is telling us how many pages the virtual memory system is writing out to disk and how many it's bringing in the first number represents the number of pages that have been moved since the system was rebooted the number in parentheses represents the number of pages that have been paged in or paged out in the last second the important number is the one in parentheses for me what I tend to say is that you should look at that you should look at talk every now and then when your system is running if you ever see those numbers go above zero for any length of time you know if they stay at 20 or at 50 that usually means that your paging a lot it usually means that your the amount of physical memory you have is not enough for all the things you're running which may indicate that your app is taking up too much memory or that something else is now if you're hitting 50 or a hundred then basically your systems thrashing it's spending all of its time through pages out to disk and bringing them back in and that and at some point around 50 or 100 you're going to be hitting the limits of the disk that it can't page any faster and so if you ever see anything at that level that means that your systems really hosed oh that's a technical term the other thing that's interesting is on the bottom part you can see that in the list of processes there's a column that says percent CPU which says which tasks are actually spending the time running and we can see here that thread viewers responsible for three or four percent of the CPU okay so it's it's not taking up a lot of CPU time we can also see a column far to the right that says RP RVT that's the other one that I think is interesting that you should look at our PRV T stands for resident private memory it's telling how much memory is physically in physical or is in physical memory on the computer and it's telling you how much is actually private to that application alone and this number tends to be a good measure of how much memory your application needs in order to run so go look at that number when you want to get a rough footprint now notice if you have a lot of things in memory there could be a lot of memory paged out sitting out on disk and in that case it would not be resident so on a heavily loaded system that number may not make as much sense now one thing we should do is let's show another mode of top-top has a huge number of modes and in fact this is probably as good a time as any to try to sell books there is a book called inside Mac OS 10 performance it's available I believe on the CD that you got in your bags it's available on the web and it's available from fat brain nicely bound like this and it describes a lot of the performance tools and it along with the various manuals for top will tell you what some of the flags are what I'm going to do is I'm going to show you or Scott's already going to show you a option to top the top d option extremely clear and extremely user friendly which gives us a few other pieces of information we can see the number of page faults the two I'm going to point at is CS W which is the number of contact switches way to the right and messages sent messages received that's how many mock messages are getting sent around in the system how many have to be received now there's a reason why I'm telling you this if Scott actually plays around with thread viewer well I guess it is kind of what we see is that the number of context switches that is the number of times that something got paged out so that something else could run is pretty high for both window manager and for thread viewer and the number of messages sent and received is relatively high does anyone know why that's happening yes I heard it what's happening is that thread viewer is drawing and the drawing is not just done by the thread viewer it's actually done by thread viewer and window manager together and so at times when the when the thread viewer is doing drawing it actually has to communicate with the window manager so this is the second home take home lesson today which is that sometimes the program that you're running can't be measured just by looking at that program in isolation you actually have to look at demons and servers that might also be doing computation in the case of drawing you want to take a look at the window manager and at your application there have been times where I've seen an application that looked like it was performing great it's only taking 30% of the CPU doing a lot of drawing but the window manager was responsible for 60% of the dry drawing and so to understand overall problems you want to look at both so Scott I actually saw something kind of interesting in that last top display can you switch back to it okay so does anyone see anything interesting in this display for thread viewer well we we don't know that it's leaking what we see is it looks like maybe there's a reason why we're doing that maybe it's intentional but we're seeing the amount of resident private memory growing at least I think it's growing but luckily top has this little plus there and it actually says how much memory is going so it seems to be going up and it's using 37 megabytes of memory hmm this looks suspicious let's see if we can track down that problem so one thing I can try to do is I can try to prove that it's actually growing yes I know that seems kind of silly but let's try it anyway so what I want to do is I want to see if I can actually prove that it's increasing I want to identify a friend over time now one of the problems is I don't really have a tool for this okay I'm not doing my job however I can show you how you can make a tool like this what Scott can do is because top is a command-line tool because it's a UNIX non Mac like tool that means that you can actually use the various features of Unix to basically roll your own so what Scott has done here as is he has typed in a little shell script while one-call top with the - shell option which is basically a one-shot give me the data and grep which is a search for the line that starts with thread and print that out then sleep for a second and keep looping and if Scott runs that what we start getting as we get the top data only grabbing the line for thread viewer okay so and we can see here that the amount of resident private memory which is the last column visible on where thread viewers obscuring things is actually growing so geez I've got a memory leak so what you should take out of this is that command-line tools regardless of how Mac like Mac unlike they maybe still have uses you can use them to roll your own tools you can run them via telnet from another machine if you want to inspect somebody else's machine and you can run them even when the window manager is not responding or when you've got full screen because you're you're debugging a game so don't ignore the command-line tools they do have a place okay so what are the possible causes for the memory leak okay one possibilities we're not throwing away old data any other suggestions history actually that's a good one well I'm not quite sure actually so let's see if we can track it down one thing I'm going to do is I'm going to try another command-line tool it's called heap actually Scott can try running it but the slide actually has the output heap is a tool that's intended to give you a dump of how much how much memory you're using via malloc and the important most of it actually is just saying oh here all the buffers you're allocating the important part is the very top though because one possibility although a relatively unlikely one is that we could have fragmentation of our heap you know we could allocate a million bytes and then free it and then allocate ten bytes that's the whole subspace and keep going so that the amount of heap space keeps growing even though we're not actually using a lot of memory what we can do with heap is we can look and we can see that there's actually two heaps what we call zones one for core graphics and one for other default malloc zone that represent about say 2.3 megabytes of memory we can see that from the overall allocation part we also see that the total number of nodes malloc the amount of space were actually using is about two megabytes okay so two point three - 2 megabytes is about 300,000 bytes of unallocated memory this indicates that heap fragmentation is not our problem okay so this means it's memory we're actually allocating so another thing that we could do is we have two megabytes of memory that's allocated our virtual memory size or our resident private size was 38 megabytes okay that leaves thirty six megabytes we don't know where it's coming from so what we can do is we can run another tool this one's called VM map and what it does is it gives us information on how our virtual memory system is laid out most of the time this isn't something you really care about but every now and then it might actually be interesting and if Scott runs it oh yes you've got to run that basically what I'll do is we'll say okay there's a page at address 0 it's 4 veidt's and it's basically unreadable there's a page at 4096 and it has the executable and it runs for 14k or something and there is the dump very friendly however it's got some interesting information and if you ever wondered about how the application was laid out in memory this is how you'd find it out so that when you're looking in the debugger and you find yourself at address 101 six you can find out why you might be there however the slide actually shows you a little more detail about what's actually there in this case we see that there's a malloc buffer at address 139 a zero zero zero and there's about sixteen thousand bytes so that's space for the heap a couple others and then below this the 512 thousand bytes allocated for one of the stacks for one of the threads we see address 14 B six zero zero zero four thousand bytes fourteen six D four thousand bytes one four six F zero zero zero four thousand bytes this may be the source of our leak see where it says four K on the slide here sorry I actually this is a good reason why I shouldn't have done this so over on the slide you can see the port one for sixty-four thousand bytes readwrite so the interesting thing is we keep allocating these four thousand byte buffers this is kind of odd and that would have been allocated via VM allocate that was how you'd actually create a virtual memory page and I went in when I found this problem and went into the debugger and said what do I see there and what I did if you can you bring up thread viewer is I found that basically that page was zero that virtual memory space was zero except that there were a couple numbers that looked oddly like those thread IDs to the left hand side the five a zero three five nine zero three which made me think oh this may have something to do with like you know a system buffer or something and it turns out that what's actually happening here is that the mock API is sometimes will return you basically a buffer it'll return you a VM allocated page that you actually have to free and the code which is actually in one of the libraries that that thread your depends on does something like hey tell me all the threads that are associated with this task and you get a buffer back that contains a and you need to make sure to create but that yellow line wasn't there and by adding that line we got rid of the memory leak now this is not a case you will ever run into none of you will ever ever ever hopefully use the mock calls to find out what threads exist and this is something I'm doing as a performance tool however what you should take out of this is first of all I had to use multiple tools to track this down and I had to compare their output to actually figure out what the problem is this is one of the problems with performance that usually the tools aren't just going to drop an answer out in your hand plan on inspecting things and thinking about them to actually find find out what the causes are okay okay let's do another example now we saw that we were using about 2.3 megabytes of memory in the heap that is in memory we allocated VM malloc and that seems a little suspicious too so what are some of the reasons why we might actually allocate that memory well one thing is that we could just have really big data structures we can be allocating two megabytes of data structures because we want them a second thing is that we could be caching things even though we don't need them we're actually saving them just in case we need them again and a third case is that we could actually be creating things and just forgetting to destroy them and there's a huge amount of other things but the problem is that with any of these the more memory you allocate via malloc you know mount basically the system will just keep giving you the memory and eventually what happens is the system grows to the point where it won't fit in physical memory anymore you start thrashing or you start swapping stuff out to disk and as a result your system slows down so there's two tools I'm going to show you that will help you track down why memory is being allocated the first one is called malloc debug and it tries to show you via a call graph where memory is allocated so that you can track down the big allocations the second one called object Alec tries to refer to things according to how many objects exist so we'll start off with malloc debug and Scott can bring that up what we do with malloc debug is we select an application and we can launch it and Scott can connect and say okay I want to look at the dock and we got our little thread viewer and now Scott can go back to malloc debug and hit the Update button and we find out that to get to that point in the code we needed 2.6 megabytes of memory now thread Viewer isn't that complex so this seems a little suspicious to me now like I said malloc debug works off a call graph so what it's doing is it's showing the the various function that were called so for example Scott can start clicking on start which is a secret function in the run time which you don't need to know about but start at main okay so main is it main is the top of our program and the number to the side of that the one point three megabytes represents the amount of memory allocated by main and everything below it by all the functions it calls the rest of the memory is allocated on different thread that isn't started at main that's why you don't see it and what we can see is to the right of main one point three megabytes of memory was allocated in NS application main and ten thousand bytes was allocated in calls to allocate unneeded buffer both of these were called directly by main and if you went to the source code you see that okay and its application main you don't need to worry about that because that is the root of the carbon or of the cocoa framework and that's how you get it started so of course there's gonna be a lot of memory allocated the call to allocate unneeded buffer which calls malloc allocates a single 10,000 byte buffer Scott can look at the listing below where it shows the buffers double click on it and get a hex dump and we can find out that that buffer continued we can look at the buffer and maybe we can detect what the cause is and it has a string that says this buffer isn't really needed okay so that's obviously a buffer we can get rid of I think but this shows you how you can walk through the call graph and you can actually find where there are allocations and and you can say hey this looks suspicious and try to fix it and if you understand how your program is structured top-down this makes a lot of sense and you can walk through the tree looking for things that are interesting because this is basically sort of a scrub your nose on the screen until you see something interesting kind of problem now the other thing that we can do is if is there's times where we don't really care how we got there from Maine but we're sort of curious about who was calling Malik directly to try to assign blame so Scott can switch the mode of display from standard to inverted and this means we're looking at the call tree from the bottom up so for example we can see here that there's seven hundred and forty thousand bytes allocated in calls to Malik and four hundred and eighty nine thousand bytes of that was in was when NS is owned Malik called Malik that's the Objective C version of Malik and two hundred ninety thousand of that wasn't when we called NS read image which called in us Zone Malik which called Malik now thread viewers a pretty simple program it doesn't have any icons it doesn't have any pictures and so the idea that NS read image is responsible for two hundred and ninety thousand bytes seems a little suspicious and so we can start walking up the call graph to see who called what to get two NS read image now if we go up past all the app kit stuff and we can see actually Scott can you point it where the library in the application name are so we can see that we end up in a function called image for process which is in the thread viewer library that's what the thing in parentheses means so this is my code and this is what's being called to actually cause those NS read images to be called now unfortunately I know this code and I know what that function is that's the function that handles the attached dialog box for thread viewer which Scott can bring up and so even though that dialog box had been closed those icons were still around and the reason why was that basically thread viewer was trying to be clever it decided it would be really smart if it could cache those icons it could keep them in memory after the dialog box closed just on the off chance it was going to need them again now the and that's nice I don't have to read them in off a disk they're sitting in memory the problem is because of virtual memory as soon as the application got a little too large what would happen is we take those pages that hadn't really been touched in a while and we'd write them out to disk and so when we actually did bring up that attached dialog box again well to avoid reading the disk we'd go to memory which would require reading the disk once again you know the idea of saving memory is pretty important because it's probably gonna be cheaper just to get the stupid thing off disk and so the idea of caching those icons was really stupid what would have been much better was when the dialog box went away if we just got rid of those icons and then we recreated them on the fly the next time we open that dialog box because it was a relatively infrequent occurrence oh one thing I haven't mentioned here that I should is that everything that you saw in malloc debug refers to currently allocated blocks so if something was allocated and then freed you won't see it in malloc debug so that's something to keep in mind as you use it now malloc debug has a number of other features that are useful for example one of the things that can do is it can help you identify leaks in your program places where you're allocating memory and then you forgot to forget to deallocate it now these are are really important to track down because leaked memory stuff that you aren't you're forgetting to free and never free basically sit in memory they occupy space that keep you from having the stuff you're using close together and if you have a long-running app that goes for days like a server app your memory just keeps growing and growing and growing until 30 days into the thing suddenly the app crashes and you have absolutely no idea why so and even if you have a small short app this is still going to affect performance you know this this is a particularly strong point for me because there's at least one game that I've been trying to play lately that after about four hours of playing it has a memory leak and eventually sort of runs out of memory and so please if you write games don't do this so malloc debug can help us detect this what we can do is we can switch from show me all things or show me the things that have recently changed which is new and we can switch to the definitely commode show me only the buffers that aren't referenced anywhere in memory the way it does that is basically garbage collection it goes scanning in memory looking for pointers to the buffer and any buffer in that's been malloc that it can't find a pointer to it assumes on reference because if we don't have a pointer to it there's absolutely no way we can free it and we see here we have about ten thousand bytes leaked and Scott can click on the Malek and see that there's a case an image for process interesting where we're allocating a couple buffers and we've lost the pointers to them so we can't free them I actually looked at this code what I'm doing here is when I go and read the icons basically I have to make a list of or I have to grab the command line to find the icon so you don't want to know and I keep a buffer around for that list of arguments and in most cases at the very end of the routine I actually free it however in a few cases I would find that there was an extraordinary situation that I knew I wasn't going to find an icon and so I would basically just exit the pro-x at that routine but I forgot to release the buffer no one else does this right so so yeah this really is just one of those things that I wanted cuz it helps me so so this is another case where malloc debug helped us and in fact it's showing us another lesson of software engineering which is if you keep finding bugs in the same code maybe this means you need to rewrite this routine now there's a couple little things to remember one is that you may say oh you're only leaking 10,000 bytes who really cares what's 10,000 bytes however the way that malloc debugs leak detection algorithm works as you remember was if you can't find a pointer to this buffer anywhere then it must be leaked this first of all means it doesn't do well with circularly linked lists because everything will have a pointer to it and so it'll never be linked or I mean he'll never be leaked also that means that if you have a tree data structure the root doesn't have any pointers to it but everything else is pointed to and so it's never leaked so every leak that you see in malloc debug may be important and so you should go track it down finally one more thing about malloc debug this doesn't necessarily impact performance but it does impact correctness malloc debug can also help you track down pointer problems and that's and what happens is that there's a number of bugs that are really subtle intermittent examples include cases where you free a buffer but you continue to read and write from it even though somebody else now has the buffer these are miserable suddenly data is changing and you have no idea why the second kind of bug that is really nasty is when you have buffer overruns where you have let's say a string that's 40 bytes long but you write 45 bytes into it and so suddenly you trash the next thing after it malloc debug can help you track down both of these excuse me malloc debug can help you find both of these and the reason why I say help is because what malloc debug does is it tries to encourage your program to crash if it does stupid things what it does and here scotts brought up the memory dump let's see do we have any so one of the things that it does is when you free memory malloc debug carefully goes and it erases the memmer the contents of that buffer and it replaces it with 55 hex so we would have seen a lot of cases actually there we are of 5 5 5 5 5 5 5 5 so if you try to read data from this buffer that you've freed you'll get garbage and hopefully you'll crash or you'll behave badly if you try to treat those as pointers it's even better because 55 55 55 55 is almost always unallocated memory and as soon as you touch it basically your app crashes so if your application ever crashes in malloc debug its trying to tell you something hook up with a debug or there's instructions in malloc debug to tell you how to track down these kind of bugs and find the pointer problem because it'll save you a lot of grief later the other thing that malloc debug does is it puts guard words on each end of a buffer at the beginning of the buffer it puts the string dead beef the first part that's highlighted so de adb EEF then you have the buffer and then at the end you have the thing beef dead I did not make this up it just came this way and and what will happen is that if you over run the buffer malloc debug checks occasionally on freeze to see whether anything happens and if it ever finds that those bytes have been changed what it will do is it actually print out a warning unfortunately the warning goes out to the console so please keep the console open while you're running malloc debug yes this is lame we're working on it we'll try to do something about it okay thanks for that Scott the second tool I'll show you is called object Alec and it's intended for helping you understand how many objects you have rather than where they were allocated specifically it's mostly useful if you're doing objective-c work or if you're doing core foundation what we can do is again select an application in this case it doesn't understand bundles and we can start the program running and what happens is it gives us this little histogram it shows us a bunch of numbers grouped by the type of object and these little histograms for showing how many objects we've created the first number and the darkest bar represent the current number of objects that exists the second bar represents the peak number of objects of that type that were created during the run okay so the peak number that ever existed at one time the final bar the lightest color represents the total number of objects of that type now this way of organizing things is really nice for certain types of tasks for example well actually let me mention one other thing one thing that's good for is identifying trends so we can see motion here we can watch the numbers growing and so it's very nice for seeing that memory use is expanding for example a second thing is it can help us when we're trying to prove various statements about our program so for example the thread viewer display has basically keeps sampling the application finding out information about how it's currently running and throws those on the right-hand side of the display okay the information that Scrolls off the Left disappears and the way that's done is with a data structure called a thread data and what happens is the new thread data's get put on the right side of a big array and when the information becomes out-of-date because it scrolled off they get thrown off on the left-hand side and hopefully the objects are destroyed so one bug I could really imagine doing is that I could be forgetting to delete them correctly resulting in the number of red date is growing without bounds and eventually my system performance would degrade so we can prove that then Scot has actually done this we can actually click we can find thread data in object Alex display and see how many objects we have and what we see here is that the current number of thread data objects is various it goes between about 45 and 55 the peak number was 62 but we noticed that total number is about 1400 and growing so this implies to me that I actually did this correctly so this isn't a bug we're keeping the correct number of objects in sort of our ring buffer throwing new items on the right hand side and pulling objects off the left so object Alec has been able to help us prove that one other thing how many of you are objective-c programmers or interested in being objective-c programmers okay you as you do test code and make sure you use object Alec because one of the things that it's really great at that you may have seen when we actually brought up the attach panel is that you can tell it to keep track of every single time that you retain or you release a data structure basically Objective C has reference counting and it only deletes the object when it you haven't retained it anymore and so if you have a program you can figure out whether you're actually destroying objects correctly and if you're not you can find out when you retained it one too many times to keep it around in memory so use object Alec especially on the example code thank you Scott okay so that's memory the third case of performance we might try to track down is what code is executing now there's a number of ways that we could be using too much CPU we could be executing code that we don't need to either that it's dead code or that it somehow is not really doing a value that calculating a value we care about we could have an algorithm that's much more expensive than we ever expected something that's let's say a quadratic you know an N squared algorithm rather than linear we could have cases where there's some operation that's much more expensive than we thought one example that was pointed out in the carbon performance session was that now that we have home directories that could be out on an NFS server when you go and get your preferences you be going across the network and so something that may have been a really quick grab that from the disk kind of operation suddenly may take seconds to actually get a result back and so you may not have expected that certain operations would be as time-consuming as they are as you may have seen in obvious keynote there's also the problem that you may be checking for events by polling by constantly checking and seeing where the mouse is for example rather than waiting for something to happen and having the system say hey by the way something changed in general the the normal way that you solve this kind of problem is the law of is is basically tracked down the biggest problem tracked down the most expensive routine because if you can cut this cut the cpu cost of that biggest routine if you can make it faster you're going to improve the performance of your system as a whole so don't try doing the little things try doing the big things first so so what we'll do is we if we want to try to improve performance what we want to do is find the expensive calls improve them the tool for doing this or at least one of them is called sampler which Scott has just brought up so with sampler we can select an application or we can connect to something that already exists and what we can do after running it is start sampling that and what sampling means is that we stop the program occasionally and ask what's going on so every 20 milliseconds we stop the program and we say where are you executing and we get basically a stack back trace then we let the program continue stop it again get another back trace and keep going and the advantage of this is that for a relatively little impact on the running system we can actually find out what code is most likely to be running this is statistical we don't know all the things that ran in between but we've got some good idea of what we were actually seeing so what Scott I believe has done is he's actually done some sampling wealth well thread viewers drawing so let's see if we can find out how what thread viewer is doing when it's running personally I'm very concerned about how much time it's taking to actually grab its samples to find out what's running in thread viewer and I want to find out how much time thread you are spending drawing now I know a little about this program I know that thread three happens to be where the sample where the where the data gathering goes on in thread viewer and we see that thread three was found executing 486 times that's the number of samples that were taken all of those samples occurred in the function P thread body which was calling sample threads okay which isn't like which is my code and every time that we stopped sample thread we found one of two things four hundred and seventy of the times we found ourselves in you sleep which is a way of basically stopping for a few microseconds to wait for something to happen or actually wait for a fixed time in sixteen of the times we stopped it out of 476 though we found ourselves in this function called thread viewer controller log which happens to be the code that actually does the the logging that gets the information for thread viewer and if we look at that we find on the far right a sample stack and we find out that most of the time was actually doing what's called sample once all threads which is getting the stack back trace which thread you er can actually display so what's happening here is that we found that basically about four percent of the time that we actually looked at thread three it was actually doing something it was actually gathering data and the rest of the time it was doing nothing this seems pretty good this means that the the data gathering is relatively cheap and that's really good for a performance tool so Scott can actually just sort of ignore all that because we don't know because it doesn't look like there's a performance problem well prune that out of the tree so we don't have to look at it and now we can look at thread zero which is the main thread where the drawing goes on and there were about seven hundred and twenty times that there was sampling or that we sampled the main thread and most of the time it was in main which is not surprising and then it goes down into this DPS next event and the block until next event is basically sitting in the run loop it's not really doing very much so we can see here that that 570 out of the 720 times we were sitting in CF run loop okay so this is kind of interesting so 572 samples 552 of the times that we found ourselves in CF run loop run we were actually in a function called Mach message geez why are we spending so much time in Mach message well I'll give you a hint that actually happens to be a kernel routine so obviously there's something wrong with the kernel because you know we're just sitting there in Mach message all the time actually does anybody know what's going on there thank you very much we are waiting for a Mach message okay so Mach message is basically saying sending off a message probably sending it off to like the window server or to whoever's giving us events saying hey let me know when something actually that I actually care about happens like the mouse moves or we need to do a redraw and so most of the time we're going to find ourselves in Mach message overwrite trap please do not open a bug against the kernel saying hey you guys I keep finding my code running in here okay that's why you see Mach message overwrite trap the rest of the time however we find ourselves in see up run loop and if we look up the sample stack and look for where the numbering change is where where the tree diverges so to speak we find ourselves eventually in thread view draw rect and we find that only in 17 out of those 700 samples okay once again maybe 3 or 4 percent of the time we found ourselves in thread view draw rect which is actually the thing for doing the drawing and 10 of the times that we sampled it and found it in draw rect we found it doing some NS string drawing and the rest of the time we found it drawing rectangles so this implies that the drawing code is pretty efficient too though we weren't spending very much time doing it and this suggests that if I wanted to up the redraw speed so that thread Euler wasn't just sort of flashing the screen every second redrawing the display I could probably make that animation much better so that's a good thing to know okay so we didn't find a bug but we learned something about how we're actually going how we could improve this application so one other thing as I said this is sampling there are few other tools for helping you out there's a tool called sample which gives you similar data that's a command-line tool that's really good for finding out why the machine is hanging for example or why an application is is stuck in a loop you can actually run sample and get a stack back trace the other tool that you may want to know about is G profit that's the standard UNIX profiler we actually have that on our system it generates a text report saying here's the code that's running if you want slightly more accurate data g prof is a better way to go however it requires you to recompile your program sampler malloc debug and the others don't require you to do to recompile and so they're much easier to use now another way that we could be having performance problems is if we're using the disk badly that is if we are trying to read the disk at the wrong time and so on in the carbon session they actually went through how important it was not to try to do disk accesses for example when you're reading when you're doing drawing because of the possibility of blocking and slowing down your drawing and also to minimize the amount of reads and writes you do during application launch to try to make the application launch as fast as possible so one thing we can imagine is trying to understand how the application uses the disk what files it tries to access now luckily there's a really cool tool that will actually help us with this it's it's also a command-line tool and it's called FS usage and what FS usage does is it basically dumps out a text report for a given process and it actually tells us every single file system call that we do every open and close and read and write and get directories and the like now it has to be run this route because it's actually a security hole because you could in theory find out what other people are doing with it remember this is the fun of multi-user operating systems and Scott did not expose his password to unlike me in a previous demo and what we get is is we can say FS usage for thread viewer and we can see the reads and writes and in fact we can find the name of the file we can find in the far right column the amount of time it took now this is a relatively boring example but you can imagine if you ran this for example on simple text and actually that's a take home a bit of homework for you all go home try running FS usage on simple text when it starts up and watch what filesystem accesses it does to go and get the list of fonts and get the resources and like and you'll be surprised so what we can see here though if we get back to my problem is we find that thread viewer is every second doing an open F stat a right and too close and all of these are taking less than a millisecond they're taking like two ten thousandth of a second to do according to numbers on the far right but this seems a little wrong to me and we can see the file that we're doing is is slash temp slash thread viewer log and we could actually go and look at that file if we needed to to try to understand what was going on actually don't worry about that so so FS usage has shown us that thread viewer is doing something really brain-dead so the question is where is that brain deadness in my code and thread viewer I mean uh FS usage doesn't tell us that luckily sampler has a mode that will actually help us on this problem rather than just doing CPU sampling sampler will let us do several other things it has a mode that helps us track down Malick's which is very similar to malloc debug it also has a mode called watch for file actions which will instead of stopping the program and getting a stack back trace every time that every 50 milliseconds it will do that every time you call one of the system file routines or it will crash let's try that again who did we kill the doc no good now so one of the problems with thread viewer is because it's a performance tool and because we're running it up it has this tendency to stop applications when it's looking at them so that it can snarf their memory in the like and one of the problems with demoing it which I'm not sure why I was silly enough to do that is that if thread viewer manages to crash when it has stopped the program there's this nasty habit that suddenly the doc is hung which makes for really good demos because suddenly you're trying desperately to get the system back luckily this didn't happen okay so Scott's got this up and can basically run the program and sampler for a while and then can hit update and can get a list of all the places where allocations occurred here we get the normal call tree starting at the root starting at main it's a little more interesting to go with the invert call tree option here and here we can see that there were 380 places where we did read 112 372 where we did LC 128 opens I'm like and so I think we were doing opens and writes so let's click on open and we see that of those 128 opens we see a number in CF read byte and like the one that's probably interesting is the F open call there is that it yes which happens to be in the thread view controller get sample array and what's happening here is that in my code for gathering the information on the thread for some really silly reason I put in basically a little loop that said I think I've got that something like this open this file right out the samples to the disk close the file okay and because I'm doing this every second that's relatively inefficient it didn't affect thread viewer but you can imagine if you had this in your code you might want to know about the fact you were opening and closing the same file a lot of times it might be more efficient if I done something like just open the file once and then just kept writing to it every time I needed data or I could just yank this code out because it's actually pointless in this program okay so you've seen two ways that you can use sampler both to stop the program occasionally to find out what codes executing and you've seen a couple of the cool another of the cool features of sampler which is to look at file system accesses the final type of problem I'm going to tell you about is drawing because all of our applications are graphics based all the good ones that we do now most most of the good ones we do it's very important it drawing is very important because it's how we communicate with the user you know all the value of the Macintosh is basically presenting things to users in graphical ways so that they can understand things so that they can do the creative work and let the computer do all the boring stuff and so drawing is a key issue and you want the drawing to be as efficient as possible so your application runs well the problem is that if you do too much drawing you're going to use CPU time you're going to use memory because you've got buffers you're going to be using mock messages as we saw to communicate with the window manager which means we'll be blocking and so too much drawing will cause a lot of blocking will cause things it will be too much work and so we want to minimize that so what we can do is there's a really cool tool done by the core graphics team it's called quartz debug and Scott will first bring up thread viewer again our sacrificial victim and courts debug has a number of features the two I'll show you first of all it has this option called flat screen update and what flash screen update is anytime that it has to redraw any part of the screen courts reports debug tells the Windows server to actually color that in yellow and so that makes the amount of drawing explicit so you can actually see what goes on in the dock yeah we actually do a lot of work there so that's really cool you know it's a and so if you had some case where let's say during the same drawing cycle you were redrawing the same thing twice this would be a way to tell another thing it points out is that the way that I handled the drawing and thread viewer is that I just erased the entire portion that I'm animating and redraw the entire thing maybe it would be more efficient if I actually just redrew the parts that changed every time that I got some new samples oh oh there's a really cool feature here I forgot to show you one of the things that you can do with thread viewer is that if the program has something interesting there you don't want it scrolling off the left and I don't have history because I've forgotten to add that so what you can do is you can press that pause button and the pause button stops the application that I'm looking at in this case the dock and it also stops the sampling because the programs not running so I don't need to gather any data Oh however someone wasn't very bright and when he implemented the pause button although he pauses the sampling or the thread data gathering he didn't bother to stop the display and so the display that was on a timer so that every second it would cause a redraw gets redrawn every single second and so this is a bug that would be extremely hard to track down in any other way you know it would be very hard to see that when you hit the pause button this happens and if you were using a tool like sample or excuse me like sampler even with that you might not realize that the reason you were calling draw rect was because you'd forgotten that bit of logic the nice thing about quartz debug is it makes it easy to perceive the drawing problems it makes it direct and so you can immediately see what the problems are and so you know this alone is a wonderful feature now another feature in quartz debug is what's called show window list and this tells you what the window manager thinks all the windows it knows about are and as we can see there are actually about six windows that are part of quartz debug even though only once on the screen hmm so thread viewer actually has six windows open some of them are off screen some are actually one is appearing and the problem is that every single window we create whether it's an off-screen window whether in this case some of those are actually windows that we created via interface builder that are just not appearing until we actually bring them up all those windows need to have space in the window manager and so they occupy memory and thus contribute to our memory footprint and contribute to the chance we might be swapping once again this is something you may not know you might not realize how many windows you actually create and therefore quartz debug actually gives us a way to find that out and it tells us exactly how many windows we've created and so now I could go in and I could try finding out exactly which off each of those windows was and in the case of dialog boxes make sure I only create them when I actually need them as opposed to keeping them up all the time okay I didn't go through all the tools today there were a couple that you may want to examine on your own the first one is called SC usage it's somewhat like FS usage only it looks at some of the system calls like get time of day or Mach message and it'll tell you how many calls you're making to that and you may find some interesting behavior that you didn't expect in your system secondly there were a number of tools I didn't mention about heap use so for example we saw a little about heap but the idea of being able to get basically a text output describing all the buffers you've allocated may be interesting to you and so that may be useful there's also a command-line tool called leaks which is like the leak detection in malloc debug its leak detection algorithm is actually a bit better than malloc do bugs it'll actually find any buffers that aren't referenced from things that are reachable from from well-known spots so to speak it'll only find things it'll actually find groups of data structures that are leaked and so it's actually more useful there's also a tool called Mallik history which I mentioned earlier that will actually help you identify for a given allocation for a given address like you know 0x e 1 C 0 for who is responsible for that who actually allocated that block so here in the debugger and you find something interesting you could actually look at that there are also a couple tools for understanding how your application is running so we saw a sampler sample as the command-line equivalent and we had a quick introduction to VM map which is useful for understanding how virtual memory is laid out in your application if you're coming over from the nine side everything is completely different in and it may be interesting to actually look at that and realize how memory is laid out as I said before I was really trying to just tease you saying these are the cool tools there are a number of hints that I should stress again or stress for the first time the first one is that all these tools have a very nice property which is that you don't need to recompile your code you don't need to instrument it you just run the tools and they work this makes them much more available it's very easy to just sort of go in and look at your own app look at other people's apps you know if you're curious about how some of Apple's own applications do disk stuff you can actually use some of these tools to find out how they're accessing the disk so that's a big advantage take advantage of it second if you're coming if you're working on code warrior and you're using CFM binaries as your output so that you can work on nine and ten you need to do a little work to actually get the performance tools to find information in the program to get the symbolic information first of all you need to make sure that you compile your code with the in-line traceback table option on this is part of the code generation settings and basically this says put the name of the function excuse-me put the name of the function immediately after the code in the binary second code wire gives you the chance to actually use its version of malloc instead of using the system's version of malloc if you use the code warrior 2 or the code warrior version basically it asks malloc for a huge buffer and then it subdivides it and hands it out if you do that then tools like malloc debug heap leaks and the like won't be able to help you with memory analysis so make sure that you actually turn on that option and unfortunately I'm not quite sure exactly where it is third object alik although a nice tool does have the problem that it doesn't understand what CFM apps are which is probably not that big a deal because it really understands core foundation and objective-c only however if you want to look at it and at least see how many objects of malloc size 20 you have which it will tell you you need to actually select the launch CFM app hidden in the system folder and then make sure that you actually name the application you're running on the command line just as if you were trying to run the application from the command line okay so that's my presentation for today the as I said in two months you're about to hopefully all ship your apps at Macworld New York and you want to make sure that you give the best impression to your customers so start tuning your code the primary thing you want to do is cut memory use in all ways that's going to be the the best way to actually make your programs efficient and so take a look at heap take a look or take a look at how you're using the heap take a look at how you're using memory take a look at your private memory use also remember that you have that just looking at the programming isolation isn't going to be useful make sure to write down some metrics measure how much memory you're using measure how long common operations take decide whether it owes are appropriate and then compare them across multiple build so you can note regressions also remember that your application is not just your binary but is also some of the servers that you connect to and so make sure in the case of drawing to look at both the window manager and at your application and go out there and please create some great apps and I'll be looking forward to seeing them at Macworld thank you very much and thank you Scott oh damn I forgot to go had a slide if you want more information about these tools first of all they're all available on the developer tools CD you've all got a copy of it go off and play with them if you want documentation all the graphical applications have documentation built in all the command line tools have man pages online as standard UNIX tools should there is also documentation in the release notes section as I mentioned before there are also books to help you inside Mac os10 performance is a really cool book that talks about how to tune your application it gives you information from the level of how the system works to documenting the tools and all of us engineers actually tried to contribute to this also there's a Mac os10 system overview book and this is really good for understanding just sort of the overall ideas behind Mac OS 10 and we try to suggest that people actually look at this so that they understand some of the terminology and with that I will turn it over to Godfrey thank you very much Robert and Scott so the last session of the day information resources we've put up in all of our other tool sessions and the information remains pretty much the same so a roadmap we wanted to point you to sessions 121 and 122 even though they've already occurred so that when you go to the the DVDs that you'll receive after the show you'll see some other areas where we talk about performance tuning tomorrow our last sessions for the tools track happen in Hall 2 at 9:00 a.m. that's the debugging of Mac OS 10 and the feedback form for Apple developer tools at 3:30 p.m. in j1 please attend we've very interested to hear your feedback at the end of the day if you have questions on tools you can contact me I am the technology manager for development tools that's my information up above and the developer tools feedback at Mac OS 10 tools feedback group.com you
