WWDC2000 Session 195

Transcript

Kind: captions Language: en hi my name is Robert vote and today I'm going to talk about or Apple's performance tools for Mac os10 and hopefully we're actually going to have a demo so the first question or the way I'd like to start this is why are you all here why should you actually care about performance tools especially today well it turns out that with the changes in Mac OS 10 this is a perfect time to be very concerned about the performance of your applications we're all working on a new operating system where the libraries that we're used to are now working on different system routines and may not behave the way that we're used to them behaving as a result we need to actually take a look at our apps and decide whether the system calls and the library calls that we used to be doing actually have the same book performance that they used to whether there's any changes and their semantics and what operations they do whether there's changes in how they behave in addition some of the algorithms we may have chosen in the past may no longer work as well in Mac OS 10 and here's three examples that actually come out of some of my experience at Apple the first one is the difference in that in how the heap is done for example in Mac OS 10 we no longer have fixed sized heaps instead the heap will expand as far as it needs to as long as you keep allocating memory as a result the idea of allocating memory and then setting the purgable bit doesn't make sense anymore because the operating system is never going to bother to purge this memory there was one case in the finder actually where they were loading in the background image they would load the compressed image into a buffer then they would uncompress it into another buffer and then they had a third copy for as a working copy that was marked as purgable and the idea was that if the memory was ever needed that copy would get blown away and it could be recreated easily on Mac OS 10 at least two of these buffers weren't really necessary the idea of the purgable case didn't really make sense because it was never going away and the coffee that was on the version of the file on disk wasn't needed to be copied into memory because we have memory mapped files and so with cases like that you need to worry about exactly what your apps doing with memory similarly the case of polling is much more expensive on a multi task operating system than when you're only expecting one application really beat taking control the CPU at a time if you're sitting around looking out on the network looking on the filesystem profile to appear waiting for the mouse to move those are cycles that are being used by the CPU that can't be used for other applications and so you don't want to pull on Mac os10 because you're going to drag down the performance of all the other things that might be running in the background and finally because we're no longer operating in a single address space the idea of inter-process communication becomes a bit more difficult we can't just sort of pass a pointer and provide another app a sneaky way to look into our memory instead we need to explicitly use one of the real IPC mechanism such as Mach messaging or tcp/ip or we need to use shared memory or we actually need to map memory into both processes using the mock underlying virtual memory mechanisms in addition many of the tools that we're used to using may no longer work or may not make sense anymore a good example of this is even better bus error this is a quick and dirty tool that will basically make sure that your app is not writing or reading from address 0 by putting a bogus value there on Mac OS 10 this is not necessary anymore because the operating system by default makes sure that for every task the first page of memory ends up being non readable non writable if your application tries to read or write to it boom it crashes you get an immediate feedback that you're doing something badly isn't that nice in other cases there's there's new tasks that may be necessary there's other cases such as understanding about purgable and non purgable that no longer matter and so you need to understand different sets of tools and so hopefully what you'll learn today are some ideas about what tools are out there and perhaps what tools are necessary as some ideas about third-party things that can be filled in so as an overview I'm going to start out by talking about two classes of tools the first set of tools are a set of unix-like command line tools that give you information about the low-level state of the system the second set of tools have some graphical and exploratory tools that actually give you a higher level understanding about how your application is running some of these may be familiar to you such as malloc debug or sampler for each tool I'm going to try to give you a little bit of background about how it's used what its purpose is and also hopefully give you enough excitement to make you want to go off and try these on your own and explore them for each of them I'll also try to give some of the details about how you interpret its data and how to use it to actually analyze your system however this is going to be a survey there's just not enough time to really go into depth about what's going on and so hopefully this will at least for you to explore and ask questions finally there's two other themes I'm going to try to keep going through as I talk the first one is that I want to tell you a little about how you might try to approach performance problems these won't be a very high level but hopefully these will be some tricks the second issue I'm going to try to do is give you some little hints about performance problems that I've seen such as what I talked about on the last slide once again I'm not going to be able to go into detail on these if you're looking for specific details about how to make your calls to let's say corefoundation more efficient or two-carbon talking to the people who are responsible for those libraries going to those sessions such as a choreographic session yesterday or some of the carbon sessions or the core foundation sessions will give you more ideas about some of the the obvious things you should be doing to make your app more efficient so Scott how are we doing ah okay so let's start off with command-line performance tools how many how many of you have experience have actually used Unix good most of you actually will have a leg up how many of you think that command-line tools are the work of the devil okay thank you Scott well actually there's some very good reasons to have these the first one the tools that we have here are basically meant to be quick and dirty tool to give you information about the state of your machine and there's three really good reasons why you want to use them the first one is that they're minimally invasive that is when you actually use these to analyze your system you're going to get more of an idea about how your system or how your application is behaving on your computer as opposed to how the tool is actually affecting how your app runs on the computer the second thing is that because all the two command-line tools that means you can actually run them remotely if you don't want to upset the screen if the machine is hung you can login via telnet and you can run these commands and find out what's going on and finally because all of the command-line tools are basically just text-based applications you can use any of the UNIX filter commands to convert the data into a format you like if you want to see let's say every 10 seconds how much memory your application is using you can easily write a little script that goes around and every 10 seconds pulls one of the tools to actually find out how much memory is being used and so in this way you can sort of roll your own without having to do anything too deep the first tool that I list here is actually PS which is a standard UNIX tool that stands for process status it gives you information about what processes are running on the machine it tells you about how much memory is used and so on I'm not actually going to talk about that because it ends up because there's some other things that might be more useful ok so let's take a look first at top top is something that you can use instead of PS to find out about the state of your system it's actually something that comes that there's some implementations of top on other unix-like operating systems this one was specifically written by us and what it does as you can see is it gives you a list of the process is ranked in basically newest to oldest order at the top it gives you information about the status of the system it starts out saying what the load average is what the average number of runnable tasks happens to be it tells you about how many processes there are how much memory the line starting with memory shows you how much memory is wired that is dedicated to uses of the kernel only the second line shows you how much memory is active in active blah blah blah below that you can see how much virtual memory there is there's currently 688 megabytes of memory allocated to virtual memory not all that may actually have memory in it but that's how much the virtual memory system thinks it has in addition it shows how many pages have been put out to disk and brought back in with the page ins and page outs and the number in parentheses there is important because that's actually a delta that shows you how many pages have changed in the last second why don't we run QuickTime Player so we actually get something interesting here and what we'll do is we'll simply run QuickTime Player and let's let's think of a hypothetical problem let's assume that we're working on the player and we're finding that the framerate doesn't seem high enough and we're not sure whether we're correctly throttling it down for some reason or if we're not getting enough CPU this is actually not a problem as far as I know but it's a good story so what we can see here is on the second line you see launch CFM app here's a bit of trivia the QuickTime Player is actually a pest executable it's in the same format that you would have seen on Mac OS 9 and as a result whenever you try to execute one of those on Mac OS 10 the launch CFM app serves as a wrapper to actually load that into memory and so that's why you don't actually see QuickTime Player in the list of processes and what we can see is that the QuickTime Player is using about 25 to 30 percent of the CPU we get the elapsed time the number of threads the number of Mach ports which is an abstraction for communicating between the kernel and the system and and the application other interesting things include the are private which is the amount of private memory memory that is only for this particular running version of the application that's different from all the memory that's needed that can be shared between multiple copies of QuickTime Player if we had multiple ones running or other applications using the same libraries so that our private is a good measure of how much memory your application is is using right now and our shared shows how much memory is being used for the application itself which can be shared and all the libraries which can be shared in memory mapped files and all those things that aren't only dedicated to one application now we can look at this and we can say gee we're only using about a third of the CPU what's going on here are we spending too much time on disk are we throttling well one thing we can do is we can look down the list and we can understand whether our application is doing anything bizarre it depends on the rest of the system and in this case there certainly is we see at the bottom actually in a lot of people probably well can everyone see the line starting with 50 window man never mind I'll just read it out down towards the bottom there's a line that says 50 window manager it's using about 20% of the CPU it's run for about 51 seconds and so on what's happening here is that the window manager is actually responsible for doing the drawing to the hardware and so all the applications end up talking to the window manager and so it's not too surprising to see execution divided between the two because the QuickTime View player is spending some of its time getting all the images ready its shipping them off to the window manager and then the window manager blasts them up on the screen so we're seeing that we're spending about 50 60 % of the CPU actually doing meaningful computation and filling up the CPU what's happening with the rest of the time well there's some other tools that we could do let's go on the hypothesis that maybe there's something going on with the disk there's another tool and before I and one of the nice things about top is that it has a huge number of extra modes and features hidden please check the man page there's probably some view that's perfect for a performance problem you're trying to track down but I'm not going to show them all if we're trying to go to the file system though and understand how we're using that there's another command that might be useful and that's called FS usage with FS usage we name either name application or we name a process and we hit return actually just to QuickTime play or actually do everything okay well what we're going to do is we're going to get a huge amount of information and if Scott hits the spacebar we'll start seeing it and what you're seeing here are all the accesses to the disk that are going on so you're actually seeing the file system system calls being performed and we can see what the act what was being done like read or write or page ins page outs doing the status of a disk that sort of thing we find out how much time was elapsed and whether it actually had to give up the CPU to another task to let that transaction finish and the application responsible if Scott actually widened that window we'll get some more information we'll know exactly which file handle was accessing that and how many yeah there we go it'll actually say what file handle it was in the process and how many bytes and what we can see here is that the QuickTime Player is getting chunks of 16 thousand bytes at 32 thousand bytes and we don't see any cases where it's having to wait too long so that probably means we're not having to do anything too weird with the disk and we're not waiting for stuff to come off the disk maybe another thing that we could check out is we could ask about how the memory is laid out are we using a lot of malloc space and the like or just if we were curious about how applications are laid out in memory in Mac OS 10 we might want to have some sort of a tool for visualizing that and there's another command-line tool called vm map and what we can do is named vm map we can specify the process ID or the name of the task and vm map will give us a listing of all the regions of memory where they start how much space they are it will actually start off telling us only the readable regions where the non writable and then it will tell us the writable ones at the end and what we can see here of interest is on the first line we see a symbolic name page 0 we see a starting address which is 0 it's 4 kilobytes then we see the permissions which is in the UNIX style octal and everybody knows how to read octal of course oh my saying that page 0 is actually 0 slash 0 which means that it's non readable non writable that's the thing that's saving us from doing page 0 accesses if you try to dereference you know a pointer which actually has the value 12 you'll know about it you'll be able to catch those immediately you're not going to have to worry about strange memory corruptions and the like below that we can see the application starting an addressing 1000 there's a couple places that are cut off with the rd which are guard pages which are again non readable non writable pages at the end of stacks for the various threads so that if you go over the end of if you fill the stack it's not going to crash assistant or it's not going to trash memory it's simply going to crash when it hits that and you can see all the libraries starting at address for one-30 and going down and you can see the names of the files that are being loaded as libraries along the right hand side if this is too small don't worry try it at home hopefully it will make perfect sense if we go down a little further we'll actually see the writable regions and here we can start seeing things like the malloc allocated regions and so we can find out which pages malloc where a was placed at and most of the malloc buffers were actually placed right below the application another tool that might be useful is we might be asking ourselves well is is the application running slow because we're doing some obnoxious system call that's just hanging forever and there's another tool called FC usage and what sv usage will do is it's going to look for all the mock system calls going down into the kernel it will tell us how fast or which ones we were calling off and how much time we were spending what you can see here is some information about how often the app got preempted how often time the CPU gave execution time to somebody else we can find a number of contexts which is below bump all that interesting stuff the second section shows us how much time we spent idle and how much time we spent busy and what we're seeing there is that you know we're spending a lot of our time in in user mode running the application and a fair amount of time waiting in the app probably because we're doing a lot of disk accesses below that we find the most popular system calls being done and we find we're actually spending a lot of time on semaphore weight and mock message over I trap okay you might say gee that's weird maybe we're locked on a semaphore that's a very good guess unfortunately it's not completely true because on a lot of applications on Mac OS 10 there will be usually one or two threads that are basically waiting for something really bad to happen they sent a message off to the system saying let me know when something bad happens and they just sit there on message overwrite trap which is send off a message overwrite the buffer when it comes back wait until we get a message back and nothing ever comes back and so they're constantly waiting so understanding that those are having huge wait times doesn't necessarily buy us anything however in some cases understanding we're spending lots of time doing semaphore signals may tell us something about how our apps running that we're spending too much time actually waiting on critical sections or something let's see what else do we want to show I guess that's about yet okay so those are the command-line tools everyone who has had covered their heads because they were afraid of them can now come back up because we're actually going to look at things that look nice and that don't use any nasty technologies so the next thing I'm going to show you or some graphical tools that tend to give you a little higher level information they don't give you quite the immediacy but hopefully will help you understand what's going on the first of these is called Melek debug and the point of malok debug is to help you understand how your application is using heap memory so what it does is for every allocation that your app is doing it will keep track of how much memory was created where that memory was created and will give you a way of seeing what's currently allocated in the system it's really good for answering questions like how much heat memory is my application using am I using 500k am i using 10 megabytes are there any places where I'm using large chunks of memory am i allocating 3 megabyte chunks for some array that I don't realize are there places where I'm over running or under running buffers preparing trying to trash somebody else's memory which is a great way to make subtle memory bugs are there cases where I might be leaking memory where I'm allocating things but forgetting to free them in all cases what malloc is going to try to do is give you information about how you're creating memory using malloc as the core idea and unlike some other tools you might be using what it does is it tries to give you a snapshot of how much memory you're using right now as opposed to showing you memory that you'd allocated before that's been freed for example so it's only a snapshot the way malloc debug does this is kind of cool what it does is it has its own version of malloc that's been instrumented and it's lied to that version of Mallik under your application when you launched it and as a result it makes it very easy to use you don't have to worry about were you compiling your code to make sure that this new library is used you don't have to change any source you don't have to do anything it just works and that's one of the advantages of these tools in addition because we have our own version of malloc what we can do is is when malloc is called we can actually keep track of the call stack and find out how you actually got there and because every other allocator and the system whether that's in core foundation whether that's in carbon whether that's an objective see all of those eventually go through malloc so this is a single point to actually find out how you're allocating memory so let's do a demo here so here's the malloc to bug window so Scott can either select the application by pressing the Browse button and going through a browser or can choose it off a drop-down list he can then press launch to actually start it up and why don't we update it to see what the current status is and we find that in this case we're launching simple text we find that we're actually allocating about 700 K to get to the point where we've actually started the application and what we see in the window below is basically a call tree it shows us all the ways that we got down or that we ended up going to malloc to bug or calling malloc so for example from start we called underbar start and eventually we got down to main after going through some system stuff main called malloc through about 4 functions either calling an it cursor or do initialize or do event loop or some strange hexadecimal value there actually let's go through that hexadecimal thing so for 1 1 0 is actually another little secret as you might know for one one is information and this actually is the place where you go to do it to load dynamic libraries that's where you get information about how to call other functions cute huh so what happens is that when your application launches it tries to load all these other libraries and as a result it has to call the initialization routines for each of these libraries and down inside that call domain that that implicit call that you didn't actually have to make in your code the initialize high-level toolbox initialized quick-draw initialized carbon core all happened automatically and we noticed that they actually allocated about 400 K so a good deal of the memory that was allocated during launch was actually in these initialization routines now going from the top down is sometimes interesting especially when you know your code it but sometimes it's interesting to see why or how you got down to malloc and what was happening down at the other end we can not only show the tree from this side but we can also invert it and we can change the style of the tree and so now what we're doing is rather than looking at how we got from from main and called down through the program down to malloc we're going to look at milk and we're going to look at the ways that we were called by that malloc was called so for example if we select malloc these are the ways that malloc was called allocate memory called malloc add usage called malloc global cache allocate called malloc and for each of these we can get some idea about how malloc was called and what the reasons are let's go through one little example here actually let's do the VALIC one so and it's too bad we didn't actually have the better example but this will do we can select VALIC and we see that we actually have a 65,000 byte chunk that was allocated through one of the calls down that way and VALIC was called by allocate memory which was held by allocate zeroed memory which was called by new handle I would prefer to have a better example than this but this one will do here's another little bit of trivia what's happening here is that in Mac OS 10 when you create a new handle and create the memory attached to that handles are actually sub allocated there's actually a big block of space that's been subdivided into handle sized spaces and somebody's got to create that memory so what happens is the first time that you call new handle it actually goes and creates the sub allocated field so that 64k chunk is the place for handles to live may not make sense it's a system-level idea but the idea is that we can actually crack down from this collagraph what the point of that memory was especially if you're looking at your own code not looking at the innards of the memory manager what we can also do is we can actually go to something a little simpler like like Alec and won't you select one of the buffers down below no actually um yeah select a buffer so you also get a list of all the allocations so not only do you find how you got there but you find a list of the buffers that were allocated by calling down that way it'll tell you the address that it was allocated at the size and so on and if we double-click on it we get a memory dump so we can actually look at memory this is really useful if you're if you find you're allocating six thousand bytes somewhere and you're curious why now you can double click on it and take a look and try to understand why that memory was allocated now one other thing that you can do actually press the back button is as I said actually Queenie um can we run the leak leak example so one of the other things that you can do with malloc debug as I mentioned is that you can actually do a bit of analysis to find cases where you over ran or under ran buffers and these are really nasty bugs because they tend to be intermittent they tend to be really subtle they tend to only occur after the program's been running for a while and then suddenly it crashes and so you'd like to track these down what malloc debug does you can do update and then let's do an inverted or actually go to trashed is you can change the mode from showing all the currently allocated Regents only showing what are called the trashed ones and if Scott actually selects start we see that there's two buffers that are trashed that is where the where we know we over ran or under ran it and the way we know that is we actually have some guard words on either side and when those get overwritten we know that we did something bad if Scott actually double clicks on one of those you can see the ten bytes the ten zeros malloc debug then what it does is one it allocates space it puts two special strings at either side it puts the hex value beef dead at the back end of the buffer and then if Scot presses back you'll see that it puts dead beef at the beginning and the last word down at the bottom and so when those words change malloc debug knows you've done something bad it actually also in an extremely user friendly fashion ends up putting a message out to the console yes we need to fix this but it will actually give you some indications of when it actually notices that something gets trashed so keep the console window open if you can when you run malloc debug so you can see this in addition let's try leak analysis next there's also the idea of leak analysis okay now for for those of you who've used zone rancher zone Ranger has an idea about leaks and what it considers a leak to be is any memory that you allocate but then don't be allocate in some operation that should have actually been cleaned itself up so example opening a document and closing the document and if you have more memory than you start if you have more memory allocated than you started out with you've probably got a leak malloc debug goes off the definition that's more like purifies that any memory that's not-- that cannot be reached by a pointer probably can't be referenced and therefore it is leaked so we go with that definition and the way you can do leak detection is that you can start it up and you can change the selection mode to leaks and what it does is malloc debug will now scan through memory looking for anything that looks like a pointer and if it's a pointer it goes and it sees whether that's a pointer to the beginning of a malloc region or to a handle which points to a memory region or a couple other options and if it does find a pointer like that it marks the block is reachable if it doesn't find it then it says it's not reachable and it's probably a leak and after a little while it comes back and it shows us only the allocations that would have been leaked and what we can see here if Scott goes down a little further or actually go from the inverted side is we find about 182 thousand bytes this is not completely true unfortunately because there's a few cases of false positive in system routines and let me just step through a couple of them the calls to Malik from the global cache allocate which are from 80s Alec our cases in the font code and this is a case where they're doing some interesting things with pointers that this doesn't detect and we can basically ignore those out yes this is ugly there's an internal version I hope to roll out at Apple real soon now like in the next week to get around this problem it didn't make it on the CD hopefully we can put it on the website but for now hopefully these will give you some hints on how you can actually look at this stuff and then laugh at me every time so we can select global cash allocate and we can say okay this is this is your material let's select the prune menu the path item and we can pull that out so we don't have to look at it so we're only looking at the things we think to be leeks similarly the allocation with new block turns out to be a case in icon services prune that out the case in VALIC is the handles we can prune that out it will be better in the future yes and eventually get down to the point where the only things left are things that are probably leaks there's some documentation on this and the release note like I said we'll have a better version that'll that will do this a little better but this is a way to start looking for memory that might not be reachable and if you find memory that you allocated in your application that's leaked this is probably a good indication that you might have a problem ok let's move on hmm well that's probably a good idea how are we doing on time ok so one of the things that that's really nice about zone rangers idea is this idea of you allocate memory or you create an object you destroy it and hopefully the amount of memory doesn't change that idea is really nice for understanding the effect of certain operations and we can do something similar to that what you can do is you can say well let's say we're having a slowdown when we start typing and malloc debuff or enum in simple text and we're curious why that's happening so we can try to see if we're doing a lot of allocations or a lot of work what we can do is we can select go back to all or actually that's fine what we can do is mark a point in time by pressing mark we can then go and type into the buffer to do the event that we're trying to watch and then what we can do is go over to Malek debug and we can change to the show only the new nodes show only the newly allocated memory and we'll find after a moment did we actually press mark oh there we go we find that we allocated 400,000 bytes oh my gosh what are we doing actually there's a good reason for this but it's a great example if Scott actually actually we should go to standard for this one because it's an easier way to see this is why it's exploratory you have to sort of dash around and explore and it makes it an interesting thing to demo what we end up finding is that most of that memory if we descend down the biggest path is inside called voices thread and what's actually happening here is that simple text has been voice enabled so that it actually can do text-to-speech and so what happens is that to speed up the load time which is something good that you should care about for performance of course that was one of those performance minutes in order to speed that up what you want to do is you want to make sure that you do as little as possible when you're launching the app and maybe do the rest later and this is the case where they're doing that but they don't bother to actually load the voices until a few seconds after the windows actually appeared and one of the things that they have to do is load in the voices and do all the data structures to make sure that text-to-speech actually works so that's okay that it's delaying things and that's a cute trick to actually improve the performance of your apps thank you very much okay let's go through a few little moments actually one thing I'll point out is is the idea of the call trees may be a little weird as I said the idea is that every time you do a malloc allocation we get sort of this call stack of all the ways you got down to malloc that can be thought of as those vertical lines on top or horizontal lines on top when you look at the normal tree what it does is it collapses together all the things at the the main end of the tree to overlap the similarity so that you can see how it starts to urghhh and notice this is a Teresa we don't pull it back together again at the other end similarly when you do the inverted tree we do the opposite we start collapsing things together at the Mallik and to find the ways that we called malloc that were similar so we can start seeing where things diverge from that end also the cover of as another one of the little issues we should probably cover just to explain malloc debug there's also the question of leak detection as I said the hope the way that the leak detection works is to go scanning through memory looking for pointers to member or to buffers that are allocated by malloc there are cases where the leaks won't be noticed this is just part of the problem with with doing leak detection in some cases there may be a value in memory that looks like a pointer you may have you know five F zero zero zero zero zero zero because you've got a null terminated string in those cases the a random point or random value and a pointer to something that's actually a malloc buffer might not be distinguishable you can't tell why that plate that stuff was put into memory and as a result you might get cases where things that are actually leaks may not be leaks similarly there's some cases where there may be leaks that don't get detected this garbage detection algorithm this garbage detection algorithm is relatively simple anyone who's played with them should immediately see some holes one of those is that if you have a list of circularly linked structures so you've got a big loop of things every object points to something else and therefore all of them are referenced and so they'll never be detected as a leak similarly a tree of data structures will always appear will only have the root of it unreferenced and therefore you may only see let's say a 20-byte leak when you're actually leaking a huge data structure so always pay attention to even small leaks just in case now I mentioned that there were a number of problems with various system routines that we're doing clever things with pointers in general what was happening is that our definition of leak is that there's a pointer to the beginning of a buffer and if there's a pointer to the beginning of the buffer it's reachable however in some cases in your own code in others people will have pointers into the middle of a buffer for various reasons and no pointers to the beginning usually because they're trying to hide secret information at the beginning in those cases malloc bug is not going to be able to do leak detection correctly the next version may help another issue to keep in mind is is my favorite question or comment people constantly come to me and say this tool is horrible you know I use it and all my application ever does is crash this is actually the same problem that people had with even better bus error you know gee every time I use this my machine crashes why don't you write better software I love hearing that story from the guy who wrote that but what's happening is that now like the bug is trying to tell you something it's trying to tell you something extremely loudly you're doing bad things with pointers okay there are a number of cases of operations that can cause subtle and intermittent memory bugs examples of those include over running or under running buffer so you trash somebody else's buffer or freeing memory and then continuing to use it and modify the values even though somebody else has now got that memory in you're trashing their values malloc debug tries to solve both those problems the first thing it does is that every time you free memory it overwrites that memory with 7f to make sure that there's absolute garbage in there and that hopefully if your app tries to read that you'll notice the second thing is that you saw that overruns regarded with with dead beef and under runs with beef dead and so if you end up trying to access beyond you're going to get a bogus value also as a result you may see your program behaving strangely you may see odd values in variables that shouldn't be there or you may find your application crashing when trying to access address 7 F 7 f 7 F 7 app when you get crashes on your app in malloc debug that don't happen normally the first thing to do is that there is a preferences panel that has the clear freed memory option turn that off and try it again if your app runs then you're doing bad things with freed memory what you can then do is run the program inside gdb using malloc debug special library there's documentation on this in the release notes and the debugger will will drop you off exactly where you should pay attention and the final bit of information about malloc debug is questions about taking the taking its advice once again mount debug is primarily a tool for exploring your data it's a really good tool for the writer to actually look because the writer understands their own code and and may be able to say gee that's odd they're still uses for this in testing if you have cases where you're leaking memory if you've got block under runs or overruns or you're referencing freed memory that's a red flag there's something to be fixed in terms of exploring I can't give you very good details about how to explore basically go off and see what's out there see if you've got any really big allocations see if you're allocating a lot of really small things that you didn't expect look for odd cases look for patterns the best advice I can give you that's really concrete is I tend to find it much more useful to use the inverted graph rather than the standard one but that may be because I tend to look at the system libraries a lot more so hopefully you find this useful the second tool that I'd like to show is a tool called sampler and you can think of this as a really cheap profiler what sampler does is every 20 milliseconds or every 50 milliseconds it stops the program and it says hey where are you running and it actually gets the call stack for all of the threads that are currently running so it knows the current point that's executing like malloc to debug it provides basically the call stack so that you can browse through those and try to find out exactly how things are running now the reason why Mel or why sampler is good is that it's extremely easy to perform you use it it works you don't need to recompile your libraries or recompile your application like you would with profiling you don't need to have special profiled versions of libraries you don't need to make any changes of the code it just works you can run this on any of the applications on the system and in fact all these tools are on the CD so please go out and play with them and in addition because it's only stopping the program every 26 milliseconds of 15 milliseconds hopefully it will be doing very little to the applications running behavior as opposed to let's say doing full profiling and so this may be a way to get really cheap data to find performance problems that should be explored in more depth I'll also point out just in passing there's also come in line tool called sample where you this type sample and the process ID or the application name and how many seconds of sample for and the interval and it will put out a text-based report saying where it found the program's execution this is really good if your application hangs or if it seems slow so that you can actually track down what the performance problem is and its really good for basically cutting and pasting and putting into a bug report so let's do a demo okay so here's the sampler UI once again we can select an application we can launch it and let's actually change the sampling rate to 20 milliseconds and then we can launch and sample it and we can see how simple text launches and what's going on during that and so eventually the window will come up there we go and we can stop sampling and now we get a set of call stacks showing what's going on we'll start off with the extra threads so thread to so they're 155 samples 155 times words stop the program that it found execution in thread 2 and all those were basically in mock message overwrite wrap ok so it's basically sitting there waiting for a message we can ignore that so we can actually add that to excluded stacks down at the bottom to get it out of our view thread ones pretty much the same way except for about 8 samples it's basically sitting there doing nothing so we can ignore that one also and then in thread 0 if we click on the 1000 block and start and start and main now we can start finding out what was going on so we had 158 samples at 20 milliseconds that's what 10 10 3 seconds most of the time was being spent in do event loop the wait next event is pretty trivial that's just when it's spinning so we can ignore that and the last 6 samples were actually you want to go down to that actually in you can see the call stack on the far side showing the entire tree so you can see that we were in do event loop in hand which called handle event which called eventually resume the current event so that was doing the setup for the app this is a relatively uninteresting example feel free to go off and try your own code and hopefully you'll find some some very interesting things about how your app is running and where it's finding it there's also a way that you can invert the call graph so you can look from the bottom up and you can find the common functions that work that it found it running in if you find that your functions are listed down here that probably means you have a tight loop and you're spending all your time there often you'll find that the application is stopped in system calls when it was sampled and that's why you're seeing calls a string compare or to mock message over I trap and the like okay there's one big caveat I should mention although I've said that this is a cheap method of profiling remember sampler is not providing comprehensive accurate data its sampling it's a statistical approach that means that it's not going to show you all the calls that are actually happening just the ones that were happening when it decided to stop the app second the numbers refer to how many times it found it in that function not how many times that function was called if we found 150 samples in main or in some arbitrary function that could mean it was called 150 times it could mean it was called once but every time it looked it was it was in that or that function could have been called 150 thousand times and we just happened to see it when it was in that if you're trying to get if you have small quick executing functions those are going to appear statistically based on how what percentage of the time they actually take to execute so with longer sample runs and smaller sample times you'll start getting better data and you'll start seeing the smaller functions appear in addition because this is sampling there's the question of sampling error when are we going to see the pro or what are we going to see when we stop the program well because the way sampler works is it takes control of the CPU and the other process stops that means that the other application is going to be a preemption point and so wherever the operating system decides is good time to stop the thing is going to be where you're going to see it in sampler that could either be because it ran out of time and the operating system took control away or it could be because the application made a system call and the operating system said you're never going to finish this in time I'm just going to give control to someone else while you waiting for this disk access and so you may see disk accesses you may see some of the system calls much more frequently than they really appear okay let's not worry about object Alec actually let's do it let's just demo it quickly so another tool that's available is object Alec this is a tool that was originally intended for objective-c but still can be useful for programming and carbon for programming in Basics in just C the idea is that this is trying to be a lot more like zone ranger but it's trying to give you ideas about how fast you're adding data how much data you're using how quickly it's increasing and what it does is it shows you a histogram and what it does is it divides up all the allocations based on the class of the object so you can see allocations or how many CF dictionaries you had how many NS strings and in the case of just plain malloc allocations it just says malloc - 46 446 byte malloc allocations what the histograms show you is first for the darkest bar it shows you the current number of objects of that type existing in the system the next darker or the next lighter represents the maximum number of objects of this type that ever existed at once and the final bar shows you how many objects of that type have been allocated so watching this run can give you an idea about in general how your app might be behaving and might give you some hints about objects that you're creating a huge number of that might be performance problems there's some other features in this and other features and the other tools please go play with them okay okay so let's do one example here let's talk about how we'd actually debug something for real and the example I'm going to use is one of my own things so that I can be very embarrassed specifically it's the mount debug leak detection what happened was that when I actually implemented or added support for carbon memory I found that leak detection got much much much much much slower about a 10 times slowdown this was very bad however I'd only change the algorithm in small ways so I was extremely confused about what was going on what I needed to do was I needed to use multiple tools to understand exactly what was going on and this is something you're probably going to find you really need to play around and look from different angles to find out why your something's behaving less than optimally the first thing I did was I ran sampler I had sampler look at my process when it was doing or went at the application when it was doing at the leak detection and what I found was it most of the time it was actually spending in a cult known as vm region which is a system level call that will tell you about what parts of virtual memory for a specific process are we are actually mapped in which ones don't exist whether they're readable writable this was important for being able to identify when I was checking a pointer figuring out whether there was anything at the other end so that I could read that data without knowing that the system was going to crash or actually the application was going to crash because it won't the system won't crash in OS 10 thank God the solution was that this data didn't change during the time I was doing analysis and so I could actually I found I could actually cache that and I increase the speed by about a third better than it was the second thing was I started listening to my machine I used my ears another tool and I found that the disk was chattering away using top I looked and I found that I was swapping about two thousand pages a minute okay so my machine was basically spending all its time throwing pages out to disk and bringing them back in this is not very efficient unless you happen to be a disk drive and what I found was that although it was spending all that time swapping around all the execution was being spent in my code it wasn't doing other IO you know it was just trying to swap and what it turned out to be after commenting certain parts of the code out was that I was checking for pointers in places I shouldn't have been in places that were only readable memory that you couldn't had reasonable pointers in and as a result I was searching around in a lot of places and because of the change in algorithm suddenly I was looking at a lot more pages in random places and instead of sort of linearly passing through memory and looking at only a few places I was looking everywhere randomly and causing huge performance problems as a result what I was able to do was minimize the number of out of order checks and tighten up the checks on what I was going to look at other pages for and as a result got the speed-up down to about a factor of two and since then I've looked at my algorithm and gotten it down to like only ten seconds from thirty which was pretty cool so the take-home lesson here is plan to use lots of tools plan to explore lots of parts of the system and plan to learn a lot of trivia welcome to the world of performance ok for those of you that are planning on porting from Carbon yes we've heard great testimonials about people who went off for lunch and converted their app over to carbon as I and that's really good and in fact in a lot of cases that's probably good enough however there as I gave you examples there may be places in your code where there actually are mismatches and the algorithms that don't quite match the new world and so porting isn't porting is only going to be half the work you're going to have to look at the app you're going to have to understand how it works and see if you can find any performance problems plan on using these performance tools plan on using multiple tools and exploring as I mentioned before and in addition remember that one of the things that we're getting with Mac OS 10 is a huge number of pieces of infrastructure that are really going to help us out and so plan on looking at them and deciding what you can actually use examples include memory mapped files you don't actually have to read memory or read stuff from disk into a buffer the operating system will kindly just map that file into virtual and when you try to touch that page you will actually map it into memory for you so you don't actually need to keep multiple buffers around and in fact if you try to keep those buffers around you maybe being too clever because the operating system may be keeping a copy of the memory map file in your address space and so suddenly you've got two copies similarly we now have pthreads a really nice thread implementation these are threads at the level of the operating system they don't have a lot of overhead because they're part of the OS so plan on looking at P threads and seeing if you can exploit those and finally we also now have the POSIX file i/o and there may be cases where that's much more useful to you than the standard Mac OS toolkit so take a look at that and see if that'll actually help you in some cases in addition a certain vice-president who shall remain nameless hacked on a few weekends and excuse me let me rephrase this certain people high in the company happened to be very interested in algorithms and happen to be very interested in malloc one of the problems on many Mac OS compiler implementations was that the malloc implementations used to be really bad and a lot of people have used sub allocators instead of going through native memory management because they want extra efficiency or they don't think the performance is going to be that hot we have a really nice implementation of malloc thanks to someone's nights and weekends so think twice about using sub allocators try the new malloc it's really efficient there's some really cool new little features in it go play and finally I will repeat again and again again pulling bad blocking good don't sit and wait for something to happen have the OS go off and tell you when it's done how the OS take control away from you and give it to someone else so that other processes can actually run and you'll have a nice feeling of a smoothness all through the app instead of having yours take up CPU I and as a final warning in a horrible place some of these tools do work with tough binaries some of them don't necessarily work so well and we want to improve that with malloc debug and sampler they currently do not identify the peph symbols and so you're not going to see the symbols in your own application if you're running Mach o native binaries isn't a problem this is something that didn't get on to the CD hopefully we can actually put it out on on the developer website so that everybody can use this but plan that the version on the CD may not do a good job with peph binaries that you may not be able to see much about your program so to conclude Mac OS 10 is really cool but the differences between it and how you use to work may has a lot of differences the algorithms you use are probably going to need to change so take a look at them use the performance tools to analyze them and have a great time with native Mac OS 10 applications so if you've got thank you if you have questions or actually if you want to use the tools they're in flatten slash user slash bin for the command-line tools the graphical tools are in system developer applications documentation is available as man pages for the command-line tools now it's a bug and sampler have documentation in them and there's also a nice release note on mouth debug explaining some of its idiosyncrasies for this particular release if you've got questions or feedback if you send mail to Mac os10

  • tools - feedback it goes - I believe

the entire group we'd love to hear your comments suggestions about other tools that are really necessary because we're all going to learn what's really needed when moving over to Mac OS 10 and if you have any other issues Godfrey de Georgie is our technology manager for the development tools group and I will bring him up so that he can tell you about the other forums oops group Apple comm thank you very much okay we we have about 12 minutes for Q&A so it's [Laughter] whatever roadmap for the next two sessions in the in the tools tools group debugging applications Emeco s 10:00 tomorrow morning at nine o'clock and carbon low level would be another another good session people interested in performance and why don't we just get our whole you