WWDC2004 Session 311
Transcript
Kind: captions Language: en good morning everyone welcome to using performance analysis from Mac os10 let's dive in I'm Dave pain working on the performance tools so this morning we're going to cover some of the concepts of performance analysis just general concepts for any system and then look at the performance tools that we have available for Mac OS 10 and then dive into looking at a case study of real world use of the tools and then showing some of the cool new work that we've been doing to implement integrate some of the features of the sampler profiling tool into shark and which is another profiling tool to combine them and now to add a bunch of powerful new features to help you find performance problems faster in your applications so why are we interested in this after all those machines are getting faster and faster right just by DP or hardware that's really cool well you know there's a lot of jobs no matter how fast the hardware is people always need more power you need it in your development tools for example faster compilations etc in general you know I don't want to have to go out and buy another gigabyte of RAM to run this application or these three in combination I'd like my laptop to have the battery lasts all the way back to Dallas on my flight a lot of that is how much should my beating on the hardware the CPU the memory when the fan kicks in well you know what's going on there that takes power as well and on the laptop and in general we want applications that play well together with others that really is important to reduce the amount of overall memory that your application is using because you know I'm running more and more applications on my system every day you guys are creating cool apps thanks but I don't want to have to page tremendously when I'm switching between those applications so rather than just diving in and saying well I think this routine is going to be slow I want to go rewrite it because it'll be fun and cool humans are notoriously bad at guessing where the performance problems might be so we need a systematic way to go about looking at this you're the expert in using your application what is for so an approach to performance analysis first define what the major operations you're interested in having fast for the user are then what are your goals to make sure that that's nice and fast to the user for example responsiveness if you have slow operations then you can either speed those operations up which is the best case there's just happen like that or if it's unavoidable that it's going to be slow then you want the user to be able to be in control of the application again quickly so maybe move the slow operation to a separate thread so that the UI can be responsive again throughput know if you're doing a game you want lots of frames per second it's doing a network application you want a lot of data throughput servers you want lots of transactions per second what's the goal for your arena then establish precise benchmarks for that define what your target hardware for your customer segment is what operating system version you're testing with and the specific data that you're going to be passing in and specifically what operations you want to test then add in time measurement code instrumentation so you can time and time again measure that same operation and see how you're doing track that throughout your development this is what the safari theme was doing to make sure that they had the fastest web browser on Mac OS 10 to find a precise set of benchmarks and then ran a time and time again and when somebody was going to check in a major piece of new functionality you couldn't go in if it caused the system to slow down noticeably so don't allow regression and finally if you do identify some performance problems then focus your tuning efforts on those hot spot last easily said but how do you actually do that how do you find what the hot spots are got tools to help with that yes absolutely we've got tools you've got tools these have been included with your Mac os10 performance tools for several you at this point take a look in developer applications performance tools as with all of our developer tools these tools are free we really want you to use these things to create great applications they provide full support for everything you need to do with Mac os10 including now a lot of support has been added in for Java profiling and in addition our GUI based tools are integrated with Xcode for a full round-trip of the development cycle you can launch your binary under a development to under a performance tool from Xcode and then you can get back to the source code directly from the performance tool back into Xcode so we have a variety of performance tools for monitoring performance problems for analyzing what once you've found that you do have a performance problem then analyzing what is the problem where is it why is it happening in a variety of areas memory use execution time other types of resource use and I'm going to dive into a bunch of these tools as we go through so in the area of high level monitoring there's a number of things you can look at to help answer just you know what what in general is happening one of the primary ones is the command line tool top which is always there for you you can use that for you know just my system right now or to look at a headless server or remotely log into a system and if you've got a full screen game going but you know we've got a bunch of other tools as well they're a little bit more friendly with graphical user interfaces so for example a nicer user interface the top is the Activity Monitor application it can help you analyze why is my system slow is there some particular process that's taking time is its memory growing so this one ships on the user CD if the user calls up your support line that says my system slow well fire up activity monitor and tell me what it looks like another one you might not be familiar with this is one of the chudd tools we've now elevated it up into the mainstream set of performance tools it's called big top so this actually takes the top information it graphs it over time so you can really see how it's changing over time I find it's very useful to watch and actually see graphically is the private memory use of my application growing or the virtual memory use those are two of the best metrics and you may end up you don't want to look at shared resonant size because that's like your frameworks or shared with other apps but the private space could be growing and that's important to note oftentimes I'm sure you've seen it the the spinning cursor comes up and you wonder and wish I could capture that maybe it's a two second spin and by the time you're off typing sample on the command line the spins over but it makes your app not feel responsive so spin control is a great way to just capture that just have it running in the background all the time it automatically detects when applications aren't responding to the user interface events and let you see what's going on there now sometimes I mentioned that the fan will kick in I've noticed this a number of times that much machine is sitting there idle and yet suddenly the fan fires up whoa what's going on there so i fire up top or activity monitor and take a look and some processes using fifty percent of the CPU maybe it's drawing too much quartz debug is a great way to look at this and I've actually seen this a number of times in real applications it's periodically drawing and course debug flashes the graphics yellow on the screen every time a draw or you can also set it up to just see now duplicate drawing in just normal operations very useful so once you've used the high level performance analysis tools to see what's going on overall then now we want to go and dive into why is this happening so we have a variety of profiling tools we're going to talk a lot about shark today because we've put a lot of effort into the shark application spin control i mentioned thread viewer lets you see the thread activity of your application what are all the different threads and what is X races of them and then for those of you who doing open GL graphics programming OpenGL profile is a fantastic application to help figure out where the time is going there command-line tools sample is excellent for just a quick basic sample process name or process ID number of seconds very useful and then we do have the venerable unix g prof tool but that's the only one that really requires recompiling for profiling the others don't need to do any recompilation of your application so i mentioned shark we have a new version that was shown yesterday in the development tools keynote shark four-point-oh this tool helps you figure out why is my time is being spent in some certain place where is it going you can look at specific threads specific processes the overall system and now you can actually do sampling over the network as well for again like dual full screen games something this captures the all the information about the system both user space and kernel space there's a full session on shark this friday afternoon i'd encourage you to go to that will see more of this today but we aren't going to dive into all the all the full-blown features of it one of the things I really love about shark is I don't even have to attach to a process right oh it's this now it's always taking the background with an option escape hot key that will sample the entire system so if I notice something seeming sluggish I can just hit option escape right away shark fires up and says okay let's sample and then I can option escape to stop it and see what's going on and system and then dive into that specific process but there's a lot of ways to get more specific information here different sampling methods memory tracing function tracing we'll look at a lot of these there's three primary views in shark a profile view that lets you see a top-down call tree of your function calls bottom up for real profiling information and who's calling the leaf functions we've done a lot of work in this to really let you hone in and do filtering and data mining 2 simplifies the complex picture there's a chart view that helps you really see the patterns of execution of your code this is excellent for both performance analysis and just understanding what's going on in your application it's really cool and finally within shark itself there's a code browser that you can see source code or assembly get hints about what the assembly code is and get directly back to the offending lines of code in Xcode these are all instrumented here with where the specific lines of code the time is going into so I haven't talked a lot about sampler so many of you may be familiar with using sample or what's going on with that we've integrated all of the features of sampler into shark and we plan to remove sampler from the system so you can see here that shark has a number of additional features and we'll touch on some of these and then more on Friday but please try shark and the team's been very busy so shark is there on your tiger CDs but there's actually a newer public beta that's got a number of additional features that seem banged in in the last couple of weeks so please download the new version of shark it runs on both Panther and tiger and send us your feedback i'll give an address later so what about memory use I mentioned that it's really important to try to minimize the overall footprint of your application we have a number of tools to help analyze what's going on with memory use so a very nice one is object Alec this is great for looking at dynamic memory use both how much memory am I using right now and how much was the peak that I used in some particular operation this you can use this with Coco applications and this is great for seeing the your allocations by memory by allocation type what type of objects so Coco objects corefoundation objects like you would also have with carbon applications and just general Malik allocations and it says what size they are because a lot of times you'll allocate specific sizes at specific points in your code so you can see all that and you can look at information about specific instances of them it's not quite as good for for pinning down precise memory leaks Malik debug is still the best application for that this shows a full call tree of all the allocated memory not by type and it's not so good for dynamic allocation but it is still the best tool for leaks there's a command-line tool equivalent of this called leaks they can also show you the back traces of where allocations were occurring if you set this Malik stack logging environment variable so another major function of Malik debug was to help find corrupt memory operations but we actually have a better solution for that at this point the purpose was to help crash your application if you did something bad with memory but really you want to be operating within a debugging environment if this happens so we have a new Malik debugging library called guard Malik so this operates within the context of the Xcode debugger there's a nice switch on the debugger menu item now say enable guard Malik and what this does when you turn it on is every allocation you make goes on to a separate virtual memory page then the end of the buffer is lined up with the end of that memory page and the next page is non allocated so if you overrun the buffer you'll crash immediately and you're in the Xcode debugger you can see immediately where buffer overruns are in your code if you free the block then we free the memory the virtual memory page and so if you go and read or write from that page again after freeing it then again you crash immediately this is a great way to find a really nasty memory problems so you can learn more about this in the Xcode debugging session on Thursday morning and the lib G Malik man page so that's much performance tool but a great solution so again I've said we're putting a lot of effort into shark we're trying to add a number of these memory analysis features into shark as well shark can now do allocation sampling and show you the size of your allocations and call trees there there are still different strengths and weaknesses of the of our memory analysis tools again object alec is great for dynamic analysis and looking at specific object types malik debug is good for leaks and we want to add leaks detection in the shark but shark has new capabilities too so that's it for a broad brush overview of the performance tools now let's dive into a specific case study now I'm not actually going to do the requisite planetary motion simulator and that seems to be so popular I'm going to be looking at an application called disk inventory X this is an open source application it's kind of cool and actually kind of useful that uses a concept from ben shneiderman at the university of maryland for representing hierarchical information in a compact two-dimensional space so it's an open source application GPL i'll be sending changes back to the author he's pretty excited about that and as we go through we'll be looking at a number of areas of what might be slow here on time memory other resource use so in your application as you look at something like this what might you want to look at of course major operations how long does it take to open a large document if the application is idle again you should be taking zero percent of the CPU and again watch for you I spins and deal with those memory sides I've talked about the importance of looking at dynamic memory use will see that leaks one thing that may not be obvious is auto released objects with Coco applications if you create a separate thread or if you have a foundation based tool it's really easy with a lot of the cocoa API to end up creating an auto released object but maybe not getting back and freeing the auto release pool very frequently maybe it's a long running thread and those objects just build up and get paged out and that can take I've actually seen applications crash due to this problem system gets low you crash so also look for at disk and network activity and will be specifically looking at some of this with our sample application here so what I want to do is switch to Hook's will switch to demo one here so this is the disk inventory application what we've done here is actually we can't see the menu bar up there if we can get the menu bar that'd be great but so what we've done is taking a look at the our applications directory that's got 1.9 gigabytes of space in it and I'm interested in where is that space going so what this application does is graphically show me the size of the files the larger the rectangle the bigger the file and the color represents what kind of file it is so we can see the blue is a disk image so wow I have at least one big file here okay that looks like the adobe photoshop seven disk image i probably don't need that down here a couple other disk images application packages so this is kind of cool I can click on a directory and see how much space that director is taking now I can move around with the mouse and see things there so it's actually kind of useful so let's go ahead and quit out of this and bring up the performance tools folder I'm going to launch the big top tool that I referred to and I'm also going to launch spin control I'll just put spin control down here in the background now with big top I can look at things like the cpu usage as i move a window around we can see that the cpu use goes up and down as expected let's go ahead and launch the disk inventory application again and i'm going to look at the specifically the disk inventory process and watch the memory size of that as i go through and so what I'm doing is going to open recent and actually reopening the applications window there and scrolling analyzing that so you can see the memory use is climbing here I've added a little instrumentation window here and it took a little bit of time to analyze that 1.9 gigabytes so that's like about nine seconds to scan that folder and a little less than a second here so about ten seconds to look at this and I haven't tried this operation on this machine we can also show package contents and note that we actually caught a little spin here as well with spin control at this point so it took about four seconds to show the package content and with the spin I can come down here and select that and show a text report and to see what was happening in there so we were making a bunch of recursive calls to determine the file kinds inside of the package that I'm looking at so that's that's interesting so we saw the memory use climb that's not totally surprising because we were building up data structures to represent this but we should look at that and see if we're as efficient as we could be let's look at one other things I resize this window here notice the slight pause before redrawing and that was interesting with the memory use a little spike there actually that looks like it was probably over about a megabyte of dynamic memory creation while I was resizing that window so maybe that dynamic memory creation and deletion has something to do with why it's not as fast as it could be so let's go back to slides but okay so when we tested this we don't have such beefy hardware in the labs you know we have mere mortal dual g5 so my resultant testing this the same directory in the lab was a little slower than that i was actually almost 20 seconds for scanning the folder and getting the file sizes and about 10 seconds actually to classify the file kind and showing the package contents was again pretty consistent there at about four seconds for total of almost 33 seconds to scan not quite two gigabytes of space there's a lot of 80 gigabyte discs out there on your personal computer systems so what 20 minutes to scan my disk what if i have a terabyte disk farm and i want to use this technique that could be nasty maybe we can speed this up now I've often heard the question of what are the best timing api's for instrumentation on the system so mach absolute time as a mock API that's the fundamental call this goes down in and read the time-based register out of the CPU there's a number of other different api's that you can use for different you know depending on what's convenient for you like get time of day is a nice portable API in the unix environment these all end up calling down into mock absolute time this is the way the actual code that I used in this application so i simply call Mauch absolute on say get time gives me a 64-bit value back I guess I could just recall it call it directly then I in subtract time once i have two of these I just subtract them and apply a conversion to get me a double value that seconds makes it easy to print so with that I've identified that we have some issues I'd like to bring on one of our experts in analyzing those two issues and also then creating tool to help do this process so it's my pleasure to introduce Christie warrant what makes software slow algorithm you know those are you have taken computer science courses have run into things like quicksort versus bubble sort now if use bubble sort on large data set it'll make your program run you know extremely slim other things are expensive operations you know file open your network called IPC even things like Malik and locking primitives even though the relatively fast can be expensive if you do them a million or a billion times a more subtle thing that I'm sure you sure you've encountered this is doing something more than one let's suppose that you know Dave and I are writing different functions and have a large program we both call quicksort on an array and even suppose it's the same array you know because we're not in intimate communication all the time we can do this sort in two different places in our program and this would be bad but it would be it would show up in a profiler as it's a call to quicksort so this is an example of doing some more than once but it's the real problem here is what i call complexity now this is a graphical depiction of a program running in this case its finder get info you know I just did a trace of the memory allocations and these are the development version of the code not the one that you're getting but there are over 100,000 events and each one of these vertical bars is a sample so the vertical axis is call stack dip so this is a picture of each of the call stacks as you go online and as you can see there's a lot of redundant visual information here and that's really interesting so that's a result of the complexity of your modern many-layered software so what is complexity well complexity has as they just said layers and many modules and good saw fringing technique says hide your implementation don't let your client know what your details what you're doing you give them a black box but there's a problem here you know that is you hide the performance cost of what you're doing I have a function foo that takes a boolean value you set food it true it could set a value in a register it takes a few milliseconds or microseconds you set food a true it could go up to eight a date days you know do some the sonication launched a rocket it could take minutes or even hours same call two totally different results and so what do we do here you know innocuous calls can lead to surprisingly complex complex excuse me I'm sure you must have run into things like this in your own development so we're back to this picture we're going to zoom in on this graph and it's not just complexity at the high level look at as you zoom into the finest detail you see repetition on many different levels it's like a fractal I going to Mandelbrot set you see course you know grand detail and then finer structure as you zoom in it's amazing what we run into with software and processors today it's just incredible so to analyze performance I've come up with simple formula the impact of any operation you do is equal to the cost of that operation times the number of places it's used in your code so like in the Dave and me quick dry exam or quicksort example you know there's two uses of it that are redundant so that makes it twice as expensive as it needs to be now traditional profilers make it really easy to understand cause you know you sample a program you see all the lead functions that you spend time in but it's hard to see use because these are complex patterns of usage that often go through not just my library but you know ten of your libraries you know scattered throughout the system so to help us analyze use you know we have two techniques available to us one is called call stack data mining and this is a new functionality that we're introducing we're not aware of this being available you're elsewhere and other programs and the idea here is you can strip away the stuff that you don't want to see and focus on on what you really care about i'll get i'll describe that more in a second the other approach is this graphical analysis the idea here is you visualize your execution trace as i've shown and in terms of there's a technique called software fingerprinting where when you see similar patterns on the picture they ramin that you're going through the same code path as his repeated patterns that are the same like heartbeats on an EKG it means you're going to the same code path and a hundred or a thousand or a million times and it's at least worth looking to make sure that you aren't just doing it on the same data over and over again or even if you are doing a different data can you hoist things out like in a quick draw I mean a quick sort sorry each time you do a compare the pair made go through a whole bunch of layers of objects you know a whole bunch of overhead there's nothing to do with you know just comparing two values so remove that stuff pull it outside of the iterative structure and your program will run a lot faster but with this tool you'll be able to see these things you know they'll shout out a few things that you'd either have to go digging through code and spend countless hours trying to find the problem if you didn't have these tools this is working so in data mining content so I talked about stripping away what you don't want now I have a question for you how many of you have profiled to program your program and seen not your code but like countless system file system frameworks that stuff annoying you know some of you a lot of yeah i mean when i first used the provel is the first problem i ran into when I was new at this and it was like well what good is this I don't I can't do anything about you know see you live I can't do anything about app kit I can only just means on my own print so wouldn't it be great to have a button that goes you push this button and all that stuff goes away and you see your functions as the leaves and the charge you know the cost of those of the system libraries all ascribe to various functions in your code wouldn't I be a lot better I mean I find that really useful and that's a very coarse operation if you want to do finer grained stuff you could strip away one library note say you have any working with the app kit team or the foundation team and say if you want to see details in those libraries but you want to get rid of the low level so I'm gonna get rid of course foundation I'm going to get rid of stand-alone exclude by library lets you strip out a particular library in the trace and it's all non-destructive it just gives you a different view on it and it revealed and you give her those libraries and a charge of the concept so by transform being the data you can focus in on the hop steps that you care about this idea of flattened library instead of just completely limiting a library you can flatten it to its entry points so you can see where your code is only calling into these libraries we don't feel the details of how and say CS dictionary is implemented you don't care about that you just care that I'm calling CF dictionary get valium so to help you see what you want use the thing focus symbol and focus symbol you choose a particular call tree that you want to look at and when you do that you strip away everything that's above it or to the sides of it so I just told you a lot and it's kind of sick so I'm going to give you some pictures to help illustrate this so here we have a main and it calls it an it function you know do example which does your work and some cleanup and then you have this bar function that's called four times by New York function the real frame is probably more like a hundred or a thousand times but I made it simple here and this in turn calls core foundation it's using a CF dictionary get valium now but if I just profile this I'll see functions mostly from these ones in yellow these are leaf functions it'll be far removed from what you care about so if we do the exclude library those go away and now bar becomes early function so by doing that operation now instead of seeing these things are removed from what we care about we're in what we care about so now flatten library is similar let's go through this quickly but it replaces the library with the entry point focus do example strip those away and boom so by doing these transformations you can manipulate your calls isn't this is also really cool because you can make really good performance arguments when you strip these things away you're no longer to try to point out something here point something there something there and maybe it makes sense you actually see the count from the places that matter and you can make really good arguments to other people that we need to work on this stuff we need to fix it so with that let's go do a little demo we're going to launch our application we're going to watch shark now how many of you abuse some version of shark before it's about most of you so this is shark for and you're going to see a lot more of it in the gut shark session later this week on friday at three thirty but you need a little preview today the UI is a little bit different you know the original shark with a time-based sampler that lets you sample the entire system that's really cool it's good to have it around the background and whatnot but in this case you want to focus on particular processes and there's a number of different things we can do we can trace memory allocations function traces we can trace various Java things view do Java and for this application we're going to use it what's called sampler time profile now this is a we choose this because the program uses file system operations and this involves a lot of waiting on the colonel and this trace does the best job of attributing those costs to the user called let's go over to our application and do open recent and the shark gives you this option escape hotkey which is really handy so I'm in the middle of you I manipulate and I can hit option escape and do start so we're gonna start our scan and this stops every thread and records a sample whether it's doing something or it's waiting on a colonel tom now that's done we stop and we have a heavy view this is a view of all the leaf functions as I've been timeout and the relative percentage of the counts that we're in so we click on this you know syscall thread switch and on the right here this is one of the nice new features as you can see a back trace of that particular similar so we see that we're in a heartbeat thread well that's not dragging its sleeping sleeping until date so in this case let's get rid of that thread so we're going to go down to the thread top up here and choose one of the threads and it's going to give us the one that wrench in so the topmost function here is get adder list and this gives you a very large call stack to look at now to help us navigate things a little bit there's a nice little thing over your cold color by library and when you click that you make some of you may remember this from sampler that feature was in there and now we see that we've colored things like you know by a different colors so you know libsystem is lavender calm pages read discs inventory is brown this has helped us make a little more sense of it visually without spending a lot of time let me make this a little bigger so we click on get out our list you we see that yes we're in user code you know the status item load child was our own thing but let's use the exclude library operation we're going to exclude libsystem be deal i'm and when that happens that goes away and we see that you know fair amount of time is spent in carbon core so we're going to do this again we're going to exclude library carbon core and we'll just a few more times so there one piece of user code comes up let's do core foundation and launch services and you see that you know it's FS you know item load properties and this load child are all floating up is pretty major players in this profound before we go in and look at those in a little more detail let's go over to the heavy and tree view this another new feature in shark it lets you see both the heavy and the tree view the top down view simultaneously so in the top down view we start at the start of our program kind of like that diagram I showed you and you wash down through your program until you get to our code but there's still a problem you know you probably have seen this before to that there are always app kit calls there's all these system calls it makes it kind of hard to look around i'll make you expand one of these trees the outline will be awfully big and hard to keep track of so we're in luck there's a function button over here called flatten system libraries which does that flattened operation and all the system libraries and when we do that now this guy simplifies and it's a lot more manageable is only a few layers of these calls so give this contact you could also exclude them if you want to do but in this case as i was useful to you help me keep track of where I was but then we go then we'll notice another problem which is we just expanded you know this recursive called a load check out this is a file system application so it's very natural to write it in a recursive style but if any of you have tried to analyze performance on a recursive function it's rather difficult because you know tease fillet of the recursion you may call out to a little branch function and each of them individually will show up as relatively small contributors but there's no way to kind of gather them together and focus them so for example you know you this fine FS item path shows that the point 1 percent here point well you know one percent here at different levels but kind of hard to make determine if that means anything or not but luckily there's another option here called flatten recursion we click on that and look what happens load child becomes a single thing and look at that and it was named parent suddenly pops up to forty three percent of the overall time so by using this data mining we're getting past the obstacles and getting to the parts that are interesting and by the way with the shark for download that Dave mentioned there's actually a nice tutorial that you can go through that it'll walk you through these things you don't have to remember everything I'm going through today so let's double click on an it name with parent and pray to the demo gods there we go so you know this shows you source now annotated with you know I'm percentage of the time those you just need a shark I've seen this before but there's a couple new things that are really cool you notice that various symbols are underlined so we may be dope that means you can follow that link just like in a web browser so we double click on self load properties which is our heaviest line we go to another source file and you can navigate forward and back and this way you can you know move around and explore your performance from in a way that's much more concrete least it is to me it wasn't better to deal with source and deal with these trees of symbols so I found this really a nice feature so now we can look at our problem this class is an SS item you know suggest even its name suggests that something you do on every file system item that you encounter iterate through these directories and if you look at the details here it does an FF path make rest it does final attributes it pass it does FS get catalog on so there's a bunch of these in it about five operations and if I look elsewhere there'd be a minute sixth operates that we're doing for every file in the directory now Dave there are bulk file system operations that we support they're really cool and you can reduce this from doing this for every file to just doing it for every directory and this should give you a really nice speed up so please consider using that you know in optimizing your program so while Dave's off working on that I'm going to show you some function tracing a date you know I just showed you data mining and how to analyze your program using data mining now i'm going to show you graphical analysis using this feature called function tracing you can specify a list of functions that will let you do an exact trace of the functions that are called so i can choose function trace and there's some presets down here you can also enter your own if you you have a set that you particularly like so i'm going to file i am and this gives you a list of finally I think there's a little bit hard to read but you just unix final calls and i already made a preset here called file i/o we're going to choose that and go back to our program open recent start recording and this time oh I get we'll just have to do this again that's nice thing about char cuz it's pretty forgiving so we've and when you're doing exact trace you want to do it for a relatively short time or you might wind up with you know hundreds of thousands or even millions of sampled even in that short time we got sixty thousand samples and this kind of a cool view you get a distribution of different file system calls in the percentage of time that you've used them so it gives you a hint of what your program is doing but there's an even better thing we can look at here let's go to the chart and in the chart here let me do one thing so you get selection is out of the way here's a chart here this was kind of wavy pattern and let's just zoom into it a little bit you like here and this is a new feature this is a really nice zoom control as you drag along you can zoom in and out just like we did in that movie that movie wasn't fake it was just film from you know the actual live program so we go in here and we see this thing that looks like we're iterating over files it's kind of different loves if you look if you're looking a finder outlined view you'll see it's a similar kind of pattern and you'll see that load child shows up in the stack here so let's just do flattened recursion and look what it does is it completely flattens out our trace and we come down here and we find a fingerprint this little shape here is very redundant it occurs over and over again even in this little thing and that happens to be in your load child and knit name with parent and then load properties so we found our culprit with graphical analysis very quickly so use both techniques you know if you have an idea of what functions are are expensive already you can do a function train you know if you need to figure out what areas are expensive due time tracing and use call stack data mining okay back you did okay praying for the audio God excellent excellent good job thing okay moving on so I did my homework while Christie was speaking and what I've learned here I studied the app in each directory we're making a directory content that path call to say enumerate all the file files and folders in this directory then for each one of those items let's go through and do a number of things to gather the information the program wants to display so again I'm getting an F s ref for each item so that i can make additional calls with that i want to know whether it's a folder or file or a symlink because i don't i don't want to navigate the symlink to duplicate the representation of the space taken by the file so i make an attribute set path call on that i want to get the file sizes so the data fork and the resource fork and also the parents ID to see if I'm on the same volume I don't want to walk off the multiple volumes here so I'm making a FS get catalog info call on that and finally when it's doing that classifying files it's saying I want to get the kind stirring as as represented in the finder so if its dot nib file we want to show that as interface builder document so for each FS ref we end up calling down to launch services saying get me the kind string for this FS ref this file so having done my homework I did learn about the fs get catalog info bulk call so what this does is it's optimized I can say for X number of files I can specify how money I want in a given directory i'm looking for a set of information here i want to get the bit that says is it a directory or is it a file I'm gonna get the parent directory ID I want to get the resource and data for exercises I'm going to get to type and creator information and we'll see what I'll do with that and the next slide and then I want to get just the full array of SS reps for all the individual items and the full array of entry names so I get a raise of all this from one-call that before I was making you know lots of file system calls so in the classifying file so again what we were doing was hitting the file system once for each file to say get any kind name for them and then the way the code was written it's actually storing that kind string for each individual different file before we step back and think about it I just don't have that many different kinds of files and I really don't need to query the file you know about the specific file what I care about is the kind and the information that specifies the kind is the type the creator and the extension so I can build a dictionary to map now this triplet of type creator extension to the file name kinda strange so I actually put all of those into a string and just make it unique use that as a key into an nsdictionary to do a lookup now if I don't find it in the cache there I can make a different launch services call to say given this triplet of information look up the Chinese strings for that now that's not even hitting the file system right so I'm going down from an order and operation here for once for the each file down to zero file system accesses and I'm also only storing the kind string for each different kind not once per file so I'm also significantly reducing my memory use so before I show the results let's see if there's anything else that we can determine from the application here so do a memory analysis demo so let's go ahead and quit out of the app and bring up our performance tools again so let me also point out shark is now up here in the performance tools it used to be down in the tread folder now it's really going mainstream here so let's look at object Alex though so we double click on that what we do here is we launch our target application from what they an object Alex because it needs to set up from the environment for it so with this I simply I get go and what I want is to keep the back traces I could keep reference counts on objects that I don't need that in this situation so it goes head and launches the app hasn't done too much yet I'm going to change the scale here because I I might have a lot of objects and we'll see that this application is doing live updates let's go ahead and and walk that folder hierarchy again as we go through we see let's do it auto sort we see that we're building up a lot of CF strings these are there's a currently allocated items we're building up a lot of FS items in all those makes sense kind of FS item I'm getting one for each file system item the CF string is actually the name for that particular file system item so that's being stored so that's kind of useful I can see the peak amount how many has been the peak of any particular type that I had and I can see the total amount and you saw again live update and auto sorting so if i go to total this is really interesting we see the different colors as the bars here what the red bars indicate as opposed to blue is the percentage of objects that you have left remaining what's the current number of objects the total eve allocated red means that you have less than ten percent of them remaining so maybe you've got a dynamic memory issue they're creating more of them than you actually need yellow means I believe it's a 25-percent or a third so the bright color indicates the number there's currently allocated so we can see that we have a lot of CF stirring still we had a little bit more peak and we had more total that we got rid of but what's this NS half store we can see that we've got 24 of them left but we allocated 180,000 of them in going through this what's up with that I can actually double click on this and get an allocation shark and see what the dynamic allocation pattern looked like here so it kind of looks almost like we just might have been walking the filesystem hierarchy and we're doing something here it looks similar to some of the patterns that Christy showed in sampler I can go in and look at specific instances of these objects and where they're allocated and what the contents are we can see this is to the library spots another path there I can look at call stacks go down through the send the maximum path another thing i can do is i can set a mark and say i'm only interested in seeing the number of objects since the mark if I do the show package contents we see that I can just watch how many objects are created during that operation there so you can look at get a lot of information about your application here through this what I want to do here is go back in and look at now you'll notice that it actually took a lot longer to run under object Alex because the amount of time that was taking so don't do time analysis while you're doing this but let's go back in and do some a memory analysis with shark so I'm going to switch to the Mallik trace operation and startup disk inventory again and select the disk inventory process so going back to disk inventory let's now look at apps once again and like Christie did i'll start the sampling and stop it and I'm actually going to jump directly health that's right first off I want to show that with the value here if I switch the value you can actually see in the call tree the amount of memory that was allocated by the various calls here and I could exclude everything that I don't have source code for and get down to just seeing the stuff that I do and where that allocation is going so that's that's fairly interesting let's go directly over the chart view remove the exclude nose source and we can see from the chart here that again this looks like we have some interesting pattern so let's just click on one of these and might be potentially interesting there let's zoom in a little bit see what we might see zoom zoom zoom interesting little sawtooth patterns here so if I click on this we see a number of different allocations of pads and I can just use the cursor keys to move through so it looks like in fact it's from the code as I'm walking down through the file system hierarchy the way the code was written was that when I got to each FS item it was making several calls to say I need to know the past right here I'm not I'm being a good citizen for memory and I'm not storing the full path we teach object that would be overkill that's too much memory use so I'll dynamically ask for it so I'll get my path by asking what my parent folder is and then appending my name to it but my parrot says well what's my name let me ask my parents and then append my name to that on up we go so that's dynamically creating lots of auto release and its appstore to string objects with cocoa and then the next thing that we see happening is then we spend a bunch of time actually auto releasing that so we can see the auto release time there so you can see the impact of too much memory use so it since recursively descending down through the file system I should be able to at each level say well this is the path that i'm currently at and when i go down into the next directory level deeper just append the past part to that and pass that down through i don't have to recursively go back up every file system item so that significantly reduced the amount of memory we're using so now let's go back through and say okay that's all good did we have any results well I was busily coding away and slaps a new binary up there so let me dynamically enable some optimizations and let's try it again so off we go boom okay so one of my test results here the folder scanning of the floor took over 10 seconds in this case about nine seconds before is now a little less than two seconds remember this was from vastly reducing the number of file system operations the classifying files that again was asking for the file kind string for each file dest out of virtually instantaneous because you'll remember I'm doing no file system calls there now and if I do this show package contents operation boom again 0.16 whereas before is about four seconds so we can see that we've significantly reduced the amount of time the program is taking let's switch back to slides please so to summarize what the tools helped me do is figure out that I should use both file system calls and there's documentation about this I actually copied much of the code from the performance documentation I used caching of the file kind strings so I can just do rapid lookups and not queer the file system that helps me reduce my storage for the file kind strings I talked about reducing the dynamic creation of the past strings as we go through and then as you go through you know optimization is an iterative process right you've got a hot spot so you go in you tune that you make that faster and now you've got a different hotspot so there's actually interesting to discover that once we made the file system access was a lot faster that the way it was updating the UI for feedback about what it was doing display this past displays attacks display this path was actually starting to take a fair amount of time and so I just display fewer passes because all you want to know is where you are and so that made things faster also because that's not an important part of my my process here you so we made significant improvement here this is the measurements I got in the lab again on somewhat slower hardware we ended up making that the file system to Russell seven times faster classifying file kind it depends on the the size of your file system but that's like infinitely faster much faster showing file content for a total of call it ten times faster so now this starts to get to do a more useful application for me so we've covered a lot of things here today a lot of tools a lot of techniques we have a lot of documentation about this on the system for both of these plus the tools have documentation in them and with them does man pages for the command line tools and so in conclusion you know we just seen we have some powerful tools that help you both monitor to see if you've got performance problems and then analyze what the problems are we put a lot of work into shark working with Nathan and Sanjay doesn't bring it doing a very collaborative effort here to try to improve both the power add more new features but make it easier to approach and understand at the same time so we need to know how we're doing what you know does this work for you if i remove sampler for the systems that is that going to cause you a problem so download the beta please send us your feedback so I'm going to bring da be able to grow on stage use our Mac os10 technology evangelist for this this is a feedback list that you can send information feedback about this too