---
title: WWDC2003 Session 305
framework: wwdc
role: article
path: wwdc/wwdc2003-305
---

# WWDC2003 Session 305

## Transcript

Kind: captions Language: en thank you so let's get the clicker working it's just cool just works when I was last up here okay let's talk about what we're going to cover in this session first I'm going to talk a bit about some performance analysis concepts so just you know as you're going through the process of thinking about making your applications fast what are some of the things you want to consider in the process of that then we'll take a look at some specific examples of uses of two different classes of apples performance applications on specific test cases so we've got some high-level tools sampler Malik debug things like that and then the chug tools that have been talked about some earlier in this conference will take a look at both of those in this session and where they're applicable and also a little bit about the integration with Xcode with these now one thing that this session is not really going to cover in detail is oh you should not use this API and you should use that API and you'll use this one here and there's a lot of other sessions like the carbon performance session which I think was Friday morning if I recall where you can get some details on that and there's a lot of great performance documentation that can cover a lot of those details as well now this is going to focus on the tools here but first let's talk about motivation why worry about performance you know it's a selling point we see it across the board it's a selling point for apple with the hardware it's a selling point with your application if you've got competitors that do similar things to your product and they're a lot faster than you you know that's that's a big competitive advantage for them and vice versa performance problems may go unnoticed I've seen a couple examples of this in recent days where you know you look at the system it's just sitting there idle with a window up and maybe an inspector up and then you look at the CPU usage and notice that ninety-five percent of CPU is being used but it's sitting there idle what is going on sometimes you actually have to look for these things and detect them or this issues of scalability that you don't see when you're working with it with his unit tests but when you sign into the field and people are really throwing lots of data at it then problems occur remember you know unlike with mac OS 9 where you know you had control of that at a certain point and you are not the only app on the system there's a potentially a lot of other things going on system demons in the background etc you know it's not nice to nice your process up to try to fool the UNIX scheduling and things like that yeah you really want to make sure that you play well with the other applications on the system and finally you know you want to start thinking about performance from the get go with your application it can be really hard to finish up your whole development cycle and then sit duck it's too slow and then try to come back in and graph performance on at the end you know you might want to wait and do tuning later on I don't make don't obfuscate your code from the beginning with no purpose but think about performance issues as you go along so let's talk about a process for how you might go about doing this one is within your application take a look at what are youse is going to be doing the majority of the time what are the common use cases that you really need to have really fast so the simple one for example is launched I'm have to take a long time to open your application the first time around the users might notice that and say hmm this looks like a slow application but there's you know a lot of other examples as well of the common cases and then what you want to do is really define some goals for what your performance goals for specific cases are let's say you're starting up some operation and it might take a bit of time maybe the responsiveness is a big issue that how fast can the user get control again may not be done with the operation the least it lets the user get back in and proceed with other things in the for specific jobs and processing what your throughput how fast can you finish certain things and then again scalability okay it works this well with the thousand objects what happens when we take 10,000 or hundred thousand objects or transactions through this system I don't have any N squared or worse algorithms in here do I now then you want to establish some specific benchmarks here and what I mean by specific is you know define precisely what hardware you're looking at you know what's the common hardware that your users are going to be using is it going to be an 800 megahertz power book or a dual 1.4 g 4 or maybe both or a 2 gigahertz g5 then define specifically what you're measuring give an example with we've been looking a lot at compile time lately with the compiler and we look at single file turnaround time well we have to kind of define pretty precisely what file are we looking at because different sized files compile in different periods of time and if we measure different one each week we'll get random results essentially now once you defined your benchmarks then you want to add some instrumentation in maybe specific API calls to start in the end so that every time around it's real easy to collect the statistics of how long did these operations take you want to be able to measure this on a precise basis time after time and that's the key to throughout your development process tracking results as you go along people ask you know how did Safari get so fast how did they do this they tracked performance throughout their development it was a key issue from the beginning and they never allowed performance regressions to get into their code I'm sorry that makes it slower I mean each engineer was required to run the performance analysis test suite in Safari before they could check their code in and if we've made it slower they weren't allowed to check it in it's a great feature but I'm sorry performance is the number one feature so it doesn't get in and lets you make it faster and finally once you've gone through all this effort then out popped the hot spots but then you can go in and start to tune these where it really makes a difference because you know he's probably all observed that we can tend to be notoriously bad at guessing in advance where our performance problems are going to be G was fun to spend a week optimizing that particular routine but it made no difference so when I say Safari it was really a part of the process they actually embedded into their application and into the development versions internally and instrumentation in the form of panel here that their engineers and their QA staff and managers could pop up and run tests at any point through the development process it really made it an integral part of what they were doing because it was so important to them they could do things like check for memory leaks and sample directly from here so you might want to consider adding that kind of thing to your application so when we talk about benchmarking what kinds of things might you want to look at there's a wide variety of factors that play a major role in performance on Mac OS 10 you've probably heard us talk time and time again about memory use so you have a limited amount of RAM space on the system and watch we eat through that then we're starting to page out to the disk and that's a lot slower so if you're using a lot of static memory or leaking memory that can be a big problem so measure that maybe you're not actually using that much more static memory over time but your dynamic memory how much you you used during this particular operation really spiked up and that can cause problems so that's an area from measurement CPU use now I mentioned the launch time there's other things that are you know fairly obvious gee this is one of the major operations of my system how long does that take if it's a fast operation maybe you want to scale it up and run it 10,000 times and measure how long does it take to do 10,000 runs again idle time you're not the only app on the system if you're not doing anything in your app it shouldn't be taking any time and then the spinning watch cursor is a spinning rainbow cursor that shouldn't ever come up on our system and well it does occasionally in my apps and might and yours but let's go fix that will show you some ways to tackle that drawing it might not be obvious but sometimes you're drawing too many times to the screen we've got some great tools to take a look at that now that we're doing live resizing of Windows or live resizing of split views are you getting smoothes resizing during that you know there's a variety of things for considering for benchmarking so so once you've identified your benchmarks then you need some tools to take a look at the issues so we've got a variety of tools on the system for both monitoring what's going on and then forgetting n saying okay I see I've got problems with cpu usage where is the time actually being spent and so we can look at memory use cpu behavior and resource usage like file systems and system calls and drawing so whatever cover a lot of these tools as we go through one thing to bear in mind is we think about performance is it there's actually a lot of different levels of performance in the system that can make a huge impact on your overall application so let's think about layers of design abstraction your application architecture if you're a multi-threaded applications you get deadlox between your threads we have tools like thread viewers to take a look at that maybe your multi process and you're getting Network hangs if you've got the complex object oriented architecture are you sending too many messages between the various objects or maybe one object is acting as the bottleneck for everything a god object that everything has to go through these are sort of architecture level issues that you might want to consider from the beginning then within a specific module this is a class you can think about things like your data structures or algorithms are you allocating too much memory here in this process or the the algorithm itself a poor algorithm for scaling up what's the interaction with the OS again the documentation covers a lot of things like this call and carbon to enumerate the directory structure is slow you might want to consider using this instead then bottleneck routines once you've isolated it down okay we seem to be spending a lot of time in the Shrew teen so on the right of this diagram here we show that we've got a number of high-level tools that you can look at some of the higher levels of the design abstraction once you get down to things like the interaction with the OS and bottleneck routines then the sampler profiling tool and the shark tool from the chug package that we'll talk about later start to kind of overlap in their capabilities they both let you do profiling and look at things in somewhat different ways so both can be helpful when you really get down to trying to optimize the use of your processor memory shark is a great tool for that plus other chud tools and then Activity Monitor lets you take a look through everything as well so we'll be taking a look at a number of new features on the system on the user CD there's a new activity monitor application that replaces the cpu monitor and process viewer and things like this really nice application that the core OS team did spin control a new application to see what's going on when the cursors watch cursors spinning i'll take a look at the integration of the tools with xcode there's a number of new features and samplers that will take a look at and then with shud where you can really get in and see what's going on now with the g5 in addition to the g4 things like that so with that let me go ahead and turn it over to Robert boat edge performance engineer for looking at some of the specific tools okay what's the first thing we need in order to actually demonstrate the performance tool cancer we need a victim and the victim we've chosen this year is the sketch application this is a small cocoa application that's available on the developer tool CD so for those of you who've seen us using carbon app through all the xcode demos today this allows you to realize that the tools actually do work on cocoa as well now if you actually go and look at sketch you won't see any performance problems this is a program that's intended to do simple line drawings you know draw a few rectangles put some text in maybe do an org chart but if you look at it you don't necessarily see any serious performance problems the guys who wrote it did a pretty good job of making it a typical cocoa app with no performance problems so we need to add some performance problems and and actually the way that we did at this time was rather than adding some some assaulting some bugs in there we decided to try to increase the scope so rather than trying to do small drawings we said well let's imagine our boss comes into our office and says hey you know that that Sketchup that's really good I think we could do architectural software with that and so suddenly instead of drawing tens of rectangles five rectangles we're drawing thousands of rectangles and the question is what's going to happen are we going to find any performance problem are we going to find that our memory uses a heck of a lot more than we ever expected are we going to find CPU problem where we're running too much code and hopefully this is a situation that many of you run into in your own code as you look at applications and find out that on certain data sets it doesn't quite behave as you expected so let's take a look at that so I'd like to run Christy one up who is the performance engineer for the text team to actually do a demonstration for us to actually start actually let's go on the slide for a sec thank you so one question is how you actually find the performance problem as Dave gave us an idea of some of the processes that you might go through whether that's looking for regressions or following a certain pattern of measuring certain things every time but sometimes you don't have that sometimes you start with a new application and you're not quite sure where to start looking so the way I like to start and the way our vice president likes to start is to use either the command line tool top which hopefully you've seen in previous years or thanks to Erik Peyton and some of the folks on the core OS team we now have a new tool called activity monitor which gives us a way to look at this if we could switch to the demo machine now thank you okay so we have activity monitor over here on the side and the way activity monitors divided up is the information at the bottom of the screen represents the system-wide information about your computer so in this window we're looking at system memory and one interesting piece of number here is the cajuns page outs down at the bottom which represents the amount of swapping your virtual memory system is doing how many pages are being written off to disk the other things the wedge the numbers here represent how how physical memory is divided up on your system how much of us used for user stuff how much of its used for the colonel how much of the memory is wired down because their structures the colonel doesn't their page out like the virtual memory system the other tabs for example CPU gives you an idea about how how much work the CPU is doing in general kind of like the cpu monitor application does and the other tabs for disk activity just usage and so on also give you summary data the information at the top gives you details about specific processes and so we can see activity monitors sketch and so on and we get information not only about what's running but how much CPU usage they're doing and we can sort this list according to what's the most cpu intensive or we can look in terms of process name or hierarchy in the process groups so Christie already has sketch running and we can double click on that entry to get a little more detail on sketch and the important numbers here is the pretend CPU as usual and the private memory size down on the bottom now private memory size is kind of is an interesting number it represents the amount of resident private memories that's being used by this application that is the memory that's resident in physical memory and the memory that's only needed by your app and so this ends up being a nice number because it represents sort of the footprint of your application because that memory is first of all only based on what your applications doing and secondly its memory you can control its the memory being used for the heap or it's the memory that you're allocating via vm allocate and so it gives you a good idea of what your fault is and how much you can reduce as opposed to the others which tend to have a lot of details of amount that you can't actually reduce so we can see here that just having Sketch up took up about 1.6 one megabyte it's not great not bad that'll do so if Christy can now load one of our architectural drawings we have a factory here okay we're going to build factories and when we went to art lens the architecture or the architect goes to the customer and says here's your factory the customer says oh I want six floors not three okay we can do that we can select the entire building we can copy it and we can taste it now we have six floors I don't know that's not enough let's make it twice as wide so let's do it again we'll select all will copy and copies taking a little while that's that's not good and we can taste and so we're drawing a couple thousand rectangles here to draw that building but we're already noticing a couple issues one was that coffee with getting a little slow and we're going to find out it actually gets a lot slower to go along but the other thing is if we go over and look at Activity Monitor we find out that we're actually using 7.6 megabytes of memory okay so 7.6 megabytes minus 1 megabyte or 1.5 megabytes we used about 6 megabytes of memory to do those two copies and pastes okay so we've got a performance problem here we have a problem in in what we're doing in terms of the copy so in terms of tea you and we have a memory problem because we switch back to the slides please Oh another interesting thing about activity monitor is because it's looking at the entire system that means that you can see what's going on in other processes and one of the things to remember is on Mac OS 10 your application your applications of work on the system is not just a matter of what your application is responsible for there are other processes whether they're little demons that are on the side or more importantly things like the windows server where if you're doing a lot of drawing now your application may only be taking up sixty percent of the CPU but the windows server could be taking up the other the other forty percent so when you're looking at activity viewer you also need to look at the whole system to understand what else your your application may be doing so that you can either find other ways you might be able to optimize okay so let's attack the first problem what do we do if memory you seems a little high well why do we care why don't we just like use as much memory as we can this will at least make the people who sell tims happy well there's a couple reasons for that one of the ends and generally using too much memory is not a good thing one of the reasons is your applications flow because suddenly all the data that you want the CPU to be processing as fast as possible especially on one of these g5s it can really race is can't fit in cache or you'll chase it out and so suddenly you're having to rely on the speed of the main memory instead of the cash and so you want to keep your your application as memory lean as possible so that you can have as much as possible in the cache if you're not using the memory well then it's a sort of wasting space because the it's sitting in in physical memory and maybe you're not touching it and if I come along and I start playing itunes or I start running I photo or I start using nail or I start doing Safari which every one of your customers is also doing when they're running their app that means that when Safari needs more memory to put in some big page some of your pages may have to get forced out of physical memory and written off to disk and so the computer is going to have to do a lot more work just because you want to keep that memory around so you want to keep your memory footprint will slow for that reason and if you've forgotten about the memories you can allocate it and you've forgotten to get rid of it it's even worse because you know you can't free it at that point and this was going to get copied around on the disk and because of the virtual memory system you can actually run in some rather interesting problems where you might not have expected things to go as badly as they did so here's an example let's imagine we've got some really large filing on 10 megabytes or 100 megabytes and reading it in when we need it seems a little slow okay well I know what i'll do i'll just read it in before i need it so that it's available i'll read it into memory in that way when i need that file is right there and the problem with that is that what happens if i go off and i run i photo and i run itunes when i run mail and everything else okay those start to need memory and so some of your pages that you've brought in get chased out to disk and then when you actually need that file or those that part's representation of the file say suddenly it has to be brought in off a disk again and so in order to save that disk read that you did you've now read it in memory written it out to memory and read it back in which is that and really inefficient so you don't want to do that you want to try to keep your memory footprint as low as possible and you want to do that in terms of both the memory you use and the memory that you've forgotten about them that you're leaking now there's two tools you can use to do this one is called object alec and it looks at your memory youth in terms of how many objects you have and the second one is called malice debug and it refers to things it refers to allocations in terms of where they are so you can see particular places in your code that tend to allocate a lot of memory and let's take a look at the first of those object alec naturally let's switch to the demo screen okay so here I am I'm running I'm running to get your next code because Xcode is really cool and I want to go and do some performance analysis okay how do you do that well the first step i usually do or at least the first step i always hear from everybody is go hunting around on just trying to figure out where the performance tools are actually who knows where the performance tools are okay the developer tools are in developer applications that's nice the performance schools are there too the problem is you've got to go hunting around form you've got to use the finder which was computer centric and not human Cedric and that kind of thing and that wasn't very good I was gonna use another word but I won't say that and so what and we've improved that so now what you can do is you're going along and you say I want to look at performance and you can go up to the debug menu now there's now an entry called launch using performance tool and it will list the three per point it'll list the performance sort of skews have I known this would we known this would make people happy and in fact if you actually did install the chudd tools which sadly I did not because I wasn't a good person you'd actually have shark there too and I suggest you install sharply you can actually see it on that list and so we can launch object Alec here and here's the object Alec window and let us launch sketch in it and what object alec does is it instrumental code it runs it you answer a few questions but it keeps track of how many objects have been created and it updates that constantly and it shows not only the current number of objects that type that exist but the peak number that have ever existed during the lifetime of your program the total number you allocated in any during the entire program and so we can go to our little example we can open our factory and we can see that we're creating huge numbers of CF strings and all sorts of other things as part of doing this work and here's our factory so let's again due to our select all and copy and our pace yeah an object alec is doing it good work and you can see that things are updating and let's do that again if we could now you can notice that to the far side of the numbers there's some histogram there's some bars there indicating how many objects you have graphically and that's very nice because that gives you a way to directly perceive how fast things are changing so you can see oh my god I'm creating a lot of these objects really quickly and the colors actually have meaning because if it's colored yellow then that implies that the current number of objects of that type is only about twenty percent of what the peak is or less which implies that you created a whole bunch of money back dog which may imply that maybe you're not auto releasing things quickly enough or maybe you're just creating a huge number but it's hopefully going to make you look at that to try to figure out why you had so many and the red indicates that you have have only ten percent of peak value okay so we've now done our coffee and we can go over and now what we do is what we do and all what we should do in all the performance tools what we're doing is we're looking at these basically scrubbing our nose against the data looking for something that looks suspicious because you know the performance will can't really say oh you know here this is the problem you know if you fix this piece of code you'll be happy you know in general it tends to be much more of a you look at it and you say oh gee I didn't expect that why is that happening and then you go and track down the bug and what we can see on this immediately is that the second most common object after general block 14 that is mallets of size 14 because object Alec will look at both Malik and CF object and objective-c object is we can see that we have 4000 NS invocation objects and it's indication that's not in my code and in fact you know not only do we have 4000 of those things but if we check countess bites we find out that out of 2.9 megabytes of memory that are used for all the objects 800k of it is used for NSN vacations so about twenty-five percent of my memory is because of these that's odd well object Alex gives us a way to track that down so what we can do is go over to the instance browser and we can select NS invocation we get a list of all the objects of that type and if when we launch the application we happen to see check the little box that said keep track of retains and releases for that object we would see all of the times that we did a retain in objective-c and did a release on that object so that we could find over retained or we could click on allocation event as Christie's done here and we can take a look at the back-trace indicating exactly where that object was allocated and we see here that it's allocated in our select graphic object now this also shows another feature that's new in the performance tools that in the past you'd find some simple and you wouldn't be able to track down where it came from now we take a look at the stabs information the debugging information in your binary and if we can find the location of that function we actually will highlight in the performance tool either with a little file icon or by underlining it and so you can double click on that and project built and Xcode will actually show the code for you and what and what we find out is that the NS invocation objects are being used for Vienna undo so every time that we select a rectangle that we copy it that we paste it for each of those thousands we end up creating and creating an NS invocation object to handle undoing that at the end ok so we're creating thousands of these things ok so you know this is an interesting problem we've got some solutions here you know we could just decide that the the undo support in cocoa is just such a big win productivity wise we just don't care that's a fine answer if we really cared because this is we're going to be doing lots of architecture stuff then maybe we actually want to change this and we want to create our own undo mechanism or a third option is we could say why are we allowing undo to select because the H I guidelines don't require us to and so you could actually get rid of that so this is one of the ways that you can step through finding something suspicious and tracking down where why that's happening and tracking it down your code to understand what the problem is and that's how you can use performance tools so can we go back to the slide please there we go okay second question what we do on the cpu seems too high first of all why do we care again well answer if you're doing something that's taking too long it's not only making your application look bad but you're going to make my itune skip and I don't like that so so you need to worry both about your own application and how it performs and how you're affecting the rest of the system because you're not alone there's lots of other things running on all our computers so there's a few tools you can do to track down member CPU use one of those is a sampler our profiler sampler can also be used to look at what's called dynamic memory footprint is krissy puts it which is a way of understanding where your call to malloc are and using those as suggestions of where you might be doing too much work and there's also a full called spin control that's new on the release that gives you a way to automatically sample when you look at when spinning cursor comes up and I'm going to show all three of these and force the bug look at it on your own so with Sandler a sampler is a statistical profiler technically and what that means is that every 10 milliseconds 5 millisecond 20 millisecond sampler stops your program and says hey what's going on and it goes and it gets a back trace from every thread that's running and looks to see what work is going on so it gathers a back trace and then it lets the application run for a little while longer and keeps doing that and again at the end of the sampling if yeah there's all those back traces together and smashes them together into a tree so that you understand the range of ways of your applications behaving and then it presents in the graphical way now there's a couple things you need to remember about sampler because its statistical because it's only stopping the program at times it doesn't know what happened in between and so that means that it may not catch all the functions though any function should appear in in the samples in proportion to the amount of time that it's actually spending running now if sampling every 10 milliseconds isn't good enough for you that you need find a resolution then you should try using the performance pool shark which you'll see after this and if you need to know about every single call then you might want to consider actually using G prof which is a standard UNIX profiler which requires you to actually recompile your code so let's take a look at sampler could we put for the demo machine again okay so we have sampler up up in the upper left hand corner now on the new version this is a new UI for this release and up in the upper left hand corner you get the type of sampling you're doing you can sample either based on time or you can get a back trace every time now it's called or you can look for specific function calls we're just going to do time samples here and what we'll do is we'll launch sketch in this and we're going to look at that coffee because coffee seems like it was going a little slow and I don't like that so we open up the factory again and let's do what we did before and so we'll do a cup so first of all we need to start sampling and if you remember how sampler used to be you actually had to switch over to the sampler window and press the button and go back to your application and that was annoying because you often would get lots of garbage because of having to raise the window data that you didn't care about sampler now has a hotkey so Christi can actually hit command option control our to start and stop sampling thank you thank you we appreciate it and she can do the copy and paste and again and crispy can stop now we can go over to sampler and we can try to take a look at what's going on now this is the way that you used to look at samplers well with a browser and the browser has some good points and bad points however there were a lot of people at Apple who actually would write their own tools to sort of parse this data because they like to display it in an outline view and so crispy actually was very nice enough to actually put in an outline view so make sure to thank her and so you can actually look at the outline outline do and actually turn down triangle to see your call tree so for example we can see here that in every one of the samples we were in Maine Maine always called NFC application Maine and so on and we can sort of step down in there snooping around and we can find where we're calling into the menu code which is right about here now actually and one of the things you saw was that accounts were originally in terms of samples how many times the program had been stopped and Chrissy actually just switched this so it was actually showing it in terms of time which is tend to be a better way to actually understand it even though remember that statistical and so you can't say that took you know that took 0.01 seconds and what we find is in that time that we were doing the sampling we spent about two point four four seconds and copy okay that seems a little odd and we can actually turn down the triangle and see where the time was spent in coffee and it turns out that we called for routines there that took all the time gee that's weird well luckily we still have that way of linking to the source code because I can't understand it from this point of view and so Christy can double click on copy there and we get our source code and we find out that what's happening is that when we do that copy we create a PDF file well PDF clipping we created tips clipping and we create the sketch internal version of clipping okay this is because Coco has this nice feature where you can say hey I can give you a clipping in any of these formats okay that's very good because then the application that you're pasting it into can say I only work in PDF I needed to if I need or whatever and it works but for when we're doing thousands of objects doing all three tend to be a little wasteful and so a better way of doing this would be to use another feature of app kid which is to basically say here the things I'm going I can produce but i'm not going to produce until you ask for him and so we could change this code so that we only said these are the types of clippings we create and then only when somebody did the pace would we actually create the clipping for that and so that would get rid of this performance problem we'd make coffee a lot faster at the expense of making paste a little slower okay could we go back to the slides please now the performance tools as I said are very good for exploring your data they're good for looking around and trying to figure out what's going on but you know that's not always the best way to work because when you found a particular performance problem such as a safari people found that they really cared about page time about the page load times nothing else having to run sampler every time to gather the amount of time would be wasteful and so if you know what you're going to be measuring try instrumenting your code putting in print statements will learn how much time was spent or automatically logging that time and this is really good because it means that you can automatically gather statistics so that you can check for regression and it means you're always watching exactly what you want to be watching and there's many ways you can do this there's a number of api's in Mac OS 10 for looking at time some of the ones that are interesting are up time which tend to have a nanosecond resolution or you can use get time of day if you prefer bsd or the nsdate class if you're in objective-c and if i actually when i actually did this on copy actually strange because i didn't expect to find anything like that well let's do this I actually found that I actually tarde graphing out the amount of time spent for each of the each of the clippings and for this task at the beginning which was called ordering the list which was sorting the things that you clipped in from back to front okay so first one you know the PDF was long a second one sketch was the longest and then suddenly when I got to about 4,000 object suddenly the sort would take forever and unless I actually measured this and unless I tried it on bunch of different sizes I never would have seen them and so this is one of the advantages that advantage 'as of instrumenting is it makes it very easy for you to check see when things go wrong and why they are and if you actually look at the code what you find is that sketch had a sketch was basically made for dealing with sense of objects and the way that it would do the ordered list is it would use the NS array sort method for those of you who are objective c fans and so it we basically said hey go sort this and there and you had to provide a comparison routine to be able to say this is how you compare two of these rectangles and the way the rectangles would be compared is it would say hey what's the index of each of these in the big array that list everything that's being drawn okay is this the first one that fifth one the tenth one that's pretty efficient accept that that code would take your NS array and it would make a copy over here in the nice static array so that it can do the search real easily so we'd have to now huge amounts of memory and then we'd have to do a linear search so that meant that the comparison was an order and operation which meant the sort ended up being like an order N squared log in or something about like that and so you end up with this funky thing we're sort looks really fun until you got about four thousand element and then suddenly it was huge so this is why you want to instrument now another thing that we've now looked at a couple ways that you can go looking for things that are suspicious now one of the interesting one of the interesting things about big objectives object oriented systems is that you tend to have a lot of layers because you've got this thing called instrument or information hiding which is great and so you don't really know how people in the minded thinks below but you sort of hope they're doing the right thing but the problem is that often they make assumptions you don't you're using the API in ways that they don't expect and so some calls you might be making into some layer might go yell all the way down to the bottom of the system and back up and take huge amounts of effort or you may have something that you think is on inexpensive like that that sort that ends up being a very expensive operation so Chrissy actually came to apple and suggested that one of the ways that we should be looking at systems is to be looking for this kind of repetition because object oriented systems tend to suffer from this and one of the ways that you can do this is you can look for Malik's because Malik's tend to be time consuming and memory intensive operations and everybody uses them all over the place and so if we can poke around and see where Malik's are being called we might be able to see where we're doing repetitive work we really don't intend to be doing so let's switch over to the demo machine okay so Christie will now switch to watching memory allocation and to using what's called the trace view which is a way to actually look at these Malik's in a very interesting way and we can go back over a sketch and we're going to do a very small example because when you're doing sampling by time you want to have lots of stuff that you hopefully find all the functions you're looking at here we're looking at every single call so you want to make your example relatively small and we're going to look at what the Malik's are being what now it's are being done when we do our copy and so Christy has the two rectangles there she'll hit the hot key to start recording do the copy stop recording and we find out to do that we required 6,000 Malik's you know and this probably isn't that unusual but you know it's big system and there's a lot of things going on and in fact if we poke around the idea is that this graph actually shows you the height of the call stack going to each Malik so how many functions you had to get to before you got to Malik from Maine and if we zoom in on one of those will actually find that you start seeing these repetitive patterns see how it's kind of like a EKG and so it's a good gift a bit at very regular patterns which implies that there's actually some very regular operation going on there if we're seeing that signature over and over again and in fact we go look we find out that worked down in some code that's parsing an xml file and it turns out that when we do a clipping and it's a PDF file the PDF file has to get information on the printer because the printer is used for the size of the page and the printer ends up going through the cups daemon and the cups daemon ends up giving us back xml we have to parse the xml and so we do lots of Malik's and we never would have known this and it might not show up in sampler but this is a way to understand for what costs are and some of these are cases where you might be able to say oh gee I shouldn't do that and a lot of those are cases where Apple needs to say oh gee we ought to fix that and we can actually fix it for you so you know never run into it okay can we switch back to the demo flight or to the sludge please okay the final demo I'll do today is a spin control which is a new application so the problem here is that in general when you have when your application takes too long to do something when it keeps them when messages coming from the windows server don't get responded to within about five seconds usually the windows server puts up the spinning cursor so usually this implies that your applications behaving badly it's not responding quickly enough for the windows server and so these tend to be bugs you know you're doing too much work the problem is you can't sample them because first of all they're they're sort of difficult to catch because they tend to sort of appear and disappear and even if you could get to sampler usually your machines doing other things because they're just any cursor up and so there's not really a chance to actually go and attached to it and so the idea is that spin control automatically samples your application for you so let's switch back to the demo machine so christie's launched spin control which is in developer applications and you have to go find that yourself sadly and it has it basically keeps a list of every time that it detects a spin and you can set it for only one application or all application and we can do that copy that we were doing that was causing us all that grief so we can select all again we can copy and we can paste we can do that again and sometimes you actually need to click on the window so that there's a window events that you might need to notice that's usually when the spinning cursor comes up and we can see here the spinning cursor just came up because we copied one of those things that takes 800 seconds hopefully not I think I need to get off the stage soon and it automatically sampled it and now we could copy that and paste it into email to send to a developer to say something's wrong or we can double click on it and we get a sampler like view where you can actually look at the code and in fact we can go and see that we're calling oh boy that's nice we're in copy which turns out to be in NS array which ends up calling CF array get valuate index just like I was explaining so I wasn't lying so spin control gives you a way to see the invisible let's actually see the kinds of things that you otherwise can't sample so this is a cool tool try running it on your system leaving it up and seeing what you catch can we go back to the slides please thank you very much there's a number of other tools that that you need to check out yourself we don't have time for everything sadly hopefully you've seen these in previous years if you've been here if you haven't you know take a look at some of these tools take a look at the performance book to find out how to use them but they all will have they're all built em all valuable in interesting ways they might be able to help you on certain types of problems and you need to explore how to use them and which sorts of problems are best found using any of these and make sure to watch your application and with that I'd like to bring up Nathan slingerland to talk about the cut tool which is allows you to look at code one level deeper than what we've been looking at now good luck yep relax there we go ok but as Robert said Nathan slingerland and I'm going to talk to you today about the shed tools or computer hardware understanding developer tools and these are tools written by the Apple architecture and performance group their performance suite of tools that give you low level access the performance monitors so these are counters that are built in to our hardware and the processors memory controller operating system like that and using these counters you can find problems in your code and improve your code and of course the chudd tools are freely available with developer tool CD there's you can bring up shark and Xcode as as you saw and there's really available on the web tube so you can check their for update if you were here last year we introduced shed tool 20 we're happy to have 30 this year with a lot of great improvements shark if you is an instruction level profiler so if you've ever used shikari from the older chat tools shark is the successor to shikari monster is spreadsheet for performance event so you can look at these counter results in either spreadsheet or chart form and Saturn is a new tool for visualizing function call behavior and of course we have a set about other lower level tools that you can use for tuning things like alphabet code a very CPU intensive code that you want to simulate using sim g4 or soon mg5 that will let you see exactly what's happening at the lowest levels on the processor and of course we provide the chat framework API so you can write your own tools or control the judge tools so the performance counters as I feather in our processor and memory controller and operating system and what they do is they count interesting low level performance events though things like cache misses on the processor execution stall cycles page faults in the operating system and Chad let you let you control these and view the result so the first tool that we're going to talk about that uses these counters the shark the shark is a system-wide profiling tool and using shark you can profile a process a particular thread or the entire system and in the most general usage of shark you can create a time profile so this lets you visualize performance hotspots either you know in your code or not you can see if if your hotspot your bottleneck is actually in your code using this you can also use unit to find event profile so you can relate performance events things like cache misses to your codes find out where cache misses are coming from cap you captures everything drivers Colonel applications what this means is if your driver writer or kernel extension writer you can use shark to see the call stacks and find out where the time is being spent in your driver and we're very low overhead because we are handling everything in the colonel in addition once you have your sampling session taken we provide automated analysis we attempt to annotate your source code and just the disassembly of that source code to point out common problems and other things that you can do to optimize your code there's a static analysis feature to find suboptimal code so if you were in the earlier chud session you know that there are some instructions that are on the g5 that are we need to look for and watch out for and this will help you find them and we also provide optimization tips so it says the scriptable command line version you can telnet in and sample things and of course you can save and review sessions and pass those around so without further ado that's the best way to see this how to use the chat tools and shark is to have a demo so for that we're going to use the noble ape stimulation this is an open source program written by Tom barber leg and to help me demo I'm going to bring up Sanjay Patel also the architecture and performance group okay so the first thing we'll do is we'll bring up no Blake okay so here we are we're stimulating thinking Apes on an island a tropical island and this map window is showing us an overview of the island and the rim little red dots each red dot is Nate and we can look at it can focus in on 18 a tape at a time that's the ape with the red square around them there and the brain window to the right here shows what his brain is how the how the changes are occurring in his brain when he's walking around the island and thinking about things so you know our every good performance study of course requires a metric in our case that's eight thoughts per second this turns out to be the the brain functionality is a performance critical in this application so this is our metric it's about this is running on a powermac g5 two gigahertz dual processor machine and we're seeing about 1208 dots per second great shape so the first thing we'll do we'll use shark to see what's happening in the system when we run while we run 90 bleep so this is the main a shark window by default we go to the time profile their other built-in profiles of course to take advantage of performance counters but for now just use the time profile and we also have a global hotkeys put shark doesn't have to be in the foreground to use it and either so let's let's sample 55 or so seconds and see what's happening okay so here's the the profile lifting the important functions from most sample to leaf sample and the little right lower left here we have the process pop up and this lists all the things that we sampled during this time period right so at the top is no belief and we kind of expect that we know that our simulation is CPU bound but it's only fifty percent of the time and it kind of wonder well okay why is that well if we go to the thread pop up we can see that in fact this application is single threaded and because because it's single threaded we're not using half of our dual processor machine so our first step and optimization with hey let's thread this thing with a we used the carbon NPAPI and threaded threaded novalee let's see what the performance improvement was like do you remember we had 1200 dots per second before and we getting almost double that so that's pretty good but let's profile again and see what we can do with this code great so now we can see that we're taking up a much greater portion of the time on the machine and that that's reassuring we want to do that for our simulation and and we can see that we've spawned these threads now we've got two we've got the main thread at eight percent and then two other the threads that are processing the Apes in parallel forty percent of peace so the next step we can do is we can double click on any entry in this profile view and it'll show us our source code colored with where the samples were taken so what this tells you is what lines of source code the most time was spent on right so if we look here this the scrollbar also gives us a way to jump quickly to the hot spots for this the hot spot is literally just this function just this piece of the function is for loop inside of the cycle troop brain scalar function so it turns out that this is about ninety-four percent of the times we highlight this right so the if we look shark gives us a hint on how to how to fix our code or how to make it better we click on this little ! it says okay this this this loop contains 8-bit integer it's taking a lot of time you're spending a lot of time in this loop maybe it would be worth the effort to vectorize miss loop so that was our next step we went and we vectorize so let's go back okay learn that so remember 2400 turn on vector alright so 10,000 that's nice but we're still not done yet let's let's look again with shark and see what else we could do alright so we see the vector function showing up there we'll double-click and we're in the vector code that's good if you're a shikari user you probably know that if you you had this disassembly view that was similar to this and you can still get this back this disassembly view is actually set right now to showed g5 dispatch groups and there's more detail on that in the full chud chud session we'll go back to the source code for now and if we look closely at the hot the scroll bar we can see that actually even though we're spending a lot of time in the vector code that we optimized now we're relative sending a relatively bigger portion of the time inside of the scalar code that we didn't optimize right before in the first step we didn't vectorize before so our next step is hey maybe we should back to ride the rest of this and you know all these loops are fairly similar and that's what sharks is to do so let's go back to the two nobly so about ten thousand nine and a half thousand turn on vector optimize and we're almost 15,000 so this is around 14 or so times the original performance and what we're able to do is take advantage of this massive bandwidth we have available on the power mac g5 by using alta back okay so could we have the slides again please thank you okay we did that oh wait yeah so just to summarize we compared this against the power mac g4 so this is the scalar code running on the current or the current power mac g4 top of the line against the power mac g5 and you can see that actually they're not that all that far apart in the scalar code we actually we have a longer pipe on the g5 a longer pipeline and so we're not entirely scaling with this higher frequency or weren't entirely bound in this in this cpu for this so when we added the threading we can see that we get a bigger jump than what the g4 got right going from scalar to scalar threaded then vector even bigger jump and vector optimized an even bigger difference right and the reason is that as we improve this code we're more and more constrained by the memory bandwidth available in the system well in the g5 we're simply not as constraint right we have a lot more memory bandwidth to play with here so we buy vectorizing your code you can you know if we had just thrown this on the g5 we would see a very marginal improvement but by putting the effort in to vectorize we're able to take advantage of a lot more of the system a lot more of what it has to offer so in addition to the shark we have some other tools monster allows you to directly configure the performance monitor counters and collect data based on these timed intervals or event counts or hockey and then look at this in spreadsheet or chart form it also has the ability to compute metrics so things like bandwidth or cycles for instruction and actually that's how we got our bandwidth numbers for this when we were looking at it the command line version of monster and you can also save and review sessions for that saturn is the last well we're going to talk about saturn is similar in some ways to g prof. it gives you an exact profile and allows you to visualize the call tree of an application it uses GCC to instrument each function in your application at entry and exit chords the function call history to it a trace file and then for each function can give you the call count it can also you the performance monitors to tell you the counts for each function as well as the execution times using a low level timer so okay at this point I'd like to bring up Dave pain again for recession wrap up so we've seen a lot of what we'd been doing with the performance tools and I'd like to talk a little bit about some of the ways we'd like to go with them we've rolled out to a lot of exciting work with Xcode here at the conference and you've seen some basic integration of the performance tools with Xcode we think that there's a lot more we can do along these fronts to really during your development process bring performance data forward to you so you can imagine for example that every time you run your application that at the end of it would pop up and say oh hey by the way did you know you'll eat this much memory when you just ran that and put that in a smart group or something like that if you look through developer applications we're starting to have a lot of different performance tools out there and perhaps there are some opportunities to unify some of the ones that have somewhat similar functionalities and you know one of the things I find as I'm walking through the tools is that I'm kind of overwhelmed by the amount of data that's there with them sometimes you know here's a whole bunch of data go figure it out so I've been playing with things like you actually just human readable thread names so by the way this is the heartbeat spread so maybe don't need to go look at that now we think there's a lot of exciting ways we can take these tools to hopefully make them even more useful in the future for helping you find your performance problems but if you've seen those there's a lot of stuff out there now that you can use the tune your applications to make the best impression of your application and of our system as a whole so you know we've got a lot there now we've added some in this release and we've got a lot more to come so if you want to learn more these tools are part of the code tools package for most of the graphical tools there's a lot of health information buried within the tools there's a lot of command line tools versions of these things so the sample command and heaps and leaps and go run the man pages on that there's a lot of information that's been newly rewritten in developer documentation performance on the system the system overview manual if hopefully you've all read and memorize that by now since so you've been working with Mac os10 for a while but there's a lot of good information there and then the the web pages have been redesigned for the performance and debugging tools so there's a URL there for getting a lot of good stuff from that we have two different feedback addresses one for the Xcode and related performance tools like sample or Malik debug etc so Xcode dash feedback at group apple com for the chug tools chug tools feedback and with that let's see roadmap to some future sessions so let's see tomorrow morning interesting session and how you can yourselves bring your carbon applications over to xcode so from codewarrior to xcode see the session come visit us in the labs we have a developer tools feedback forum tomorrow afternoon and the debugger session tomorrow afternoon and then I mentioned the tuning carbon applications session friday morning at 9am as well as a testing tool session so is that Godfrey and the panelists if you'd like to come on up and do some Q&A
