---
title: WWDC2001 Session 121
framework: wwdc
role: article
path: wwdc/wwdc2001-121
---

# WWDC2001 Session 121

## Transcript

Kind: captions Language: en welcome to session 121 you know performance is an important consideration for all of us and at Apple we're certainly doing our part working hard to improve the performance characteristics of Mac OS 10 many developers have told me that they've seen measurable performance boosts in their carbon apps running on 10 but we know there's always room for improvement and so it's my pleasure to introduce the manager of the advanced Mac toolbox team John Orochi to tell you about how you can improve your performance running on Mac OS 10 welcome John [Applause] good afternoon perhaps you've noticed there's been a little bit of an undercurrent a theme here throughout the conference on performance and basically a lot of that revolves around a realization at some point you've carbonized your app you've you've taken advantage of some some of these latest features and you're comparing your app on a nine and ten and you realize gee this part of my app is slow why is it slow and there's been other other talks that have khanna ghanna over the high level a description of why it's slow but basically we've seen time and time again as more and more people bring carbon apps to ten that there's this basic assumption that that certain code certain code paths or calls will have the same compatibility on OS 10 as they do or not we we looked at all of the api's going into carbon we studied them carefully we knew right up front that there were several api's that just would not be would not work well on 10-4 technical reasons we also knew that there were some api's that would pose performance problems we kind of had that had to compromise between getting all the best api's into carbon versus making sure that you could get your apps over to carbon readily and so the ones the api's that we knew wouldn't perform very well we still left them in carbon so you could port your apps easily and we concentrated on providing alternate api's so that you could actually eke out the last bits of performance with your app before I actually get into some of the specifics in this session I just want to mention that this session is coming ahead of the tools talk 705 which is later on 5:00 o'clock this afternoon and this sessions going to refer to tools that are described in depth in that talk ideally we would have actually been able to schedule the sessions that one coming after this one but I'm going to make reference to these tools I'm not going to go into any depth just know which tools are appropriate to certain techniques and tips that were talking about here and then go and learn about the tools in the follow-up talk then finally there's no one answer for performance and there's going to be all sorts of different performance problems with your apps so look at--look to in other sessions as well some of which have already happened for tips on performance in your app okay so these are the topics that we're going to go through today and they're pretty much ranked in terms of the way I would prioritize and for you if you're only going to do one thing with your app regarding performance I would definitely look into application launch I'd highly recommend the first three going through launching filesystem usage CPU usage that's where you're really going to get a lot of payback in terms of the time you put in but all of them have interesting performance benefits and I really would encourage you to really try to get performance into your planning into your scheduling and into the way you're develop your app so let's start with application launch perhaps some of you are familiar with the bouncing icon the interesting thing is that sometimes it bounces quite a lot sometimes it doesn't stop bouncing there are there are pretty good reasons for that first of all the the bouncing is there as visual feedback to the user that something is actually happening that a launch is occurring right some very legitimate reasons why the app may take a long time to launch is perhaps the app is actually off on some network volume and the network is sluggish perhaps it's on a disc that is spun down others it may be on a CD drive there's real legitimate answers to some some of these launching performance problems but those aren't really interesting and they're not really under your control right there's kind of environmental the one that we really want to talk about today are the the things that you can do something about so when talking about app launching I like to refer to two different kind of launches to different environments and once you app in which you launch your app the first one being a cold launch and the second one being a warm launch a cold launch is it's your app on launching on a bad day everything is going against your app all the files that it has are not readily available or cached in the system all the memory that it needs is has to fight for some some of that memory if some other app is using it this is kind of a worst case scenario and it's actually it's an extreme that you're not really likely to hit as the system is actually being used but what we do is we mimic this a simple way to do it is is to basically write a tool that allocates all of physical memory then touch that memory and then I'll flush out all the memory that's in the system and then see how your app launches after that that doesn't quite cover all of it because there cases where even though the memories is no longer cached you actually potentially have files that have been cached in the file system and other kernel objects that affect the performance so it's actually hard to get to your real worst case scenario warm launch on the other hand is is basically things are going well for the app the libraries that you depend on are all loaded they've already been instantiated the best case is basically it's an app very much like yours or another instance of your app has just launched and the reason I distinguish between the two is because they're they're really different in terms of how you optimize and the the biggest point is the the cold launch is predominated by what I would call low-level I know and I'll explain that in a little bit so my challenge to you is to get your warm launches for your typical app and this will vary depend on a memory and disk and configuration to launch in one bounce there are apps on OS 10 that launch on one bounce in this situation and there's plenty of improvement for even the apps that we've shipped in the first release of ten the other thing I would encourage is - measurement techniques that help constrain or help give you boundaries as to how fast or how slow your app can launch the first one what I refer to that as the do-nothing app is take your application and the very first thing you should do right after in your main entry point is just put in exit to shell leave everything else the same exit will work as well what you're trying to do here is launch an app that basically does nothing but not just any app your app with all of the libraries that it depends on everything else is the same it's just you're not executing any coded initialization I think you'll be surprised when you actually measure that with either a stopwatch or any of the tools that we have on the system I usually use for this case I usually use the time command at the command line it's an interesting data point because it's when you first look at it that's your best case right and a worm launch when you just do absolutely nothing at main that's as fast as you're going to get well actually that's not quite true because before main is run there is other code that run runs that you actually have some control over particularly there's the init routines of your libraries that you pull in that execute code there's the init routine of your your app itself that executes code and the third major category is static initializers for c++ these three areas are things you definitely have to look at because that they contribute to your best case scenario you haven't run any code in the app at all so you may be doing things in your static initializers that you just don't even remember you put there right so make sure you take a look at those so that's the do-nothing app approach and that's a good data point to capture the other one is more like the best case just to kind of get an idea of how how good things could be and there I would just recommend that you you basically have a very high-end system as much memories you can put in it make sure that absolutely nothing else is running take it off the net launch the app once and launch it again measure it those are two boundaries that you should keep in mind as you do performance analysis of your app and what you should be doing as you improve your application launch performance and see how you can make get those two to converge essentially I'm not actually bringing up a talking at length about some more conventional techniques that that you are already familiar with there whole area of perceived performance is something that you might also to look into I'm talking about real-time performance talking about clock time there's still definitely advantages to for example putting a splash screen up ideally you'd want to put your first window up as fast as possible and if you can't get to that to be as you know a second kind of granularity maybe a splash screen is in order some feedback oh the other thing I forgot to mention on the bouncing icon the bouncing icon starts when you double-click on your app it stops when your application is handling events so if you're doing a whole bunch of other stuff before handling events you're not able to respond to events that's going to tie into this it's going to tie into how it's perceived that your application launches it also is going to tie into how when the user can actually use your application and then there was one other thing on the bouncing app is it will timeout after some absurd time and then at that point the user isn't quite sure if the app is actually launched or not okay so in looking at launch performance I was able to profile to word processing apps this is a typical profile of an untuned app you can see the the time is dominated by low-level i/o by that I mean the virtual memory system paging the dynamic loader doing library loading initializing libraries for the first time this is largely completely out of your control this is something the OS takes care of but the other two sections are really interesting that being the file i/o and the CPU time during launch by file i/o the tip a typical example of file i/o is when you're actually going and reading preferences maybe your enumerated plugins that's the kind of file i/o I mean on the CPU side of things it could be as simple as determining you know maybe you've read your part is in and now your your sanity checking them anything that's typically compute bound now what's interesting here is those two file i/o and CPU time compete with the lower-level i/o the ideal case we minimize the file i/o and CPU time and we can make up much better use of low-level I own the other thing that sometimes shows up in untuned applications is pauses by that I mean either an explicit call to a call like delay or a sleep call left in accidentally left in to work around some bug those are kind of hard to detect usually you have to see that basically the app is running but it's not doing anything compute-intensive and it's not doing i/o there's a couple tools that that I'll get into a little bit later that'll help find these these kind of problems the other anomaly we see sometimes is writing during launches there's really no good reason that your application has to write to the file system during a launch now I'm not talking the first time your app ever launches on that system for that user it's perfectly okay to go ahead and write out your preferences for the first time but statistically speaking your typical lunch should not have any file system rights the reason for that is first rights are much more expensive than reads and the whole launch facilities the low-level i/o has optimizations for reads it's basically geared at reads and the write right in the middle app will interrupt it will essentially discard some of the optimizations here's the profile of a tuned at now both of these are what it referred to what I would I explained before as a cold launch this one you can see has a lot more low-level i/o and that's good because that's the best case for us we can optimize that into the largest chunks of i/o that we can do and we can do them as efficiently as possible um of course the file IO and the CPU are minimized in this case if this were a worm launch all that low-level io would go away you might see some more compute cycles but the profile is quite different for a worm launch okay so what does that mean for both cold and warm launches you should concentrate on CPU usage and filesystem usage that's the area's that'll pay back the most the best way of doing that is is first do only what you need to do look at what you're doing in the launch of your applications if it's the typical app you're probably initializing a whole bunch of stuff that you may or may not use during the life of that app right look at deferring some of that initialization this might be a very good use of setting up a carbon event timer to descend yourself a one-shot timer to defer some of this initialization or don't even do it when the app is up and running events do it when the user first uses that feature of your application particularly if the feature in question is a somewhat optional feature not saying it's a bad feature or anything I'm just saying if your typical user base isn't going to use that feature why pay for it why pay for it upfront and your initialization then the second speed up tip I would have for you is eliminating some of these things that you see during launch just outright eliminating writing and pausing of some sort that's a good a good example of that dead code make sure your tools are working for you in this regard I'm talking about dead code that you know might be in there for debug reasons may be in there for whatever reasons you have for tracing filing things like that make sure that that doesn't end up in your final product just a little bit of code sprinkled around means it affects your locality means that code that would not be on the same page code that should be on the same page would potentially be split up across two pages and that can make a difference then there's another kind of dead code that I would encourage you to go after that's the dead code that you've inherited over time now is the time to get rid of that kind of code that's checking to see if quick-draw supports color no need for that anymore you probably still have the check in your code you probably still have the code base that supports that check at least pound to find that out if you're your app and then of course redundant i/o and I shouldn't be understated redundant i/o is where you can actually get a lot of time back from your launches and now I'd like to bring up neat and ghen entre who's going to go through some of the details of file system performance and help with that rate redundant IO [Applause] this on ok it's on now good afternoon so as John mentioned redundant IO and in fact any kind of file IO is a big burden on clock time of your application doing anything to get rid of this file IO or minimize it at launch will pay off immediately and it can either pay off at least minimally in reducing system call overhead in the best case scenario where you've got buffer caches that have got your data and they're already hot all the way to if you're reading something off of a network disk and you solve because it's on a network so here are the areas that I'd like to cover first is file iteration metadata volume iteration well actually you know you can read those let's just get right into it first one the venerable PV get cat info you're all familiar with this call I'm sure and when it when we were creating the carbon API it was pretty much no question we couldn't get rid of PV get cat info it would have just caused a huge upheaval in people's source bases everywhere ours included and so there was just no choice we had to provide it it had to be compatible unfortunately we couldn't make it performance compatible but that was a secondary concern in the interest of getting your apps onto 10 quickly the bad news is the PV get cat info is non optimal on any file system that goes for 9 and for 10 if you have file sharing on on 9 for example and for the most part it's overkill for all clients PB get cat info just returns a huge amount of data and the developer code out there that uses PB get cat info ranges from using none of it in other words just checking an error code to maybe one field from this enormous data structure let's take a look at that data structure in fact I couldn't even fit everything that get cat info returns to you it's just an enormous amount of stuff and you know when it came time to or back when PB get cat info was first created and exported as a system call it made perfect sense because it was a it was a great reflection of the underlying volume format right on HFS discs the catalog information is stored in one section of the disk it tends to be hot in the caches because everyone is using the catalog files so PB get cat info tends to be free and well while us while you've paid the trap overhead you know on a classic Mac OS system while you've paid the trap overhead of making the call and getting into the file system and what-have-you let's just return back everything that we possibly can and guess what we did of course and and everyone uses this call it's plenty fast on nine there aren't any real problems with it problem slowly started creeping in with file sharing again and things got much worse with ten sort of as Nixon as a graphical example of how people get CAD info works this is on an optimal this is an optimal case right here this is what file sharing turned off this is you know on an HF S or an AFP disk in other words all the data that's given to you in one PB get cat info call is in one contiguous part of the disk and lo and behold it fills in the parabola in one shot again this is optimal let's look at what happens with PB get cat info on other file systems with Mac OS 10 now we have the opportunity to support plenty of other file systems then than we ever did before and it turns out that PB get cat info is just not a good reflection of the underlying volume format in order to get some data we have to go to parts of the disk different parts of the disk and in fact a lot of those different parts of the disk are completely disjoint which means you make one PB get cat info call on one of these file systems you're doing numerous IO operations and I don't think I have to say that that's bad potentially if this is a network-based disk and as we move forward more of them will be fortunately there is a good solution and it's available on Mac OS 9 and later the call is FS get catalog info the interesting design point of this API is that it takes a bitmap that allows you to specify exactly what fields you want returned to you again FS get catalog info I didn't put a slide here with the fields that it returns but it's a big honkin per Ambler Slyke PV get cat info but it does take a bitmap and so whatever you're interested in you can fill in those bits and FS get catalog info we'll do the minimal IO required to satisfy your request now I can't emphasize enough that you have to pass in just those bits that you're interested in if you go ahead and fill in the you know all of the bits if you pass in ffff for everything you may as well just use PV get cat info it's not going to buy you much and you're gonna pay the price like we saw in the previous slide a good way to see exactly what's going on as you know when you make a request an FS get get catalog info call to see what's going on either the covers is to use FS usage this is a tool that's on your systems and it will be covered in the performance tools talk I believe it's session 705 that John talked about earlier it's great to actually just write a simple little app call FS get catalog info with the various bits that you are interested in and just see what's happening particularly on these other file systems you know something that's not h FS or AFP NFS or u FS u FS is probably the most readily available so here's a quick little sample give me given an FS Ref tell me if this item is a folder notice that the only bit that's passed to F the FS get catalogue info call is the FS cat info node flags because that's the only bit that we're really interested in so on an NFS or a u FS file system that's all we have to worry about and we can do that in one system call at most in fact in a lot of cases we can do it in zero and calls and then of course the field is ended in return the next topic is volume iteration so when it came time to create the Carbon API we were going through and pruning out a lot of areas that we just couldn't support in the in carbon on OS 10 one of them was the low mem to get at the VCB pointer a lot of Carbon apps or a lot of Mac OS apps saw this as a free way to get access to all volumes to enumerate all volumes with zero i/o and you know I don't have to tell you just in memory copies are very fast very efficient however when we created the Carbon API it was pretty clear that we couldn't support direct VCB access and our recommendation was and continues to be to use one of the get volume info type calls specifically in the documentation we mentioned H get V info however the problem the problems that we have with PV get cat info are the same problems we have with pdh get V info and it tends to be very expensive it returns a large parameter block for most uses you probably just don't care about a lot of that information and exactly analogous to FS get catalog info there is an FS get volume info call again passing the minimal bitmap that you require and we will do the minimal i/o in a lot of cases it will just be an in-memory copy for us out to your parameter oh I owe the FS ref ap eyes are the primary ap eyes and the preferred api's in Carbon on OS 10 in fact all of the major clients of the file manager on OS 10 today the file manager navigation services and cocoa open save all use the FS ref API and have actually seen huge performance gains in doing so we've seen performance gains not just on HF on on NFS and network based file systems but in fact we've seen them even on local HFS discs so it's def at least something worth investigating so on to file i/o first thing I'd like to plug here is what some of you probably recognize as being a relatively old tech note I think it was put out in 93 or 92 I'm not really sure plant manager performance and caching turns out a lot of the lessons that are taught in that tech note are relevant today and I highly recommend that you go back and read that some of the biggest points in it of course are to use large page lined i/o wherever you can if you're picking through little bits of data in a file don't push that down to the file manager or the file system and you know don't pay the system call overhead just because you're doing little bitty writes do larger page lined i/o in to preferably page lined buffers and you'll get the maximum throughput from the file system a lot of times without even having to do copies from the kernel buffer if you pass us page aligned buffers as well again let me put in another plug for FS usage this is a great place to look and you can actually see exactly where reads are coming through rights are going through and how much data is actually being read her read or write this will help you identify quickly where you're spending a lot of your time doing up doing a lot of small iOS the next point is don't pollute the cache this is covered in the performance tech note and it's something that a lot of clients overlook and it really shouldn't be because it tends to pay off in big wins what I mean by not polluting the cash is you as the app developers know exactly what your usage pattern is going to be for any bit of data or large chunks of data you know that if you're streaming in if you're importing a file that you're just not going to look at again say you're importing from some foreign file format into your own internal representation you're just not going to go back to that file again to do the read and if you're talking about a multi megabyte or or you know it doesn't even have to be that big multiple hundreds of K file if you just do reads without passing the no cash mask what you're actually doing is filling the buffer cache with data that you know that you're never going to read turns out that by passing the no cache mask you're not actually hurting yourself you're not hurting your throughput by doing those fresh reads from these files but by actually filling the buffer cache with these blocks that you in fact know you're never going to read you're evicting other blocks other blocks that the user probably cares about a lot more than you care about this imported file and the same goes for write say your say user select save as and you have some whatever you're saving your your internal representation out to some file format that you're not going to read again again you know that you're not going to be doing this so if you pass the no cache bit to these rights you're not going to pollute the users buffer cache and in fact you're not gonna pay any performance penalty by passing the no cache fit you're just gonna make it an overall better experience I think a big reason why this isn't used so often is because it doesn't look like it's a performance game in other words when you go and change your code it's really hard to see the benefit of this the reads that you are doing or just as fast the rights that you were doing or just as fast and if you're not if you haven't evicted pages from the buffer cache that you really cared about you're just not going to notice but it is still very important and I strongly encourage you to to look at to look hard at where you're doing IO and what kind of IO you're doing and pass the no cash mask where you can internally just a couple examples of where we use it or when we're doing find your copies the finder is doing a copy you have a folder from A to B it knows the finder itself knows it's never gonna or it's very likely that it's not going to look at that data again unless the user requests it so there's no reason to flood the entire buffer cache with these copied blocks instead the users data can stay intact and the finder copy can execute just as quickly as it did before the other area is in iTunes when it's encoding a file or or ripping a file from CD and writing out to disk iTunes itself knows that the chances are very slim that it's going to actually go back and read those pages again so it passes the no cache mask and it turns out that because of that they use the user experience on 10 is a little bit better even though it's kind of hard to really quantify that and look at it you kind of just have to know that it's better and know that you're doing the right thing okay writing large files one common technique that has sort of been passed down from generation to generation is when you're writing a large file to first to do a set EOF or an FSF fork size to the final length of the file that you are actually writing and then back up and start filling in the data this has a couple of advantages and this is why it's been done over time first of all it is a good preflight to allow you to know whether or not you have space on the disk to actually do the i/o and then second it's also a good way to reserve a portion of the disk a hopefully contiguous portion of the disk so that when you go back and you're actually doing your rights you know that you're getting your writing to continuous parts of the disk and subsequent reads of that document or that data off the disk will be fast the problem is on OS 10 for security purposes when you extend the file what we do is we zero filled the entire file from the current AOF out to the very end where you extend it and the reason we do that of course is security we don't want some malicious program to run on your disk reserved as much space as possible and then potentially sniff through looking for Social Security numbers or credit card numbers or what have you so this is why we do the zero fill of course it has the downside of producing double iOS in this very common usage of the file manager the double iOS come from first when we do the zero fill of that extended area and then later on when you actually do your write if you write a little app on OS 10 right now that all it does is just create and open a file and then just do a set EOF of a gig you'll notice that before that set EOF returns your disk will be buzzing away and when you go and do a subsequent read you'll see that it is all zero field and that's exactly what I'm talking about we're looking at ways to fix this in the near future but the truth of it is we've shipped this way this is already on customers disks so it's something that you should probably address now and fortunately there are a couple of ways that you can address it you can use the pb allocate call this does not have the zero filling behavior however it does preserve it does allow you to reserve a portion of the disk for a contiguous for a contiguous file on the disk or the other thing you can do is just write just start writing if you're not doing as long as you're not doing set e of's followed by a right you won't get this double i/o if you're just doing writes pass the end of the file then that's enough of a trigger to the filesystem that your that any subsequent reads are just going to pick up that data that was written so we don't need to zero fill and we don't finally file assumptions I couldn't think of a better topic for this they're heading for this slide so I just put this since the beginning of personal computers we've been able to make some assumptions about the the layout of disks and the layout of hardware and usage patterns and things like that and as we move forwards more and more of those assumptions will prove to be false or can prove to be false under certain situations one of these assumptions of course is that your disks or the user disk is locally attached to the machine that all user data is coming off of a local disk all preferences are coming off of a local disk and in fact document directories and things like that are on local disks well networks are getting faster all the time and pretty much right now they are fast enough that in some situations you can actually create a an environment on the disk where user preferences user documents and various other bits of data are actually stored on a network back to disk and it provides lots of benefits in fact we have this all set up at Apple right now where users can log in to their machines you log in in one workstation let's just call it and you can you log in with your username and your password all of your preferences come up on that one machine you can use your documents just as you have in the packet just as you were you know maybe in your office or you know do whatever you want to you get all your same preferences you go back to your office all of your preferences are updated because everything is on the network it's a beautiful thing this sharing and as we move forward it's going to be one of those things that a lot more users are going to be exposed to but it does mean some serious considerations for your applications in other words preferences and documents and things like that are no longer going to be backed by local disk and this can have this gonna have impacts on your code basse that you're probably not even aware of just because when you're coding or designing with some assumptions in mind a lot of times you're not even aware you're making those assumptions and you know you'll tend to do things like let's say oh I know that I'm caching or I'm bringing this data down off the network and I need to cash it somewhere let me cache it in the Preferences directory or let me cache it to a temp file in the documents directory well if those if those directories are backed by a network volume you're really not buying much by caching something off the network to another Network volume or is this is a lot more common scenario when you launch your app and you're doing tiny little iOS to the Preferences file and this has never been a problem because it's a local disk it's very fast well if it's a network disk that's a big window that you can stall in and your users will definitely notice trust me we've noticed at Apple and we've been working with developers where we can to point out what's going on and help them work through it but the best thing that you can do is try to set up one of these Hostel test environments in your own offices and see for yourself one of the best examples of this is to set up an NFS based user directory and login to that user and just double click your app and see what happens or double-click your app with FS usage running alongside and see what happens you'll notice that if you if you're logged in as either a local user or as a network user a lot of times you'll notice a great variation in the performance of your of your app and a lot of that can be attributed to some of these design decisions the good news is that anything that you fix for the network case will also benefit the local case so if you're working off of may be slower media or maybe not say you're just working off of fast media you can reduce the number of system calls that you're making and speed things up even in your local scenario so it's definitely a good thing to look into doing and check the Mac OS 10 server documentation for more details and with that I'll bring John back up on stage talk about watching the application bounce a lot those network director connect network users when you have your system set up that way we typically see two or three times the number of bouncing we do on the local directory it's really something I advised to take Newton's advice on that one okay you've learned all of the details about what you can do to help with your file system performance I'm gonna talk a little bit now about your CPU usage first thing I'd like to say is you're running on a pre-emptive multitasking system but it's not magic it doesn't give you more than 100% of the CPU on a single CPU system it can't give you free cycles a matter of fact basically the gain is that one single thread on that system is not going to take over the whole system it's not going to bring the system to its knees so if you have a hundred threads that all need to run they're all sitting there have something to do even if it's very little the scheduler has to take into them into account that's why we talked about making sure that your threads are blocked that CPU is still a limited resource so make sure when you're using threads or timers cooperative threads that you're taking this into account the best tools on the system to really look at this our top time and CPU monitor CPU monitor you've probably seen in some of the demos I would advise just keep that thing running as you're doing development just keep it off maybe on a second monitor it'll show you very easily when there's a little bit of a CPU peak and you can go in and see is that your problem or not typically that is a great indicator for when you have CPU bound problems top is another one that you could write run because it shows you a little bit more than just CPU usage both of those I would encourage as you know as you're just doing your ongoing development on on your app keep them running in some window has clues to the possibility of a performance problem okay so responsiveness this is the next area after launching file system and CPU usage that I would encourage you to look into mostly things to do with responsiveness you should be able to fix up fairly quickly by just taking a quick look at what your app is doing in with regards to event handling the biggest indicator that you're not doing event handling right is probably that you're pegging the CPU you've seen this in some demos the best and simplest workaround for that is maybe look around if your if your app is showing this behavior if it's CPU bound and during tracking during you know interaction with your UI take a look use sampler which is a tool that lets you actually pinpoint where in your code the problem lies search your code search your code for still down and button and look at how you're using how you're calling these older calls that we really would rather you get off of track mouse location is your friend that's what you want to be using that's the the basic primitive for letting you do all sorts of tracking in the UI that blocks intelligently so after event handling Oh in addition to those tools I would encourage you to look on a developer CD there's an application called appearance sample which has almost every widget the toolbox supports every control you've ever seen go and play with that app run CPU monitor you'll see all of those those controls block well that's what your app should do if you're seeing different behavior in your app it's either a problem in that you've you've done your own kind of handling your own custom control or potentially in the way that you're using the toolbox the next area that contributes to your apps responsiveness you know maybe it feels a little sluggish maybe everything else is looking good your file system performance in your launch is good but when you activate Windows things don't appear as snappy as they do say online that's probably an indication that you have a drawing problem the best tool for that is quartz debug you've probably seen it some of the other sessions it should be in the performance tool session as well because it'll let you see when you're doing redundant drawing when you drawing the same things over and over the other typical pitfall that we've seen is people back buffering that doing their own double buffering further drawing when the system is already doing that for them so make sure you check the port using QT is port buffered and if it is then you don't have to do that buffering yourself that's being done for you the next area on responsiveness has to do flushing because you have a back buffer it means there has to be a time when you actually get those bits in a back buffer to the screen generally you should try to avoid flushing the system will do flushing for you basically on it event boundaries it'll try to do that as intelligently as possible so you shouldn't have to flush the two exceptions two examples of exceptions are when you're doing some kind of animation you want that to get to the screen right now or when you're not really involved with events at all the splash screen case those are good uses of explicit flushing otherwise let the system do it for you another common performance problem area is with regards to the window resizing and this also has to do with the back buffer in the design of the window system on OS 10 and basically our advice there is to try to do this all in one fell swoop with set window bounds instead of trying to use size window and move window in combination that's what that call it exists for to optimize those cases also the other thing that we've seen with regard to windows is pervasive or heavy use of invisible windows windows are generally more expensive on OS 10 the back buffer in addition the overhead the interaction with the core graphic system in the window server you may have Vince invisible windows that doesn't mean that they don't cost anything that manipulating the visible windows isn't doesn't come for free as a matter of fact I wouldn't look into why are using invisible windows it is often the case that you can ditch that window dispose it create a new one and redraw faster than you can by twiddling that invisible window in the last one I wasn't going to put on here at all but I figured I would try some of the tools on and look at various apps a couple days ago and I just happened to notice that one of one of the apps that I use every day was doing file i/o when I activated a window and I just sitting there scratching my head finger trying to figure this out and it's just a bug but this is one of those things that you probably wouldn't notice unless you're running a tool that tells you that that's happening FS usage is perfect for that yeah FS usage lets you basically see the file i/o that's going on in the whole system and a particular app you can filter things out another use if you're really going after file and anything in particular is is sampler which lets you get tie the the usage pattern back to your code okay pulling versus blocking I'm sure you've heard this a lot in various talks I'm going to talk a little bit about some of the more atypical situations in which you find pulling affecting performance way next event is actually pretty typical but the the fallout of using way next event zero is where we sometimes see some problems so just to make sure we're all on the same page here you really shouldn't be using wait next event zero a very simple way to get rid of waiting x event zero so if you have something that you want to do periodically set a Carbon event timer to do it with the frequency that you need and use wait next event very long time now time and tick count that's something that surprised a few of us basically that we did some performance profiling and various apps show up a tick count is taking up a significant amount of time and one of the reasons is that account costs more than Adonis on OS nine but it's also used all over the place and a lot of UI in places where it really doesn't have to be used first of all we're talking about it's something that's very like coarse-grained right ticks sixtieths of a second calling it more often than 60 times a second doesn't make a lot of sense so the best advice I have for you here is to try to use the event system try to use the time stamps that are in events and look at events like that there's often comparisons made to time now and that can get you out of polling essentially four tick count and have it show up in in it being a performance problem another often asked for a bit of information is the volume list Newton went through earlier how to do that as efficiently as possible but I think it's a pretty rare case where you actually need the volume list I would suggest you just get rid of that code altogether or figure out what you really need if you're trying to find out about new volumes or if you're trying to find out about volumes that have just been unmounted register yourself for a carbon event for volume mount and unmount ask the system to tell you about it instead of periodically going out and looking at all the volumes and trying to figure out what happened same kind of thing goes for preference change notifications there is a theme change applicant there is a various new carbon events that let you know about about things actually up on the volume slider I should probably also have another bullet for processes if you're trying to find out what process just got launched or what process died there's a carbon event for that as well generally we've been really looking at the system to try to find out if there's any legitimate need to do polling the answer should be known we're trying to notify you of everything that you might find of interest it's a much better solution on OS 10 so if you guys see things that you're still you still think you have to pull forward let us know about it we'll figure out a better way to do it and on this final note maybe some of you have heard heard this bit in the application packaging and a document binding presentation yesterday we need notification to particularly in the parts of the the system that presents the file system visually so in the finder in nav services in open safe panel those are showing you the file system objects they are not pulling we don't pull so we need you guys to if your participate in this kind of thing if you're an installer or if you're copying files to a place that is likely to be visible to use the effin notify call lepida notify is a ten only API it's in files dot H basically says something happened something changed in this directory takes an FS ref it lets us know that something changed and we should refresh the contents of that directory in any UI elements they care do this intelligently know if you're copying a whole bunch of files to a single directory let us know when you're done with that copy operation not at every file ok resource manager use the resource manager is very tied to the file manager so in in essence I I could just repeat what Newton said earlier about the files file manager but it's actually worse in that the resource file format was designed way before BM systems were really commercial like they are now and really the file format is not designed in for a VM system in that there's a resource map in one part of the file and resource data and the other part of the file and there's no way around going to both places every time you need data you're asking for a resource you have to go look up where the resource is in the file in the resource map that's at least one IO then you got to go to where it it told you the resource was that's a second IO okay that's bad enough then you look at what's in the typical resource file and you see lots of little resources that's also bad right you already heard how we really like to see very large items that's the ones we optimize we get the best bang for the buck at it so if you have any tools or if you have in the past done any kind of coalescing of your resources if you have resources that for example stur resources that could be combined in the stir pound it's a much better a use of the resource manager going and reading eight bytes out of the resource managers about one of the most expensive kinds of IO you can do the other thing that we see with regards to resources particularly in the use of plugins is numerating your plugins opening up their resource files to find out something about them opening and closing the same files this pattern that we've seen and we really like you to avoid that perhaps what you can do is cash that and open you know cache the results in one file and open that minimally make sure that you just do that scan once when you actually have to find out about your plugins and then for historical reasons just lots of calls to update res file which writes out the map and the data for your resource Forks just kind of call it willy-nilly it's more like a flush in front people's minds and that causes IO okay the last bullet item on here is something that we added to OS 10 basically a feature in the resource manager to kind of help out with these sets of problems the what we did is add a new key to your info.plist called CS resource file mapped and if you set this it's a boolean key if you set this to true it'll change the behavior of the resource manager with respect to your applications resources what it'll do is it'll open them up read-only okay you can't write to them and it'll file map them and then there's some support in the memory manager to support file mappings so that we don't have to allocate for the data that's in your resource fork which saves on your memory footprint and because it's all file map now we get a lot better characteristics of Io because yes we're still hitting that resource map and we're still hitting the data but there's some locality there when we go to the resource map the second time it's likely to be on the same page and when we go for data if you're going through the data of a certain type this will depend on the the organization the data it's likely that we're going to get some good win there the only caveat to this and the reason why we didn't turn this this behavior on by default is that it'll break some of your code at the point where you say yes turn this on then all of the resource handles that you get back essentially the pointers in them point to read-only memory if you try to modify that your application is going to crash so there's plenty of folks who've just turned this on and don't write back to the resources right I mean in general particularly the application resource file you probably don't want to write too because it might be living on a CD it might be reading on a network volume that other people are using in general it's bad practice but let's say you were writing things to the resource but not actually flushing them out to file you can still do that by detaching the resource thereby getting an mm coppy mess with a copy and everything else still works correctly so this is something that you have to turn on yourself and it's fairly straightforward to debug because it basically leads to crashes and if you have any questions about exactly what the info.plist is I would recommend looking at techno 2013 ok the next section is memory usage this can be a real big problem on 10 largely because of the big difference in between the memory models and that you just have a very large and sparse address space for example just right off the bat you could accidentally allocate you know order magnitude off what you intended to allocate and not even know it the system will give it to you mem full errors are relatively rare on this system and that can be a problem so in order to keep on top of this I really would recommend getting familiar with both the leaks and the Malik debug tools leaks in particular you want to keep an eye on you may not notice the performance necessarily so much in your your app it may be that a slow leak over time but it really does affect what's going on underneath the covers in that you know you don't get reuse of the same memory blocks and it'll lead to paging it'll lead to general bad character a ssin so it'll lead to your app generally feeling sluggish aside from leaks I would really recommend that you get a good handle on the size of your application particularly make sure your tools are doing the work that they should for you make sure that things that are actually constant in your application end up in the right section so that they don't get so that the OS takes the maximum advantage of that we went through a lot of the carbon frameworks early on and got a lot of gains by doing this basically marking strings and other constant sections as constant so that they could show up in a text section that gets shared across the system same thing goes for your app the third thing on memory usage is there's really been a reversal in terms of handles and pointers on OS 9 the handle was really the first-class citizen it was designed to work with that limited application partition that heap and was designed to be reused inside that limited space on 10 the reverse is the case pointers are really the first-class citizens and there's some cost to handles so in performance critical code look at rewriting to use pointers instead of handles we found one case where just removal of h lock and h unlock in this code path made a big difference in terms of performance the reason there was that the the locking costs are sufficiently higher on 10 than they are on 9 and what work had to be done was really impacted by that and this is something that you should do kind of carefully the OS itself I mean if you're looking into this kind of an optimization the OS itself doesn't this doesn't rely or doesn't go and purge and move handles out from under you that's really under your control intent and so if you know that you're not resizing the handles somewhere else in your code if you know that you're not looking at the handle to see if it's locked then you're likely able to make this kind of an optimization then lastly I hope it's pretty obvious is that there's really no purging the calls are still there they're largely there so you can run the same app on both 9 and 10 but your purged crocs are not going to get called the application heap is not going to fill up and if you're relying on basically allocating allocating allocating until you get called in your purged proc that's not going to happen that's the biggest leak you can ever have so take a look at that if you're you're in that kind of a category okay code loading this is something that kind of referred to earlier on and launching a bit in that I said something to the effect of defer some of the things that you do at launch time to later on and one of the ways in which you can do this is to factor your app most application code bases start off and are organized basically you know by the people that work on them first so you get you know you get Kelley's feature and Mary's feature and John's feature and they go off and do those different pieces of it and then neaten comes along and he has a new feature to add and pretty soon you've got one little piece of the app whether it's a shared library or a plug-in for each person that's working on it and soon those features grow up and there's a whole team around each of those features and before you know it the organization your app looks a little bit like the organization your organization right and that's rarely the best organization in terms of performance you really want to look at the features in terms of what the app really needs probably want to look at layering and dependencies so factoring in your app in terms of performance is something that I would advise you to do it's usually not something that you would do quickly it's something you probably converge on over time look at plug-ins for things that are truly optional again I don't mean this is a optional innocence no one would use it an example of this in the real live OS is nav services and printing those are both good categories from the OSS point of view of services that are purely optional in that the application your application can run fine do lots of good work and never interact with now services or printing so why should they pay the cost upfront the answer is it shouldn't so look for those kind of opportunities in your app maybe there's a plugin no you know has all the bells and whistles that you could ever want but you only use it once in a blue moon factored that out so you don't pay any costs for that don't give that app that plug-in a initial load and say are you happy with things that will cost then finally look at your libraries look at the number of libraries that you're using libraries when backed by files are costs that costs all the way down to kernels the kernel has a fixed cost there's a prosper process cost if your tools support it take libraries and combine them together merge your pep libraries together into one big file that's in fluent better from the performance and resource use point of view okay in the async i/o space we've seen some problems that are kind of interesting and combination of async i/o and threading asynchronous i/o on ten by that I mean deferred tasks file manager time manager those are all implemented based on running the operation in question synchronously on a thread that the OS creates for you and particularly when used with cooperative threads or working use combined there's additional costs there in general this is not performing as well as the equivalent on nine this is a case where I would recommend do it continuing to do async i/o and chain completion routines on nine factoring your app and dynamically checking and doing something entirely different on tenth the simplest workaround or the simplest solution on ten is really just to use threading in a synchronous i/o model and then just a data point we ran with one of there was an app or two that we've seen that implement threaded packages threading packages on top of the time manager the time manager is itself implemented as a threat which means you're threading on top of a thread that's scheduled by the kernel which is you're trying to run threads on top of something that's already being managed by something else not a good performance solution in the context of cooperative threads there's one basic flaw with cooperative threads and that is that the norfolk whopper threats that gets scheduled you have to yield and there's no blocking going on so by their very nature cooperative threads are compute bound that's the biggest problem they're still there because we know you have code that depends on it I would really look at not using cooperative threads or potentially using timers Carbon event timers instead or moving your code off to MP threads often there's a performance problem with regards to messaging between threads this usually has to do with messaging as opposed to or polling to see if some message base thing is complete versus true messaging and there I would just encourage if you're doing things with multiple threads even across processes make sure that you're not getting in this situation where both of the threads are competing the casing in the data point that I have was basically one thread was doing a lot of file i/o and was reading and writing to a file and the first thread was basically looking to see if it was done and the cleaner solution to that and a better performing solution is basically have the the second thread just block and when the file i/o thread is completed just send it a Carbon event to get that whole thing to work well and then finally seen situations where basically people just go thread happy they have just way too many threads for no real apparent reason and just bear in mind that each one of those threads has a real cost there's a in a wired memory cost in the kernel and they're not free so use them diligent and then finally threading in general can be used to really help out with performance I would say look particularly look for things like when you're trying to do a safe save or a fast save kind of feature that's a very good use of a thread you can create the thread do that you work on it and dispose it a network listener is another model that works really well out where the thread is just basically listening for incoming activity and occasionally there's a good use for threading when you're doing low priority idle kinds of computation maybe you're indexing something in the background something like that okay so finally at the summary really want to encourage you to factor performance into your planning try to really make it be a feature of your application we want that killer app to be that much better by performing well performance isn't a one-shot deal you really have to keep it in your your workflow you got to keep on on keep on top of it ideally you would you know we've built different builds of your app try to capture data about that about how it performs and see and pinpoint where performance problems were introduced and then I really encourage you to get into the tools the tools talk is later on this afternoon and all those tools are on the system you should just become experts at those tools allow you to look at your app in various different ways and they've really helpful in pinpointing these problems and lastly just go after those performance problems all right so now let's see oh yeah one last thing so the first one the carbon developer documentation you should just generally know the second one if you're not ready to do anything with performance at all you're stuck behind a whole bunch of a couple months worth of features on your app you're still carbonizing anything like that at the very least remember the second URL up here that performance PDF file has a lot of information on performance a lot of what I've gone over and what we'll be going over in other sessions is in that one document okay I'd like to bring mark up and then we can help do the roadmap and then we'll head off to Q&A thank you John there we go that won't work okay so as John mentioned we'd like you all to attend the performance tool session at 5:00 today if you possibly can we're gonna talk about the various tools that he introduced you to and also because we don't have a lot of time right now to take questions I ask you to take your questions to that session and we'll have some of the same people there to answer them but we'll just take a couple right now so if we can bring up our Q&A panel we can do maybe two or three questions oh by the way this is me so if during or after the conference you have any questions or comments about carbon or carbon performance send them to me at this address you
