WWDC2001 Session 121
Transcript
Kind: captions
Language: en
welcome to session 121 you know
performance is an important
consideration for all of us and at Apple
we're certainly doing our part working
hard to improve the performance
characteristics of Mac OS 10 many
developers have told me that they've
seen measurable performance boosts in
their carbon apps running on 10 but we
know there's always room for improvement
and so it's my pleasure to introduce the
manager of the advanced Mac toolbox team
John Orochi to tell you about how you
can improve your performance running on
Mac OS 10 welcome John
[Applause]
good afternoon perhaps you've noticed
there's been a little bit of an
undercurrent a theme here throughout the
conference on performance and basically
a lot of that revolves around a
realization at some point you've
carbonized your app you've you've taken
advantage of some some of these latest
features and you're comparing your app
on a nine and ten and you realize gee
this part of my app is slow why is it
slow and there's been other other talks
that have khanna ghanna
over the high level a description of why
it's slow but basically we've seen time
and time again as more and more people
bring carbon apps to ten that there's
this basic assumption that that certain
code certain code paths or calls will
have the same compatibility on OS 10 as
they do or not we we looked at all of
the api's going into carbon we studied
them carefully we knew right up front
that there were several api's that just
would not be would not work well on 10-4
technical reasons we also knew that
there were some api's that would pose
performance problems we kind of had that
had to compromise between getting all
the best api's into carbon versus making
sure that you could get your apps over
to carbon readily and so the ones the
api's that we knew wouldn't perform very
well we still left them in carbon so you
could port your apps easily and we
concentrated on providing alternate
api's so that you could actually eke out
the last bits of performance with your
app before I actually get into some of
the specifics in this session I just
want to mention that this session is
coming ahead of the tools talk 705 which
is later on 5:00 o'clock this afternoon
and this sessions going to refer to
tools that are described in depth in
that talk ideally we would have actually
been able to schedule the sessions that
one coming after this one but I'm going
to make reference to these tools I'm not
going to go into any depth just know
which tools are appropriate to certain
techniques and tips that were talking
about here and then go and learn about
the tools in the follow-up talk then
finally there's no one answer for
performance and there's going to be all
sorts of different performance problems
with your apps so look at--look to in
other sessions as well some of which
have already happened for tips on
performance in your app okay so these
are the topics that we're going to go
through today and they're pretty much
ranked in terms of the way I would
prioritize and for you if you're only
going to do one thing with your app
regarding performance I would definitely
look into application launch I'd highly
recommend the first three going through
launching filesystem usage CPU usage
that's where you're really going to get
a lot of payback in terms of the time
you put in but all of them have
interesting performance benefits and I
really would encourage you to really try
to get performance into your
planning into your scheduling and into
the way you're develop your app so let's
start with application launch perhaps
some of you are familiar with the
bouncing icon the interesting thing is
that sometimes it bounces quite a lot
sometimes it doesn't stop bouncing there
are there are pretty good reasons for
that first of all the the bouncing is
there as visual feedback to the user
that something is actually happening
that a launch is occurring right some
very legitimate reasons why the app may
take a long time to launch is perhaps
the app is actually off on some network
volume and the network is sluggish
perhaps it's on a disc that is spun down
others it may be on a CD drive there's
real legitimate answers to some some of
these launching performance problems but
those aren't really interesting and
they're not really under your control
right there's kind of environmental the
one that we really want to talk about
today are the the things that you can do
something about so when talking about
app launching I like to refer to two
different kind of launches to different
environments and once you app in which
you launch your app the first one being
a cold launch and the second one being a
warm launch a cold launch is it's your
app on launching on a bad day everything
is going against your app all the files
that it has are not readily available or
cached in the system all the memory that
it needs is has to fight for some some
of that memory if some other app is
using it this is kind of a worst case
scenario and it's actually it's an
extreme that you're not really likely to
hit as the system is actually being used
but what we do is we mimic this a simple
way to do it is is to basically write a
tool that allocates all of physical
memory then touch that memory and then
I'll flush out all the memory that's in
the system and then see how your app
launches after that
that doesn't quite cover all of it
because there cases where even though
the memories is no longer cached you
actually potentially have files that
have been cached in the file system and
other kernel objects that affect the
performance so it's actually hard to get
to your real worst case scenario warm
launch on the other hand is is basically
things are going well for the app the
libraries that you depend on are all
loaded they've already been instantiated
the best case is basically it's an app
very much like yours or another instance
of your app has just launched and the
reason I distinguish between the two is
because they're they're really different
in terms of how you optimize and the the
biggest point is the the cold launch is
predominated by what I would call
low-level I know and I'll explain that
in a little bit so my challenge to you
is to get your warm launches for your
typical app and this will vary depend on
a memory and disk and configuration to
launch in one bounce there are apps on
OS 10 that launch on one bounce in this
situation and there's plenty of
improvement for even the apps that we've
shipped in the first release of ten the
other thing I would encourage is -
measurement techniques that help
constrain or help give you boundaries as
to how fast or how slow your app can
launch the first one what I refer to
that as the do-nothing app is take your
application and the very first thing you
should do right after in your main entry
point is just put in exit to shell leave
everything else the same exit will work
as well what you're trying to do here is
launch an app that basically does
nothing but not just any app your app
with all of the libraries that it
depends on everything else is the same
it's just you're not executing any coded
initialization I think you'll be
surprised when you actually measure that
with either a stopwatch or any of the
tools that we have on the system I
usually use for this case I usually use
the time command at the command line
it's an interesting data point because
it's when you first look at it that's
your best case right and a worm launch
when you just do absolutely nothing at
main that's as fast as you're going to
get well actually that's not quite true
because before main is run there is
other code that run runs that you
actually have some control over
particularly there's the init routines
of your libraries that you pull in that
execute code
there's the init routine of your your
app itself that executes code and the
third major category is static
initializers for c++ these three areas
are things you definitely have to look
at because that they contribute to your
best case scenario you haven't run any
code in the app at all so you may be
doing things in your static initializers
that you just don't even remember you
put there right so make sure you take a
look at those so that's the do-nothing
app approach and that's a good data
point to capture the other one is more
like the best case just to kind of get
an idea of how how good things could be
and there I would just recommend that
you you basically have a very high-end
system as much memories you can put in
it make sure that absolutely nothing
else is running take it off the net
launch the app once and launch it again
measure it those are two boundaries that
you should keep in mind as you do
performance analysis of your app and
what you should be doing as you improve
your application launch performance and
see how you can make get those two to
converge essentially I'm not actually
bringing up a talking at length about
some more conventional techniques that
that you are already familiar with there
whole area of perceived performance is
something that you might also
to look into I'm talking about real-time
performance talking about clock time
there's still definitely advantages to
for example putting a splash screen up
ideally you'd want to put your first
window up as fast as possible and if you
can't get to that to be as you know a
second kind of granularity maybe a
splash screen is in order some feedback
oh the other thing I forgot to mention
on the bouncing icon the bouncing icon
starts when you double-click on your app
it stops when your application is
handling events so if you're doing a
whole bunch of other stuff before
handling events you're not able to
respond to events that's going to tie
into this it's going to tie into how
it's perceived that your application
launches it also is going to tie into
how when the user can actually use your
application and then there was one other
thing on the bouncing app is it will
timeout after some absurd time and then
at that point the user isn't quite sure
if the app is actually launched or not
okay so in looking at launch performance
I was able to profile to word processing
apps this is a typical profile of an
untuned app you can see the the time is
dominated by low-level i/o by that I
mean the virtual memory system paging
the dynamic loader doing library loading
initializing libraries for the first
time this is largely completely out of
your control this is something the OS
takes care of but the other two sections
are really interesting that being the
file i/o and the CPU time during launch
by file i/o the tip a typical example of
file i/o is when you're actually going
and reading preferences maybe your
enumerated plugins that's the kind of
file i/o I mean on the CPU side of
things it could be as simple as
determining you know maybe you've read
your part
is in and now your your sanity checking
them anything that's typically compute
bound now what's interesting here is
those two file i/o and CPU time compete
with the lower-level i/o the ideal case
we minimize the file i/o and CPU time
and we can make up much better use of
low-level I own the other thing that
sometimes shows up in untuned
applications is pauses by that I mean
either an explicit call to a call like
delay or a sleep call left in
accidentally left in to work around some
bug those are kind of hard to detect
usually you have to see that basically
the app is running but it's not doing
anything compute-intensive and it's not
doing i/o there's a couple tools that
that I'll get into a little bit later
that'll help find these these kind of
problems the other anomaly we see
sometimes is writing during launches
there's really no good reason that your
application has to write to the file
system during a launch now I'm not
talking the first time your app ever
launches on that system for that user
it's perfectly okay to go ahead and
write out your preferences for the first
time but statistically speaking your
typical lunch should not have any file
system rights the reason for that is
first rights are much more expensive
than reads and the whole launch
facilities the low-level i/o has
optimizations for reads it's basically
geared at reads and the write right in
the middle app will interrupt it will
essentially discard some of the
optimizations here's the profile of a
tuned at now both of these are what it
referred to what I would I explained
before as a cold launch this one you can
see has a lot more low-level i/o and
that's good because that's the best case
for us we can optimize that into the
largest chunks of i/o
that we can do and we can do them as
efficiently as possible um of course the
file IO and the CPU are minimized in
this case if this were a worm launch all
that low-level io would go away you
might see some more compute cycles but
the profile is quite different for a
worm launch okay so what does that mean
for both cold and warm launches you
should concentrate on CPU usage and
filesystem usage that's the area's
that'll pay back the most the best way
of doing that is is first do only what
you need to do look at what you're doing
in the launch of your applications if
it's the typical app you're probably
initializing a whole bunch of stuff that
you may or may not use during the life
of that app right look at deferring some
of that initialization this might be a
very good use of setting up a carbon
event timer to descend yourself a
one-shot timer to defer some of this
initialization or don't even do it when
the app is up and running events do it
when the user first uses that feature of
your application particularly if the
feature in question is a somewhat
optional feature not saying it's a bad
feature or anything I'm just saying if
your typical user base isn't going to
use that feature why pay for it why pay
for it upfront and your initialization
then the second speed up tip I would
have for you is eliminating some of
these things that you see during launch
just outright eliminating writing and
pausing of some sort that's a good a
good example of that dead code make sure
your tools are working for you in this
regard I'm talking about dead code that
you know might be in there for debug
reasons may be in there for whatever
reasons you have for tracing
filing things like that make sure that
that doesn't end up in your final
product just a little bit of code
sprinkled around means it affects your
locality means that code that would not
be on the same page code that should be
on the same page would potentially be
split up across two pages and that can
make a difference then there's another
kind of dead code that I would encourage
you to go after that's the dead code
that you've inherited over time now is
the time to get rid of that kind of code
that's checking to see if quick-draw
supports color no need for that anymore
you probably still have the check in
your code you probably still have the
code base that supports that check at
least pound to find that out if you're
your app and then of course redundant
i/o and I shouldn't be understated
redundant i/o is where you can actually
get a lot of time back from your
launches and now I'd like to bring up
neat and ghen entre who's going to go
through some of the details of file
system performance and help with that
rate redundant IO
[Applause]
this on ok it's on now
good afternoon so as John mentioned
redundant IO and in fact any kind of
file IO is a big burden on clock time of
your application doing anything to get
rid of this file IO or minimize it at
launch will pay off immediately and it
can either pay off at least minimally in
reducing system call overhead in the
best case scenario where you've got
buffer caches that have got your data
and they're already hot all the way to
if you're reading something off of a
network disk and you solve because it's
on a network so here are the areas that
I'd like to cover first is file
iteration metadata volume iteration well
actually you know you can read those
let's just get right into it first one
the venerable PV get cat info you're all
familiar with this call I'm sure and
when it when we were creating the carbon
API it was pretty much no question we
couldn't get rid of PV get cat info it
would have just caused a huge upheaval
in people's source bases everywhere ours
included and so there was just no choice
we had to provide it it had to be
compatible unfortunately we couldn't
make it performance compatible but that
was a secondary concern in the interest
of getting your apps onto 10 quickly the
bad news is the PV get cat info is non
optimal on any file system that goes for
9 and for 10 if you have file sharing on
on 9 for example and for the most part
it's overkill for all clients PB get cat
info just returns a huge amount of data
and the developer code out there that
uses PB get cat info ranges from using
none of it in other words just checking
an error code to maybe one field from
this enormous data structure let's take
a look at that data structure
in fact I couldn't even fit everything
that get cat info returns to you it's
just an enormous amount of stuff and you
know when it came time to or back when
PB get cat info was first created and
exported as a system call it made
perfect sense because it was a it was a
great reflection of the underlying
volume format
right on HFS discs the catalog
information is stored in one section of
the disk it tends to be hot in the
caches because everyone is using the
catalog files so PB get cat info tends
to be free and well while us while
you've paid the trap overhead you know
on a classic Mac OS system while you've
paid the trap overhead of making the
call and getting into the file system
and what-have-you
let's just return back everything that
we possibly can and guess what we did of
course and and everyone uses this call
it's plenty fast on nine there aren't
any real problems with it problem slowly
started creeping in with file sharing
again and things got much worse with ten
sort of as Nixon as a graphical example
of how people get CAD info works this is
on an optimal this is an optimal case
right here this is what file sharing
turned off this is you know on an HF S
or an AFP disk in other words all the
data that's given to you in one PB get
cat info call is in one contiguous part
of the disk and lo and behold it fills
in the parabola in one shot
again this is optimal let's look at what
happens with PB get cat info on other
file systems with Mac OS 10 now we have
the opportunity to support plenty of
other file systems then than we ever did
before
and it turns out that PB get cat info is
just not a good reflection of the
underlying volume format in order to get
some data we have to go to parts of the
disk different parts of the disk and in
fact a lot of those different parts of
the disk are completely disjoint which
means you make one PB get cat info call
on one of these file systems you're
doing numerous IO operations and I don't
think I have to say that that's bad
potentially if this is a network-based
disk
and as we move forward more of them will
be fortunately there is a good solution
and it's available on Mac OS 9 and later
the call is FS get catalog info the
interesting design point of this API is
that it takes a bitmap that allows you
to specify exactly what fields you want
returned to you again FS get catalog
info I didn't put a slide here with the
fields that it returns but it's a big
honkin per Ambler Slyke PV get cat info
but it does take a bitmap and so
whatever you're interested in you can
fill in those bits and FS get catalog
info we'll do the minimal IO required to
satisfy your request
now I can't emphasize enough that you
have to pass in just those bits that
you're interested in if you go ahead and
fill in the you know all of the bits if
you pass in ffff for everything you may
as well just use PV get cat info it's
not going to buy you much and you're
gonna pay the price like we saw in the
previous slide a good way to see exactly
what's going on as you know when you
make a request an FS get get catalog
info call to see what's going on either
the covers is to use FS usage this is a
tool that's on your systems and it will
be covered in the performance tools talk
I believe it's session 705 that John
talked about earlier it's great to
actually just write a simple little app
call FS get catalog info with the
various bits that you are interested in
and just see what's happening
particularly on these other file systems
you know something that's not h FS or
AFP NFS or u FS u FS is probably the
most readily available so here's a quick
little sample give me given an FS Ref
tell me if this item is a folder notice
that the only bit that's passed to F the
FS get catalogue info call is the FS cat
info node flags because that's the only
bit that we're really interested in so
on an NFS or a u FS file system that's
all we have to worry about and we can do
that in one system call at most in fact
in a lot of cases we can do it in zero
and calls and then of course the field
is ended in return the next topic is
volume iteration so when it came time to
create the Carbon API we were going
through and pruning out a lot of areas
that we just couldn't support in the in
carbon on OS 10 one of them was the low
mem to get at the VCB pointer a lot of
Carbon apps or a lot of Mac OS apps saw
this as a free way to get access to all
volumes to enumerate all volumes with
zero i/o and you know I don't have to
tell you just in memory copies are very
fast very efficient however when we
created the Carbon API it was pretty
clear that we couldn't support direct
VCB access and our recommendation was
and continues to be to use one of the
get volume info type calls specifically
in the documentation we mentioned H get
V info
however the problem the problems that we
have with PV get cat info are the same
problems we have with pdh get V info and
it tends to be very expensive it returns
a large parameter block for most uses
you probably just don't care about a lot
of that information and exactly
analogous to FS get catalog info there
is an FS get volume info call again
passing the minimal bitmap that you
require and we will do the minimal i/o
in a lot of cases it will just be an
in-memory copy for us out to your
parameter oh I owe the FS ref ap eyes
are the primary ap eyes and the
preferred api's in Carbon on OS 10 in
fact all of the major clients of the
file manager on OS 10 today the file
manager navigation services and cocoa
open save all use the FS ref API and
have actually seen huge performance
gains in doing so we've seen performance
gains not just on HF on on NFS and
network based file systems but in fact
we've seen them even on local HFS discs
so it's def
at least something worth investigating
so on to file i/o first thing I'd like
to plug here is what some of you
probably recognize as being a relatively
old tech note I think it was put out in
93 or 92 I'm not really sure plant
manager performance and caching turns
out a lot of the lessons that are taught
in that tech note are relevant today and
I highly recommend that you go back and
read that some of the biggest points in
it of course are to use large page lined
i/o wherever you can if you're picking
through little bits of data in a file
don't push that down to the file manager
or the file system and you know don't
pay the system call overhead just
because you're doing little bitty writes
do larger page lined i/o in to
preferably page lined buffers and you'll
get the maximum throughput from the file
system a lot of times without even
having to do copies from the kernel
buffer if you pass us page aligned
buffers as well again let me put in
another plug for FS usage this is a
great place to look and you can actually
see exactly where reads are coming
through rights are going through and how
much data is actually being read her
read or write this will help you
identify quickly where you're spending a
lot of your time doing up doing a lot of
small iOS the next point is don't
pollute the cache this is covered in the
performance tech note and it's something
that a lot of clients overlook and it
really shouldn't be because it tends to
pay off in big wins
what I mean by not polluting the cash is
you as the app developers know exactly
what your usage pattern is going to be
for any bit of data or large chunks of
data you know that if you're streaming
in if you're importing a file that
you're just not going to look at again
say you're importing from some foreign
file format into your own internal
representation you're just not going to
go back to that file again to do the
read and if you're talking about a multi
megabyte or or you know it doesn't even
have to be that big multiple hundreds of
K file if you just do reads without
passing the no cash mask what you're
actually doing is filling the buffer
cache with data that you know that
you're never going to read
turns out that by passing the no cache
mask you're not actually hurting
yourself you're not hurting your
throughput by doing those fresh reads
from these files but by actually filling
the buffer cache with these blocks that
you in fact know you're never going to
read you're evicting other blocks other
blocks that the user probably cares
about a lot more than you care about
this imported file and the same goes for
write say your say user select save as
and you have some whatever you're saving
your your internal representation out to
some file format that you're not going
to read again again you know that you're
not going to be doing this so if you
pass the no cache bit to these rights
you're not going to pollute the users
buffer cache and in fact you're not
gonna pay any performance penalty by
passing the no cache fit you're just
gonna make it an overall better
experience I think a big reason why this
isn't used so often is because it
doesn't look like it's a performance
game in other words when you go and
change your code it's really hard to see
the benefit of this the reads that you
are doing or just as fast the rights
that you were doing or just as fast and
if you're not if you haven't evicted
pages from the buffer cache that you
really cared about you're just not going
to notice but it is still very important
and I strongly encourage you to
to look at to look hard at where you're
doing IO and what kind of IO you're
doing and pass the no cash mask where
you can internally just a couple
examples of where we use it or when
we're doing find your copies the finder
is doing a copy you have a folder from A
to B it knows the finder itself knows
it's never gonna or it's very likely
that it's not going to look at that data
again unless the user requests it so
there's no reason to flood the entire
buffer cache with these copied blocks
instead the users data can stay intact
and the finder copy can execute just as
quickly as it did before the other area
is in iTunes when it's encoding a file
or or ripping a file from CD and writing
out to disk iTunes itself knows that the
chances are very slim that it's going to
actually go back and read those pages
again so it passes the no cache mask and
it turns out that because of that they
use the user experience on 10 is a
little bit better even though it's kind
of hard to really quantify that and look
at it you kind of just have to know that
it's better and know that you're doing
the right thing okay writing large files
one common technique that has sort of
been passed down from generation to
generation is when you're writing a
large file to first to do a set EOF or
an FSF fork size to the final length of
the file that you are actually writing
and then back up and start filling in
the data this has a couple of advantages
and this is why it's been done over time
first of all it is a good preflight to
allow you to know whether or not you
have space on the disk to actually do
the i/o and then second it's also a good
way to reserve a portion of the disk a
hopefully contiguous portion of the disk
so that when you go back and you're
actually doing your rights you know that
you're getting your writing to
continuous parts of the disk and
subsequent reads of that document or
that data off the disk will be fast
the problem is on OS 10 for security
purposes when you extend the file what
we do is we zero filled the entire file
from the current AOF out to the very end
where you extend it and the reason we do
that of course is security we don't want
some malicious program to run on your
disk reserved as much space as possible
and then potentially sniff through
looking for Social Security numbers or
credit card numbers or what have you so
this is why we do the zero fill of
course it has the downside of producing
double iOS in this very common usage of
the file manager the double iOS come
from first when we do the zero fill of
that extended area and then later on
when you actually do your write if you
write a little app on OS 10 right now
that all it does is just create and open
a file and then just do a set EOF of a
gig you'll notice that before that set
EOF returns your disk will be buzzing
away and when you go and do a subsequent
read you'll see that it is all zero
field and that's exactly what I'm
talking about
we're looking at ways to fix this in the
near future but the truth of it is we've
shipped this way this is already on
customers disks so it's something that
you should probably address now and
fortunately there are a couple of ways
that you can address it you can use the
pb allocate call this does not have the
zero filling behavior however it does
preserve it does allow you to reserve a
portion of the disk for a contiguous for
a contiguous file on the disk or the
other thing you can do is just write
just start writing if you're not doing
as long as you're not doing set e of's
followed by a right you won't get this
double i/o if you're just doing writes
pass the end of the file then that's
enough of a trigger to the filesystem
that your that any subsequent reads are
just going to pick up that data that was
written so we don't need to zero fill
and we don't
finally file assumptions I couldn't
think of a better topic for this they're
heading for this slide so I just put
this since the beginning of personal
computers we've been able to make some
assumptions about the the layout of
disks and the layout of hardware and
usage patterns and things like that and
as we move forwards more and more of
those assumptions will prove to be false
or can prove to be false under certain
situations one of these assumptions of
course is that your disks or the user
disk is locally attached to the machine
that all user data is coming off of a
local disk all preferences are coming
off of a local disk and in fact document
directories and things like that are on
local disks well networks are getting
faster all the time and pretty much
right now they are fast enough that in
some situations you can actually create
a an environment on the disk where user
preferences user documents and various
other bits of data are actually stored
on a network back to disk and it
provides lots of benefits in fact we
have this all set up at Apple right now
where users can log in to their machines
you log in in one workstation let's just
call it and you can you log in with your
username and your password all of your
preferences come up on that one machine
you can use your documents just as you
have in the packet just as you were you
know maybe in your office or you know do
whatever you want to you get all your
same preferences you go back to your
office all of your preferences are
updated because everything is on the
network it's a beautiful thing this
sharing and as we move forward it's
going to be one of those things that a
lot more users are going to be exposed
to but it does mean some serious
considerations for your applications in
other words preferences and documents
and things like that are no longer going
to be backed by local disk and this can
have this gonna have impacts on your
code
basse that you're probably not even
aware of just because when you're coding
or designing with some assumptions in
mind a lot of times you're not even
aware you're making those assumptions
and you know you'll tend to do things
like let's say oh I know that I'm
caching or I'm bringing this data down
off the network and I need to cash it
somewhere let me cache it in the
Preferences directory or let me cache it
to a temp file in the documents
directory well if those if those
directories are backed by a network
volume you're really not buying much by
caching something off the network to
another Network volume or is this is a
lot more common scenario when you launch
your app and you're doing tiny little
iOS to the Preferences file and this has
never been a problem because it's a
local disk it's very fast well if it's a
network disk that's a big window that
you can stall in and your users will
definitely notice trust me we've noticed
at Apple and we've been working with
developers where we can to point out
what's going on and help them work
through it but the best thing that you
can do is try to set up one of these
Hostel test environments in your own
offices and see for yourself
one of the best examples of this is to
set up an NFS based user directory and
login to that user and just double click
your app and see what happens or
double-click your app with FS usage
running alongside and see what happens
you'll notice that if you if you're
logged in as either a local user or as a
network user a lot of times you'll
notice a great variation in the
performance of your of your app and a
lot of that can be attributed to some of
these design decisions the good news is
that anything that you fix for the
network case will also benefit the local
case so if you're working off of may be
slower media or maybe not say you're
just working off of fast media you can
reduce the number of system calls that
you're making and speed things up even
in your local scenario so it's
definitely a good thing to look into
doing and check the Mac OS 10 server
documentation for more details
and with that I'll bring John back up on
stage talk about watching the
application bounce a lot those network
director connect network users when you
have your system set up that way we
typically see two or three times the
number of bouncing we do on the local
directory it's really something I
advised to take Newton's advice on that
one okay you've learned all of the
details about what you can do to help
with your file system performance I'm
gonna talk a little bit now about your
CPU usage first thing I'd like to say is
you're running on a pre-emptive
multitasking system but it's not magic
it doesn't give you more than 100% of
the CPU on a single CPU system it can't
give you free cycles a matter of fact
basically the gain is that one single
thread on that system is not going to
take over the whole system it's not
going to bring the system to its knees
so if you have a hundred threads that
all need to run they're all sitting
there have something to do even if it's
very little the scheduler has to take
into them into account that's why we
talked about making sure that your
threads are blocked that CPU is still a
limited resource so make sure when
you're using threads or timers
cooperative threads that you're taking
this into account the best tools on the
system to really look at this our top
time and CPU monitor CPU monitor you've
probably seen in some of the demos I
would advise just keep that thing
running as you're doing development just
keep it off maybe on a second monitor
it'll show you very easily when there's
a little bit of a CPU peak and you can
go in and see is that your problem or
not typically that is a great indicator
for when you have CPU bound problems
top is another one that you could write
run because it shows you a little bit
more than just CPU usage both of those I
would encourage as you know as you're
just doing your ongoing development on
on your app keep them running in some
window has clues to the possibility of a
performance problem
okay so responsiveness this is the next
area after launching file system and CPU
usage that I would encourage you to look
into mostly things to do with
responsiveness you should be able to fix
up fairly quickly by just taking a quick
look at what your app is doing in with
regards to event handling the biggest
indicator that you're not doing event
handling right is probably that you're
pegging the CPU you've seen this in some
demos the best and simplest workaround
for that is maybe look around if your if
your app is showing this behavior if
it's CPU bound and during tracking
during you know interaction with your UI
take a look
use sampler which is a tool that lets
you actually pinpoint where in your code
the problem lies search your code search
your code for still down and button and
look at how you're using how you're
calling these older calls that we really
would rather you get off of track mouse
location is your friend that's what you
want to be using that's the the basic
primitive for letting you do all sorts
of tracking in the UI that blocks
intelligently so after event handling Oh
in addition to those tools I would
encourage you to look on a developer CD
there's an application called appearance
sample which has almost every widget the
toolbox supports every control you've
ever seen go and play with that app run
CPU monitor you'll see all of those
those controls block well that's what
your app should do
if you're seeing different behavior in
your app it's either a problem in that
you've you've done your own kind of
handling your own custom control or
potentially in the way that you're using
the toolbox the next area that
contributes to your apps responsiveness
you know maybe it feels a little
sluggish maybe everything else is
looking good your file system
performance in your launch is good but
when you activate Windows things don't
appear as snappy as they do say online
that's probably an indication that you
have a drawing problem the best tool for
that is quartz debug you've probably
seen it some of the other sessions it
should be in the performance tool
session as well because it'll let you
see when you're doing redundant drawing
when you drawing the same things over
and over the other typical pitfall that
we've seen is people back buffering that
doing their own double buffering further
drawing when the system is already doing
that for them so make sure you check the
port using QT is port buffered and if it
is then you don't have to do that
buffering yourself that's being done for
you the next area on responsiveness has
to do flushing because you have a back
buffer it means there has to be a time
when you actually get those bits in a
back buffer to the screen generally you
should try to avoid flushing the system
will do flushing for you basically on it
event boundaries it'll try to do that as
intelligently as possible so you
shouldn't have to flush the two
exceptions two examples of exceptions
are when you're doing some kind of
animation you want that to get to the
screen right now or when you're not
really involved with events at all the
splash screen case those are good uses
of explicit flushing otherwise let the
system do it for you another common
performance problem area is with regards
to the window resizing and this also has
to do with the back buffer in the design
of the window system on OS 10
and basically our advice there is to try
to do this all in one fell swoop with
set window bounds instead of trying to
use size window and move window in
combination that's what that call it
exists for to optimize those cases also
the other thing that we've seen with
regard to windows is pervasive or heavy
use of invisible windows windows are
generally more expensive on OS 10 the
back buffer in addition the overhead the
interaction with the core graphic system
in the window server you may have Vince
invisible windows that doesn't mean that
they don't cost anything that
manipulating the visible windows isn't
doesn't come for free as a matter of
fact I wouldn't look into why are using
invisible windows it is often the case
that you can ditch that window dispose
it create a new one and redraw faster
than you can by twiddling that invisible
window in the last one I wasn't going to
put on here at all but I figured I would
try some of the tools on and look at
various apps a couple days ago and I
just happened to notice that one of one
of the apps that I use every day was
doing file i/o when I activated a window
and I just sitting there scratching my
head finger trying to figure this out
and it's just a bug but this is one of
those things that you probably wouldn't
notice unless you're running a tool that
tells you that that's happening FS usage
is perfect for that
yeah FS usage lets you basically see the
file i/o that's going on in the whole
system and a particular app you can
filter things out another use if you're
really going after file and anything in
particular is is sampler which lets you
get tie the the usage pattern back to
your code okay pulling versus blocking
I'm sure you've heard this a lot in
various talks I'm going to talk a little
bit about some of the more atypical
situations in which you find pulling
affecting performance way next event is
actually pretty typical but the the
fallout of using way next event zero is
where we sometimes see some problems so
just to make sure we're all on the same
page here you really shouldn't be using
wait next event zero a very simple way
to get rid of waiting x event zero so if
you have something that you want to do
periodically set a Carbon event timer to
do it with the frequency that you need
and use wait next event very long time
now time and tick count that's something
that surprised a few of us basically
that we did some performance profiling
and various apps show up a tick count is
taking up a significant amount of time
and one of the reasons is that account
costs more than Adonis on OS nine but
it's also used all over the place and a
lot of UI in places where it really
doesn't have to be used first of all
we're talking about it's something
that's very like coarse-grained right
ticks sixtieths of a second calling it
more often than 60 times a second
doesn't make a lot of sense so the best
advice I have for you here is to try to
use the event system try to use the time
stamps that are in events and look at
events like that there's often
comparisons made to time now and that
can get you out of polling essentially
four tick count and have it show up in
in it being a performance problem
another often asked for a bit of
information is the volume list Newton
went through earlier how to do that as
efficiently as possible
but I think it's a pretty rare case
where you actually need the volume list
I would suggest you just get rid of that
code altogether or figure out what you
really need if you're trying to find out
about new volumes or if you're trying to
find out about volumes that have just
been unmounted register yourself for a
carbon event for volume mount and
unmount ask the system to tell you about
it instead of periodically going out and
looking at all the volumes and trying to
figure out what happened
same kind of thing goes for preference
change notifications there is a theme
change applicant there is a various new
carbon events that let you know about
about things actually up on the volume
slider I should probably also have
another bullet for processes if you're
trying to find out what process just got
launched or what process died there's a
carbon event for that as well generally
we've been really looking at the system
to try to find out if there's any
legitimate need to do polling the answer
should be known we're trying to notify
you of everything that you might find of
interest it's a much better solution on
OS 10 so if you guys see things that
you're still you still think you have to
pull forward let us know about it we'll
figure out a better way to do it and on
this final note maybe some of you have
heard heard this bit in the application
packaging and a document binding
presentation yesterday we need
notification to particularly in the
parts of the the system that presents
the file system visually so in the
finder in nav services in open safe
panel those are showing you the file
system objects they are not pulling we
don't pull so we need you guys to if
your participate in this kind of thing
if you're an installer or if you're
copying files to a place that is likely
to be visible to use the effin notify
call lepida notify is a ten only API
it's in files dot H basically says
something happened something changed in
this directory takes an FS ref it lets
us know that something changed and we
should refresh the contents of that
directory in any UI elements they care
do this intelligently know if you're
copying a whole bunch of files to a
single directory let us know when you're
done with that copy operation not at
every file ok resource manager use the
resource manager is very tied to the
file manager so in in essence I I could
just repeat what Newton said earlier
about the files file manager but it's
actually worse
in that the resource file format was
designed way before BM systems were
really commercial like they are now and
really the file format is not designed
in for a VM system in that there's a
resource map in one part of the file and
resource data and the other part of the
file and there's no way around going to
both places every time you need data
you're asking for a resource you have to
go look up where the resource is in the
file in the resource map that's at least
one IO then you got to go to where it it
told you the resource was that's a
second IO okay that's bad enough then
you look at what's in the typical
resource file and you see lots of little
resources that's also bad right you
already heard how we really like to see
very large items that's the ones we
optimize we get the best bang for the
buck at it so if you have any tools or
if you have in the past done any kind of
coalescing of your resources if you have
resources that for example stur
resources that could be combined in the
stir pound it's a much better a use of
the resource manager going and reading
eight bytes out of the resource managers
about one of the most expensive kinds of
IO you can do the other thing that we
see with regards to resources
particularly in the use of plugins is
numerating your plugins opening up their
resource files to find out something
about them opening and closing the same
files this pattern that we've seen and
we really like you to avoid that perhaps
what you can do is cash that and open
you know cache the results in one file
and open that minimally make sure that
you just do that scan once when you
actually have to find out about your
plugins and then for historical reasons
just lots of calls to update res file
which writes out the map and the data
for your resource Forks just kind of
call it willy-nilly it's more like a
flush in front people's minds and that
causes IO
okay the last bullet item on here is
something that we added to OS 10
basically a feature in the resource
manager to kind of help out with these
sets of problems the what we did is add
a new key to your info.plist called CS
resource file mapped and if you set this
it's a boolean key if you set this to
true it'll change the behavior of the
resource manager with respect to your
applications resources what it'll do is
it'll open them up read-only okay you
can't write to them and it'll file map
them and then there's some support in
the memory manager to support file
mappings so that we don't have to
allocate for the data that's in your
resource fork which saves on your memory
footprint and because it's all file map
now we get a lot better characteristics
of Io because yes we're still hitting
that resource map and we're still
hitting the data but there's some
locality there when we go to the
resource map the second time it's likely
to be on the same page and when we go
for data if you're going through the
data of a certain type this will depend
on the the organization the data it's
likely that we're going to get some good
win there the only caveat to this and
the reason why we didn't turn this this
behavior on by default is that it'll
break some of your code at the point
where you say yes turn this on then all
of the resource handles that you get
back essentially the pointers in them
point to read-only memory if you try to
modify that your application is going to
crash so there's plenty of folks who've
just turned this on and don't write back
to the resources right I mean in general
particularly the application resource
file you probably don't want to write
too because it might be living on a CD
it might be reading on a network volume
that other people are using in general
it's bad practice but let's say you were
writing things to the resource but not
actually flushing them out to file you
can still do that by detaching the
resource thereby getting an mm
coppy mess with a copy and everything
else still works correctly so this is
something that you have to turn on
yourself and it's fairly straightforward
to debug because it basically leads to
crashes and if you have any questions
about exactly what the info.plist is I
would recommend looking at techno 2013
ok the next section is memory usage this
can be a real big problem on 10 largely
because of the big difference in between
the memory models and that you just have
a very large and sparse address space
for example just right off the bat you
could accidentally allocate you know
order magnitude off what you intended to
allocate and not even know it the system
will give it to you mem full errors are
relatively rare on this system and that
can be a problem
so in order to keep on top of this I
really would recommend getting familiar
with both the leaks and the Malik debug
tools leaks in particular you want to
keep an eye on you may not notice the
performance necessarily so much in your
your app it may be that a slow leak over
time but it really does affect what's
going on underneath the covers in that
you know you don't get reuse of the same
memory blocks and it'll lead to paging
it'll lead to general bad character a
ssin so it'll lead to your app generally
feeling sluggish aside from leaks I
would really recommend that you get a
good handle on the size of your
application particularly make sure your
tools are doing the work that they
should for you make sure that things
that are actually constant in your
application end up in the right section
so that they don't get so that the OS
takes the maximum advantage of that we
went through a lot of the carbon
frameworks early on and got a lot of
gains by doing this basically marking
strings and other constant sections as
constant so that they could show up in a
text section that gets shared across the
system
same thing goes for your app the third
thing on memory usage is there's really
been a reversal in terms of handles and
pointers on OS 9 the handle was really
the first-class citizen
it was designed to work with that
limited application partition that heap
and was designed to be reused inside
that limited space on 10 the reverse is
the case pointers are really the
first-class citizens and there's some
cost to handles so in performance
critical code look at rewriting to use
pointers instead of handles we found one
case where just removal of h lock and h
unlock in this code path made a big
difference in terms of performance the
reason there was that the the locking
costs are sufficiently higher on 10 than
they are on 9 and what work had to be
done was really impacted by that and
this is something that you should do
kind of carefully the OS itself I mean
if you're looking into this kind of an
optimization the OS itself doesn't this
doesn't rely or doesn't go and purge and
move handles out from under you that's
really under your control intent and so
if you know that you're not resizing the
handles somewhere else in your code if
you know that you're not looking at the
handle to see if it's locked then you're
likely able to make this kind of an
optimization then lastly I hope it's
pretty obvious is that there's really no
purging the calls are still there
they're largely there so you can run the
same app on both 9 and 10 but your
purged crocs are not going to get called
the application heap is not going to
fill up and if you're relying on
basically allocating allocating
allocating until you get called in your
purged proc that's not going to happen
that's the biggest leak you can ever
have so
take a look at that if you're you're in
that kind of a category okay code
loading this is something that kind of
referred to earlier on and launching a
bit in that I said something to the
effect of defer some of the things that
you do at launch time to later on and
one of the ways in which you can do this
is to factor your app most application
code bases start off and are organized
basically you know by the people that
work on them first so you get you know
you get Kelley's feature and Mary's
feature and John's feature and they go
off and do those different pieces of it
and then neaten comes along and he has a
new feature to add and pretty soon
you've got one little piece of the app
whether it's a shared library or a
plug-in for each person that's working
on it and soon those features grow up
and there's a whole team around each of
those features and before you know it
the organization your app looks a little
bit like the organization your
organization right and that's rarely the
best organization in terms of
performance you really want to look at
the features in terms of what the app
really needs probably want to look at
layering and dependencies so factoring
in your app in terms of performance is
something that I would advise you to do
it's usually not something that you
would do quickly it's something you
probably converge on over time look at
plug-ins for things that are truly
optional again I don't mean this is a
optional innocence no one would use it
an example of this in the real live OS
is
nav services and printing those are both
good categories from the OSS point of
view of services that are purely
optional in that the application your
application can run fine do lots of good
work and never interact with now
services or printing so why should they
pay the cost upfront
the answer is it shouldn't so look for
those kind of opportunities in your app
maybe there's a plugin no you know has
all the bells and whistles that you
could ever want but you only use it once
in a blue moon factored that out so you
don't pay any costs for that don't give
that app that plug-in a initial load and
say are you happy with things that will
cost then finally look at your libraries
look at the number of libraries that
you're using libraries when backed by
files are costs that costs all the way
down to kernels the kernel has a fixed
cost there's a prosper process cost if
your tools support it take libraries and
combine them together merge your pep
libraries together into one big file
that's in fluent better from the
performance and resource use point of
view okay
in the async i/o space we've seen some
problems that are kind of interesting
and combination of async i/o and
threading asynchronous i/o on ten by
that I mean deferred tasks file manager
time manager those are all implemented
based on running the operation in
question synchronously on a thread that
the OS creates for you and particularly
when used with cooperative threads or
working use combined there's additional
costs there in general this is not
performing as well as the equivalent on
nine this is a case where I would
recommend do it continuing to do async
i/o and chain completion routines on
nine factoring your app and dynamically
checking and doing something entirely
different on tenth the simplest
workaround or the simplest solution on
ten is really just to use threading in a
synchronous i/o model and then just a
data point we ran with one of there was
an app or two that we've seen that
implement threaded packages threading
packages on top of the time manager the
time manager is itself implemented as a
threat
which means you're threading on top of a
thread that's scheduled by the kernel
which is you're trying to run threads on
top of something that's already being
managed by something else not a good
performance solution in the context of
cooperative threads there's one basic
flaw with cooperative threads and that
is that the norfolk whopper threats that
gets scheduled you have to yield and
there's no blocking going on so by their
very nature cooperative threads are
compute bound that's the biggest problem
they're still there because we know you
have code that depends on it I would
really look at not using cooperative
threads or potentially using timers
Carbon event timers instead or moving
your code off to MP threads often
there's a performance problem with
regards to messaging between threads
this usually has to do with messaging as
opposed to or polling to see if some
message base thing is complete versus
true messaging and there I would just
encourage if you're doing things with
multiple threads even across processes
make sure that you're not getting in
this situation where both of the threads
are competing the casing in the data
point that I have was basically one
thread was doing a lot of file i/o and
was reading and writing to a file and
the first thread was basically looking
to see if it was done and the cleaner
solution to that and a better performing
solution is basically have the the
second thread just block and when the
file i/o thread is completed just send
it a Carbon event to get that whole
thing to work well and then finally seen
situations where basically people just
go thread happy they have just way too
many threads for no real apparent reason
and just bear in mind that each one of
those threads has a real cost there's a
in a wired memory cost in the kernel and
they're not free so use them
diligent and then finally threading in
general can be used to really help out
with performance I would say look
particularly look for things like when
you're trying to do a safe save or a
fast save kind of feature that's a very
good use of a thread you can create the
thread do that you work on it and
dispose it a network listener is another
model that works really well out where
the thread is just basically listening
for incoming activity and occasionally
there's a good use for threading when
you're doing low priority idle kinds of
computation maybe you're indexing
something in the background something
like that okay so finally at the summary
really want to encourage you to factor
performance into your planning try to
really make it be a feature of your
application we want that killer app to
be that much better by performing well
performance isn't a one-shot deal you
really have to keep it in your your
workflow you got to keep on on keep on
top of it ideally you would you know
we've built different builds of your app
try to capture data about that about how
it performs and see and pinpoint where
performance problems were introduced and
then I really encourage you to get into
the tools the tools talk is later on
this afternoon and all those tools are
on the system
you should just become experts at those
tools allow you to look at your app in
various different ways and they've
really helpful in pinpointing these
problems and lastly just go after those
performance problems all right so now
let's see oh yeah one last thing so the
first one the carbon developer
documentation
you should just generally know the
second one if you're not ready to do
anything with performance at all you're
stuck behind a whole bunch of a couple
months worth of features on your app
you're still carbonizing anything like
that at the very least remember the
second URL up here that
performance PDF file has a lot of
information on performance a lot of what
I've gone over and what we'll be going
over in other sessions is in that one
document okay
I'd like to bring mark up and then we
can help do the roadmap and then we'll
head off to Q&A
thank you John there we go
that won't work okay so as John
mentioned we'd like you all to attend
the performance tool session at 5:00
today if you possibly can we're gonna
talk about the various tools that he
introduced you to and also because we
don't have a lot of time right now to
take questions I ask you to take your
questions to that session and we'll have
some of the same people there to answer
them but we'll just take a couple right
now so if we can bring up our Q&A panel
we can do maybe two or three questions
oh by the way this is me so if during or
after the conference you have any
questions or comments about carbon or
carbon performance send them to me at
this address
you