WWDC2013 Session 704

Transcript

X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
[ Silence ]
>> Good afternoon.
My name is Anthony Chivetta.
And I'm an engineer in
the OS X performance team.
And I'd like to talk to you
about building efficient
OS X apps
and cover some advanced
topics in resource management.
Now most of you are
probably familiar
with performance
testing in some form,
whereby you evaluate how long
it takes your application
to perform a specific action.
What I want to talk
to you today is not
about performance optimization,
but about resource optimization.
Looking at-- whether looking
at latency of an action,
how much resources it
consumes to achieve its goal.
Now, one of the problems that
we face in resource management
in OS X is that it's
fundamentally a multitasking
operating system.
If you're coming over
from iOS, you're coming
from an environment where
there's one application
that a user is actively
using at a time.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
And so, that application
can be provided the full use
of the system's resources.
On OS X, however, a user
may be running multiple
apps simultaneously.
And so, those apps, consumption
of system resources can affect
each other's performance.
As a result, it's very important
that your app uses system
resources efficiently in order
to help create a
great user experience.
So today, we'll cover
a couple of topics
about resource efficiency
including how to profile
and reduce your app's
memory footprint,
how to optimize your access
of a disk, and how to do work
in the background without
impacting system responsiveness.
So I want to talk
first about memory.
And let's take a look at a
simplified view of a system.
So we have a OS X system with
a number of apps running,
and some of those apps have
been provided in memory.
There's also memory that
is currently unused.
And this isn't really providing
any value to the system,
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
it's just sitting there.
And some memory has been devoted
to caching the contents
of files on disk.
Now, as apps request
more memory,
we'll first provide the unused
memory to those applications.
Now, apps can continue to
request memory and will continue
to provide the unused memory
until there's no more unused
memory available on the system.
And this isn't a problem.
Unused memory wasn't providing
us any value in the past.
But if apps continue
to consume more memory,
we'll eventually need to start
providing them the contents the
disk cache.
And this is relatively efficient
because the disk cache is just
holding data that's already
stored on disk.
So it can simply discard it,
turn it into unused memory
which is then provided
to an application.
But we now no longer have
that cache data in memory
which means access to
disk by application
to the system may take longer.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
This is where we'll begin
to see the responsiveness
of the user system decrease.
Now, where things
get really bad is
when apps continue
to request memory.
In this case, we'll need to
do something called swapping.
We'll take the contents of
memory from one app and save it
to disk, and then provide that
memory to a different app.
Now, the problem is this takes
a long time because we have
to write out the contents
of memory to disk.
And if the original app tries
to access that memory again,
we'll have to pull that
memory and back off disk.
And both of these actions
can introduce large latencies
and cause responsiveness
problems for users.
But let's take a
look under the hood
at how this works in practice.
So every app on a system or a
process has an address space.
If you're a 64-bit app,
this is the 64-bit range
that your pointers use.
And that address space is
broken into 4 kilobyte pages.
And of course, the system
also has some amount
of actual physical memory.
And virtual memory allows
us to establish a mapping
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
from that address space
to a physical memory.
Now, when we need to swap,
what virtual memory allows us
to do is disconnect one
of those physical pages
from the virtual page that
it's currently backing.
And then we can use that
memory somewhere else.
But if the app wants to access
that location memory again,
it will cause what's
called a page fault.
The operating system
will then be able
to pull the data back off
disk, place it somewhere else
in the RAM, and reconnect
that page
in the virtual memory mapping.
Now, what's important
to understand here is
that this happens as soon as
the application tries to access
that memory which means
that executing any code could
potentially cause a page fault.
And this is what makes
swapping so dangerous.
The application has no control
over when these accesses
to disk happen or what
thread they happen on.
And as a result, it's
very important to try
to lower the memory
footprint of your app.
This can help reduce the chance
that your memory will be swapped
when the system is under
low memory situations.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
It means that more memory will
be available to you quickly
when you need it,
and it improves overall
system performance.
Now, the first step in this is
going to simply be to profile
and reduce your app's
memory use.
And instruments come
with two templates
that can be of great help here.
The first is the
allocations template.
And this can profile the
objects that your app allocates
so that you can find
targets for optimization.
This might include large objects
that you want to make smaller,
or objects that are allocated
frequently which you can try
to reduce the quantity
of their allocations.
There's also the leaks template,
and this helps you look
for objects that are leaked.
Leaked objects, objects to
which there's no longer any
references, and so you
cannot release them anymore.
They're simply going
to stay in memory
until your app is terminated.
If your app is not running,
this can cause unconstrained
memory growth.
And so, the Leaks tool
can help you find leaks
in your application and
then analyze those leaks
to understand their
cause and fix them.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
Now, both these tools will
be covered in much more depth
in the Fixing Memory Issues talk
and I highly recommend
you attend.
What I want to discuss are
some more advanced tools
and techniques you can use
that helps keep your memory--
your application's memory
usage small and continue
to have efficient
applications over time.
And the first thing you should
consider doing is automating
memory testing of
your application.
Hopefully, you do some
sort of regular testing,
whether that's a nightly
test suite, unit tests,
functional tests,
continuous integration,
or simply just a set of
actions you confirm continue
to work before you
ship your app.
Whatever it may be,
integrating memory testing
into that can give
you a quick barometer
as to whether a particular
change
in your app has introduced
any memory regressions.
And you really want to
look for two things.
You want to look for
increases in memory consumption
that you don't expect, and any
new leaks in your application.
And you want to consider
any leaks that you find
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
to be a bug you should
immediately fix
because this is important to
reducing engineering debt.
Fixing leaks in old code that
you don't maintain familiarity
with can be incredibly
difficult.
But, if you're able to find
and fix leaks immediately,
you can help prevent incurring
an engineering debt over time.
There's a couple
of tools we provide
that can help you
automate this process.
And the first I want to
talk about is the Heap tool.
This is similar to the
allocations instrument.
But you can run it in
an automated fashion
from the command-line.
So the first thing you want
to do is simply run your app
and put it through its paces
and then run the Heap tool
and provide the name
of your application.
The tool will then analyze the
running application in memory
and provide you a list
of all the objects
that that application has
allocated including how many
times a particular
object has been allocated,
and the total amount of memory
used by that type of object.
Now, you can compare this
between multiple releases
of your app to understand
whether you've caused memory
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
regressions and look for changes
in the memory use of
your applications.
If you look at the [inaudible],
there are also a number
of other options that
can help you dive deeper.
Now, on the leaks
side of things,
we also provide a
leaks command-line tool
which you can use
to automatically detect
leaks in your application.
And when you run it, the first
thing you want to do is turn
on MallocStackLogging.
You can do this with the
scheme editor in Xcode
by checking the stack
logging box
or setting the
MallocStackLogging equals 1
environment variable.
Then, run your app as you
might when running heap.
But instead, we'll
now use the Leaks tool
and leaks will then provide us
a couple of pieces of output.
The first is how many objects
were leaked by your application
and what size and
memory they consume.
And then for each leak,
the address of the object
and the type of object.
In this case, we
leaked MyLeakedClass,
an Objective-C object
from MyApp.
And then because we're
using MallocStackLogging,
we'll also get the full call
stack that allocated the object
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
which can help you narrow down
where the object came from
and then provides you a starting
point for future analysis,
perhaps interactively an
instrument with the Leaks tool.
Now, you may have already
eliminated the leaks
in your app, ensured that you
don't see any unbound heap
growth and optimized there.
But one other place you can
look for additional memory use
that you can slim is
duplicated objects.
Your application probably
pulls in data from the network,
or the files on disk, or accepts
information from the user.
And it's easy to accidentally
produce extra copies
of that data.
The stringdups tool can
analyze your application
and let you know when you have
duplicated C strings, NSStrings,
and other types of objects.
To run it, you'll
simply go on stringdups
and provide the process
ID of your app.
And there are two modes that
you might want to consider.
The first is the No Stacks Mode.
It simply gives you a listing
of all the duplicated
objects in your application.
This is really helpful for
deciding what things you want
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
to target to as far
as slim down.
Now notice when you do this,
you'll see that there's a lot
of strings from localization
and frameworks
that you'll find duplicated.
And those are simply result
of how those frameworks work.
What you want to look for are
large numbers of duplicates
and strings that your
application has created
that contain for example
content specific to your app.
Then once you've picked
a duplicated object,
if you want to dive deeper into,
you can use the call stacks view
and this will show you all
of the locations in your app,
where that particular
object was allocated.
Now, you may have done
all these things to try
to slim down your app.
But sometimes you're
still going to get
into a low memory situation.
We refer this as being
under memory pressure.
I want to talk about
what the system--
what you can do to help
the system behave better
in this case.
So let's look at
just a single app.
Now, the first thing that
we want to be aware is
that the system internally
has a gauge memory pressure.
This is roughly,
an approximation
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
of how difficult it is for the
system to create new free memory
when it's requested
by an application.
And there are two
tools you can use
to help the system
alleviate memory pressure
and restore the system
to full responsiveness.
The first is NSCache.
This is like a container
for objects
that the system can
automatically evict
and allow to be reclaimed.
And, purgeable memory
which are regions of memory
that the system can reclaim
automatically without having
to interact with your app.
So in this case, if our app
requests memory, the system can,
rather than swapping, acquire
memory from the NSCache
in a purgeable memory region.
[ Pause ]
So let's dive into this
a little more deeply.
The first thing I want to
talk about is NSPurgeableData.
This is how we expose purgeable
memory through the Cocoa APIs.
And as purgeable data,
it's similar to NSData
but it has the property that
its contents can be discarded
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
automatically when the system
is under memory pressure.
So in this case, we have
NSPurgeableData object
that points to a
purgeable memory region.
When a system gets
under memory pressure,
the purgeable memory region
is reclaimed by the system.
But the NSPurgeableData
object stays around.
So this can query for the status
of that memory region later.
Let's look at an example
of how this works.
So in this case, first create an
NSPurgeableData using some array
of bytes we have in
our code already.
And then we indicate
that we're done using it
by calling endContentAccess.
Sometime later, if we want
to access that data again,
we call beginContentAccess
and look at the return value.
If the return value is No,
then the data has been purged
from memory and we'll need
to regenerate that data.
For example, by reparsing
a file or redownloading it
from network depending on where
the original data came from.
If the answer is Yes, then we
can continue to use the data.
And eventually we'll want to
call endContentAccess again
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
to indicate to the system
that we're no longer using it.
By bracketing your use of
the purgeable data with begin
and endContentAccess, you ensure
the system will never remove it
from underneath you.
Now, the other approach
I mentioned is NSCache.
NSCache is a key value store
like an NSMutableDictionary.
But it also has the advantage
that it's thread-safe, meaning,
you can use it from any
thread in your application
without requiring
additional synchronization.
But the special property of
NSCache is that it's capable
of automatically evicting
objects on memory pressure.
This means that you can put
as much data into the NSCache
as you'd like and it will
automatically size itself
to an appropriate size given
the current system conditions.
It does this by simply
releasing its strong reference
to your objects upon eviction.
So once you have
another reference to any
of your objects, you can be
sure they won't disappear
from behind you.
And it uses a version of
least recently used eviction.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
Should expect the contents
of an NSCache will eventually
be evicted if not accessed.
Now, you can actually combine
NSPurgeableData and NSCache.
And this can make working
purgeable data objects a little
bit easier.
NSCache gives aware of when
NSPurgeableData objects have
been purged from memory.
And so, in this case, we placed
an NSPurgeableData object
in our NSCache.
The system reclaims the
purgeable memory region.
And then the NSCache will evict
the NSPurgeableData object.
So future look ups for its key
will not return any object.
So I mentioned memory regions.
Well, what exactly
is a memory region?
Purgeable memory
regions are one type.
But there's a variety of types
of memory regions on a system.
Let's go back to our
view of virtual memory.
I mentioned that a process
address space is divided
into 4 kilobyte pages.
Well, there's actually one
more level of obstruction here.
The process of address
space will first be divided
into a number of regions.
These regions, each
are then subdivided
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
into 4 kilobyte pages, and
those pages inherit a variety
of properties from the region.
For example, the region
can be read-only or read
and rewritable, it may be backed
by a file, might be shared
between processes, and
these things are all defined
at their region level.
And then of course, these
individual pages may
or may not be backed
with physical memory.
And we've been talking mostly
so far about objects that exist
in your process' heap.
But there are a variety
of other types of regions
that consume memory
inside of your process.
Now, I want to talk a
little bit about those.
So first of all, is this
actually an important thing
to be aware of?
Well, I did some analysis of a
couple of example applications.
The first was a media
player app.
And in this case, only 34
percent of the memory consumed
by the media player application
was actually due to heap memory,
the rest came from
other types of regions.
Now, graphics memory is
often not part of the heap.
And so, a simple game might
have less than 10 percent
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
of its memory actually
allocated in its heap.
So what are these other
non-heap memory regions?
Well, the first thing is going
to be anonymous memory regions.
Now, these are things like
the heap that store data just
for the lifetime
of your process.
They're private to your process,
and our tools have the
ability to name them.
So as you're looking through
the anonymous memory regions,
these are some examples
that you might see.
Malloc size, they're like
Malloc tiny, Malloc large,
those are going to
be used for the heap.
You'll also find Image IO
regions in your process.
And these are used to store
or decode an image data.
What makes these interesting
is that the actual object
in your heap might be very small
but it will contain a reference
to an Image IO region in memory.
So leaking that object
will, from the perspective
of the Leaks tool, show
only a very small leak.
But because you've also
leaked the reference
to a memory region, your app
has leaked much more memory
in practice.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
There's also CA layers,
restore the contents
of rasterized layer-backed
views.
And these will actually have
annotations giving you the name
of the delegate of that layer.
And to learn more about this,
you should see the optimizing,
drawing, and scrolling on OS
X talk which we'll go in-depth
in the layer backing
of your views.
There's also file-backed memory.
And these are regions
whose contents are backed
by a file on disk.
And what's interesting
about these regions is
that we will populate them with
the contents of that file only
when you access the
region for the first time
and cause a page fault.
This means that the data
will only be resident
if it's been accessed.
And so, you might have a
very large file backed region
with only a very small
amount of data resident.
And these are commonly
used for things
like decoding your application
or data files that you want
to randomly reference.
And so, in this case, our app
has file-backed memory region
for each of these.
And as it begins to execute
its code, it will fault
that code in from disk.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
And then, as it accesses
a data file,
will pull that data file
in from disk as well.
So let's zoom in on
that data file region.
So imagine this is our data
file region and it's writable.
When we created that region, we
specified we wanted to be able
to write to it, and
we set it shared,
meaning that the changes we
make should be written back
up to disk.
Now, in this case, our region
isn't entirely resident
in memory because we haven't
accessed all the data.
And you can see here some
of the pages just
simply aren't populated.
Now, if we go and try
to modify that memory,
we're going to dirty it.
We refer to clean memory as
memory that whose contents match
that on disk and
dirty memory as memory
where we have made changes.
So now, we have dirty
memory in our app.
And if the system would
like to turn that back
into clean memory, it will have
to write those pages
back out to disk.
Now, what makes a dirty
memory interesting is
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
that it's much more expensive
to reclaim the clean memory.
If we need to reclaim
clean memory to provide it
to another app, we could
simply throw that memory away
and use it for a
different purpose.
On the other hand, dirty
memory needs to be written back
out to disk so it's more kind
of swapping in that sense.
Now, given all these types
of memory regions your app might
have, how do you get inside
into what your app
is actually doing?
Well, as of OS 10.9 in iOS 7,
the allocations instrument
is capable
of showing the memory
regions used by your app.
But what you'll notice is
that there's a new
allocation type selector
in the allocations instrument
that you can choose whether you
want to see all allocations,
just heap allocations which
is what you would have seen
in previous versions, or
just the new VM regions
that are being tracked.
So in this case, we're
looking at all allocations.
And you can see that some
of these allocations start
with a VM con and
then provide the name
of that allocation,
when it's known.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
And you can then drill
down to understand
where these allocations
come from.
And in many cases, see a
stack trace of the code
that created that object.
This can then help you
understand why does this exist.
And there's the only thing
I can do to change its size
or prevent it from
being created.
Now, there's also
the VM Tracker tool.
And this tool will-- it can take
a snapshot at a regular interval
of all of the virtual
interval regions in your app.
It can then determine
a residency information
and how much of that
data is dirty or clean.
You can also look at
the region MapView.
And the region map will
show you simply a listing
of all the regions of your
application and you can drill
down to get per page data
about residency status
and whether it's clean or dirty.
Now, given all of
these types of memory,
you're probably asking yourself,
"How do I just get a
simple number for the amount
of memory my application
is using?"
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
Well, this is something we've
tried to address in Mavericks.
We've run a new tool
called Footprint.
To run Footprint,
simply specify the name
of the process you
would like to analyze.
And in this case, we're also
going to run it the -swapped
and -categories flags.
This will provide some of--
just additional information
about our application.
It link it out for the
look something like this.
And what we can see here is
that our application has
a 12-megabyte footprint.
This is our estimate of
what the impact of having
that application
running is on the system.
We can then see a
breakdown of what types
of memory are contributed
to that footprint.
So in this case, we can see
we have over 5 megabytes
of private, dirty memory.
For example, heap memory
in our application.
And 2 megabytes of that
has been swapped already.
This is probably an
indication that the system was
under memory pressure
at some point.
Now, one wrinkle in
this is shared memory.
Memory regions can be shared
between multiple processes.
You'll most commonly see
this for graphics memory
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
or in multi-process
applications.
For example, an application
in a bundled XPC Service.
And these shared regions
may not be visible
in the allocations
instrument depending
on how they're created.
But we have a tool that can
help you understand the amount
of memory shared by
multiple processes.
And this is once again,
the footprint tool.
But instead, we're going to
run it with 2 proc arguments
and specify both processes
that we want to analyze.
And here we can see that
we have memory shared
with the Windows server, and
at the bottom of the output,
we get a total footprint of
all the processes we specified.
If you're developing an app
that is a bundled XPC Service,
you can use this to
get a footprint number
for both your app and
that XPC Service together.
All right.
So now, given all of this new
test memory, what is our picture
of a system under memory
pressure look like?
So I want to walk through what a
system will do to satisfy demand
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
for new memory given these
different types of memory?
Now of course, the first thing
the system will do when it's
under memory pressure is start
evicting objects from NSCaches
and reclaiming the contents
of purgeable memory regions.
Well, this is important
because these are the things
that applications on a system
have said that they want
to be reclaimed first when
under memory pressure.
And so, it's the
tool that you'll use
to help make sure your
application is well-behaved
and that you control which user
memory will be taken from you.
Now, once that memory
has been reclaimed,
the system will start
aggressively writing the
contents of dirty
memory to disk so that
that memory can become
clean again
and can be easily
reclaimed when needed.
Then, we'll start taking the
contents of file-backed memory.
And once the amount
of file-backed memory
has decreased,
we'll begin also taking memory
from anonymous VM regions
and from the heap
of applications.
And this is the point at
which you'll see the system
performance really
begin to decline.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
Now, in Mavericks, there's
one more part of this.
And that's compressed memory.
Compressed memory allows us
to, before swapping memory
out to disk, first,
compress it in RAM.
And because compressed memory
consumes a lot of space,
as we compress memory,
we free up pages
which can then be
put to another use.
Now of course, once we-- at some
point, we may still need to swap
out that memory to disk.
And then we'll have
reclaimed the full contents
of that memory.
Now, given that all these
behaviors a system can do
to create new memory,
sometimes it's hard
to get a good system-wide
picture of what's going on.
And so, in Mavericks, we've
improved activity monitor,
and now have a few more high
level numbers that you can use
to understand where memory
is being used on your system.
We will look at the bottom
of activity monitor
in the memory tab.
On the right side, you
can see a breakdown
of where memory is being
used in your system.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
App memory refers to anonymous
memory regions like heap
and the framework
allocate memory regions.
The file cache refers to
any file-backed region.
Wire memory is memory that the
operating system has wired down,
consumed for its own purposes
and can't easily be reclaimed.
And then finally, compressed
memory is the memory being used
to store other anonymous
compressed pages.
Now, if you want to
dive even deeper,
the VMStat tool has also
been improved in Mavericks.
And this is just a subset
of the output you'll
get from running VMStat.
For this case, we're
going to run it
with a single argument, 1.
And that specifies the interval
at which we wanted
to report data.
Here, we're seeing
data every one second.
Now, some of these column
headers are a little cryptic.
But if you run VMStat
without any arguments,
you'll get longer titles
for each of those headers.
And so, we can see here, we have
a couple statistics that cover
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
where a memory is
currently being allocated
and this match roughly what
you're seeing activity monitor.
In this case, we can see
how much memory is used
for file-backed or
anonymous memory.
And then how much
memory we've compressed
and how much memory is being
used to store compressed pages.
And then we can also
look at, over time,
the change in memory
use on a system.
So these values represent when
pages are moving in and out
of the compressor, to and from
file-backed memory regions,
and from the compressor
to disk and back.
Now, one question you might
have is, how do I know
if my app is being
affected by swapping
or other memory pressure
activity?
Well, we can do this with
the time profiler instrument.
You're going to want to
run it with two options.
The first is to record
waiting threads.
And this will record threads
even if they're blocked trying
to swap data in from disk.
And then, you want to record
both user and kernel stacks.
So you can see what
the kernel is doing
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
in response to a page fault.
Then, runtime profilers, you
normally would against your app.
And you want to look
for the VM Fault Frame.
This is the frame
that you'll see
in the kernel anytime it
takes a page fault as a result
of memory access your app does.
You can then dive
even deeper than that
to understand whether
it's hitting disk
or decompressing data.
And in this case, you can
see we're spending 2 percent
of our time in VM Fault,
that's actually a lot of time.
Really, any more than
a few samples you find
at VM Fault should be
taken as in occasion
that your app is seeing the
effects of memory pressure.
And it means that you
should begin to look
at your apps memory use and
how you can improve your app's
performance under
memory pressure.
Now, one problem with
this technique is
that it requires you to be
able to reproduce the problem.
And unfortunately,
memory pressure-related problems
typically depend on what's going
on in the system, what
other apps are running,
and could be very
difficult to reproduce.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
So we provided something
called sysdiagnose.
This is a tool that can
automatically collect a wide
variety of performance
diagnostic information
from the system.
You could simply run it
from the command-line,
sudo sysdiagnose, and then
provide an app name that you
like to target for
data collection.
It will then run a bunch
of diagnostic commands
and archive the output
under VAR/TMP
and a sysdiagnose archive
including a timestamp.
And this includes things like
a spindump which is a sample
or time profiler like profiling
of all apps on a system, heap,
leaks, footprint,
VMStat, and FS usage
which I'll cover
in a little bit.
You can also trigger this
with the Shift Control option
command period key chord,
if you can manage to
mash those keys in time.
But this isn't going to collect
as much detailed information
about your specific application.
And so anytime you can
use the command-line form,
it will provide more
actionable data
about what your app was doing.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
All right.
So just a recap, we'll
be talking about memory.
You want to make sure
that when you're looking
at the memory usage
of your application,
you're paying attention to the
entire footprint of your app,
not just the usage of your heap.
When trying to reduce your
memory usage, consider things
like leaks and heap growth.
Look for unnecessary
VM regions and check
for instances of
duplicate memory.
Consider adopting purgeable
memory or NSCache for anything
which you can easily regenerate
as this will allow you
to direct the system as
to how best take memory
from your application in
low memory situations.
And remember, the larger
memory footprint your app has,
the more likely it's to slow
down when under memory pressure.
[ Pause ]
So I want to talk
about disk access.
Well, why is disk
access important?
Well, I did some testing
with two scenarios
that you probably care
about in your app.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
And what that app launch
and the time it takes
to open a document.
And we'll look at these in cases
where a system was totally idle
and a case where there
was another app on system
that was trying to do IO.
And when you have multiple apps
contending to use the disk,
AppLaunch easily
regressed 70 percent.
And this is a huge increase
in time that's really
going to impact your users.
Open document, increased
55 percent.
And so, it's important
that you do IO
in the most efficient
way possible to make sure
that you're-- that one,
you're being performant.
And two, that you're not going
to be affected by other process
on the system that want
to compete with you
for bandwidth to devices.
Well what exactly are we
talking about with IO.
Well, there's a variety of
layers at the storage stack
that all interact together to
help you load data from disk.
Of course, we have
your app but in--
your app is going to use
some set of frameworks
to help it do IO,
but ultimately,
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
all access to the disk are
going to fall through one
of two interfaces in the kernel.
Either Memory Mapped IO, and
these are file-backed regions
like we talked about earlier,
or the virtual file
system interfaces.
And these are the open, read,
write and close system calls
of which you might be familiar.
And then on the other
end, the kernel is going
to use a file system to
organize data on disk.
Now, of course, we have to have
some sort of device driver.
But then at the end, you'll have
either a spinning magnetic hard
disk drive or solid-state
flash storage
to which your data is actually
going to be persisted to.
Now, it's interesting that today
we see customers with both kinds
of storage, hard drives
and flash storage.
And so, it's important that you
consider both types of storage
when you're profiling
and performance testing
your application.
And the reasons that they have
incredibly different performance
characteristics, for example,
the solid-state drive
has no seek penalty.
On the other hand, a hard drive,
because it uses rotating media
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
and must first seek to
the correct location
on disk before it can read
or write data, can experience
up to 10 milliseconds of latency
every time you access a new
location on disk.
This thing is that while an
SSD might be capable between 3
and 30,000 IO operations
per second,
a hard drive is only going to
be capable of maybe 80 to 100.
Solid-state drives also have
better sequential speed.
But the difference there
is much less pronounced.
But there's other
differences too.
An SSD is capable of some
limited degree of parallelism.
This means it's important
to provide multiple IOs
to the SSD's queue at a time
to take advantage
of that parallelism.
On the other hand, a hard drive
is only ever going to be able
to do one IO request at a time.
And so, it's not as
important to keep the queue
on a hard drive field.
Finally, on a solid-state drive,
writes are significantly
more expensive than reads.
Wherein a hard drive, those
had relatively symmetric costs.
This meant in the past you
might mostly have focused
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
on what reads your
application was doing.
So these tend to be more likely
to block what the user's
experience of your application.
On the other hand, with
a solid-state drive,
writes become a lot more
important as these compete
with reads much more
heavily for disk bandwidth.
Now, what I really want you
to take away from this is
that the difference--
different performance profile
of these devices mean that
you should be testing your
application on both.
If you're developing
on a new machine
with a solid-state drive,
your customers are going
to have a very different
experience
when running on a hard drive.
And also, high performance
IO is difficult to do well.
You need to avoid causing
trash in your hard drives,
keep the queue field for SSDs,
use appropriate buffer sizes,
compute on data concurrently
with IO,
and avoid making extra
copies of the data.
So, we provided an API
to help encapsulate some
of these best practices
for doing IO.
And that comes in the
form of dispatch IO.
Dispatch IO is an API that's
part of Grand Central Dispatch.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
It's been available since 10.7.
And it provides a declarative
API for file access.
What this means is that rather
than telling a system how
to access data, you tell it
what data it should access.
This allows it to automatically
encapsulate best practices
and do things in the most
performant way possible.
Now, I want to talk through two
examples of how to use this API
that where doing these things
with the file system calls
directly would be significantly
more difficult.
The first is processing a large
file in a streaming manner.
This might be a transcoding
media searching for a string
in a file or anything where you
want to do a sequential read
and do computation
concurrently with IO.
So let's take a look
at that example.
And the first thing we're going
to do is create a
serial dispatch queue
that we want our
computation to run on.
We'll then create a dispatch
IO object by providing a path
and informing dispatch IO that
we want to read this data.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
We can then set a
high watermark.
And what this means
is that we would
like to be provided
opportunity to compute
on data no larger
than this size.
So in this example, we want to
see data every 32 kilobytes.
And so, the block we provided
dispatch will be called
with data smaller
than this amount.
And then finally,
we issue the read.
And the read, we
will provide a block
to call every time
data is available.
In this case, we can simply
use especially to apply
to operate on those buffers.
And this will do the appropriate
thing involving non-blocking IO
to ensure that you can have
as little data and memory
as possible while still
concurrently computing on data
and bringing in more
data from the drive.
If you never tried to
use FileDescriptors
with the O NONBLOCK option
to this, you understand
that it can be a little
harried to implement yourself.
Now, this is what
you might want to do
if you're reading
one large file.
But what if you have
a lot of small files?
Let's say for example you
want to read in a couple
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
of hundred thumbnails
from a disk?
Well, dispatch IO can help
you do that correctly too.
In this case, rather than
using a single serial queue
to call our blocks on,
we're going to provide a
global concurrent queue.
And then for every image whose
thumbnail we want to read in,
we're going to again,
create a dispatch IO object.
But instead of setting
a high watermark,
we're going to use
low watermark.
And we're going to
set it to size max.
This informs dispatch IO that
we want the entire file contents
all at once.
Then, we issue the read
and in our callback,
we can use the dispatch
data provided
to instantiate for
example NSImage.
Now, as of Mavericks, dispatch
data is bridged automatically
to NSData.
On older systems, you'll need
to use some other dispatch data
APIs to extract those contents.
Now, what's important
about this is
that if you were trying
implement it yourself,
you have to answer
questions like,
how many of these
operations should I have
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
running concurrently?
Simply putting them all
on a concurrent queue would
probably run out of threads
and trying to do it
yourself means you have
to understand the performance
of the underlying hardware.
Using dispatch data lets
the system make choices
like that for you.
And regardless of
how you're doing IO.
You need to organize
data on disk.
And what's important
to understand is
that using large numbers
of small files can
be very expensive.
And you should consider
using Core Data
or SQLite any time you
have a large number
of objects to store.
Now, just how expensive is it?
Well, imagine we want to
insert 100,000 objects.
Storing each of those objects as
a small file on disk, say, 100--
couple of 100 bytes would
take almost 25 seconds,
whereas inserting them
to an SQLite database takes
just about half a second.
This can be a huge performance
difference and ensures
that they're going to
be less susceptible
to contention from
other processes.
Of course, using a database
provides other benefits
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
like control over atomicity so
you can put multiple operations
in a single transaction.
It's more space efficient
and gives you better
querying capabilities.
Now, one thing you need to think
about as you're doing
IO is write buffering.
This is our typical open,
write, and close set
of system calls we might do if
we want to write it into a file.
But what might surprise
some of you is
that data is actually issued
when we close the file.
For smaller [inaudible],
the system isn't going
to actually flash
the data to disk
until the FileDescriptor
is closed.
And there's a couple of system
calls that can cause this kind
of write flushing to happen.
If you're using the
VFS interfaces,
it's anytime you close or
fsync a file descriptor.
And if you have Memory
Mapped IO,
it's going to be
anytime you use msync.
And what's important to think
about here is how often am
I pushing data app to disk,
and am I going to
be pushing data app
to disk more often
than necessary?
If you can combine multiple
writes into a single flushing
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
of data, that can help
improve the IO performance
of your application and make you
less susceptible to contention.
Now, of course, if you
have consistency guarantees
that you need, for example,
you want to make sure
that a file is completely
on disk in a stable storage,
these APIs won't
solve that problem.
And instead, you should
be considering a database
like Core Data or
SQLite which can help--
which can automatically
journal your changes and ensure
that data is consistent on disk.
Now I mentioned before
the file cache,
some amount of memory is devoted
to caching the contents
of files on disk.
And accessing from
the file cache can be
over 100 times faster than even
the fastest solid-state drives.
But the file cache
competes with--
for memory with the
rest of the system.
This means that as
applications memory usage grows,
less will be available
for the file cache.
And any time you pull new
data into the file cache,
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
other data is going
to need to be evicted.
You can control whether
this happens
for a particular IO you
do by using non-cached IO.
This tells the system, "Please
don't hold on to this data
and throw it away as soon
as you're done doing the IO
so that you can keep more
important data on memory."
You might want to do this
if you're, for example,
reading an archive to extract it
or streaming a large
multimedia file.
And you don't want
to impact the rest
of the file cache
on the process.
Now, there are a couple of
different APIs you can use
to indicate to the system that
you want to do non-cached IO.
If you're using NSData,
you can use the
NSDataReadingUncached option.
And that will automatically
use non-cached IO.
On the other hand, if you're
using the virtual file system
interfaces, the f-- no cache
f control can indicate any IO
on a particular FileDescriptor
should be done without caching.
Now of course, you can still
use that with dispatch IO
by then providing
such a FileDescriptor
to dispatch IO create.
Now I also mentioned
in the memory section,
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
file-backed memory regions.
And this is-- this can be
used to do Memory Mapped IO.
What's great about
Memory Mapped IO is
that it avoids creating any
additional copy of the data.
If you're using traditional Read
commands, you'll have to first,
pull data into the file
cache and then copy it
into a buffer in
your application.
And for small IO, this is fine.
But if you're doing random
accesses to a large file,
Memory Mapped IO can avoid
that extra copy of data.
It's ideal for random accesses
because it lets the
system control whether
or not a particular piece
of data is kept in memory
or can be evicted automatically
under memory pressure.
And when doing Memory Mapped IO,
you can use the madvice
system call
to indicate future needs
allowing prefetching
or eviction of data
as necessary.
Now if you're using
the NSData APIs,
you can use the NSData
reading map to a safe option
to automatically
use Memory Mapped IO
or you can use the mmap system
call to map a file into memory.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
Now, regardless of how you do
IO and what data you're writing
to where, there's one
very, very important thing
that you should remember
and that is
to never do IO on
the main thread.
And hopefully, you've all heard
this before but it's important
to keep in mind as you're
running your applications
that a wide variety of our
frameworks are going to need
to do some IO to accomplish
the work you've asked of them.
And in low memory situations,
any memory access can
potentially involve a page fault
and access to the disk.
Now, this is all very important
because any time your main
thread has to block waiting
on IO, the IO could take a
very long time to complete.
And this will result in
a spinning application
which is a very poor
experience for your users.
So you should aggressively
consider moving work off
of a main thread of your
app and on to for example,
a dispatch queue
whenever possible.
Now, of course, it's-- none
of these things are important
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
until you understand what IO
your application is actually
doing, so you can target
the biggest offenders
in your application
for improvement.
And the FS usage command-line
tool can help you do this.
It provides a listing
of system call
and IO operations on a system.
It provides a couple of
options for filtering.
For example, you can use the
-f files as option to filter
to just files as
events or disk IO
to get just access to the disk.
And you also want to
consider the -wflag to get
as much data as possible.
Let's take a look at what FS
usage looks like in practice.
In this case, we're going to
filter just file system events.
And this is just a couple
events from my system
when I was sitting here
writing these slides.
And we can see a
couple of things.
The first thing we
can create is the time
that a particular
event completed.
But, this is important.
These are ordered by when the
events completed, not issued.
We then see what the event
itself is, have some data
about the event, the
duration the event lasted for,
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
and finally, the
process and thread ID
that performed the operation.
Now, because these are
ordered by completion time,
you can use that fact
to find matching events.
So in this case, we have a read
data command and that indicates
that we actually pulled data
from the device into memory.
And then we see a pread system
call that completed immediately
after on the same thread.
This is a good indication
that that read data was a
result of the pread command.
And to help you see these when
you're looking at FS' output,
we'll indent commands like
read data automatically.
Now I want to talk
a little bit more
about that read data command
because that's the actual IO
to a storage device
that you want
to be focusing on optimizing.
And so, if we look at
just the disk IO commands,
by using the -f disk IO
option, we can get a sense
of what type of IO we're doing.
So the command name will include
things like whether it's a write
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
or a read, whether
it's file system data,
or metadata about files on disk,
whether it's a page in or page
out from a file-backed region,
and whether it's non-cached.
If you see an N inside
brackets that indicates
that the IO was done non-cached.
You'll then get the file offset
on disk, the size of the IO,
the device it was to, and
in some cases, a file name.
Now, given this data,
you then want to try
to find ways you can improve
the performance of your app.
This includes things like simply
don't do any IOs unnecessary.
And looking at what IOs
your application is doing
with FS users can
be a great place
to find this or do it less.
Could you potentially
read or write less data
for a particular operation?
Do it later.
If you're looking at
something like AppLaunch,
any IO that you do during
AppLaunch is potentially
something that could
increase the AppLaunch time
of your app significantly.
Try to defer those to a less
critical time especially
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
if it's the time
that won't contend
with other operations
your user might be doing.
And for your hard
drive-based users,
try to do IO sequentially.
Avoid accessing lots
of different files
in a random order.
Now, one thing that
you want to think
about when using FS usage is
what impact the disk cache is
going to have on the
app that you see.
If you're doing at the -f disk
IO option, you're only going
to see accesses that go to
the actual hard drive itself.
Anything that has the disk
cache won't be printed.
So, for example, this is a
case of a warm AppLaunch.
There we go.
And by warm, I mean
that the things
that this application needs
are already in memory.
If I haven't run
the app recently,
and instead I get
a cold AppLaunch,
it looks a little
more like this.
And this doesn't
quite fit in the slide
so let me scroll
through it for you.
[ Pause ]
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
Now this is potentially a little
bit of an extreme example.
But I expect that if you
were to go home and try this
on your app, you'll
see something similar.
Launching your app for
the first time when it's--
the files it needs
aren't cached,
it's significantly
more expensive
than subsequent launches
but it is already cached.
Now, as a result, it's
important to profile
in different warm
states for your app.
This means you want
to run your app once
and then use the purge
command to evict caches
and try running it again.
Now, remember that some data
might be automatically cached
by the operating system at boot.
So you'll need to do at least
one cycle of running your app
and then using purge to
throw away the contents
of the disk cache before
you'll get good data.
[ Pause ]
So just to recap some points
about disk IO, the best practice
for doing especially IO to
large files or large number
of files is to usually
dispatch IO APIs.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
When profiling your disk
accesses, make sure to do it
in different warm states.
Consider adopting non-cached
IO for any large file access
where you don't want to evict
other data from the cache.
Pay attention to when your
data is flushed to disk,
and never ever do IO
on the main thread.
Now last I'd like to talk about
working in the background.
Your app may do some
sort of work
that isn't directly required by
the user at the time it's done.
This can include refreshing
data from the network,
syncing a user's data with
some sort of server, indexing
or backing up a user's files,
making extra copies of data,
whatever it might be,
anything that you do
that isn't directly relevant
to what the user has currently
requested has the potential
to hurt system responsiveness
by contending
with other operations the
user is doing on the system.
Backgrounding is a
technique that you can use
to limit the resource
use of your app
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
when performing these
operations.
Now, the keynote, you
heard about App Nap.
And this is a kind of similar
technique whereas App Nap is
designed to automatically
put your apps in a nap state
when they're not being used.
Backgrounding is a way
you can explicitly specify
that a particular piece
of work is background.
These things work together
and so you may still need
to adopt APIs about App
Nap at the same time
as using backgrounding.
But what exactly does
backgrounding do?
Well, the first thing
it's going to do is hint
to the entire system that
this work is backgrounded,
and whenever possible
do it more efficiently.
It will be used by a variety
of places in the system
to make choices on about
how to do your work.
It will lower your CPU's
scheduling priority ensuring
that other things can
run first on the system.
And finally, it will apply
something called IO throttling
to any accesses that you
try to make to the disk.
Now, let's look at that
in a little more detail.
Imagine we have an application
the user is actively using.
And some sort of
background task.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
The background task wants
to let's say, copy a file.
And so, it's doing lots of IO.
Then the application
tries to do an IO itself.
IO throttling will automatically
hold off the background task
giving the application
full access to the disk
to allow its IO to
complete quickly.
If the application
tries to do more IOs,
then IO throttling
helps base out the IOs
of the background task
in order to continue
to give the application as
much bandwidth as possible.
All right.
So how do we actually
accomplish this?
Let's imagine you just
have one block of code
in your application
you like to background.
This is probably
the easiest case.
And you can simply background
that block by dispatching it
to the background
priority queue.
Anything you dispatch
there will run backgrounded
but it's important to run where
that code shouldn't take locks
or any way block any code
that you need to execute
in response to UI operations.
Things that you run
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
in the background may take
an unbounded amount of time
to complete and will try
to complete them as
fast as possible.
You don't want them to
cause a priority inversion
with your user interface.
Now, you can also use XPC
to background larger tasks.
There's a new XPC activity API
that was discussed a few hours
ago on the efficient design
with XPC Talk that you can use
to allow the system to tell you
when to perform your
background activities.
Any blocks you provide
to the XPC activity
API will also get run
on the background
priority queue.
You can also use an XPC
Service as an adaptive Daemon.
So an XPC Service as of 10.9
will be backgrounded by default
and then it will be taken out of
the background only in response
to requests from an application.
This is an easy way to
do things that might need
to take locks required
by an application.
If you separate that
out from other process,
you can use this boosting
mechanism to unbackground tasks
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
so that they complete quickly
and service the user interface.
And again, these are
discussed in more depth
in efficient design with XPC.
Finally, if you have a
legacy service, for example,
a Launch Daemon or Launch Agent,
you can use the new process type
launched plist key to specify
that that process should
always run backgrounded
or you can use the set
priority system call
to background a particular
process or thread.
There were rules of how
you adapt backgrounding.
There are a couple of
tools you can use to debug
to make sure your backgrounding
is working as expected.
The first is PS which
is normally list process
on the system.
But if you provide
the aMX options,
you can see the scheduling
priority of every thread.
And in this case,
backgrounded things are running
in a priority of four.
And that-- So that
indicates that all the threads
in this particular process have
been appropriately backgrounded.
You can also use
the spindump tool.
This is similar to
time profiler sample.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
But it has the advantage that it
will also show you the priority
of a particular process.
So in this case, we can
see that our accounts--
the process is running at
the background priority.
Now, you also want to look for
the throttle low pry IO frame.
This frame is where
you'll see a process sit
if its IO is being throttled.
And you can see that
in the kernel stacks
in the time profiler,
or using spindump.
There's a new task policy
command which is similar
to the Unix nice command.
And it can allow you to
run a particular process
as backgrounded.
This is great if you
want to test what happens
when you background a
process or application.
And finally, FS users can
show you which IOs were issued
by a backgrounded
process or a thread.
And you'll see this
with the capital T
after disk IO commands.
Now, one of the things that's
been a constant theme here is
that your users will experience
different performance based
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
on what type of system
they're working on.
And so, as you're
testing your application,
you should consider using
multiple types of systems.
But for most of us,
setting up an entire QA lab
with different systems
is a very big task.
And so, you can at
least as a first start,
simulate resource constraints
system in a variety of ways.
If you want a test
running with less memory,
you can use the maxmem boot-arg
to specify how much memory
your system should have.
In this case, we're eliminating
a system that had 2 gigabytes.
Now, to revert this, you'll want
to run the [inaudible] command
but remove the maxmem
equals 2048 part.
You can also use an
external Thunderbolt drive
to simulate different
drive speeds.
A Thunderbolt-attached
hard drive is going
to have similar performance
to an internal hard drive.
And so, if you're running
on an SSD configuration,
this is a great way to
experience what it's
like for a hard drive user.
Simply run the OS installer
and install a separate OS
to your external hard drive
and then you can boot off
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
that by holding option at
boot to get the BootPicker.
Finally, you can use the
instruments preferences,
just limit the number of
CPUs in use by the system.
And this will be-- this
will automatically go back
to all CPUs whenever
you restart.
Now, if you have questions,
you can contact our developer
evangelists, Paul Danbold
or David Delong, or see
our Apple Developer Forums.
There's also a variety
of related sessions you
might want to check out.
This morning, we had
Maximizing Battery Life on OS X
and Efficient Design with XPC.
But you should also look at
Improving Power Efficiency
with App Nap to learn how
App Nap will affect your app
and how you can work
best with it.
Optimizing Drawing
and Scrolling on OS X
to learn about layerbacking.
Energy Best Practices
will talk about how
to use the CPU most efficiently
and give the CPU
form of this talk.
And finally, Fixing Memory
Issues can show you how to dive
in with instruments
to understand the memory
uses of your application.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
So just to summarize
some key takeaways,
remember to regularly
profile and optimize your app,
not just the performance
of your app,
but also the resources it
consumes while carrying
out its actions.
Remember that your
users may have a variety
of different systems.
And so, just because a
particular operation works well
on your well-equipped
developed machine doesn't mean
that the users will
have a good experience.
And ensure your app
is a good citizen
with shared system resources so
that users enjoy using your app
and don't feel they
need to quit.
Thanks.
[ Applause ]
[ Silence ]