WWDC2015 Session 409

Transcript

X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
[Applause]
>> Good morning, and welcome to
Optimizing Swift Performance.
My name is Nadav, and together
with my colleagues, Michael
and Joe, I am going
to show you how
to optimize your Swift programs.
Now, we, the engineers on the
Compiler Team, are passionate
about making code run fast.
We believe that you can
build amazing things
when your apps are
highly optimized.
And if you feel the same way,
then this talk is for you.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
And if you feel the same way,
then this talk is for you.
Today I'll start by
telling you about some
of the new compiler
optimizations
that we have added
over the last year.
Later, Michael will describe
the underlying implementation
of Swift and give
you some advice
on writing high-performance
Swift code.
And finally, Joe
will demonstrate how
to use instruments to identify
and analyze performance
bottlenecks in your Swift code.
So Swift is a flexible and safe
programming language with lots
of great features, like closures
and protocols and generics and,
of course, automatic
reference counting.
Now, some of you may associate
these features with slowness
because the program
has to do more work
to implement these
high-level features.
But Swift is a very fast
programming language that's
compiled to highly
optimized native code.
So how did we make Swift fast?
Well, we made Swift fast
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
by implementing compiler
optimizations that target all
of these high-level features.
These compiler optimizations
make sure that the overhead
of the high-level
features is minimal.
Now, we have lots of
compiler optimizations,
and we don't have enough
time to go over all of them,
so I decided to bring
you one example
of one compiler optimization.
This optimization is called
bounds checks elimination.
On the screen, you can
see a very simple loop.
This loop encrypts the
content of the array
by X-raying all the elements in
the array with the number 13.
It's not a very good encryption.
The reading and writing
outside of the bounds
of the array is a serious bug
and can also have
security implications,
and Swift is protecting you
by adding a little bit of code
that checks that you don't
read or write outside
of the bounds of the array.
Now, the problem is that this
check slows your code down.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
Now, the problem is that this
check slows your code down.
Another problem is that it
blocks other optimizations.
For example, we cannot
vectorize this code
with this check in place.
So we've implemented a
compiler optimization
for hoisting this check outside
of the loop, making the cost
of the check negligible,
because instead of checking
on each iteration of the loop
that we are hitting inside
the bounds of the array,
we are only checking once
when we enter the array.
So this is a very
powerful optimization
that makes numeric
code run faster.
Okay. So this was one
example of one optimization,
and we have lots
of optimizations.
And we know that these
optimizations work
and that they are very effective
because we are tracking hundreds
of programs and benchmarks,
and over the last year,
we noticed that these programs
became significantly faster.
Every time we added
a new optimization,
every time we made
an improvement
to existing optimizations,
we noticed that these
programs became faster.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
Now, it's not going to be very
interesting for you to see all
of these programs, so I decided
to bring you five programs.
The programs that you see
on the screen behind me
right now are programs
from multiple domains.
One is an object-oriented
program.
Another one is numeric.
Another one is functional.
And I believe that these
programs represent the kind
of code that users
write today in Swift.
And as you can see,
over the last year,
these programs became
significantly faster,
between two to eight times
faster, which is great.
Now, these programs are
optimized in release mode.
But I know that you also
care about the performance
of unoptimized programs
because you are spending a lot
of time writing your code and
debugging it and running it
in simulator, so you care
about the performance
of unoptimized code.
So, these are the
same five programs,
this time in debug mode.
They are unoptimized.
So you are probably
asking yourself, wait,
how can improvements
to the optimizer improve the
performance of unoptimized code.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
to the optimizer improve the
performance of unoptimized code.
Right? Well, we made
unoptimized code run faster
by doing two things.
First of all, we improved
the Swift runtime component.
The runtime is responsible
for allocating memory,
accessing metadata,
things like that.
So we optimized that.
And the second thing that we
did is that now we are able
to optimize the Swift
Standard Library better.
The Standard Library
is the component
that has the implementation of
array and dictionary and set.
So by optimizing the
Standard Library better,
we are able to accelerate
the performance
of unoptimized programs.
We know that over the
last year, the performance
of both optimized
and unoptimized programs
became significantly better.
But to get the full picture,
I want to show you a
comparison to Objective-C.
So on the screen you can see
two very well-known benchmarks.
It's Richards and
DeltaBlue, both written
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
It's Richards and
DeltaBlue, both written
in object-oriented style.
And on these benchmarks,
Swift is a lot faster
than Objective-C.
At this point in the
talk, I am not going
to tell you why Swift is
faster than Objective-C,
but I promise you that we
will get back to this slide
and we will talk about
why Swift is faster.
Okay. Now I am going to talk
about something different.
I want to talk about a new
compiler optimization mode
that's called "Whole
Module Optimization"
that can make your programs
run significantly faster.
But before I do that,
I would like to talk
about the way Xcode
compiles files.
So Xcode compiles your
files individually.
And this is a good idea because
it can compile many files
in parallel on multiple
cores in your machine.
That's good.
It can also recompile only
files that need to be updated.
So that's good.
But the problem is that
the optimizer is limited
to the scope of one file.
With Whole Module Optimization,
the compiler is able
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
With Whole Module Optimization,
the compiler is able
to optimize the entire module
at once, which is great
because it can analyze
everything
and make aggressive
optimizations.
Now, naturally, Whole Module
Optimization builds take longer.
But the generated binaries
usually run faster.
In Swift 2, we made
two major improvements
to Whole Module Optimizations.
So first, we added new
optimizations that rely
on Whole Module Optimization
mode.
So your programs are
likely to run faster.
And second, we were able
to parallelize some parts
of the compilation pipeline.
So compiling projects in Whole
Module Optimization mode should
take less time.
On the screen behind me,
you can see two programs
that became significantly faster
with Whole Module Optimization
because the compiler was able
to make better decisions,
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
because the compiler was able
to make better decisions,
it was able to analyze
the entire module
and make more aggressive
optimizations
with the information
that it had.
In Xcode 7, we've
made some changes
to the optimization level menu,
and now Whole Module
Optimization is one
of the options that
you can select.
And I encourage you to try
Whole Module Optimization
on your programs.
At this point, I would like
to invite Michael on stage
to tell you about the underlying
implementation of Swift
and give you some advice
on writing high-performance
Swift code.
Thank you.
[Applause]
>> MICHAEL GOTTESMAN:
Thanks, Nadav.
Today I would like
to speak to you
about three different aspects of
the Swift programming language
and their performance
characteristics.
For each I will give specific
techniques that you can use
to improve the performance
of your app today.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
Let's begin by talking
about reference counting.
In general, the compiler
can eliminate most reference
counting overhead
without any help.
But sometimes you may still
find slowdowns in your code due
to reference counting overhead.
Today I'm going to present two
techniques that you can use
to reduce or even
eliminate this overhead.
Let's begin by looking at the
basics of reference counting
by looking at how
reference counting
and classes go together.
So here I have a block of code.
It consists of a class C,
a function foo that takes
in an optional C, and a couple
of variable definitions.
Let's walk through the code's
execution line by line.
First we begin by allocating
new instance of class C
and assign it to the variable X.
Notice how at the top of the
class instance, there is a box
with the number 1 in it.
This represents the reference
count of the class instance.
Of course, it's 1 because
there's only one reference
to the class instance
currently, namely x.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
to the class instance
currently, namely x.
Then we assign x
to the variable y.
This creates a new reference
to the class instance,
causing us to increment
the reference count
of the class instance, giving
us a reference count of 2.
Then we pass off y to foo,
but we don't actually
pass off y itself.
Instead, we create a temporary
C, and then we assign y to C.
This then acts as a third
reference to the class instance,
which then causes us to
increment the reference count
of the class instance once more.
Then when foo exits, C is
destroyed, which then causes us
to decrement the reference
count of the class instance,
bringing us to a
reference count of 2.
Then finally, we assign
nil to y and nil to x,
bringing the reference count
of our class instance to 0,
and then it's deallocated.
Notice how every time
we made an assignment,
we had to perform a
reference counting operation
to maintain the reference
count of the class instance.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
to maintain the reference
count of the class instance.
This is important
since we always have
to maintain memory safety.
Now, for those of you who are
familiar with Objective-C,
of course, nothing new is
happening here with, of course,
increment and decrement
being respectfully retained
and released.
But now I'd like to talk to you
about something that's
perhaps a bit more exotic,
more unfamiliar.
Namely, how structs interact
with reference counting.
I'll begin -- let's begin this
discussion by looking at a class
that doesn't contain
any references.
Here I have a class, Point.
Of course, it doesn't
contain any references,
but it does have two
properties in it,
x and y, that are both floats.
If I store one of these
points in an array,
because it's a class, of course,
I don't store it
directly in the array.
Instead, I store reference
to the points in the array.
So when I iterate
over the array,
when I initialize
the loop variable p,
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
when I initialize
the loop variable p,
I am actually creating a new
reference to the class instance,
meaning that I have to perform
a reference count increment.
Then, when p is destroyed at
the end of the loop iteration,
I then have to decrement
that reference count.
In Objective-C, one
would oftentimes have
to make simple data
structures, like Point,
a class so you could
use data structures
from Foundation like NSRA.
Then whenever you manipulated
the simple data structure,
you would have the
overhead of having a class.
In Swift, we can use structs --
in Swift, we can work around
this issue by using a struct
in this case instead of a class.
So let's make Point a struct.
Immediately, we can store each
Point in the array directly,
since Swift arrays can
store structs directly.
But more importantly, since
a struct does not inherently
require reference counting
and both properties
of the struct also don't
require reference counting,
we can immediately eliminate all
the reference counting overhead
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
we can immediately eliminate all
the reference counting overhead
from the loop.
Let's now consider a slightly
more elaborate example of this
by considering a struct with
a reference inside of it.
While a struct itself does not
inherently require reference
counting modifications
on assignment,
like I mentioned before, it
does require such modifications
if the struct contains
a reference.
This is because assigning
a struct is equivalent
to assigning each one
of its properties
independently of each other.
So consider that the struct
Point that we saw previously,
it is copied efficiently,
there are no reference counting
needed when we assign it.
But let's say that one
day I'm working on my app
and I decide that, well, I
would like to make each one
of my Points to be
drawn a different color.
So I add a UIColor
property to my struct.
Of course, UIColor
being a class,
this is actually adding
a reference to my struct.
Now, this means that every
time I assign this struct,
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
Now, this means that every
time I assign this struct,
it's equivalent to assigning
this UIColor independently
of the struct, which
means that I have
to perform a reference
counting modification.
Now, while having a struct with
one reference count in it is not
that expensive, I mean, we
work with classes all the time,
and classes have
the same property.
I would now like to present
to you a more extreme example,
namely, a struct with many
reference counted fields.
Here I have a struct user, and
I am using it to model users
in an app I am writing.
And each user instance has some
data associated with it, namely,
three strings -- one for
the first name of the user,
one for the last
name of the user,
and one for the user's address.
I also have a field for
an array and a dictionary
that stores app-specific
data about the user.
Even though all of these
properties are value types,
internally, they contain
a class which is used
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
internally, they contain
a class which is used
to manage the lifetime
of their internal data.
So this means that every time
I assign one of these structs,
every time I pass it off to
a function, I actually have
to perform five reference
counting modifications.
Well, we can work around this
by using a wrapper class.
Here again, I have my user
struct, but this time,
instead of standing on
its own, it's contained
within a wrapper class.
I can still manipulate the
struct using the class reference
and, more importantly, if I pass
off this reference to a function
or I declare -- or I sign --
initialize a variable
with the reference,
I am only performing one
reference count increment.
Now, it's important to note
that there's been a
change in semantics here.
We've changed from using
something with value semantics
to something with
reference semantics.
This may cause unexpected
data sharing that may lead
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
This may cause unexpected
data sharing that may lead
to weird results or things
that you may not expect.
But turns out there is a way
that you can have value
semantics and benefit
from this optimization.
If you'd like to
learn more about this,
please go to the Building Better
Apps with Value Types talk
in Swift tomorrow in Mission
at 2:30 p.m. It's going
to be a great talk.
I really suggest that you go.
Now that we've talked
about reference counting,
I'd like to continue by talking
a little bit about generics.
Here I have a generic
function min.
It's generic over
type T that conforms
to the comparable protocol from
the Swift Standard Library.
From a source code perspective,
this doesn't really
look that big.
I mean, it's just three lines.
But in reality, a
lot more is going
on behind the scenes
than one might think.
For instance, the code
that's actually emitted --
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
For instance, the code
that's actually emitted --
here, again I am
using a pseudo-Swift
to represent the code
the compiler emits --
the code the compiler emits
is not these three lines.
Instead, it's this.
First notice that the
compiler is using indirection
to compare both x and y.
This is because we could
be passing in two integers
to the min function, or we
could be passing in two floats
or two strings, or we could be
passing in any comparable type.
So the compiler must be
correct in all cases and be able
to handle any of them.
Additionally, because
the compiler can't know
if T requires reference
counting modifications or not,
it must insert additional
indirection
so the min T function
can handle both types T
that require reference counting
and those types T that do not.
In the case of an
integer, for instance,
these are just no-up calls
into the Swift runtime.
In both of these cases, the
compiler is being conservative
since it must be able to
handle any type T in this case.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
Luckily, there is a
compiler optimization
that can help us here, that
can remove this overhead.
This compiler optimization is
called generic specialization.
Here I have a function
foo, it passes two integers
to the generic min-T function.
When the compiler performs
generic specialization,
first it looks at the call
to min and foo and sees, oh,
there are two integers
being passed
to the generic min-T
function here.
Then since the compiler
can see the definition
of the generic min-T
function, it can clone min-T
and specialize this
clone function
by replacing the generic type T
with the specialized type Int.
Then the specialized
function is optimized for Int,
and all the overhead associated
with this function is removed,
so all the reference count --
the unnecessary reference
counting calls are removed,
and we can compare the
two integers directly.
Finally, the compiler
replaces the call
to the generic min-T
function with a call
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
to the generic min-T
function with a call
to the specialized
min Int function,
enabling further optimizations.
While generic specialization is
a very powerful optimization,
it does have one
limitation; namely, that --
namely, the visibility of
the generic definition.
For instance, this case,
the generic definition
of the min-T function.
Here we have a function compute
which calls a generic min-T
function with two integers.
In this case, can we perform
generic specialization?
Well, even though
the compiler can see
that two integers
are being passed
to the generic min-T function,
because we are compiling
file 1.Swift
and file 2.Swift separately,
the definition of functions
from file 2 are not
visible to the compiler
when the compiler
is compiling file 1.
So in this case, the compiler
cannot see the definition
of the generic min-T function
when it's compiling file 1,
and so we must call the
generic min-T function.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
and so we must call the
generic min-T function.
But what if we have Whole
Module Optimization enabled?
Well, if we have Whole
Module Optimization enabled,
both file 1.Swift and file
2.Swift are compiled together.
This means that definitions
from file 1
and file 2 are both visible
when you are compiling
file 1 or file 2 together.
So basically, this means that
the generic min-T function,
even though it's in file 2,
can be seen when we
are compiling file 1.
Thus, we are able to specialize
the generic min-T function
into min int and replace the
call to min-T with min Int.
This is but one case
where the power
of whole module optimization
is apparent.
The only reason the compiler can
perform generic specification
in this case is because of the
extra information provided to it
by having Whole Module
Optimization being enabled.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
Now that I have spoken about
generics, I'd like to conclude
by talking about
dynamic dispatch.
Here I have a class
hierarchy for the class Pet.
Notice that Pet has a method
noise, a property name,
and a method noiseimpl,
which is used
to implement the method nose.
Also notice it has a subclass
of Pet called Dog
that overrides noise.
Now consider the
function make noise.
It's a very simple function,
it takes an argument p that's
an instance of class Pet.
Even though this block of code
only involves a small amount
of source again, a lot more is
occurring here behind the scenes
than one might think.
For instance, the following
pseudo-Swift code is not what is
actually emitted
by the compiler.
Name and noise are
not called directly.
Instead, the compiler
emits this code.
Notice the indirection
here that's used
to call names getter
or the method noise.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
to call names getter
or the method noise.
The compiler must
insert this indirection
because it cannot know given the
current class hierarchy whether
or not the property name or
the method noise are meant
to be overridden by subclasses.
The compiler in this
case can only emit --
can only emit direct
calls if it can prove
that there are no
possible overrides
by any subclasses
of name or noise.
In the case of noise, this
is exactly what we want.
We want noise to be
able to be overridden
by subclasses in this API.
We want to make it so
that if I have an instance
of Pet that's really a dog, the
dog barks when I call noise.
And if I have an instance of
Pet that's actually a class,
that when I call
noise, we have a meow.
That makes perfect sense.
But in the case of name,
this is actually undesirable.
This is because in this API,
name is not -- is
never overridden.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
name is not -- is
never overridden.
It's not necessary
to override name.
We can model this
by constraining this
API's class hierarchy.
There are two Swift language
features that I am going
to show you today
that you can use
to constrain your
API's class hierarchy.
The first are constraints
on inheritance,
and the second are constrains
on access via access control.
Let's begin by talking about
inheritance constraints,
namely, the final keyword.
When an API contains
a declaration
with the final keyword attached,
the API is communicating
that this declaration will never
be overridden by a subclass.
Consider again the
make noise example.
By default, the compiler
must use indirection
to call the getter for name.
This is because without more
information, it can't know
if name is overridden
by a subclass.
But we know that in this API,
name is never overridden,
and we know that in this API,
it's not intended for name
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
and we know that in this API,
it's not intended for name
to be able to be overridden.
So we can enforce this
and communicate this
by attaching the
final keyword to name.
Then the compiler can look
at name and realize, oh,
this will never be
overridden by a subclass,
and the dynamic dispatch, the
indirection, can be eliminated.
Now that we've talked about
final inheritance constraints,
I'd like to talk a little
bit about access control.
Turns out in this API, pet and
dog are both in separate files,
pet.Swift and dog.Swift, but are
in the same module, module A.
Additionally, there is another
subclass of pet called Cat
in a different module but
in the file cat.Swift.
The question I'd like to ask is,
can the compiler emit a
direct call to noiseimpl?
By default, it cannot.
This is because by default,
the compiler must assume
that this API intended for
noiseimpl to be overridden
in subclasses like Cat and Dog.
But we know that
this is not true.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
But we know that
this is not true.
We know that noiseimpl is a
private implementation detail
of pet.Swift and that it
shouldn't be visible outside
of pet.swift.
We can enforce this by
attaching the private keyword
to noiseimpl.
Once we attach the private
keyword to noiseimpl,
noiseimpl is no longer
visible outside of pet.Swift.
This means that the
compiler can immediately know
that there cannot be any
overrides of noiseimpl in cat
or dog because, well,
they are not in pet.Swift,
and since there is only
one class in pet.Swift
that implements noiseimpl,
namely Pet,
the compiler can emit a direct
call to noiseimpl in this case.
Now that we've spoken about
private, I would like to talk
about the interaction between
Whole Module Optimization
and access control.
We have been talking a lot
about the class Pet,
but what about Dog?
Remember that Dog
is a subclass of Pet
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
Remember that Dog
is a subclass of Pet
that has internal access
instead of public access.
If we call noise on an
instance of class Dog,
without more information, the
compiler must insert indirection
because it cannot know if
there is a subclass of Dog
in a different file of module A.
But when we have Whole
Module Optimization enabled,
the compiler has
module-wide visibility.
It can see all the files
in the module together.
And so the compiler is
able to see, well, no,
there are no subclasses of dog,
so the compiler can
call noise directly
on instances of class Dog.
The key thing to notice here
is that all I needed to do was
to turn on Whole
Module Optimization.
I didn't need to
change my code at all.
By giving the compiler
more information,
by allowing the compiler to
understand my class hierarchy,
with more information I was
able to get this optimization
for free without
any work on my part.
Now I'd like to bring
back that graph
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
Now I'd like to bring
back that graph
that Nadav introduced earlier.
Why Is Swift so much
faster than Objective-C
on these object-oriented
benchmarks?
The reason why is
that in Objective-C,
the compiler cannot
eliminate the dynamic dispatch
through Ob-C message send.
It can't inline through it.
It can't perform any analysis.
The compiler must assume
that there could be anything
on the other side of
an Ob-C message send.
But in Swift, the compiler
has more information.
It's able to see all the certain
things on the other side.
It's able to eliminate this
dynamic dispatch in many cases.
And in those cases
where it does,
a lot more performance results,
resulting in significantly
faster code.
So please, use the final
keyword in access control
to communicate your
API's intent.
This will help the compiler to
understand your class hierarchy,
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
This will help the compiler to
understand your class hierarchy,
which will enable
additional optimizations.
However, keep in mind that
existing clients may need
to be updated in
response to such changes.
And try out Whole
Module Optimization
in your release builds.
It will enable the compiler to
make further optimizations --
for instance, more
aggressive specialization --
and by allowing the compiler
to better understand your
API's class hierarchy,
without any work on your
part, you can benefit
from increased elimination
of dynamic dispatch.
Now I'd like to turn this
presentation over to Joe,
who will show you how you
can use these techniques
and instruments to
improve the performance
of your application today.
[Applause]
>> JOE GRZYWACZ:
Thank you, Michael.
My name is Joe Grzywacz.
I am an engineer on
the Instruments Team,
and today I want to take you
through a demo application
that's running a little slowly
right now, so let's get started.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
right now, so let's get started.
All right.
So here we have my Swift
application that's running
slowly, so what I want to do
is go ahead and click and hold
on the Run button
and choose Profile.
That's going to build my
application in release mode
and then launch instruments
as template choosers
so we can decide how we
want to profile this.
Since it's running slowly,
a good place to start is
with the time profiler template.
From Instruments,
just press Record,
your application launches, and
Instruments is recording data
in the background
about what it's doing.
So here we can see
we are running
at 60 frames per second
before I've started anything,
which is my target performance.
But as soon as I add these
particles to the screen,
they are moving around and
avoiding each other just
like I wanted, but we
are running at only
about 38 frames per second.
We lost about a third
of our performance.
Now that we have
reproduced the problem,
we can quit our application
and come back to Instruments.
Let me make this a
little bit larger
so we can see what's going on.
You can just drag
this, drag that around.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
You can just drag
this, drag that around.
View Snap Track to Fit is handy
to make your data fill
your horizontal time.
Now what are we looking at?
Here in the track view,
this is our CPU usage
of our application.
We can see on the left before I
did anything, CPU usage was low;
after I added those particles,
CPU usage became higher.
You can see what those values
are by moving your mouse
and hovering it inside
this ruler view.
You can see prior we were around
10% or so, not doing much.
Later on we moved around 100%.
So we saturated our CPU.
In order to increase
our performance,
we need to decrease how
much work we're doing.
So what work were we doing?
That's where this detail
pane down below comes in.
So here's all of our threads.
Go ahead and open
this up a little bit.
You are probably familiar
with this call stack
from seeing it inside of
Xcode in the debugger.
Start, calls main, calls NS
application main, et cetera.
But what Instruments
is also going
to tell you is how much time
you were spending inside
of that function,
including its children,
right here in this first
column Running Time.
We can see 11,220 milliseconds,
or 99% of our time,
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
We can see 11,220 milliseconds,
or 99% of our time,
was spent in NSApplication
Main or the things it called.
The second column, Self,
is how much time the
instrument sampled inside
that function itself, so
it excludes its children.
So what I want to
do is see where does
that self number get
larger, and that means
that function is actually
performing a lot of work.
You can continue opening these
up one by one, hunting around,
but that can take
a little while.
Instead we recommend you come
over here to the right side,
this extended detail view,
and Instruments will show you
the single heaviest stack trace
in your application.
That's where it sampled
the most number of times.
You can see again here
is our main thread,
it took 11,229 milliseconds.
It began in Start.
Symbols in gray are
system frameworks.
Symbols in black here,
like Main, are your code.
And what I'd like to do is just
look down this list and see
if it's kind of a big jump.
That means something interesting
happened around this time.
If I scan down this list,
the number is slowly
getting smaller,
but there's no big jumps going
on, until I get down here
where I see a jump from
about 9,000 to about 4,000.
So something happened there.
I am going to go ahead
and click on my code,
and Instruments has
automatically expanded the call
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
and Instruments has
automatically expanded the call
tree on the left side so you can
see what you just clicked on.
Let me frame this up.
And what's going on here?
Well, if I back up just a
little bit for a moment,
here is my NSFiretimer call,
what's driving my simulation,
trying to get at 60
frames per second.
Down here is my particle
Sim.app delegate.update routine,
that's my Swift routine
driving my simulation.
But in between is this weird
@objc thing sitting here.
I want to point out
that's just a thunk.
Basically, it's a compiler
inserted function that gets us
from the Objective-C
world here in NSFiretimer
down to the Swift world
down here inside of my code.
That's all it is.
Otherwise, we can ignore it.
Now, we can see my update
routine is taking 89%
of the time, so continuing
to optimize this
function is a good idea.
So everything else above it is
not really interesting to me.
I am going to go ahead
and hide it by focusing
in on just this update routine
by clicking this arrow
here on the right.
Everything else around
this has been hidden.
Running time has been
renormalized to 100%,
just to help you do a
little less mental math.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
just to help you do a
little less mental math.
If we look in on what's
going on in this function,
Update Phase Avoid calls
Find Nearest Neighbor,
that calls down into something
really interesting here.
We see Swift release is
taking 40% of our time,
and Swift retain is taking
another 35% of our time.
So between just these two
functions, we are doing
about three-quarters
of our update routine is just
managing reference counts.
Far from ideal.
So what's going on here?
Well, if I double-click on my
Find Nearest Neighbor routine
that calls those
retains releases,
Instruments will show
you the source code.
However, Swift is an automatic
reference counted language,
so you are not going
to see the releases
and retains here directly.
But you can, if you go over
to the disassembly view,
click on that button there,
Instruments will show you what
the compiler actually generated.
And you can hunt around in here
and see there's a
bunch of calls here.
There's 23% of the
time on this release.
There's some more
retains and releases here.
There is another
release down here.
They are all over the place.
So what can we do about that?
Let's return to our code here
and go to my particle file.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
Let's return to our code here
and go to my particle file.
Here is my class Particle,
so it's an internal
class by default.
And it adheres to some
collidable protocol.
All right.
Down below is -- this is the
Find Nearest Neighbor routine
that was taking all
of that time before.
Now, I know that when the update
timer fires, that code is going
to call Find Nearest Neighbor
on every single particle
on the screen, and then there's
this interfor loop that's going
to iterate over every single
particle on the screen.
We have an N-squared
algorithm here or effectively,
the stuff that happens
inside this for loop is going
to happen a really
large number of times.
Whatever we do to optimize this
thing should have big payoff.
So what is going on?
We have our for loop itself
where we access one
of those particles.
So there's some retain
release overhead.
There are property
getters being called here,
this dot ID property.
And as Michael was
talking about,
since this is an internal class,
there might be some other
Swift files somewhere
that overrides these property
getters, so we are going
to be performing
a dynamic dispatch
to these property getters,
which has retain/release
overhead as well.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
Down here there is this
distance squared function call.
Despite the fact that it lives
literally a dozen source code
lines away, once
again, we are going
to be doing a dynamic dispatch
to this routine with all
of that overhead as well as
the retain release overhead.
So what can we do
about this code?
Well, this code is complete.
I wrote this application,
I am finished,
my particle class is complete,
and I have no need
to subclass it.
So what I should do is
communicate my intention
to the compiler by marking
this class as final.
So with that one little
change, let's go ahead
and profile application
again and see what happened.
This time, the compiler was
able to compile that file,
knowing that there are
no other subclasses
of that particle file --
particle class, excuse me --
and that means it's able
to perform additional
optimizations.
It can call those
functions directly,
maybe even inline them, or any
other number of optimizations
that can reduce the
overhead that we had before.
So if we record, this time
when I add the particles,
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
So if we record, this time
when I add the particles,
we can see they are
moving around and running
around at 60 frames per
second at this time,
so we got back 20 frames
per second with just
that one small change.
That's looking good.
However, as you may guess,
I have a second phase
here called collision
where we swap the algorithm
and now they are
bouncing off one another,
and again our frame rate
dropped by about 25 percent
down to 45 frames per second.
We reproduced the problem again,
let's return to Instruments
and see what's happening.
We will do what we do before,
make this a little bit larger,
Snap Track to Fit, and
now what do we see?
Over here on the left, this
was our avoidance phase.
Things are running much
better, around 30%, 40% or so,
so that's why we are hitting
our 60 frames per second.
But over here on the right,
this is our collision phase.
And now this is capping
out at 100% of our CPU,
and that's why our frame
rate is suffering again.
We did what we did a moment ago
right now, this call tree data
down here in the detail
pane is going to have data
from this avoidance phase,
which is running fine,
as well as this collision phase,
which is what I really want
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
as well as this collision phase,
which is what I really want
to actually be focusing on.
So that avoidance
sample over here is going
to water down our results.
Instead, I would like to set a
time filter so I am only looking
at my collision phase.
That's really simple to do.
Just click and drag
in the timeline view,
and now our detail
pane has been updated
to only consider the samples
from our collision phase.
Now we can do what
we did before,
head over to our
extended detail view.
Look down this list,
see where we see a jump,
and something interesting
happens here,
we went from about
8,000 milliseconds
to 2,000 milliseconds.
So I am going to click on my
collision detection class here.
Instruments once again
automatically expands this call
tree for us.
And if we just kind of look
at what's going on here,
88% of my time is spend inside
of this runtime step routine.
This is a good place to dig in.
I'll do what I did
before and click
on this Focus arrow
here on the right.
Now we are looking at just
our runtime step routine,
and let's see what it's doing.
All right.
Well, 25% of its time
is being spent inside
of Swift.array.underscore
getelement.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
of Swift.array.underscore
getelement.
When you see this A
inside of angle brackets,
that means you are calling
into the generic form
of that function and all
the overhead that entails.
You will see this
again here inside
of Swift array is
valid subscript,
there's that A inside
of angle brackets.
It also happens when you have
that A inside of
square brackets.
So we are calling a generic
property getter here.
So just between these
three generic functions,
we are looking at about 50% of
our time is being spent inside
of these generic functions.
So what can we do about
getting rid of that overhead?
All right, back over to Xcode.
Here is my collision
detection file.
Here we can see that
collidable protocol
that my particle
was adhering to.
Here is that generic
class, class detection,
type T that adheres to
a collidable protocol.
What does it do, well it has
this collidables array here,
that's of generic type T.
And here down below is
our runtime step routine,
and that's where we were
spending all of our time.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
and that's where we were
spending all of our time.
So what does this function do?
Well, it iterates over all
our collidables, accesses one
of the collidables from
that array, calls a bunch
of property getters here.
Here's some more.
There is an interfor
loop, where we do kind
of the same thing again, we pull
out another second
collidable from that array.
Then all sorts of property
getters down below.
We're doing a lot of generic
operations here, and we'd really
like to get rid of that.
How do we do that?
Well, this time you can see my
collision detection class is
here inside of this Swift file.
However, the users of this,
where I am using this class
is inside this app delegate
routine, this particle Swift
file, so it's in other parts
of this module, so we
are going to have to turn
to Whole Module Optimization.
Doing that's really easy,
just click on your project.
Go over here to build settings.
Make sure you are looking at
all of your build settings.
Then just do a search
for optimization.
And here is that setting that
Nadav showed you earlier.
You just want to switch
your release build
over to Whole Module
Optimization.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
over to Whole Module
Optimization.
And now when we profile, the
compiler is going to look
at all those files together and
build a more optimized binary,
but let's check and
see what happened.
So we will launch time profiler
for the third time here,
start our recording, and
60 frames per second,
we add our particles, this
avoidance phase still running
at 60 frames per second.
Good, I expected
that not to change.
Always good to verify.
Then we move over to
our collision phase.
Now that is running at 60
frames per second as well.
All it took was a couple
minutes of analysis
and a few small tweaks,
and we made our application
a lot faster.
[Applause]
All right.
So to summarize what
we saw here today,
we know that Swift is a
flexible programming language
that uses -- that's safe
and uses automatic
reference counting
to perform its memory
management.
Now, those powerful features
are what make it a delight
to program in, but they
can come with a cost.
What we want you to do is focus
on your APIs and your code
that when you are writing them,
you keep performance in mind.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
that when you are writing them,
you keep performance in mind.
And how do you know what
costs you are paying for?
Profile your application
inside of Instruments,
and do it throughout
the lifetime
of your application development
so that when you find a problem,
you find it sooner and you
can react to that more easily,
especially if it involves
changing some of your APIs.
There's documentation
online, of course.
The Developer Forums where you
can go, and you will be able
to ask questions about
Swift and get them answered,
as well as Instruments.
And speaking of Instruments,
there's a Profiling
in Depth talk today
in Mission at 3:30.
There is an entire session
devoted to Time Profiler
and getting into even more depth
than we're able to
get into today.
And as Michael talked
about earlier,
there is a Building Better
Apps with Value Types in Swift
that will also build
upon what you saw today.
So thank you very much.
[Applause]