WWDC2014 Session 417

Transcript

X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
>> Good morning!
Welcome!
[ Applause ]
Glad to see a number of folks
out now, bright and early,
to talk about all the
heart-pounding excitement
in the world of compilers.
And I'm Jim Grosbach, and I'm
really happy to be here today
to share with you all of the
new things that we have in LLVM.
When we normally talk about LLVM
and what first comes to mind
when we think about it is the
Apple LLVM Compiler itself.
This is what we all use to
build our apps and that's
where we really first
encounter LLVM,
but it's much more than that.
LLVM is used in a wide
variety of products and tools
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
LLVM is used in a wide
variety of products and tools
that we all use every day, both
as developers and as end users.
Over the years LLVM has grown to
be a really key technology here
at Apple for building
tools, for performance,
and for modernization, and that
has been no exception this year
as we have moved swiftly along
with a wide variety
of new improvements.
To start with, back in September
we introduced the Apple A7
processor which has been
just absolutely magnificent
in what it's allowed us to do,
bringing truly desktop-class
performance
to your mobile devices and
LLVM plays a key role in this.
And now we're encouraging more
of you to use this technology
in your apps, so
building for 64-bit
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
in your apps, so
building for 64-bit
in iOS is now the default.
As of Xcode 5.1 carrying
on into Xcode 6,
when you rebuild your app,
if you're using standard
architectures,
ARM64 will be included.
This does not impact
your deployment story.
You can continue to
deploy back to iOS 4.3.
We still build for
arm V7 for 32 bit.
All of the development work
flows that you're familiar
with for the simulator,
the debugger, profiling,
all of these things continue
to work transparently,
just as you're familiar with
in a 64-bit environment.
Now, one thing to be aware of is
that because ARM64 is an
entirely new architecture,
your entire application
must be built 64 bit,
not just a few libraries here,
or a few files there,
but the whole app.
So, if you're relying
on third-party libraries
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
So, if you're relying
on third-party libraries
and those libraries have
not yet adopted 64 bit,
please work with your vendors
and encourage them to update
and support 64 bit development
so that your app
can then migrate
as well and get the benefits.
Now, during migration there
are a few things that we'd
like to bring to your
attention that might come up,
a few advancements we've
made, and a few things
that we've tightened
up in the specification
and what the possible
impact of that tier app is.
To start with, in 64-bit iOS all
functions must have a prototype.
This has been good style
since time immemorial
and it's been required
for C++ since the start.
It's been highly suggested in C,
for any modern version not
using a prototype is deprecated
and has been for a
very long time now.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
and has been for a
very long time now.
So, we've taken advantage
of this in ARM64
to generate more efficient
calling convention code,
in particular for variatic
functions like printf,
that the number of arguments
to the function varies
by call site.
So, when you have older
code that you're using
that may not use prototypes,
what is normally a warning has
now been promoted to an error,
so the compiler will highlight
to you in your code exactly
where this is happening so
that you know which prototypes
to go add to your
header files to move on.
One place that this
does sometimes come
up in a little bit more
of a subtle way is when C
and Objective-C interworking
code with direct indications
of Objective-C message send.
To help find this, we
have a new Xcode setting
to enable strict checking
of objc underscore msgSend.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
This is a recommended setting
and when you first
upgrade your project
to [inaudible] code
6 we'll encourage you
to adopt this setting.
And what's tricky is
that every indication
of objc underscore
msgSend effectively has a
different type.
It has the type of what the
final receiving method is going
to be.
For example here,
a trivial piece
of code that's invoking
method foo,
with strict checking enabled,
the compiler will now tell us
that we need to tell it
what the final type is.
This is straightforward to do.
It's a little bit verbose,
but very straightforward.
We simply add the type of
the final receiving method.
Done it here with a typedef.
This could be done with a direct
type test on all on one line,
if you prefer, just to make sure
that the compiler knows what
the final receiving type
of the method is so that it
can generate the right code
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
of the method is so that it
can generate the right code
to get the final result correct.
Another place that we've
tightened things up
and taken advantage
of our new ABI
and ARM64 is the
Objective-C Boolean type.
If any of you were at Stump
the Experts last night,
this topic actually
came up as a question.
It was rather amusing like,
"I have a slide on that!
That'll be great!"
So, BOOL is basically
now a BOOL type.
Previously, it's been
a signed character.
And, sometimes our code -- our
code as well, not just in yours,
would put values
into the Boolean type
that weren't strictly Boolean.
Now, the compiler is going to
be taking advantage of this type
or definition, so
what can happen is
that if your code does that,
the results between 32-bit iOS
and 64-bit iOS may differ.
So, if you start seeing some
odd behaviors with Booleans,
this is something
to look out for.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
this is something
to look out for.
We also have pointers.
As we're now 64-bit
architecture, this is kind
of the core of what this
is all about, that pointers
and longs are now 64 bits.
So, old code would
often do horrible things
like casting integers to
pointers, and back-and-forth.
And hopefully, we don't write
code that does that anymore,
but we all have this legacy
code that we have to live with,
and now this can bite
us if we're not careful.
This is very similar
to what we've all dealt
with on the 32-bit to
64-bit Intel transition,
if we went through that.
That's still a problem;
we haven't magically just
solved that in the compiler.
So for example here, we're
casting an integer which came
from a pointer somewhere else.
We're casting that to a void.
But now the compiler
can help a little bit.
It can at least inform us
that the problem is coming up
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
It can at least inform us
that the problem is coming up
and tell us that, "Oh, we have a
problem here that we need to go
and look at and make sure
that this is really
what's happening."
Now, if we ignore this warning,
the runtime in the
kernel is going
to be a little bit more
forceful about this.
If we dereference that pointer
we're going to get a hard fault
because the page zero is
mapped to always give a fault
so if we miss any of these
through other warnings,
we'll still get an error.
Paying attention
to the compiler,
it's going to be a lot
friendlier because it'll be nice
and friendly and tell you the
line number and the source file
for where the problem is.
The kernel's just going to
tell you you did something bad.
To address this we use
the C language typedefs
that are 64 and 32-bit cleaned.
We say we want a signed
integer, an unsigned integer,
that is an appropriate type
for saving a pointer value
or for indexing into an array
for comparing the differences
between two pointers.
For example, if we would
modify our previous code
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
For example, if we would
modify our previous code
to simply use the intptr type,
which when we're compiling
for 32-bit iOS, will we
get 32-bit signed integer,
and for 64-bit iOS will be
a 64-bit signed integer.
Slightly more subtly, this can
come up in structure layouts.
When we use a long or pointer
these now grow, which change
but the size and
sometimes the alignment,
the offsets of other
fields in our structures.
And, we have to be careful
that this is done in
a way that's safe.
Now, most of time this is
going to work transparently,
because these structures
are used entirely
within our application
and everything gets the new
definition and works fine.
But, if we're doing something
like a representation
of an on-disk file format
communicating across a network
to another process that is going
to rely on the exact layout
of a structure, that
can go badly.
So again, on any of those
data structures we want
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
So again, on any of those
data structures we want
to use the C fixed type,
fixed size types to make sure
that we get what we want,
whether we're building
for 64-bit iOS or
for 32-bit iOS.
So in summary, building
for 64-bit iOS is easy,
it's a default, and the
compiler will help find
and resolve any issues.
But, this isn't the only
thing that we've been up to.
We've also been making
advances in Objective-C
and the compiler
can help here, too.
The language has
continued to move forward.
Some of this really helps with
the interoperability with SWF
as well, as you may be
seeing in that talk.
I highly encourage
you to check it out.
It's happening at the same time
as this one, so go and look
on the video when that
comes on the WWDC app.
And, whenever we write new
code, we've been using all
of these advancements
in the language
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
of these advancements
in the language
to get the modern best
practices, more expressive code,
but then we have all of this
older legacy code that we'd
like to adopt all of these
features in, as well.
But, that's a lot of
code to go read through
and manually find all of these
things, so we have a tool
that will help us
identify the opportunities
where we can use
these new features.
And, I think the best
way to talk about that is
to show you with the demo.
Now, rather than use some
contrived example code here,
I thought we'd maybe
look at something
that we all are familiar with,
at least as users,
and our WWDC app.
That code has been
with us for a while.
We update it every year.
And, with the modernizer we
wanted to use that to look at it
and find out if there
are perhaps some places
in the codebase that we
missed for opportunities
to use new Objective-C features.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
to use new Objective-C features.
So, let's look and
see what a few
of those things that
we found are.
If we go under Edit to refactor,
we can convert our project
to modern Objective-C syntax.
We get a dialog box telling
us what we've just selected,
so make sure that we've
got the right thing.
We can select whether --
which targets in our
project to modernize.
In this case, we're looking
at the WWDC app, itself.
In the previous versions of
Xcode the modernizer would go
through and just look
for Objective-C literals
and subscripting.
But now, we have more options.
Now personally, I prefer not
to do all of these at once.
That tends to be a little too
much to swap back-and-forth,
so I tend to want to
select a few things.
I'm going to look for
instance type here
that we can get our
initialization methods more
strongly typed.
I'm going to try and find if we
missed any read/write properties
where we convert explicit
getter/setter methods.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
where we convert explicit
getter/setter methods.
And, we're going to
look to use NS ENUM
for our enumeration values
so the compiler can cooperate
with the runtime to
give better results.
Click Next, and the compiler
will run over our code
and it turns out we do, indeed,
have a few more suggestions
for what we can look at.
Now, do keep in mind that
these are just suggestions,
that we need to go
through and look
at the side-by-side diff here,
where we have the new code
on the left, the old code on the
right, and we look through here.
This looks fine.
Everything looks good here.
We're converting to ENUMS.
Let's look at our next one.
This looks a little
bit different,
because we still have this NS
integer over here that looks
like it'd be straightforward
to clean up,
but I'd rather come
back to this later.
I just want to deal
with the things
that we can do automatically
right now.
So, I tell the modernizer
to discard that change.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
So, I tell the modernizer
to discard that change.
It wants to make sure
that I'm doing that.
Yes, I am absolutely sure.
I can do the same here.
We could also tell it to ignore
all of the changes in this file
with this Check button here.
And now, it's also found a place
where we can use
an instance type.
And, that all looks good
so we tell it to save.
And, Xcode will now
tell us that, "Oh!
We can update our project as
well, and take snapshots."
That sounds great.
Let's let it do that
because backups are good.
And now, our project is
saved, it's been rebuilt,
and the Objective-C modernizer
works to update our code
and help us find places
where we can take advantage
of new features.
But, this isn't the only
place that we've made advances
for Objective-C and
for interoperability.
And, to tell you more about that
I'd like to invite my friend
and coworker, Bob Wilson.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
and coworker, Bob Wilson.
Thank you, Bob.
[ Applause ]
>> Thank you, Jim.
So, modules are another way
that LLVM can help
modernize your code.
We introduced modules just last
year, but in cases you missed
that let's start
with some background.
So, before modules we
had precompiled headers,
which are often an effective
way to speed up the compilation
of your code, but they
do have some limitations.
You can only have one
precompiled header at a time,
and more importantly,
the whole approach
of using a textual inclusion
of a header file as a way
of importing a framework
is just fragile.
We have a deal with the issue
where a header file gets
included more than once
in a single compilation.
We have also a problem
of headers being fragile.
And, what I mean by
that is that the meaning
of the header can change
depending on the environment
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
of the header can change
depending on the environment
where it's imported, and let me
show you that with an example.
So here, I've defined a macro
count to the value of 100,
and then I import the
foundation framework.
Now, inside the foundation
header there's an include
for the NSArray definition.
An NSArray has an
ivar, the count.
So, the macro of count gets
substituted as literal text
in that place and we end up
with completely broken code
where instead of the ivar
name, we have a value of 100.
This is what I mean by
headers being fragile.
Modules solve this problem
by replacing the model
of textual inclusion
with a semantic import.
And, there's a lot more detail
about modules in the Advances
in Objective-C presentation
from last year's WWDC
and I encourage you to watch
that if you're not
familiar with modules.
Until now, modules have
only been available
for the system frameworks.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
The new in Xcode 6, you
can now define modules
for your own frameworks as
well for C and Objective-C.
Besides fixing the
problems we just looked at,
this also gives you a way
of importing your own
framework into your SWF code.
And as Jim mentioned,
there's another session
on integrating SWF with
Objective-C that I encourage you
to watch the video to
learn more about that.
So if you want to do this, how?
It's really very easy.
For most frameworks
it's possible
to define a single
umbrella header
that imports all of
the framework API.
And, this is what we
recommend that you do
as it is the easiest
way to adopt a module.
Once you've done that, simply
go to the Xcode BUILD settings
for your framework and in the
packaging section set Defines
Module to Yes, and that's it.
It really is very easy.
Now, if you have a more
complicated framework
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
where that single umbrella
header is not sufficient,
you can use a custom module map.
And, there's more
information to describe how
that works on the LLVM website.
After you've created a
module you'll want to use it.
How do you do that?
There's an @import keyword
followed by the module name
that tells the compiler, "I
want to import this module."
If you haven't had a
chance to update your code
and you're still --
have a #import to include
the umbrella header,
the compiler's smart
enough to know
that this is now a modular
framework and it will go ahead
and treat that as an implicit
modular import anyway.
So just as a guideline though,
we do recommend you use @import
when you're importing your
framework into a separate target
within your project just because
it makes it clear in the source
that you really intend for
this to be a modular import.
One exception to that is
within the implementation
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
One exception to that is
within the implementation
of your framework, itself.
It doesn't make any sense to
import a framework into itself
and so, in that case, you
really need to use #import
to textually include
the framework headers,
just within the implementation
of the framework.
And, besides those guidelines,
we have a few other rules
about modules that you
should be aware of.
First, don't expose
any non-modular headers
in your framework API.
It's fine to import
another module, like Cocoa,
but if I have an import of
something like Postgres.h,
which presumably
is not a module,
you can put that down
inside the implementation
of your framework, but
don't expose it in the API.
One other issue is that
modules can change the semantics
of your code.
We saw earlier the
problem of a fragile header
where a macro definition
inadvertently broke the code.
Sometimes you might want
to do this on purpose,
and I'm showing here an example
where I've defined a macro,
DEBUG, as a flag to enable
additional debugging APIs
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
DEBUG, as a flag to enable
additional debugging APIs
in my framework.
By switching that
framework to be a module,
the DEBUG macro defined
in my source code no
longer has any effect,
which is not what I wanted.
Now, that limitation
only applies to macros
that are defined
in the source code.
So, if you really want to
do something like this,
one alternative is to define
the macro on the command line
or in the Xcode build settings.
So, that is user-defined
modules.
It's really pretty
straightforward
in the common case, and it
gives you fast compilation,
clear semantics, and a way of
interoperating with SWF code.
So far, we've been
talking a lot about ways
that LLVM helps you
modernize your code
and adopt modern Objective-C
modules, but let's turn now
and look at performance,
which is the other theme
of this presentation.
Profile Guided Optimization, or
PGO, is a new feature in Xcode 6
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
Profile Guided Optimization, or
PGO, is a new feature in Xcode 6
and it gives you a way
of getting even more
performance out of your code.
Let me give you an overall
high-level understanding
of what this is about.
One of the inherent
challenges for the compiler is
that it has no way of
knowing what the input
to your program is going to be.
The only input to the
compiler is your source code.
So, the compiler has to assume
that all inputs are
equally likely.
There are some cases
where it can guess
that certain code paths will
be more common than others.
For example, it can assume that
going through a loop is going
to happen more often than
code outside of that loop.
But, those are just guesses
and there are a lot of things
that it simply can't know.
If we provide a profile
as an additional input
to the compiler it can now try
to optimize for the common case
and do a better job
of optimization.
And, what I mean
by a profile, here,
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
And, what I mean
by a profile, here,
is simply a count how
many times each statement
in your app executes in a
typical run of your app.
You may be wondering, "How do
I get a profile like that?"
Again, we could use
the compiler here
to generate a special
instrumented app
that as it runs is going
to count how many times each
statement will executes.
And then, when your app finishes
with this special instrumented
version, it will write
out that profile which
we can then use for PGO.
So, how does the compiler
use that profile information?
There are an awful lot of ways.
So many optimizations
can benefit from this,
but I'm highlighting
just three here
that are particularly valuable.
One is to the inliner.
If we know that a
function is really hot,
and by that I mean it's
run a lot, over and over.
The inliner can be much more
aggressive about inlining that.
When we're generating
the code we can try
to layout the common
paths through your code
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
so that they're contiguous,
which makes it easy
for the processor
to run them fast.
And the register allocator
can also try to keep values
in registers throughout
those most common paths.
Let's look at an example just
to give you a better
understanding of this.
This is some C++ code that's
going to iterate over a set
of colored objects and
for each one it's going
to update the position
of the object.
So, at the top I've got
a loop over the objects,
and for each one I'm going
to call my Update
Position function.
And, Update Position is
going to look and see
if the object is red it moves in
a very simple horizontal line,
so the code is really simple.
But, if the object
is blue, let's assume
that the movement is
much more complicated,
I've got a very large
block of code here.
Now, the compiler has no way
of knowing whether red objects
or blue objects are more likely,
so it just assumes they're
both equally likely.
But, with PGO I might
be able to know
that red objects
are far more common.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
that red objects
are far more common.
And so, I'm highlighting
in red here the hot code,
which is the code to iterate
over the set of objects and then
to handle the red objects.
I'm going to color-code
the cold code in blue,
which is blue objects
which are rare
for some reason in
this application.
And then, let's look at how the
compiler would handle this code.
Here's kind of the default code
layout that matches, roughly,
the original source order.
We've got the hot loop outside,
and then the Update
Position function down below,
with a little bit
of hot code in it.
Inlining is one of the most
important optimizations
and we'd really like to inline
that Update Position function.
But, the compiler
can't inline everything
or the code would bloat beyond a
point where it would be useful.
But in this case, the Update
Position function is big
because of all that cold code
for handling the blue objects
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
and so it wouldn't
normally be inlined.
But, because PGO tells us
there's some really hot code
here, the inliner can be much
more aggressive about that
in this particular case.
So, we take the loop iterating
over the objects and split
that in half and move the Update
Position code right inline.
So, this is much better now.
We've got a lot of the
hot code right together,
but we've still got a big chunk
of this code for blue objects,
the cold code, right in
the middle of our loop.
And, PGO can help this, as well,
by changing the code layout.
It knows that that code is cold
and can move it down below,
out of the way, and we end
up with a nice tight loop
that can run really fast.
And, it also typically
enables other optimizations
on that hot code.
So obviously, this is
a simplified example,
but hopefully gives you a
feel of the power of PGO
and just how much it
can help the optimizer.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
So, you may want to use it.
When does it make sense?
The compiler does a really
good job optimizing by default.
With PGO, if you do just
a little bit of extra work
to gather the profile
you can do even better.
So obviously, if you're happy
with the performance
you're already getting,
you're probably not
motivated to do that --
even that little
bit of extra work.
But, if you need
more performance,
by all means, give it a try.
And, let me show
you some examples
of just how much it can help.
This is a graph showing
the speedup.
Compare it with PGO
compared to a case
of just a normal optimize build.
And, I'm looking at four
different applications here;
the Apple LLVM compiler
itself, applying PGO
to the compiler itself,
the SQLite database,
the PERL interpreter, and
gzip file compression.
And, PGO gives us
speedups ranging
from about 4% all
the way up to 18%.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
So, not all apps will
benefit this much.
It really varies,
depending on the app,
but clearly there's a
lot of potential here.
So, if you want to try it,
how do you go about that?
PGO is really easy to use.
The first step is to
collect a profile.
I'm going to come back and talk
about that in just a minute.
Once you've done that, simply
go in the Xcode Build settings
for your project and find
the Use Optimization Profile
setting, and set it
to Yes, typically just
for the release configuration.
And that's it!
You've enabled PGO.
Once you've done that,
as you continue developing
your app you may change it
as you fix bugs, you
add new features,
the code becomes
gradually out of sync
with the profile you've
collected earlier.
And, when that happens, the
compiler will simply fail to use
that profile information.
It won't break anything,
you just gradually lose
the optimization benefit.
And when that happens, it
will give you a warning.
So, if you see warnings
like this,
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
So, if you see warnings
like this,
saying that your profile may
be out of date, as you see more
and more of them, it's a good
indication to you that's time
to go back and update
your profile.
So, let's turn now and look at,
how do you generate the profile?
Xcode 6 has a new command,
Generate Optimization Profile.
When you run this command,
Xcode will build the special
instrumented version of your app
and then run it, and
you can then interact
with the running app to
generate the profile.
When it finishes running, it
will write out the profile
and add it to your project.
As you're running your app,
keep in mind it's important
to exercise all of the
code that's important
for your performance.
If I have a game with
three different levels
and I only play the first level
of my game, the compiler's going
to assume that that's the
only thing that really matters
and not work as hard
on the other levels.
Now, you may be wondering, "If
I've written a really hard game,
it may take a while to play
the whole thing to completion."
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
it may take a while to play
the whole thing to completion."
That could be a problem, right?
So, Xcode has another option,
which is to use your
performance tests as inputs
to drive the profiling.
Performance tests are a
new feature in Xcode 6.
If you'd like to
learn more about them,
there's a session right
here tomorrow morning
on testing in Xcode 6.
And, if you care about
performance you want to set
up these performance tests
anyway, to catch regressions
in your code, just to keep
track of how you're doing.
And once you've gone to
that trouble to set them up,
in most cases they're
pretty good inputs
for driving this profile.
Again though, keep in
mind it's important
that your tests cover
the code in a way
that reflects the
typical usage of your app.
Going back to my three-level
game, if I write lots of tests
for the first level and
only a few for the second
and third level, again,
the compiler's going to end
up optimizing more heavily
for that first level.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
Another benefit of using tests
is it gives you a great way
of evaluating, how
much does PGO help me?
You can just run your tests.
Now, let me show you
that with a demo now.
So, with the release
of the SWF language,
we thought it would be
fun to make a demo app
that would celebrate that.
And so, rather than
the SWF language,
we thought of the SWF birds
and we made an application
that uses the Boids
Artificial Life Simulation
to simulate a flock of SWFs.
And, I can create a
whole bunch of them here
and let them fly around.
And, the way this Boids
application works is
that each bird, or Boid,
compares its position to all
of the other ones on the screen
and it calculates the distance
between them to find the flock
of the birds nearest to it.
And then, each Boid
has competing urges.
On the one hand, it
wants to move closer
to the center of the flock.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
to the center of the flock.
At the same time, it doesn't
want to get too close.
And so, if it gets too close to
another one it will move apart.
And the performance of that,
as we add more and more
of these Boids, could
become a problem.
So, we set up a performance
test to track that,
and this is a really
simple performance test.
We set up a scene with 200 Boids
and measured the time it takes
to update their positions
100 times,
and that's our performance test.
So, let's run that.
Because I care about
performance,
I'm going to edit my
current scheme to make sure
that my test step is going
to use the release-built
configuration
so that we get optimized
results.
And, I'll go to the
Product Test menu
and run my performance
test here.
All right.
And now, because I haven't run
the test before I don't have a
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
And now, because I haven't run
the test before I don't have a
baseline, so let's go ahead
and set the baseline
based on that first run.
And now, let's try adding PGO.
Under the Product
menu, Perform Action,
down at the bottom here is this
new command I told you about,
Generate Optimization Profile.
I get two choices; I can
either run the application
or I can use my performance
test.
And, I'd like to show you how it
works with the performance test.
I just click Build and Run,
and Xcode, very helpfully,
warns me that I haven't
yet enabled PGO
in the Build settings
and it offers to do that.
So, let's go ahead and
let it enable that.
It's now building a special
instrumented version of our app
and running it using
the performance test.
And when those tests finish --
ah, I got a warning
here, an error.
Let me just explain
what's happened here is
that because we've
run the app with a lot
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
that because we've
run the app with a lot
of the instrumentation
code, it runs more slowly.
But, this is just being
used to generate the profile
so that's not a problem.
I'm going to go back to the
Project Navigator a minute
and show you that Xcode has
added this new Optimization
Profiles folder.
And inside of that, if you can
see it, there's my profile data.
So, that's great!
PGO is enabled, we
have a profile.
Let's rerun those
performance tests.
We'll go back to
run Product Test,
and see how much does it help?
And the tests are running now.
And, wow, we got a 21%
improvement just like that.
We didn't have to change the
code or do anything else.
[ Applause ]
So, that is PGO.
It's a great new feature to help
you get even more performance,
when you care about
getting every last drop
out of your code.
Continuing on this
theme of performance,
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
Continuing on this
theme of performance,
I'd like to turn the stage over
to Nadav Rotem, my colleague,
to talk about advances
in vectorization
>> Thank you, Bob.
Hi.
[ Applause ]
So, Last year with Xcode 5 we've
introduced a new optimization
called loop vectorization.
And, I would like to remind
you what loop vectorization is.
So, modern processors
have vector instructions.
These instructions can process
multiple scalars at once.
And loop vectorization is
the compiler optimization
that accelerates loops using
these vector instructions.
And let's see how it's done.
If you can see the code
on the screen here,
you'll see that it's
a simple program
that accumulates all
the numbers in the array
into one variable, into sum.
And, the natural way of
executing this code is
to load one number at a time and
save it into the variable sum.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
to load one number at a time and
save it into the variable sum.
And then, load another
number and save it into sum.
But, there's the better
way of executing this code.
What the loop vector does
for you automatically,
is that it introduces a new
temporary variable, temp4.
Now, this is a vector register,
a vector temporary variable.
And, this allows us to
load four numbers at a time
and add four numbers at a time,
and we do it for
the entire array.
So, this is obviously
much faster
because we're processing
four numbers at once instead
of processing one
number at a time.
And, when you finish
scanning the array we need
to take the four numbers
from that temporary register
and add them together,
but it doesn't matter
because usually an
array is pretty big.
So, this is how loop
vectorization accelerates loops
and makes your code run faster
so that you don't have
to change your code.
So, in Xcode 6 we've improved
loop vectorization in a number
of ways, where first of all,
we've improved the analysis
of complicated loops.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
we've improved the analysis
of complicated loops.
This means that the
LLVM will be able
to analyze more complicated
loops and vectorize more loops
in your code, which is great.
We've also integrated the
Loop Vectorizer with PGO,
that Bob just mentioned.
So, this means that when PGO is
available the Loop Vectorizer
will be able to make
better decisions
when vectorizing your code.
We've also improved the X86
and ARM64 in coding support.
Now this means two things.
First of all, the Loop
Vectorizer has a better
understanding of the processor
so it can predict better
when it is profitable
to vectorize your codes.
And the second thing
that it means is
that when it vectorize your
code it'll generate better,
more optimized code sequences,
so that your code
would run faster.
And, the last feature
that I want to talk to you
about is specialization
of loop variables.
So, most variables in your
code are only known at runtime.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
So, most variables in your
code are only known at runtime.
These variables can be arguments
or computed expressions,
and compiler doesn't know
the values of these variables
at compile time,
only at runtime.
And in many times,
the Vectorizer cannot vectorize
your code unless the value
of these variables is
known to be constant.
So, let's take a
look at the example
that I showed you earlier.
So, this is a simple loop and
I modified it a little bit
and I introduced
the Step variable.
So now, instead of consecutively
scanning all of the elements
in the array, we jump
and skip some elements,
and we go in step
of variable Step.
Now, we can't vectorize
this code
because these elements are
not consecutive in memory.
We can't use these vector
registers to load a few elements
and then add them together.
It's won't work unless
Step is equal to one.
Well, in many cases
Step is equal to one.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
So, what do we do?
Well, we've introduced a
new optimization that's
called Specialization.
What we do is we create
multiple versions of the loop.
In one version of the loop
we assume that step is equal
to one, and then we
vectorize the code
and make the code run faster.
But, in another version of the
loop we don't assume anything
and the code runs
as-is -- scalar.
And then, we add code
for selecting at runtime
which version of
the loop to run.
If Step happened to
be one, then we go
and execute the vectorized
version.
But, if Step is not equal
to one then we execute
the regular version.
And this compiler,
this new feature,
allows the Loop Vectorizer
to vectorize a lot more
loops, and it's great.
Okay. So, this was
loop vectorization.
But, in Xcode 6 we've also added
a new kind of vectorization.
This is -- this new vectorizer
is not a loop vectorizer.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
It's called SLP Vectorizer,
which stands
for Superword Level Parallelism,
and it extracts parallelism
beyond loops.
What this SLP Vectorizer
does is that it looks
for multiple scalars in your
code and it glues them together
into vector instructions.
Let's see how it's done.
So, on the screen you
see a very simple struct.
This struct has two
members, x and y.
They're consecutive in memory.
And, we have a simple
function that converts units
from feet to centimeters.
Now, this is a very
simple conversion.
All we have to do is
load the x member,
multiply it by a
constant, and do it again.
And, we do the same thing for y.
And of course, the natural
way of executing this code is
to do it consecutively;
load variable x,
multiply it, save it back.
Load variable y, multiply
it, and save it back.
But again, there's a
better way of doing it,
and this is what the
SLP Vectorizer does.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
and this is what the
SLP Vectorizer does.
We can load x and y together
because they're consecutive
in memory, multiply
them together again,
and save them back to memory.
And, this is SLP vectorization
SLP vectorization is very
beneficial for some
kinds of application,
mainly numeric applications,
and we see great speedups.
It may not speed
up all programs,
but it definitely
speeds up a lot
of numerically complex
applications.
So to summarize, we've improved
loop vectorization in Xcode 6
and we've introduced a new kind
of vectorization called
SLP vectorization.
Now in Xcode 5, when we
introduced the Loop Vectorizer
we did not enable it by default
and you had to go into one
of the settings and
select Loop Vectorization
and then Loop Vectorization
worked.
Well, in Xcode 6 you
don't have to do anything
because both the
new SLP Vectorizer
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
because both the
new SLP Vectorizer
and the improved Loop Vectorizer
are enabled by default
when you build your
application in a release mode.
This means that you don't
need to do anything.
Just compile your
application in release mode
and the improved LLVM will
make your code run faster.
Okay. So, we talked
about a number
of performance features in LLVM.
We talked about PGO, we
talked about vectorization,
but both of these
features are features
of a static C and C++ compiler.
But, LLVM is essential
technology here at Apple,
that's used by many projects.
And, one of the projects
that I want to talk to you
about today is accelerating
JavaScript code.
Well, WebKit is another
important technology.
It's the heart of the
Safari Web Browser.
And, WebKit needs to
execute JavaScript code
because JavaScript is
everywhere in every web page.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
because JavaScript is
everywhere in every web page.
And, WebKit has an interpreter,
so when you load your Facebook
page, or any other page,
WebKit starts executing your
code with the interpreter.
But, WebKit also has
two JIT compilers
to accelerate your code.
When WebKit sees that you
execute the same function,
the same JavaScript
functions over
and over again, it says, "Huh.
Let's take a little bit of time
to compile it really quickly
so that it will run a little bit
faster than the interpreter."
So, this is the fast JIT.
And, when WebKit sees that you
execute a function many times,
then it says, "All right,
let's also take the time
and optimize this
function real quick,
so that it will run a
little bit more faster,
a little bit faster."
So, we have the interpreter,
we have the fast JIT,
and we have the optimizing
JIT, and there are tradeoffs
between compile time and
the quality of the code.
And this works really great,
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
And this works really great,
except that JavaScript
is evolving.
People start writing large,
compute intensive
applications in JavaScript.
People then compile C++
programs into JavaScript
and run them in the browser.
You can even compile
a Quake3 and run it
in your browser today,
which is --
some people like it [laughter].
Yeah, it's great.
But, it's a new-use case
and we need a new compiler
to support this use-case,
and this is where LLVM
comes into the picture.
So, we're adding LLVM as a
fourth tier compiler to WebKit.
Functions that run many, many,
many times are now
compiled with LLVM.
And, LLVM is tuned for making
the most out of your code,
for really trying hard
to optimize your code
and to generate excellent
code quality.
And again, there's a
tradeoff between compiled time
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
And again, there's a
tradeoff between compiled time
and the quality of the code,
so WebKit really waits for you
to execute that function
many, many times as you do
in computing intensive
applications
that you run in the browser.
But, compiling JavaScript
with LLVM is very different
from compiling C or Objective-C
because JavaScript --
it's a great language,
it's a dynamic language,
and if you look at the code
on the screen you'll see
that there are no types.
There's this n argument
here, but what is n?
Is it an integer?
Is it double?
Is it a class?
It can be a lot of
different things.
So, how do we compile it?
Well luckily, WebKit executed
this function many, many, many,
many times before
with the interpreter,
so it knows that in the last
1000 times n was an integer.
So now, we can compile this code
assuming that n is an integer,
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
So now, we can compile this code
assuming that n is an integer,
except that someone made decide
to pass an n that's
not an integer.
Someone may decide to
pass a double or a class
and then everything will
break and we can't allow that.
So, what do we do?
We use a technique
that's very similar
to what we did with
the vectorizer.
We add checks.
We make assumptions
and we add checks.
We assume that n is an integer.
We assume that n
does not overflow.
And then, we verify our
assumptions at runtime.
Okay, that's great.
But, what is the fallback?
What do we do?
When our assumptions
fail we have to go back
to the interpreter because only
the interpreter can handle all
these cases, all
these extreme cases.
But, moving back to the
interpreter is not simple
because we started
executing it in a code
and the function made changes.
We can't just start executing
it from the beginning.
So, we developed a technology
that's called On-Stack
Replacement, which is
techniques that is used
to migrate the state of the
program from the JITed code
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
to migrate the state of the
program from the JITed code
in LLVM back to WebKit.
And, LLVM needs to track all of
the variables in your program
and some of them may be in the
register, some of them may be
in the stack, and now we're
able to migrate them from LLVM
to WebKit and continue
the execution in WebKit.
Now, this doesn't happen all the
time, it's a very extreme case.
But when it happens, we
have to handle these cases.
Okay, now compiling code
with LLVM is very beneficial,
especially for compute intensive
applications and especially
for these C++ applications
compiled
into JavaScript,
run in the browser.
And, we're really excited
about this technology.
It's great.
So now, we use LLVM.
So to summarize, we use LLVM
as a fourth tier compiler
in Safari, both for X86
and ARM64 on iOS and OS X,
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
in Safari, both for X86
and ARM64 on iOS and OS X,
and we get excellent
performance speedups.
To summarize this
talk, today we talked
about modernizing Objective-C
code and we also talked
about a number of
performance features.
If you have any more questions,
you can contact our Developer
Tools Evangelist, Dave DeLong
or you can go to the Apple
website or to the LLVM website.
There are a few related
sessions, and I encourage you
to attend these sessions
or to watch them online.
Thank you, very much,
and have a good week.
Bye-bye.
[ Applause ]