Transcript
[ Music ]
[ Applause ]
>> Good morning, and welcome to
this talk.
My name is Guillem Vinals
Gangollels.
And I work at the GPU Software
Performance Team here at Apple.
Good developers like you make
iOS an excellent gaming
platform.
And we at Apple obviously want
to help.
So this year we reviewed some of
the top iOS games and found some
common performance issues.
We analyzed a lot of data, and
as a result of that
investigation, we decided to put
this talk together.
So this is going to be the main
topic today.
Develop Awesome Games.
But I will only be providing
technical directions here.
So we'll [inaudible].
Before we begin, please let me
thank our friends at Croteam.
They are the developers behind
The Talos Principle, which is a
really awesome game.
You will see it featured in
these slides and in two of the
demos.
Notice that it has stunning
visuals but it does really not
compromise in performance.
And that's what this is all
about.
So let's do a quick run through
of the agenda.
I'll start with an introduction
to the tools.
This is a very good place to
start.
And then we'll talk about the
actual performance issues.
Around frame pacing, thread
priorities, thermal states, and
unnecessary GPU work.
Even though all these issues
seem unrelated, they will
compound and aggravate each
other.
So it's important to tackle them
all.
Let's start with the tools.
This is the most important
message.
You should profile early and do
it often.
Do not ship your game unless
you've profiled it.
And for that you will need to
know your tools.
Today, I will focus on two of
them.
First, we have instruments,
which is our main profiling
tool.
You will want to use it to
understand performance, latency,
and overall timing.
Second, we have the Metal Frame
Debugger, which is also very
powerful tool, which you will
want to use to debug your GPU
workload.
So where do we start?
This is a question we often get.
Well, this year we are making it
easier for you.
We are introducing a new
instruments template, which will
be a great starting point.
The Game Performance Template.
It is the combination of already
existing instruments such as
System Trace, Time Profiler, and
Metal System Trace.
We configured it for you so it
records all the CPU and GPU data
that is relevant for your game.
So you can make it smooth.
So how do we launch it?
How do we get there?
Well, just open Instruments and
you will see it right there in
the center.
After you choose it, you will be
able to configure it same as
every other template.
Once you start recording, you
will do so in windowed mode,
which will allow you to play
your game for as long as you
like, and only the last few
seconds of data will be
recorded.
And this is how this last few
seconds of data will look like.
There's a lot of information so
let's have a quick high-level
overview.
First, we have System Trace and
Time Profiler, which will give
you an overview of the system
load as well as your application
CPU usage.
For example, user interactive
mode will record all the active
threads at a given time.
In this case, the orange color
you can see means that there are
more runnable threads available
than CPU cores.
So there is some contingency.
These will offer a great view of
the system.
There's a couple of great talks
that talk about this instrument
in more depth.
Please follow-up on them.
Next on our list is Metal System
Trace, our GPU profiling tool.
It offers a great view of the
graphic stack.
All the way from the Metal
Framework down to the display.
In particular, we will want to
pay close attention to the GPU
[inaudible], which is split in
vertex, fragment, and compute if
your game uses it.
Notice as well that the display
track will be the starting point
of many of our investigations.
We will identify a long frame or
a starter and we will work it
all the way up from there.
So it's a very natural place to
start.
There is a lot of information
about the tool because it really
is a very powerful tool.
And I encourage you all to catch
up on it.
These are a couple sessions that
will provide you a great
starting point.
Okay.
So next on our list we'll have a
thread states view which we
introduced this year.
This view will show you the
state of every thread in your
game.
In this case, each color
represents a possible thread
state, such as preempted which
is represented in orange.
Or blocked which is represented
in red.
We designed this view
specifically with you, game
developers, in mind.
Because we know the threading
systems in modern games are very
complex.
And we hope this really will
help you.
Also we have a track for each
CPU core.
It will show the thread running
on that core as well, as well as
the priority of that thread,
which is color coded.
By using this, you will be able
to see at a glance how easy the
system really is.
That was a short but a quite
wide introduction to the tools.
So it's about time we move to
the actual performance issues.
The first one will be around
frame pacing.
And let's visualize it first.
For this we used the modified
version of the Fox [inaudible]
demo.
That will help us illustrate the
issue better.
Can you guess which game renders
faster?
Well, some of you may not have
guessed it.
The game on the left is trying
to render at 60 frames per
second.
But it can only achieve 40, so
it's inconsistent, and it seems
jittery.
The game on the right on the
other hand is targeting 30
frames per second, which can
consistently be achieved.
That's why it looks smoother.
But that's a bit
counterintuitive.
How, how come the game that
renders faster doesn't look
smoother?
Well, this issue's known as
micro stuttering or inconsistent
frame pace.
It occurs when the frame time is
higher than the display refresh
interval.
For example, our game may take
25 milliseconds to render or 40
frames per second.
And the display may refresh at
16.6 millisecond or 60 frames
per second.
Same as the video we've just
seen.
These will create some visual
inconsistencies.
So how did we get there?
What have we done to be in this
situation?
Well, we didn't do much really,
and that's kind of the whole
point of this.
After rendering the frame, we
requested the next drawable from
the display link.
And as soon as we got the
drawable, we finished the final
pass and presented it right
away.
We explicitly told the system to
present that drawable as soon as
possible, at the next refresh
interval.
After all, we are targeting 60
frames per second, right?
There's also another class of
problems that will cause micro
stuttering.
And some games are already
targeting lower frame rate.
But we have also identified many
of those games that are using
usleep on their main or random
thread.
This is a very bad practice in
iOS, so please don't do that and
just hang, hang here for the
next few minutes.
And I'll tell you the actual
correct way of doing this in
iOS.
Now, let's have a deeper look
into what happens in the system
for micro stuttering to be
visible.
In this case, we see here a
timeline of all the components
involved in rendering.
And we'll start rendering our
game normally.
Notice this is a three-point
buffer case, which is quite
common in iOS.
In this case, every drawable is
represented by a letter and a
color.
And also notice the premise
here.
Rendering to drawable V takes
longer than one display refresh
interval, which is the time
between vsyncs.
In this case, could be 25
millisecond to render to V and
16.6 millisecond in between
display refresh intervals.
So since that is the premise,
this means that we will need to
[inaudible] on the display for
the next interval to give time
so we can finish.
And we will do so.
And that during that interval,
we will actually B, B will
actually finish.
And we will be ready to present
it but notice that we have just
hid the issue here.
During this interval, we have
also finished rendering to C.
And we are ready to present it
right away.
So we will [inaudible] an
inconsistent frame pacing from
that moment onward.
We are stuck in this pattern.
Every other frame will be
inconsistent.
And the user will see micro
stuttering.
Now this may appear in different
shapes and forms in the real
world.
So what we'll do now is a quick
demo and I'll show you an
instruments trace of the Talos
Principle.
And we will use to see if we can
identify micro stuttering in the
real world case.
Okay.
So what we see here is the same
lot of information I've shown
you before.
This has been captured with the
Game Performance Template by
default.
Notice all the same instruments
I talked about here displayed on
the left.
And all the game threads here in
the middle.
In particular though, we are
looking now at micro stuttering.
So this quite intuitively will
bring us to look at the display
track because micro stuttering
by definition is frames
presented inconsistently.
In this case, we have the
display track here.
Notice as well that there are
some hints in the display track.
We [inaudible] and these are the
hints here.
They will show you when a
surface has been displayed for
longer than we would expect on a
normal rendering.
So maybe this is a great place
to start looking at it.
There's some clusters of them.
So let's zoom into one.
To zoom, we will hold the option
key and just drag the pointer to
the region of interest.
And in this case, if we keep
looking at the display track,
it's kind of evident already
that we are micro stuttering.
We can see that every display
has a different timing.
So in this case for example, we
have 50, 33, 16, back to 50, and
back to 33.
So when we see this pattern in
an instruments capture, it means
that we are micro stuttering and
we should correct it.
So let's just do that.
Back to the slides.
Okay.
We've just seen the problem, how
it occurs in the real world.
The pattern is basically the
same.
So how do we go about fixing it?
The best practice here is to
target the frame rate your game
can achieve.
So at the minimum frame duration
there is longer than the time it
takes to render.
For that, there's a bunch of
APIs that can help you.
For example, MT Drawable
addPresentedHandler will give
you a call back once that
drawable is presented.
So you can identify micro
stuttering as it is happening.
The other two APIs will help you
to actually fix the problem.
They will allow you to
explicitly control the frame
rating-- the frame pacing.
In this case we have present
afterMinimumDuration and present
atTime.
What we want to do here?
We set the minimum duration for
our frame longer than it takes
to render.
And we'll do just that.
Let's see how that looks.
Notice that when we start
rendering, we are already
consistent from the get-go.
Our frame spends on display more
time it takes to render.
Every frame will be consistent.
The user will see also being
consistent.
And that's great.
Also notice that there's a side
effect.
The frame rate will be lowered.
We went from 40 frames per
second to 30 frames per second.
So that also gave us some extra
frame time to play with.
So how did we do this?
How did we fix the-- the frame
pacing?
Well, really it's just a couple
of lines of code.
We have the same pattern as
before.
We rendered the scene.
We get the next drawable.
We do the final pass.
The only difference here is that
we specify a minimum duration
for our frame.
And present it with that minimum
duration.
That's all it takes.
That will allow us to set the
minimum duration for our frames.
And they will all be consistent.
And after doing so, you may be
thinking well, what about
maximum duration?
What about the concept of
priority of our work?
Or how long a thing could take?
Well, that's actually the next
issue on our list-- thread
priorities.
Let's visualize it first, same
as we did before.
Again, with the modified version
of the Fox II demo.
You may be thinking and you
would be right that there are
many things that could cause
stuttering such as this.
Maybe you are doing some
resource loading or [inaudible]
compilation.
Today we will focus on the more
fundamental but also incredibly
common type of stutter.
That caused by thread stalling.
If the work priority is not well
communicated to the system, your
game may have unexpected stalls.
iOS does plenty of stuff besides
rendering your game.
Thread priorities are used to
warranty the quality of service
in the whole system.
So if a thread does a lot of
work, its priority will be
lowered over time so other
threads can run instead.
That's the concept known as
priority decay.
Also you see on the slide behind
me priority inversion.
This is another class of
problems that manifests in a
very similar way.
In this case, priority inversion
occurs when the render thread
depends on the lower priority
worker thread from your same
engine in order to complete the
work.
Let's see how that looks like in
the same timeline as we've seen
before.
In this case, we start rendering
at 30 frames per second so we
are cool.
But then there is some
background work.
iOS does lots of stuff.
Maybe now it's checking the
email.
And the problem here is that the
[inaudible] thread is not well
configured.
You may get preempted by that
background work.
You may not finish scheduling
all the work onto the GPU.
And there is no such thing as
maximum duration for a frame.
So that could potentially go
along for hundreds of
milliseconds.
The user will see a stutter.
This is also the theory behind
it.
And in practice it shows in
different ways that follow the
same pattern.
So let's do another demo.
I'll show you another
instruments capture of the Talos
Principle.
That will show you how to
identify this problem.
So in this case, what you see
here is again a capture taken
with the Game Performance
Template.
But this time we have already
zoomed into the frame we are
interested in, which is this
very long frame.
It has a duration of 233
milliseconds.
So that's likely a very good
stutter that we should
investigate.
By-- by looking at it at a
glance, we can already tell that
the GPU does not seem to be
doing much.
It's idle during this time, so
this means that we are not
fitting it.
Now we can look at the CPU, of
course, and they seem to be
fairly busy down here.
Right?
They are really-- all of it
seems quite solid.
But notice what you see here is
the time profiler view of our
application.
And it does not seem to be
running.
Why is our game not running and
how come that causes a stutter?
Why?
Well, we can switch to the new
view I talked to you about, the
new thread states view.
To do so you will go into the
icon of your application and
click on that button here and
that would pull out the track
display.
And in this case, you can switch
to thread states.
And that will hope-- hopefully
already help you to see there is
something wrong here.
It is highlighted in orange, and
it's already telling us that the
thread has been preempted for
192 milliseconds.
So that's the actual problem
here.
A render thread is not running.
Something preempted it.
If you want to know more, you
can expand information at the
bottom, which will contain also
the thread narrative.
And by clicking at the preempted
thread, you will see here an
explanation of what's going on.
In this case, your render thread
was preempted at priority 26,
which is very low.
It's below background priority
because the App Store was
updating.
So that's something we do not
want.
We want to tell the system that
to our user, our game is more
important than an App Store
update at that particular
moment.
So let's go back to the slides
and see how can we do that?
So the best practice here is to
configure your render set.
We recommend the render set
priority to be fixed to 45.
Notice that the [inaudible] OS
and macOS priorities have
ascending values.
So priority 31 has higher
priority than priority four.
Also, we need to opt out of the
scheduler's quality of service
in order to prevent priority
decay which could lower our
priority as well.
Let's see how a well-configured
render thread looks like.
In this case, we configure just
how I told you.
We start rendering normally.
We also have some background
work going on.
Otherwise it wouldn't be fair.
And this background work could
be updating the App Store just
as we've seen in the demo.
But notice that vsync after
vsync, our render occurs
normally.
We are preempting the background
work of the CPUs so we can run
instead.
The user does not see the
stutter.
Your game can run at 30 solid
frames per second, even though
the system is under heavy load.
That is technically awesome, and
that's what this is all about.
So let's see how we make this
happen with a little bit of
code.
And it literally is a little bit
of code.
It is only like a couple lines.
In this case, it's just about
configuring the pthread
attributes before we can create
the pthread.
We need to opt out of quality of
service, set the priority to 45.
And that's it.
We can create the pthread with
those attributes, and it will
work just fine.
It is simple and technically
awesome.
What's not so simple though is
the next issue on our list.
That about dealing with multiple
thermal states.
The message is very clear.
Design for sustained performance
and deal with the occasional
thermal issues.
So let's see how we go about
that.
iOS devices give you access to
an unprecedented amount of
power.
But [inaudible] in a very small
form factor.
So more apps use more resources
on the device, the system may
begin enacting measures in order
to stay cool and responsive.
Also the user may have enabled a
low power mode condition, which
will have a very similar effect.
Okay, so the best practice
really is just to adjust your
workload to the system state.
You should monitor the system
and tune the workload
accordingly.
iOS has many APIs to help you
with that.
For example, use NSProcessInfo
thermalState to either query or
register for notification when
the device thermal state
changes.
You should also check for the
low power mode condition in a
similar fashion.
Also consider querying the GPU
start/GPU end time from the MTL
Command Buffer in order to
understand how system loads may
impact the GPU time.
Let's see how we do that with a
simple code example.
This comes straight from our
best practices.
A tip score is a very simple
switch statement when every case
corresponds to a thermal state.
We have nominal, fair, serious,
and critical.
And that is all very good.
So now we know that we are in a
thermal state and thse command's
telling us to do something about
it.
So how can, how can we actually
help the system stay cool?
Well, I can give you some
suggestions, but it's up to you
game developers to decide what
compromises to make in order to
help the system.
You know what's best for your
game to keep being awesome under
stress.
Some recommendations I'll give
you though are to target the
frame rate that can be sustained
for the entire game session.
For example, stay at 30 frames
per second if you cannot sustain
60 for ten minutes or more.
Doing the GPU work is also super
helpful.
For example, consider lowering
the resolution of intermediate
render targets, or simply find
the shadow maps, loading simpler
assets and even removing some of
the post-processes altogether.
Wherever, whatever fits your
game the best.
You should decide that one.
And this will bring us to the
next issue on our list.
That about dealing with
unnecessary GPU work.
For that, please welcome my
colleague Ohad on stage.
He's going to tell you all about
it.
[ Applause ]
>> Thank you, Guillem.
[ Applause ]
Hey, everyone.
My name is Ohad, and I'm a
member of the Game Technologies
Team here at Apple.
In the previous slides, Guillem
showed you how important it is
to adapt to the system.
Responding to states like low
power mode or the varying
thermal states will require you
to tune your GPU workload in
order to maintain consistent
frame rates throughout an entire
game session.
However, for many developers,
the GPU is a bit of a black box
hidden behind the curtains of a
game engine.
Today, we'll pull back those
curtains.
Wasted GPU time is a very common
problem and it's one that often
goes unnoticed.
But I want you to remember this.
Technically awesome games don't
only hit their GPU budget.
They're also good citizens to
the system, helping it to stay
cool and save power.
All the popular game engines
provide a great list of best
practices to follow.
We won't cover those.
Instead we'll focus on how to
tell if something is expensive
to render.
And as we've done with the CPU
several times today, the best
practice here is profile your
GPU as well.
The power of our GPUs can hide
many efficiencies in either
content or algorithms.
You will want to time your
workload, but also understand
each rendering technique that
you enable.
And only keep those that add
noticeably to the visual quality
of your games.
But how do you find these
inefficiencies?
How do you determine which parts
of your pipeline are flat-out
excessive?
This of course brings us back to
tools.
As always, your first stop
should be Instruments.
Here we're looking at Metal
System Trace.
It'll provide you accurate
timings for vertex, fragment,
and compute work being done.
But by measuring your GPU time,
you're only halfway there.
Next you want to really
understand what each of your
passes is doing.
And for this, we're added a new
tool to the Metal Frame Debugger
this year.
It's the Dependency graph.
The Dependency graph is a story
of a single frame.
It's made up of nodes and edges
and each one of these tell a
different part of the story.
Edges represent dependencies
between passes.
As you follow them from top to
bottom, you'll see where each
pass fits into your rendering
pipeline.
And how they work together to
create your frame.
Nodes on the other hand are the
story of a single pass.
They're made up of three main
components.
First, the title element will
give you the name of the pass.
Now I really want to emphasize
this.
Label everything.
It'll help you not only in the
Dependency viewer, but
throughout our entire suite of
tools.
Secondly, it'll allow you to
quickly tell what type of pass
you're looking at.
Render, blit, or compute.
Here from the icon we can see
that it's a render pass.
Next, you have a list of
statistics describing the work
being done in this pass.
And finally to the bottom, a
list of all the resources that
are being written to during this
pass, and each of these also
comed with a label, a thumbnail
allowing you to preview your
work, and a list of information
describing each one of those
resources specifically.
And all that together allows you
to really understand each of
your passes.
Okay, so now we know how to read
the graph.
Let's jump into a demo and see
how it all fits together.
Okay.
So I have the Fox II demo
running on my machine here.
It was built in Scene Kit, which
allowed me to add all sorts of
great effects.
As you can see, I have cascading
shadow maps, bloom, depth of
field, and all of it comes
together to create a beautifully
rendered scene.
Let's use the dependency viewer
to see how it all works.
First, we'll go to Xcode and
we'll capture a frame using the
capture GPU frame button in the
bottom.
And we'll select the main pass
on the left.
[Applause] And we'll also switch
to automatic mode which will
give us, will give us our
assistant on the right.
Now notice that the same pass
that I selected in the debug
navigator is also the one that's
showing-- is selected, and
focused in the main view.
And this is a two-way street.
So as we interact with the
graph, select, selecting
different passes or textures or
even buffers, both the navigator
on the left and the assistant on
the right will update to show
your selection.
So this is a really fantastic
way to navigate your frame.
Now as I zoom out, the first
thing you'll notice that the
statistics hide and the focus
goes away from the individual
passes onto the frame as a
whole.
And I can zoom out even more to
see a great bird's-eye view of
my entire frame.
Now the really cool thing to
notice here is that since
dependencies drive the
connectivity of the graph, each
logical piece of work is grouped
together in space.
So let's zoom in and see what I
mean.
Here I have a branch of work
that's creating my shadow maps.
On the left, I can see three
passes that are rendering the
shadows.
So this is really fantastic
because I'm not just getting the
story of my entire frame.
But there's another story in
between these two layers.
One of how each rendering
technique is built up.
And this is something that isn't
always entirely obvious when
you're using a game engine to
turn these on.
For instance, when my shadow
maps, I may not have known that
cas-- that each cascade would
require its own pass.
If I considered each one of
these individually, they
wouldn't really stand out.
But now I see that I have to
consider them as a group.
And that gives me the insights
that I need to make informed
decisions on any compromises
that I make while tuning my GPU
workload.
So that's the Dependency viewer.
I'll switch back to the slides.
And please help me welcome
Guillem back onto the stage for
his final thoughts.
Thank you.
[ Applause ]
>> Thank you.
That was an awesome demo
[inaudible].
Cool.
So Ohad had just shown us how a
frame looks like through
Dependency viewer.
And that is great for you to
inspect your GPU workload.
For example, oftentimes we may
go from a very small and simple
pipeline such as this one to a
very complex one with
post-process, multiple shadow
maps in HDR.
And all of these can be done by
adding, you know, a couple
properties to the common object
of your favorite game engine.
You see that the code complexity
of those changes is minimal.
But the-- but the rendering
complexity may have increased
tenfold, which will really bring
us back to the beginning right
where we started.
Profile.
It is very important that you
understand what your game does.
You spend tens of thousands of
hours developing a game, you
should consider spending some of
that time profiling as well.
Everything we have seen today
can be found within minutes.
The best part?
You don't need to know what
you're looking for.
Just record the stutter, get the
long frame, and work it all up--
all the way up from there.
It's that simple.
The tool will give you all the
information you need to identify
the problems.
But you will need to use the
tool.
And that is really the takeaway.
So we have seen a bunch of
common pitfalls followed by some
best practices.
All of these issues can be found
through profiling.
That's how we found them.
We analyzed a ton of games,
found the common issues, and
decided to put a talk together.
Now, if you have access to the
engine source code, make sure
that both thread pacing and
thread priorities are well
configured.
It's just a couple lines of code
really.
But regardless, your game should
always adapt to thermals and do
not submit unnecessary GPU work.
By making sure to follow all
these best practices, you too
will be developing technically
awesome games.
And that's what this is all
about.
For more information, there is
a-- a coming lab at 12 PM.
We will be there.
I'll be there and now we'll be
more than happy to ask any
questions you may have after
this session.
Or maybe you just want to sit
down and let us profile your
game.
Also there, there were two great
talks [inaudible] about Metal
for game developers and our
profiling tools.
Thank you very much, and enjoy
the rest of the day.
And have a great one.
[ Applause ]