WWDC2019 Session 611

Transcript

[ Music ]
[ Applause ]
>> Lionel Lemarie: Hi, folks.
Welcome to our Metal session.
I'm Lionel.
I'm in the GPU software
performance team here at Apple.
And with my friends Max and
Sarah, we'll be guiding you
through how to bring your OpenGL
app to Metal.
So last year we announced that
OpenGL, OpenGL ES, and OpenCL
are deprecated.
They will continue to be
supported in iOS 13 and macOS
Catalina, but now is the time to
move.
New projects should target Metal
from their inception.
But if you have an OpenGL app
that you want to port to Metal,
you've come to the right place.
So we first introduced Metal in
2014 as our new low-overhead,
high-efficiency,
high-performance GPU programming
API.
Over the past five years,
Apple's core frameworks have
adopted Metal and they're
getting really great results.
If your application is built on
top of layers like SpriteKit,
SceneKit, RealityKit, Core
Image, Core Animation, then
you're already using Metal.
We've also been working closely
with vendors on engines like
Unity, Unreal Engine 4, and
Lumberyard to really take
advantage of Metal.
If you're using one of these
engines, you're already up to
speed.
But if you've built your own
renderer, then Metal gives you a
lot of great benefits.
Metal combines the graphics of
OpenGL and compute of OpenCL
into a unified API.
It allows you to use multithread
rendering in your application.
Whenever there are CPU
operations that need to take
place that are expensive, we
made sure that they happen as
infrequently as possible to
reduce overhead during your
app's execution.
Metal's shading language is C++
based and all the shaders used
in your application can be
precompiled, making it easier to
have a wide variety of material
shaders, for example.
And last but not least, we have
a full suite of debugging and
optimization tools built right
into Xcode.
So once you have ported to
Metal, you have full support to
make your application even
better.
So let's dive in.
In this session we'll take a
look at the different steps
involved in migrating from GL
into Metal, and we'll do that by
comparing a typical GL app to a
Metal app.
As an overview, let's quickly
look through the steps of our GL
app.
First, you set up a window that
you'll use for rendering.
Then you create your resources
like buffers, textures,
samplers.
You implement all your shaders
written in GLSL.
Before you can render anything
in GL, you may have to create
certain object states, such as
GL programs, GL frame buffer
objects, vertex array objects.
So once you've initialized your
resources, the render loop
starts and you draw your frames.
For each frame, you start by
updating your resources, bind a
specific frame buffer, set the
graphic state, and make your
draw calls.
You repeat this process for each
frame buffer you have.
You may have shadow maps, a
lighting pass, some
post-processing.
So potentially quite a few
render passes.
And then finally, you present
the final rendered image.
It's pretty easy.
And as you can see, the Metal
flow looks very similar.
We updated some of the original
concepts and introduced a few
new things.
But overall, the flow is much
the same.
It's not a complete rewrite of
the engine; it works in the same
manner.
So we will reintroduce the new
concepts while drawing parallels
between GL and Metal, comparing
and contrasting the two API's to
help you successfully make the
transition.
When you're walking through any
tutorial on graphics, then the
first thing you learn is how to
create and draw to a window.
So let's start with the window
subsystem.
Both GL and Metal have this
concept, but it's accomplished a
little differently.
The application is required to
set up and present a drawing
surface.
And view and view delegates
manage the interface between the
API and the underlying window
system.
You might be using these
frameworks to manage your GL
views, so we have equivalent
frameworks in Metal.
NSOpenGLView and GLKView map to
MTKView.
And if you are using Core
Animation in your application
with the EAGLLayer, then there's
an equivalent CAMetalLayer.
As an example, let's say you are
using GLKView.
It has a single entry point with
the draw rate.
So you need it to check if the
resolution of your target is
unchanged since the last frame,
update your render target sizes
as needed, right from within the
render loop.
In MetalKit, it's a bit updated.
There's a separate function for
whenever the drawable needs to
change, such as when you're
rotating the screen or resizing
your window.
So you don't need to check if
your resources need to be
reallocated inside your draw
function; it's dedicated to
render code.
If you need additional
flexibility, we provide the
CAMetalLayer, which you use as
the backing layer for your view.
While the CAEAGLLayer defined
the properties of your drawable
such as its color format, the
CAMetalLayer allows you to set
up your drawable size, pixel
format, color space, and more.
Importantly, the CAMetalLayer
maintains a pool of textures and
you call next drawable to get
the drawable to render your
frame to.
It's an important concept that
we'll revisit in a short while
when it's time to present.
So now we have a window.
Next we're going to introduce
some new concepts in Metal.
So the command queues, command
buffers, command encoders.
These objects work together in
Metal to submit work to the GPU.
They're new because the
underlying glContexts managed
the submission for you.
GL is an implicit API, meaning
that there is no code that tells
GL when to schedule the work.
As a developer, you have very
little control about when
graphics work really happens,
such as when shaders are
compiled, when resource storage
is allocated, when validation
occurs, or when work is actually
submitted to the GPU.
The glContext is a big
[state] machine, and a
typical workflow would look like
this.
Your application creates a
glContext, sets it on the
thread, and then calls arbitrary
GL comments.
The comments are recorded by the
context under the hood and would
get executed at some point in
time.
Let's take a closer look to see
what actually goes on.
Say your application just send
GL these calls, a few state
changes, a few draw calls.
In a perfect scenario, the
context would translate this
into GPU comments to fill up an
internal buffer.
And then when it's full, it
would send it to the GPU.
If you insert a glFlush to
enforce execution, you know for
sure they'll be kicked off by
that point.
But actually, the GPU could
start execution at any point
beforehand.
Alright.
So, for example, if we change
one draw call introducing every
dependency, suddenly execution
is kicked off at that point and
you could experience massive
stalls.
So, again, when does work
actually get submitted?
It depends.
And that was one of the
downsides of OpenGL -- wasn't
consistent in performance.
Any one small change could force
you down a bad path.
Metal, on the other hand, is an
explicit API, meaning the
application gets to decide
exactly what work goes to the
GPU and when.
Metal splits the concept of a
glContext into a collection of
internal working objects.
The first object an app creates
is a Metal device object, which
is just an abstract
representation of the GPU.
Then it creates a key object
called a Metal command queue.
The Metal command queue
maintains the order of commands
sent to the GPU by allocating
command buffers to fill.
And a command buffer is simply a
list of GPU commands your app
will fill to send to the GPU for
execution.
So we saw this command buffer
concept in GL -- in the GL
example we just studied.
Let's work with that command
buffer from this point on.
But an app doesn't write the
commands directly to the command
buffer; instead, it creates a
Metal command encoder.
Let's look at the main three
types of encoders.
First one we'll use will be
filled with blit commands that
are used to copy resources
around.
The command encoder translates
API codes into GPU instructions
and then writes them to the
command buffer.
After a series of commands have
been encoded, for example,
series of blits to copy
resources, then your app will
end encoding, which releases the
encoder object.
Additionally, Metal supports a
compute encoder for parallel
work that you would normally
have done in OpenCL before.
You enqueue a number of kernels
that get written to the command
buffer and you run the
encoder to release it.
Lastly, let's use a render
encoder for your familiar
rendering commands.
You enqueue your state changes
and your draw calls and end the
encoder.
So here we have a command buffer
full of different workloads, but
the GPU hasn't done any work
yet.
Metal has created the objects
and encoded commands all within
the CPU.
It's only after your application
has finished encoding comments
and explicitly committed the
command buffer that the GPU
begins to work and executes
those commands.
So now that we have encoded
commands, let's now compare and
contrast GL and Metal's command
submissions.
In GL there's no direct control
of when work gets submitted to
the GPU -- you rely on big
hammers like glFlush and
glFinish to ensure code
execution; glFlush submits the
commands and poses the CPU
threads until they're scheduled,
and glFinish poses the CPU
thread until the GPU is
completely finished.
Work can still get submitted at
any time before these commands
happen, introducing potential
stalls and slowdowns.
And Metal has equivalent
versions of these functions; you
can still explicitly commit and
wait for a command buffer to be
scheduled or completed.
But these wait commands are not
recommended unless you
absolutely need them.
Instead, we suggest that you
simply commit your command
buffer and then add a callback
so that your application can be
notified later when the command
buffer has been completed on the
GPU.
This frees your CPU to continue
doing other work.
So now that we have reviewed
command queue, command buffer,
command encoder, let's move on
and talk about resource
creation.
There are three main types of
resources that any graphic app
is likely to use: Buffers,
textures, and samplers.
Let's take a look at buffers
first.
In GL, you have a buffer object
and the memory associated with
it.
The API codes you use can modify
the object state, the memory, or
both together.
So here, for example,
glBufferData can be used to
modify both the memory and the
state of the object.
The buffer dimensions can be
modified again later by calling
glBufferData, in which case the
old object and its contents will
be discarded internally by
OpenGL.
In Metal, the API to create and
fill a buffer looks very
similar, but the main difference
lies in the fact that the
produced subject is immutable.
If at any point you need to
resize the buffer, you simply
need to create a new one and
discard the old one.
Both OpenGL and Metal have ways
to indicate how you intend to
use an object; however, in GL
the enum is simply a usage hint
about how the data in a buffer
object would be accessed.
The driver uses that hint to
decide where to base the locate
memory for the buffer, but
there's no direct control over
storage.
OpenGL ultimately decides where
to store the objects.
In Metal, the API allows you to
specify a storage mode which
maps to a specific memory
allocation behavior.
Metal gives you control, since
you know best how your objects
are going to be used.
It's an important concept in an
object creation, so we'll come
back to it in a short moment
right after we look at texture
API's.
In GL, each texture has an
internal sampler object, an
app's commonly set up sampling
mode through that sampler.
But you also have the option to
create a separate sampler object
outside of your texture.
Here's an example for creating
and binding your texture,
setting up your sampler, and
then finally filling in the
data.
One thing worth mentioning is
that GL has a lot of API calls
to create initialized textures
with data.
It also has what are called
named resource versions of the
same API.
There's even more API's when it
comes to managing samplers.
The list just goes on and on.
One of the design goals with
Metal was to give a simpler API
that would maintain all of the
flexibility.
So in Metal, texture and sampler
objects are always separate and
immutable after creation.
To create a texture, we create a
descriptor, set various
properties to define texture
dimensions like pixelFormat and
sizes, amongst others.
Again, an important property we
said is the storage mode to
specify where in memory to store
the texture.
And finally, we use that
descriptor to create an
immutable object.
In a similar fashion, you start
with a sampler descriptor, set
its properties, and create the
immutable sampler object.
It's pretty easy.
To fill a texture's image data,
we calculate the bytes per row.
And just like we did in OpenGL,
we specify the region to load.
Then we call the textures
replaceRegion method, which
copies the data into the texture
from a pointer we specify.
Once you load your first
texture, you're likely to
observe that it's upside down.
That's because in Metal the
texture coordinates are flipped on the
y-axis compared to GL.
And it's also worth mentioning
that Metal API's don't perform
any pixelFormat transformation
under the hood.
So you need to upload your
textures in the exact format
that you intend to use.
Now let's get back to storage
modes.
As mentioned, in GL the driver
has to make a best guess on how
you wanted to use your
resources.
As a developer, you can provide
hints in some cases, like when
you created a buffer or by
creating render buffer objects
for frame buffer attachments.
But in all cases, these were
still hints and the
implementation details are
hidden from you.
A few minutes ago, we briefly
saw the additional storage mode
property Metal that you can set
on a texture descriptor and also
when creating a buffer.
Let's look at the main use cases
for those.
Simplest option is to use shared
storage mode, which gives both
the CPU and GPU access to the
resource.
For buffers, this means you get
to point here to the memory
backing of the object.
For textures on iOS, this means
you can call some easy-to-use
functions to set and retrieve
image data.
You can also use a private
storage mode, which gives the
GPU exclusive access to the
data.
It allows Metal to apply some
optimizations that it wouldn't
normally have been able to use
if the CPU had access to it.
But only the GPU can directly
fill the contents of the data.
So you can indirectly fill the
data from the CPU by using a
blitEncoder from a second
intermediate resource that uses
shared storage.
On the voices with dedicated
video memory, setting the
resource to use private storage
allocates it in video memory
only, single copy.
On macOS there's a managed
storage mode which allows both
the CPU and GPU to access an
object's data.
And on systems with dedicated
video memory, Metal may have to
create a second mirrored memory
backing for efficient access by
both processes.
So because of this, explicit
codes are necessary to ensure
that your data is synchronized
for CPU and GPU access, for
example, using didModifyRange.
So to recap, we reviewed some of
the typical uses for each mode.
On macOS you would use the
private storage mode for static
assets and your render targets.
Your small dynamic buffers could
use the shared storage mode.
And your larger buffers with
small updates would use the
managed storage mode.
On iOS, your static data and
rendering targets can use the
private storage mode.
And since our devices use
unified memory, dynamic data of
any size can use the shared
storage mode and still get great
performance.
Next, let's talk about
developing shaders for your
graphics application and what
API's you use to work with
shaders.
When it comes to shader
compilation in GL, you have to
create a shader object, replace
the ShaderSource in the object,
make just in time compilation,
and verify that the compilation
succeeded.
And while this workflow has its
benefits, your application had
to pay the performance costs of
compiling all your shaders every
time.
One of the key ways in which
Metal achieves its efficiency is
by doing work earlier and less
frequently.
At build time, Xcode will
compile all the Metal
ShaderSource files into a
default Metal library file and
place it in your app bundle for
retrieval at runtime.
So this removes the need to
compile a lot of it at runtime
and cuts the compilation time
when your application runs in
half.
All you need to do is create a
Metal library from a file
bundled with your application
and fetch the shader function
from it.
In GL you use GLSL, which is
based on the C programming
language.
The Metal shading language or
MSL is based on C++.
So it should look reasonably
familiar to most GL developers.
Its foundation in C++ means that
you can create classes,
templates, and stretches.
You can define enums and
namespaces.
And like GLSL, there are
built-in vector and matrix
types, numerous built-in
functions and operations come in
and use for graphics.
And there are classes to operate
on textures that specify sampler
state.
Like Metal, MSL is also unified
for graphics and compute.
And finally, since shaders are
pre-compiled, Xcode is able to
give you errors, warning, and
guidance to help you debug at
build time.
So let's take a look at actual
code for MSL and compare it with
GLSL.
We're going to walk through a
simple vertex shader, GLSL on
top, MSL on the bottom.
Let's start defining our
shaders.
These are the prototypes.
In GLSL, void main.
There's nothing in the shader
that specifies the shader stage.
It's purely determined by the
shader type passed into the
glCreateShader call.
In MSL the shader stage is
explicitly specified in the
shader code.
Here the vertex qualifier
indicates that it will be
executed for each vertex
generating perfect examples.
In GLSL, every shader entry
point has to be called main and
accept and return void.
In MSL each entry point has a
distinct name.
And when you're building shaders
with Xcode, the compiler can
resolve include statement in the
preprocessing stage the same it
would do for regular C++ code.
At runtime you can query
functions by their distinct name
from the precompiled Metal
library.
Then let's talk about inputs.
Because each entry point in GLSL
is a main function with no
argument, all of the inputs are
passed as global arguments.
This applies to both vertex
attributes and uniform
variables.
In Metal all the inputs to the
shaded stage are arguments to
the entry function.
The double brackets declare C++
attributes.
We'll look at them in a second.
One of the inputs here that we
have is a model view projection
matrix.
In OpenGL, your application had
to be aware of the GLSL names
within the C++ code in order to
bind data to these variables.
And that made shader development
error-prone.
In MSL the uniform binding
indices are explicitly
controlled by the developer
within the shader, so an
application can bind directly to
a specific slot.
In the example here, slot number
one.
The keyword constant here
indicates that the intention for
the model view projection is to
be uniform for all vertices.
The other input to the shader is
a set of vertex attributes.
In GLSL you typically use
separate attribute inputs.
The main difference here is that
MSL uses a structure of your own
design.
The staging keywords suggest
that each invocation of the
shader will receive its own
arguments.
Once you have all the inputs to
the shaders set up, you can
actually perform all the
calculations.
Then for the outputs, in GLSL
the output is split between
varying attributes like
glTexCoord and predefined
variables, in this case gl
Position.
In MSL, the vertex shader output
is combined into your own
structure.
So we've used a vertex and
vertex output structure.
Let's scroll up in the MSL code
to see what they actually look
like.
As mentioned previously, GLSL
defines the input vertex
attributes separately, and Metal
allows you to define them within
a structure.
In MSL there are a few special
keywords for vertex shader
input.
We mark each structure member
with an attribute keyword and
assign an attribute index to it.
Similar to GLSL, these indices
are used in the Metal API to
assign the vertex buffer streams
to your vertex attributes.
And GLSL predefines special
keywords like GL position to
indicate which variable contains
vertex coordinates that have
been transformed with the model
view projection matrix.
Similarly, for the vertex
output, a structure in MSL, the
special keyword position signals
that the vertex shader output
position is stored in that
structure member.
Similar to GLSL vector type, MSL
defines a number of simd types
via the simd.h header that can
be shared between your CPU and
GPU code.
But there's a few things you
need to remember about them.
Vector and matrix types in your
buffers are aligned to 16 bytes
or 8 bytes for half precision.
So they're not necessarily
packed, for example, a float3
has a size of 12 bytes but is
aligned to 16 bytes.
This is to ensure that the data
is aligned for optimal CPU and
GPU access.
There are specific backed
formats you can use if you need
them.
But you will need to unpack them
in the shader before using them.
So we've just reviewed the main
differences between GLSL and
MSL.
And to make this transition
smooth and easy, my colleague
Max will show you a really cool
tool to help you breeze through
it.
Thank you.
[ Applause ]
>> Good evening.
Metal, it's not just an API and
a shading language, it is also a
powerful collection of tools.
My name is Max, and I'm going to
minimize your hassle porting to
Metal.
Let's take a look at this scene.
This is the very first draw call
from an old OpenGL demo that we
here at Apple also ported to
Metal.
It's drawing a model of a temple
and a tree, both illuminated by
a global light source.
Let's port the fragment shader
together.
So the very first thing I did, I
just copy and pasted my entire
old OpenGL code directly into my
Metal shader file.
Based on this, I've already
created my input structure, as
well as my function prototype.
Let's begin.
So what we are going to do is
just copy and paste the contents
of the main function directly
into our Metal function.
And here we see the very first
powerful thing about Metal.
Because the shader's
precompiled, we are getting
errors instantly.
Let's take a closer look.
Of course, the building vector
types have different names now.
So vec2 becomes a float2; the
vec3 becomes the float3; and the
vec4 becomes a float4.
So we quickly fix that.
The next error we are going to
see is that like all of our
input structures -- all of our
global variables are now coming
from our input structure.
And because I just used a
similar naming scheme, this is
also very easy.
And, of course, we have to do
the exact same thing for our
uniforms.
The next error is a little bit
more complex.
Sampling in Metal is different,
so let's take a look.
We are going to start from
scratch.
So we directly can call a sample
function on our colorMap.
And here we can see how powerful
it is to have full auto
completion.
So this function expects us to
put in a sampler and a texture
coordinate.
We already have the texture
coordinate.
We could pass in the sampler as
an argument to our function or,
conveniently in Metal, we can
just declare one in code like
this.
We need to do the exact same
thing for our normalMap.
The last error that we are
seeing is that we are writing
into, like, one of many OpenGL
magic variables.
Instead, we are just going to
return our final computed color.
We can also see that all the
other functions, like normalize,
dot product, and my favorite
function max, are still exactly
the same.
Our shader now compiled
successfully.
Let's run it.
Something went wrong.
In OpenGL when you're
experiencing an error with your
shader, what you usually do is,
like, you look at your source
code, you look at your output,
and you think really hard.
We're just going to use the
shader debugger instead.
Clicking on the little camera
icon in the debug area will
capture a GPU trace.
This is a recording of every
Metal API call we made.
And we can now navigate to our
draw calls.
Here we are drawing the tree.
And here we are drawing the
temple.
Let me long press on the stairs
of the temple to bring up the
pixel inspector, which allows us
to start the shader debugger.
What we are seeing here now is
the values per line for the code
that we have ported together and
for the pixel we have just
selected.
Let's take a look at our
colorMap first.
We can see this looks like a
reasonable texture.
And we can also see that our
stairs are, like, in the upper
half of this texture; however,
if we were taking a look at our
texture coordinate, we can see
that we are sampling from the
lower half.
Let me quickly verify if this is
the case.
What we are going to do is to
invert the y coordinate of our
texture.
We can now update our shaders --
looks reasonable -- and we can
continue our execution.
There, much better.
This is a pretty common error
that you will experience when
porting from OpenGL to Metal.
And, of course, the real fix is
you go into your texture loading
code and make sure your texture
is loaded at the right origin so
you don't have to do this fix in
every shader.
However, the combination of a
feature-rich editor and mighty
debugging tools will also help
you port in your games to Metal
finally.
Thank you very much.
My colleague Sarah will now
guide you through the rest of
the slides.
[ Applause ]
>> Sarah Clawson: Thanks, Max.
Hi, I'm Sarah Clawson.
And I'm here to take you through
the rest of the port from GL to
Metal.
So far in the life of a graphics
app, we've gone through a lot of
setup.
We've got a window to render to,
a way to get your commands to
the GPU, and a set of resources
and shaders ready to go.
Next up, we're going to talk
about setting up the state for
your render loop.
OpenGL has several key concepts
when it comes to state
management.
The vertex array object defines
both the vertex attribute
layout, as well as the vertex
buffers.
The program is a link
combination of vertex and
fragment shaders.
And the framebuffer is a set of
color and depth stencil
attachments that your
application intends to render
to.
These state objects are created
during initialization and are
used throughout your frames.
Let's walk through an example to
show how OpenGL manages state.
Here we have a sample render
loop where an OpenGL application
binds a framebuffer, sets a
program, and then makes other
state modifications, like
enabling depth, or face culling,
or changing the colorMap before
making a draw call.
If you look at this same API
trace from OpenGL's perspective,
it has to track all these
changes on each API call.
And then when a draw call
happens, it has to stop and
validate to be sure that the
previous changes to primitive
assembly, depth state,
rasterizer, and programmable
stages are all compatible with
each other.
This validation can be super
expensive.
And while OpenGL does try to
minimize its negative impact,
there's limited opportunity to
do so.
It is worth noting that the open
OpenGL state objects were ahead
of the curve when they were
first introduced.
Framebuffer objects combine
attached render targets,
programs linked fragment and
vertex shaders together, and
vertex array objects were larger
objects combining some of the
vertex attribute API's and
vertex buffer setup.
But even with all these changes,
although they yielded positive
results, OpenGL still has to
validate many things on a draw
call, such as will the -- can
the ColorMask help optimize the
fragment shader?
Is the fragment shader output
compatible with the attached
frame buffer?
Is the vertex layout compatible
with the bound program?
Or are the attached render
targets blendable?
So as we redesigned the graphic
state management for Metal, we
took the program shaders
combined with the vertex input
layouts from the VertexArray
objects and added the
information about attachment
pixelFormat and blend state, and
we combined them into one object
called the PipelineDescriptor.
This structure describes all the
relevant states in the graphics
pipeline.
To set up the descriptor, first
you initialize it.
And then you set all the state
we just talked about, like
vertex and fragment shaders,
vertex information, pixel
formats, and blend state.
And then you take that
descriptor and you create what
is called a pipeline state
object or PSO.
This immutable object fully
describes the render state.
And what's great about it is
that you create it once, have it
validated for correctness, and
then use it throughout your
program.
In a similar way, we combined
all the depth and
stencil-related settings into a
depth/stencil state descriptor.
And, again, it is a collection
of all the depth/stencil state.
And you take this descriptor and
you create what's called a
depth/stensil state object.
This object is also immutable
and used throughout your
program.
So the render loop we were
looking at in OpenGL now looks
like this in Metal.
With all of the prevalidated
state objects, there's no longer
any state validation or
tracking.
Let's look through the
comparison.
In Metal, the render encoder is
the start of a render pass,
similar to binding your frame
buffer.
Now that your depth state is
prebaked into an object, you
simply set it on the
renderEncoder.
The PipelineState object
represents and combination of
program shaders, VertexArray
properties, and a pixelFormat.
And it's also set on the
renderEncoder.
And now the renderEncoder
manages your rasterizer state
directly.
And it's important to note here
that there is still flexibility
in your pipeline, as not
everything is prebaked into your
PipelineState object.
Here's the list of state that
we've just been discussing that
you prebake into your PSO: State
like vertex and fragment
functions and pixel formats,
etc.
On the other hand, here's all
the state that you still set
while drawing -- state like
primitive culling mode and
direction, fill mode.
Scissor and viewport areas are
still set just like in OpenGL.
And ultimately, the draw calls
remain the same.
The main difference here is that
instead of enabling new state,
which could incur hidden
validation costs, you simply
swap out a new PipelineState
object that had blending enabled
in its descriptor.
I want to discuss one more
possible optimization that you
may have used in OpenGL in order
to hide certain expensive
operations.
As an OpenGL developer, you may
have seen that your render loop
has an unexpected hiccup on the
first draw call after making a
bunch of state changes.
And if this is the case, you
probably use an optimization to
hide that called shader
pre-warming.
In shader pre-warming, an
application uses dummy draw
calls for the most common GL
programs in order to have OpenGL
create all the state that's
necessary ahead of time.
If you were doing this in your
engine already, then it's going
to be very easy for you to
replace it with PSO creation.
Now shader pre-warming in Metal
is accomplished through creating
separate PSO objects with
different state enabled.
First, you create your
descriptor, and then you set all
of the state up until the first
draw call and create your first
PipelineState object.
Then you can take that same
descriptor, change a bit of
state on it -- like here we're
enabling blending -- and you
create a second PipelineState
object.
Both of these are prevalidated
so that during draw time you can
just swap them out between draw
calls.
Hopefully if you're porting from
OpenGL to Metal, this is a
straightforward change.
Now, as we conclude the setup
stage of our application, I'd
like to bring up one of the main
benefits of porting your app
from OpenGL to Metal, and it is
that it will start doing
expensive operations less often.
In OpenGL, your application
would have to wait until draw
time in order to do things like
compile and link shaders or
validate states, which means
that these expensive operations
happen many times per frame.
Once you port your app to Metal,
your application moves these
operations to different stages
of its lifetime.
With precompiled shaders, shader
compilation has moved out of
initialization and into build
time so it's only done once.
Then with PSO's, state
definition is moved to content
loading.
So that leaves your draw time
free to actually make draw
calls.
So now that we've completed the
setup stage of your application,
let's talk about using all these
resources, shaders, and objects
to render frames.
In order to draw a single frame,
your application needs to first
update textures and buffers,
then establish a render target
to render to, and then make
several render passes before
finally presenting your work.
Let's talk about updating
resources.
Typically, at least some
resources have to be updated
continuously throughout your
render loop.
Such examples are shader
constants, vertex and index
buffers, and textures.
And these modifications can be
accomplished between frames
through synchronization between
the GPU and the CPU.
A typical GL resource update can
be any combination of the
following calls: A buffer can be
updated by the CPU; or you can
update a buffer through the GPU
via buffer-to-buffer copy.
Similarly, a texture can be
updated by the CPU or it can be
updated via texture-to-texture
copy on the GPU.
At a glance, Metal offers
similar functionality.
But as Lionel mentioned earlier,
the containers for buffers and
textures are immutable and are
created during initialization;
however, their contents can be
modified through any combination
of the following.
A buffer with shared or managed
storage mode can be updated
through its contents property on
the CPU.
And on the GPU, the blitEncoder
is in charge of doing all data
copying.
And so you can update a buffer
from the GPU via the
copyFromBuffer methods on the
blitEncoder.
Similarly, a texture with shared
or managed storage mode can be
updated on the CPU through its
replaceRegion method.
Or on the GPU, you can update a
texture through the
copyFromTexture methods on the
blitEncoder.
Note that storage mode matters
here when it comes to these
updates as only buffers and
textures with shared or managed
storage modes can be updated by
the CPU.
OpenGL managed the
synchronization between the GPU
and CPU for you, though
sometimes at exorbitant costs to
your application as it waited
for one or the other to be done.
In Metal, because you control
how the memory is stored, you
also control how and when the
data is synchronized.
And this is true for both
buffers and textures.
If you port your GL app to Metal
and only use a single buffer for
your resource updates, the flow
will look like this.
First, your CPU will update your
resources during the setup of a
render pass.
And then once complete, the
buffer will be available for the
GPU to consume during the
execution of that render pass.
However, while the GPU is
reading from this buffer, the
CPU may begin setting up for the
following render pass and will
need to update the same buffer,
which is a clear race condition.
So let's look at one approach to
solve this problem.
A simple solution would be to
commit this resource to the GPU
with the waitUntilCompleted call
on the commandBuffer it is used
in.
As we discussed earlier, this is
similar to glFinish and it
places a semaphore on all CPU
work until the GPU is done
executing the render pass that
uses that buffer.
After the execution is
completed, a call back is
received from the GPU, and this
way you can ensure that your
single buffer will not be
stomped on by the CPU or the
GPU.
However, as you can see, the CPU
is idle while the GPU is
executing, and the GPU is
starved waiting for the CPU to
commit work.
So while this can be helpful for
you at the beginning while
you're working out these race
conditions, it is not
recommended to use
waitUntilCompleted as it
introduces latency into your
program.
Instead, an efficient way to
synchronize your updates is to
use two or more buffers
depending on your application's
needs so that the CPU can write
to one while the GPU reads from
another.
Let's look at a simple triple
buffering example.
So here we start with the first
resource ready to go for the --
to be consumed by the GPU.
But instead of
waitUntilCompleted, we just add
a completion handler so that
once the corresponding frame is
finished on the GPU, it can let
the CPU know that it is done.
But now we don't have to wait
for it to be done.
While the GPU is executing, with
triple buffering the CPU can
jump two updates ahead because
it's in different buffers.
So here we are with the -- with
the frame done executing on the
GPU, and this is where the
completion handler comes in.
It notifies that GPU work is
done and then returns the buffer
to the buffer pool so that it
can be used by the CPU in the
next frame while the GPU
continues execution.
I think most developers will
find that they'll need to
implement triple buffering to
achieve optimal performance.
As for implementation, for
triple buffering, of course, you
need to start with a queue of
three buffers.
You also need to initialize your
frameBoundarySemaphore with a
starting value of three.
And this semaphore will be
signaled at each frame boundary
when the GPU is done executing,
letting the CPU know that it is
safe to override that buffer.
And finally, we need to
initialize the buffer index to
point at the current frame's
buffer.
Inside the render loop, before
we write to a buffer, we need to
ensure that the GPU is
completely done executing the
corresponding frame.
So at the beginning of each
render pass, we need to wait on
our frameBoundarySemaphore.
And then once the signal has
been received, we know that it's
safe to grab its buffer and
reuse it for new frame data.
And now we encode commands and
bind this resource to the GPU to
be used in the next frame.
But before we commit it, we have
to add our completion handler to
the commandBuffer and then we
commit it.
And once the GPU has finished
executing, our completion
handler will signal our frame
semaphore, allowing the CPU to
know that it is done and it can
reuse the buffer for the next
frame's encoding.
And this is a simple triple
buffer implementation that you
can adopt for any dynamic
resource updates.
Okay.
So now we have our resources
updated, so let's talk about
render targets.
In OpenGL, framebuffer objects
are the destination for
rendering commands.
An FBO collects a number of
textures and render buffer
objects under one umbrella and
facilitates rendering into them.
The state of a framebuffer is
mutable, and the render pass is
loosely outlined by binding a
framebuffer and ultimately
swapping them for display.
This is a typical OpenGL
workflow with framebuffers.
During the application's
initialization stage, a
framebuffer is created.
And then you make it current by
binding it.
And then you attach resources
like textures and then check the
framebuffer status to make sure
it's valid to use.
During draw time, you make a
framebuffer current by binding
it, which is implicit start to a
render pass.
And then you have to clear it
before you make any draw calls
to it.
And then at the end you can
signal that certain attachments
can be discarded to let OpenGL
know that it's not necessary to
store these contents into
memory.
These discard events can serve
as hints to end the render pass,
but it's not a guarantee.
In Metal, the render command
encoder is the destination for
rendering commands.
A render command encoder is
created from a render pass
descriptor, which, similar to an
FBO, collects a number of
rendering destinations for a
render pass and facilitates
rendering into them.
A render command encoder is
directly responsible for
generating the hardware commands
for your GPU, and a render pass
is explicitly delineated by the
starting and ending of encoders.
Here's a render pass in Metal.
You start by creating your
renderPassDescriptor.
And the renderPassDescriptor
describes all the attached
resources and also specifies the
operations that happen at the
beginning and end of a render
pass -- these are called load
and store actions.
In contrast to GL, in Metal you
do not clear a resource
directly; instead, you specify a
load action to clear it and also
the color.
Here, it is black.
The store action here is don't
care, which is similar to GL
discard framebuffer in our GL
example.
If you want to store the results
to memory, you would use the
store action here instead.
And at render time, you use your
descriptor to create your
encoder so the state is set.
You make all your draw calls and
then explicitly end encoding.
But before discarding
framebuffers or ending encoding,
let's actually draw something.
A series of render commands is
often referred to as a render
pass.
Inside the render pass, you set
up state and draw call inputs
like textures and buffers and
then issue your draw commands.
This is a typical OpenGL draw
sequence.
A well-behaved OpenGL app tries
to set all of its state ahead of
time, and then it binds its
target and a GL program to link
shaders.
Then it will bind resources such
as vertex buffers, uniforms, and
textures to different stages in
the program.
And finally, it will draw.
As we've discussed a few moments
ago, OpenGL state changes can
cause hidden validation checks.
And if you're already grouping
your state changes together in
OpenGL to avoid these
performance hits, then you'll
get the most out of Metal's
pre-validated state objects.
In Metal, because validation
only happens when you create
your PipelineState object and
because shaders are precompiled,
your render loop becomes much
smaller.
But for a programmer, there's
not that many changes to do.
Here is the same code that we
looked at in OpenGL but now in
Metal.
You start with your render
command encoder, which is an
equivalent to setting the GL
framebuffer.
And then you set your prebuilt
PipelineState object, which is
equivalent to GL use program.
And after that, we assign
resources for our Metal program,
starting with the VertexBuffer
and uniforms.
And you can note here that you
have to set your uniforms per
shader stage instead of like in
GL you set it for the GL
program.
And here, because we ported it
directly from OpenGL, we're
sending the same set of
uniforms; but in Metal you can
send different ones if you want.
And then you set your textures
and issue the draw call.
And finally, once you've done
all the draw calls, you can end
your render pass.
And now, once the work is
submitted, there's still the
matter of presenting.
As the GPU renders the scene, it
writes out to a framebuffer to
display.
In OpenGL, in order to present a
rendered frame, when you return
from drawInRect, the context
calls the presetRenderBuffer for
you.
Metal, on the other hand,
accomplishes this directly
through Core Animations pool of
drawables.
And drawables are textures for
on-screen display.
And you can encode a render pass
to encode to drawables.
You fetch the current drawable,
and then after your render loop
tell the command buffer to
present it.
Remember our code from the very,
very beginning of this talk when
we were talking about the
windows subsystem.
Here we're going to dive into
glkView and drawInMTKView to see
how you can present what you've
rendered.
So here it is.
In glkView you bind your
framebuffer; perform your render
commands; and then when you
return from drawInRect, the
present is managed for you.
In Metal it's much the same: You
create your commandBuffer,
perform your render commands by
creating ending encoders, and
then the one extra step you have
to take is to call
presentDrawable yourself before
finally committing your
commandBuffer.
And if your render loop is very
simple with a single encoder,
then this is all you have to do;
however, if you do have a more
complex app, you may want to
check out the talk we have on
delivering optimized Metal apps
and games for how to handle your
drawables.
And that concludes our frame.
So we've shown how the window
subsystem can be migrated
easily.
We've gone over the resource
creation steps.
We've ported our shaders and
used the great tools to quickly
find issues.
We created our render command
queue, command buffers, and
command encoders to set up our
render passes.
And we created our prevalidated
state objects.
Then to render each frame, we
used triple buffering to update
our resources.
We used the render command
encoders for our command -- for
our render passes where we drew
our geometry before ultimately
presenting the rendered frame.
We've walked through the life of
a graphics app and showed how
Metal is a natural evolution.
Many of OpenGL's established
concepts have migrated into
Metal to work alongside new
concepts that we've added to
address specific problems raised
in the graphics community.
If you can take one thing away
from this session, we hope it's
that porting your applications
from OpenGL to Metal is not
intimidating and that your
application will actually
benefit from it.
But if you have room for two
things, it's that Metal also
offers an awesome set of tools
to enhance your developing
experience.
Max already demoed Xcode's
built-in frame capture and
shader debugger to offer deeper
insight into subtle issues
within your code.
But Xcode also offers the new
GPU memory viewer to understand
and optimize how to use memory
in your application.
In instruments we have a game
performance template that
includes the Metal system trace
to visualize submission issues
which might cause frame drops.
And new this year we also have
support for Metal in the
simulator.
Yay, you can get excited.
[laughs]
New with Xcode 11 on macOS
Catalina, we have full hardware
acceleration to run your games
and apps for iOS and tvOS
simulator using Metal.
The simulator supports the
MTLGPUFamilyApple2 feature set
and should meet the majority of
your needs to run all of your
apps and games in all available
screen resolutions.
For a deeper dive into the
simulator and how it achieves
hardware acceleration, please
check out the simulator talk
tomorrow morning.
If you're looking to solve a
specific issue with Metal, you
can see our many, many sessions
online.
For more information, you can
check out our documentation on
our website or you can visit us
in the Metal lab tomorrow
morning.
And with that, thank you all for
coming, and I hope to see you at
the bash.
[ Applause ]