Transcript
>> My name is Dan Omachi.
I work for the Apples GPU Software Team
on the OpenGL Framework for Mac OS X,
as well as the OpenGL ES Framework on iOS 4.
I hope you guys are here today, because you'd really
like to add some stunning visual
effects to your Mac or iOS applications.
Perhaps you thought of adding some shadows,
reflections, or refractions into your app.
Maybe you've heard of some advanced techniques,
such as parallax occlusion mapping,
tone mapping, or deferred shading.
I'm not going to be talking so much about those advanced
techniques today, however, I am going to be talking
about some essential design practices that you'll need
to consider in your applications if you want to add
such advanced techniques, or invent
your own techniques using OpenGL.
So what is OpenGL?
So many of you know OpenGL as a 3D graphics API.
The OpenGL specification, which is
the definitive document on OpenGL,
actually has what I think is a
slightly more accurate definition.
OpenGL is a software interface to graphics hardware.
In other words, it's an interface
with a graphics processing unit.
Every device that ships with iOS 4 and
Mac OS X has a pretty capable GPU on it.
So what does this GPU do?
Well, many people believe that the GPU is
just there to make your graphics look good.
Actually, you can make some pretty high
quality renderers using just the CPU.
Movie studios do this all the time.
They make very high quality renderers,
and you see some great special effects.
However, their renderers take many,
many minutes to render a single frame.
This isn't so good for an interactive application.
So what does the GPU do?
It accelerates your rendering.
And when you're talking about interactive
frame rates, this matters.
Faster rendering equals better image quality.
Drawing efficiently, allows drawing more: more models,
more vertices in those models, more pixels, longer shaders,
better special effects in your application.
All at an interactive frame rate.
All right, so what will you learn today?
So let's say you've got a Formula One car and just because
you've got it you've driven to work every day doesn't mean
that you're going to be winning any Formula
One races or even qualifying for them,
even though you've got this very
advanced, almost 1,000 horsepower machine.
You need to know how to use it effectively.
Same thing with the GPU in OpenGL.
It's a very complex machine.
You need to know how to utilize it and use it
efficiently in order to harness that power.
So I'll tell you a little bit about
how OpenGL works under the hood.
Like any good Formula 1 driver, he knows exactly
the strengths and weaknesses of his machine,
where it excels, where he needs to work at it more.
I'll talk a lot about this process called state validation.
This is where OpenGL translates API calls into GPU commands.
Now this is actually a CPU intensive operation,
and a lot of applications stumble on it.
So you need to be aware of what happens there.
I'll also give you some fundamental OpenGL techniques to
avoid these CPU bottlenecks, and efficiently access the GPU.
Now it's important to note that these
techniques apply equally to iOS 4 and Mac OS X.
So the codes you write on one platform and the knowledge
that you've gained here, you can
leverage on the other platform.
They've got pretty different architectures,
very different GPU's, very different devices.
But the OpenGL software stack is quite similar,
so you can leverage that knowledge pretty easily.
So let me talk a little bit about OpenGL's design.
OpenGL is designed to be a low level API to
allow it direct access to the GPU's capabilities.
Allow you to get in and get out, not interfere, not
have the software stack interfere with your code.
Really lean.
However, it's high enough level
to drive many heterogeneous GPU's.
On the Mac we support many GPU's.
On the iOS we support many as well.
So there is a fair amount of work to
translate from these API calls to GPU commands.
Not the lowest level way to actually get to the GPU.
I mean we could make an API that's to the metal.
However, what this allows, this high level allows for you to
do is write code that's portable, and also write code today
that will run on devices that have changed
drastically underneath your application tomorrow.
So as architecture changes, your code
remains the same and works quite efficiently.
So OpenGL is a state machine.
And this state maps roughly to the GPU pipeline.
If you were to look at Apple's implementation of
OpenGL, what you'd see is this gigantic C Struct.
OK? And what would, be in this struct you could
actually look at the OpenGL specification.
At the back you'd see this pretty
large table that says, OpenGL state.
And it has things like, you know, what stuff is enabled.
The various GL enables in OpenGL, whether it's on and off.
It has things like what's bound at certain points.
And so that's basically what's in this context.
All right?
So much of OpenGL's live time is just
spent tracking state that your app makes.
So your application, there's a certain class
of calls that OpenGL has that just exists
to change this context state, this state becker.
[phonetic] So for instance, you
call glEnable, some state changes.
You call glBindTexter some more state changes.
You call glUseProgram, and again, the
context is updated with new state.
So at this point we're actually
doing nothing that's GPU specific.
It's all GPU agnostic.
There's no hardware specific command getting generated.
The real work happens when you make a draw call.
This is when API calls get translated into GPU commands.
All of that work is deferred until you draw.
So for instance you call glDrawArrays, and look what
happens here is we're munching all of this context state,
taking that draw call, and creating a GPU command.
So one thing to note about this is that this
translation state is a CPU intensive operation.
We need to do a lot of processing on the CPU to
figure out exactly what your application has done,
what it wants, and translate that into a GPU command.
All right, so now we've got this
GPU command, what happens to it?
Well, GPU commands are inserted to a command
buffer that's allocated by the kernel.
So we've put that in there.
And as the command buffer fills up, or your
application calls glFlush, we flush it to the GPU.
The GPU now can process it.
But actually there's one step that needs to happen first.
We need to actually transfer all the
resources that are needed for those commands.
For instance, we need to put textures
that are used for those draw commands.
And we need to download those to the GPU.
And that can make this process a CPU intensive process.
In addition to state validation, this is another
potential bottleneck your application could incur.
Now this is why we recommend not
calling glFlush for really any reason.
There are some very specific reasons where you might want
to call glFlush for multicontext, multithreading rendering.
But for the majority of applications,
you're only talking about a single thread,
and you should never need to call glFlush.
It's an expensive operation.
Now you've got the command buffer on the
GPU, and the GPU can begin processing it.
And it starts by fetching vertices, and
then it pushes that data down the pipeline.
So it's important to note that there are
many potential bottlenecks on the GPU.
Any one of these stages could be a bottleneck.
However, a very common bottleneck is actually the CPU.
All of those stages that I just talked about.
So a key point, the GPU is another
processor that runs in parallel to the CPU.
Like any good multithreaded apps, you don't want
one thread blocking and locking up the other thread.
You want both cords being running at the same
time, really maximizing the use of the CPU.
Well, same thing happens with a GPU.
As OpenGL pushes commands into a command buffer, the GPU
fetches commands that have already been submitted to it.
So, let me show that again.
OpenGL is simultaneously adding commands to one command
buffer while the GPU is fetching commands from another.
So you really don't want to let the GPU wait for the CPU.
That's just wasting the GPU's resources.
You could be rendering 1,000 enemies in your
game, and maybe you could be rendering 3,000
if you're not, if you are utilizing the GPU fully.
If you're stalling it, well, you're
not utilizing the GPU as much,
and you're not rendering its cool effects
or as many effects as you could be.
Also, some applications tend to use
the CPU to do some graphics work.
Well, as I mentioned before, the
CPU is good at a lot of things.
And you can do some high quality
rendering with it, but it's very slow.
And you really want to use the GPU for
this work, really offload the work.
There are certain calls that; so this is what happens
when you let the GPU wait, it just sits there.
There are a certain class of calls where the
CPU may need to wait for the GPU to process.
Anytime the GPU needs to process something for the CPU and
give it to the CPU, the CPU will sit there waiting for it.
These things are like readpixels, queries, and fences.
Here's what happens, CPU sort of just sits there
waiting until it gets the data it needs from the GPU.
Now there are ways you can use readpixels, queries, and
fences efficiently where you're not stalling the CPU,
and I can go into that a little bit later.
So one thing I really want to reiterate
here is that for each draw call,
context state is translated into GPU command.
This is a very CPU intensive operation.
We call this state validation.
Here's what happens you called
glDrawElements, the CPU processing goes way
up and we translate this GPU command here.
If you were to profile your application and
what you would see here is that the draw calls,
in this case glDrawElements takes a substantially
more time than any of the other OpenGL calls.
OK, here you see it takes order of magnitude more
than any of the other OpenGL calls that you see here.
And what's important to note about this, is
that there is some cost in making a draw call.
It's actually not the draw call itself that is very
expensive, or causing this expense, it's state setting calls
that your app has made before the draw call.
So you enable something, you bind something, those look
cheap on a Shark profile, but that cost does show up.
It doesn't show up in that particular call,
but it shows up in the subsequent draw call.
So state sending calls look cheap, they're
not really as cheap as they might appear.
So I'm going to over a couple of techniques,
which you can use to reduce your CPU overhead.
The first of which is to use OpenGL's various objects.
The second is to manage your rendering state.
Sort it, so that you're not making redundant state changes.
State changes that incur more CPU costs,
unnecessary state changes, and batcher state.
Reduce some state by combining objects into one.
[ silence ]
So objects.
Whenever you use an OpenGL objects, some of that
state is cached in that object and pre-validated.
It's not validated at draw time; it's validated up front.
So when it's bound it's easily translated into GPU command.
So let's take texture objects for example.
You call glGenTextures, and you create this texture object.
All right, you call tech image and
you see this object state here.
It's this little state vectors that the object itself has.
And that gets updated and validated.
You call tech parameter and that state
gets validated and updated again.
And finally, when you call bind texture, that
state is merged with the rest of the context state,
and cached much more easy to validate, much more, much
less CPU processing required to create that GPU command.
Now it's important to note that this process of
setting up objects is actually somewhat expensive.
So you really want to do this upfront, when
you're app, your level, your document is loading
when the user isn't expecting some high frame rate.
Don't wait until your middle of your, you're
in the middle of your real time run loop.
Do it up front.
So now I've described one way in
which you can reduce CPU validation.
I'll also describe a few other ways.
And I'll also describe some ways in which not
using objects can be an inefficient use of the API.
The first thing I'll describe is how fixed function,
vertex and fragment pipe, how that fixed function,
vertex and fragment pipe, can be
an inefficient use of the API.
And instead, how using GLSL programmable
objects can be a much more efficient use.
So modern hardware doesn't have any fixed
function vertex or fragment pipeline.
OpenGL needs to convert all of your fixed function
calls, glEnableLight, etc., into a shader internally.
And this can be pretty costly.
Here's what happens.
You call glEnableLight, some context
state changes, as it usually does.
GL text in another fixed function call.
And that state gets updated.
glDrawArrays.
Here's what happened, we generate this large
shader under the hood for your application.
Now you don't have to pay attention
to exactly what's going on.
We're just emulating what you've told us,
what the state that you'd like to use.
And this can be a pretty expensive operation.
We actually cache the shader away,
so we don't generate shader
and complete it every single time that we use fix function.
But we cached it away in the cache table.
And when you use this, you use fixed function, we need
to generate this cache key, which is somewhat expensive.
And then go inside of our cache and
pull out your shader and set it.
This can be pretty expensive.
So instead what we'd rather have you do, and which
could be a much more efficient use of the API,
is to use program object, shader objects.
This is a most efficient way to set up the pipeline.
So you specify shader code, you compile and link it
into a program object while you're document
your level, your document, your app is loading.
Before you venture your real time run loop.
And then when you want to use that pipeline
you set it, and bam, easily set on the GPU.
No looking into some weird hash table
to pull out the fix function shader.
Very efficient use of the API.
So some people would like a little bit more
understanding of how vertex and fragment shaders work.
They are kind of a complex piece.
So I'm just going to go through and
describe a little bit about how they work.
So the vertex shader is the first
part of the programmable pipeline.
A shader is executed every time for every vertex.
So let' say you've got a model with 1,000 vertices.
And you draw that model.
Your vertex shader will run 1,000 times.
OK, now this is great, this is fine, because
the GPU's are very efficient at that.
It's designed for this.
It's a high bandwidth processor specifically
designed to do this sort of professing.
OK, so let's talk about the inputs.
Inputs are per-vertex attributes that
are specified outside the shader.
Things like pre-transformed positions,
pre-transformed normals, untransform texture coordinate.
Specified outside the shader.
Now there are two classes of inputs.
The first is a position in clip space.
OK? This is the glPosition built-invariable.
You've assigned this, glpresent variable.
And this clip space is used for
OpenGL to map, map to the screen.
OK? It's to map your 3D model onto a 2D screen.
So that's one output of the vertex shader.
The second are one or many bearings.
And so the type of data that these bearings usually
consist of are colors, normal, and texture coordinates.
Values that you'd like to read in your fragment shader.
Now these values are interpolated
across any rasterized triangle.
So, let's say this triangle was generated by three vertices.
And on one vertex you signed 0.4 to one of these varying.
One the other you would sign 0.8.
You output that in your vertex here.
OK, so halfway through the pixel generated
on this polygon will have a value of 0.6.
One-quarter of the way down it'll have 0.5.
Three-quarters, 0.7.
So in other words, it's distributed across the polygon from
the two varyings that you output in your vertex shader.
Now it's not it's not linearly output like
it looks here where it's evenly distributed.
In fact, if the polygon is facing you it will be.
But if it's a little bit on edge you'll have
this sort of perspective correct interpolation.
Where values that are closer to you will be
nearer, or actually be a little further apart.
That's to give it this 3D effect that you need.
OK, so let me give you an example
of a pretty simple vertex shader.
So here is a varying that we define,
varTexCoord And this will, this is,
this works just like the variances I just
described that are distributed across the polygon.
And here is the main body where all this work is going on.
On the first line what we're doing is we're multiplying
the incoming glVertex, the pre-transformed position,
and multiply it by the modelViewProjectionMatrix.
And then we output it to glPosition.
So now this position is in clip
space, because we transformed it.
And then the second step here is we take in a texture
coordinate that is specified outside the shader.
We take the two components, S and T
components, and we assign it to this varying.
And this varying, again, it works just
like that diagram I just showed you.
And you can read varTexCoord in the fragment shader.
The second stage of the programmable pipeline.
Now there are a couple of things to note about this shader.
We're using these built in variables here.
Now these are sort of a throw back
to the fixed function days.
And they're only available on the Mac.
They're not available if you use ES 2.0 on iOS 4.
They're based on, sort of, the legacy pipeline.
And actually we would prefer that you
do not use them for a couple of reasons.
There needs to be some mapping
performed in order to use them.
And they're not forward compatible.
OpenGL ES doesn't have them as I decided.
And code without them isn't the future of OpenGL.
In fact, there are a number of extensions that would
ship on Mac OS X where you can't use these variables.
There a whole set of these variables.
In fact, on the right hand side are
varyings and attributes that are, you know,
these legacy fixed function based variables.
And on the left here, so excuse me on the
right those are the varying attributes.
And on the left here we've got uniforms.
ModelViewProjectionMatrix, lighting, points, stuff that
just doesn't exist anymore in the programmable pipeline.
But OpenGL has them for a legacy reason.
So instead of using these variables, you can use generics.
Here's an example of a shader, which does
the exact same thing as the previous shader.
Except instead of using built in variables we use our own.
We define our own, so that we have portable code and OpenGL
can map them to the programmable pipeline much more easily.
So we define an input position
and an input texture coordinate.
Well don't use the built in glVertex or glMultitexture cord.
We use our own.
And we define our own model view projection matrix.
We don't use glView projection matrix.
And we sat that outside the shader.
And just as in our last shader, we have bar,
TexCoord another varying that we output to.
And in the main body we multiply our model view
projection matrix by our input position that we've defined.
And we output it to glPosition.
Now glPosition is a built in variable, but it's
not one of these legacy throwback variables.
You still need to use it.
And then again we take our input text TexCoord
variable and pass it through to varTexCoord.
So that it may now be read in the fragmentator.
The fragmentator.
So this runs once per pixel produced by each polygon.
So let's say you've got a model with 1,000
vertices, and it draws 100,000 pixels.
This shader will be run 100,000 times.
Again, GPU's efficient at processing this, so that's great.
But in general, if you have some processing
that could be done higher up in a pipeline
in the vertex shader, instead of the fragment shader.
You probably want to do it up there,
because that will be run less times.
A little bit less processing, a
little bit less work for the GPU.
The cool thing about this programmable
pipeline is that you can render effects
that aren't possible using the fixed function pipeline.
So here we've got this tune shading effect.
OK? And here's how that works.
Per fragment we calculate this edginess factor.
So if the pixel on the polygon, if the polygon
on which it lies is more on edge to the user,
we'll give it a value that's lower to zero.
That's closer to zero.
OK? However, if the polygon is facing the camera,
the user will give it a value that's closer to 1.
So we now know whether this polygon is on edge or facing.
And then we could use this edginess
factor to give it a color.
So, for instance, if it's on edge, it's probably we
want to give it the sort of tune effect, you know,
as the silhouette of the teapot
hat, so we'll give it a black value.
And otherwise we'll give it a blue value.
And then we assign this color that we've determined
the glFragColor, another built in variable,
which is the ultimate color that
will be rendered for that pixel.
Let's talk about the inputs to the vertex shader.
There's this call, glVertexAttribPointer, which
points to data that can be fed to the vertex shader.
Input position, for instance, or
texture coordinates, things like that.
Here's how it works.
You allocate some memory in your
application; in this case we're using malloc.
And then you load data into this
buffer that you've allocated.
You call glEnableVertexAttrib to let OpenGL
know that hey, I'm going to use a vertex array.
And you give it a position at index, or
some index 0 to 16 that maps to the shader.
And then you call glVertexAttribPointer and you
give it this position data that you've allocated.
OK? Position data is basically tells
OpenGL hey, my vertex data lives here.
So there are some issues with using OpenGL this way.
Because you're allocating the buffer on your end,
you're not telling OpenGL to allocate it itself.
OpenGL has to copy this data into its own stream.
So CPU's cycles will be required to copy that vertex data.
Here's what happens.
You call glVertexAttribPointer, OpenGL now
knows about this buffer that you've allocated,
but then when you call glDrawArrays or draw with this
buffer, OpenGL copies it into the command buffer.
And there's a double whammy here.
Because the CPU's also needing to copy,
there's some CPU cycles being incurred.
But also we're filling up the command
buffer much more quickly.
And then a flush will occur.
Flushes will happen much more often, a second whammy.
So instead of doing this, we'd like you to be
able to just cache that vertex data on the GPU.
And here's how you do it.
You can store it in a VBL.
You call glBufferData and BufferData allocates some space
on the GPU, and then loads your vertex data into the GPU.
Then you'll call glDrawArrays, a command is created and
it simply references this data that's already on the GPU.
You would call BufferData probably when your application
loads before you're in the real time run loop.
There is some cost to it.
But if you do that it'll be cached ready
to be used in a real time run loop.
So here's how it works.
You call glGenBuffers, that creates
this vertex buffer object.
You can call bindBuffer to bind it to the context.
Tell OpenGL, hey I'm going to work with this object now.
And you call glBufferData to allocate and load your data.
Then you call glVertexAttribPointer the same call
that you made before with client side vertex arrays.
But this time instead of giving it
a pointer, you give it an offset
into the vertex buffer object, where your vertex data lives.
So in this case we give it 0, which says, hey my
vertex data is at the beginning of this buffer.
You can actually store many attributes within a single
vertex buffer object, so it doesn't have to be 0.
Let's say you've got color data 50 bytes down, then
you give it a value of 50 for the color attributes.
You may want to modify your data
for animations or some other reason.
If you have a constant number of animations, if you have
few of them, few enough of them, just create multiple VBOs.
Let's say you've got 10 frames, 10 models
that you want to animate for your character.
Just make 10 VBOs.
That way all 10 of them are cached on the GPU.
However, you may generate data on the fly.
Maybe you'll load it from disk.
Someway that OpenGL may not know about it.
In this case you can call glBufferSubData Or
MapBuffer to modify this cached vertex buffer object.
Here's how this works.
You call glBufferData as you normally would,
but instead you give it this glDynamicDrawHint.
And that says to OpenGL, hey, I'm going to modify this
buffer, so put in someplace that's easily accessible
by the GPU, but can be easily modified by the CPU.
Then you modify this data that you want
to update in the vertex buffer object.
And you call BufferSubData with this update data
pointer, and it's loaded into the vertex buffer object.
There are some caveats.
There are some potential problems where if
you're updating buffer a lot, buffers a lot,
you can have some, encounter some problems.
You can force the GPU to sync with the CPU.
So all of a sudden you're running full out in parallel,
and then you call BufferSubData and
then lock, one depends on the other.
And you don't want that.
So instead what you can do, well so, basically
what will happen is this CPU will wait for the GPU
to finish drawing the buffer before it updates the buffer.
OK? Both the CPU and GPU can't read and
write from the buffer at the same time.
So this can happen whenever you use
glSubData or glBufferSubData or MapBuffer.
You can use a double buffering
technique to avoid this problem.
And let me explain how that works.
So, you have two vertex buffer objects.
On an odd frame you'll load, you'll
bind and load an odd buffer.
OK? And you draw with it just as you normally would.
But on an even frame you bind and load this other buffer.
This way the CPU is loading this even buffer, while the
GPU is reading from an odd buffer, a different object.
They don't need to synchronize, because they're
not accessing the same object, the same data.
OK? And then you draw with an even
buffer as you normally would.
And so what you would do is you'd ping-pong between these
two buffers updating one, while the other is being drawn.
Vertex array layout.
So vertex array layout.
So VertexAttribPointers are really important to GL call.
Because not only does it tell you, tell OpenGL where
your vertices live, it tells it the vertex layout.
So you call glVertexAttribPointer and you give
it some data, like what kind of data is it?
It's a floating point in this case.
The size of the data.
It's probably, it's a position,
so maybe it has an X, Y, and Z.
So it needs a value of 3.
Describe the number of bytes from the
one vertex, one attribute to the next.
So in this case there's 16 bytes between
to the next attribute 0, OK, in the array.
And offset, where it lives within the vertex buffer object.
Call glVertexAttribPointer again and some more
data is updated for a different attribute.
OK? So wouldn't be nice to cache this in the GPU so that
you're not always having to call the glVertexAttribPointer.
Well now you can, because now you
have a vertex array object.
And the way this works is you call glGenVertexArrays.
And this creates this vertex array object.
And any subsequent vertexAttribPointer call
actually changes the data within this VAO.
So it's all cached right there.
Let me show you some code.
You call glGenVertexArrays, create the vertex array object.
You bind it to the context, tell OpenGL, hey;
I'm going to work with the vertex array object.
You call glEnableVertexAttrib just as you
normally would, and glVertexAttribPointer.
But instead of this getting set in
the context, it's set in the VAO.
You can set multiple vertex attributes
and it'll all cache within the VAO.
And then you can call glBindVertexArrays
when you want to use this VAO to draw with.
You don't have to call VertexAttribPointer many, many
times to set it up, to set up your model data to be drawn.
You just call BindVertexArray once and it's
already cached ready to go to be drawn.
Framebuffer objects.
These are pretty cool objects.
So with EAGL and OpenGL ES you must
always use an FBO in some form.
The EAGL IT guide requires that you use and FBO
to allocate your backing store your
store from, to which you will render to.
However, you can do some pretty cool effects by
attaching a texture to a frame buffer object.
So, here we've got this little demon character.
And what I've done here is we've rendered
this demon character to a texture.
All right?
And then we bind that texture and textured
this plane that you see here on the bottom.
And this plane is the image that we've rendered
to that we're now mapping to this polygon.
So you can do all sorts of reflections,
refractions, shadows.
Some pretty neat effects with a
renderable texture and framebuffer object.
The way this works you call glGenTextures, create your
texture as you normally would, bind it to the context,
and then you allocate some data by, with glTexImage2D.
In this case we're making a 512 x 512 texture.
OK? And then we can create a frame
buffer object, called glGenFramebuffers.
And bind that to the context.
Later when the texture is no longer bound to the
context, we can attach it to this framebuffer object,
which basically says, any drawing that
you do in OpenGL, draw to this texture.
Don't draw it to the screen.
OK? And then later on you can bind that
texture to the context, and you can read it
and map your rendered image to some other object.
Some notes about objects mutability.
OpenGL objects can be changed at
any time during their lifetime.
So in this case we've created a texture,
and we've given it GL Linear Filtering.
And then later on maybe in a real time run loop when
the user is expecting an interactive frame rate,
we can do something like glLinearMipmapLinear
and change that texture object.
Change the way that it works.
I would really avoid doing this, because this forces
OpenGL to revalidate the object the next time it's used.
If you need an object with 2 different
states, just create 2 different objects.
Set them up at fronts, don't change
them, because then OpenGL needs
to do some revalidation, and that's
going to cause a stutter.
Or it's potentially going to cause
a stutter in your application.
So OpenGL objects aren't actually
pre-validated when they're created.
Instead they're validated when they're first used to draw.
Now there's some reasons for this.
Shader objects can't be compiled until their first use of
draw; because the compiler needs to know some context state
that that shader's going to be used with.
For instance, they may need to know that FBO, VAO, or
textures that are bound, the blend states that is used
in conjunction with that shader in order for it to
do a good job compiling and optimizing your shader.
Texture objects and vertex buffer objects, memory resources
don't get cached in VRAM until they're first used to draw.
Now this lazy validation step that I just spoke
about, can cause some hiccups in your run loop.
And there are some methods to get to avoid this.
So you can avoid this validation, this
hiccup by pre-warming your objects.
And the way this works is, you bind the object
and draw with it to an offscreen buffer.
Maybe the back buffer and you don't
present that to the user.
And you use the state and other objects it's used with.
So let's say you got a shader.
Make sure you use it, you turn on all the blending state
you bind all the textures that that shader is used with,
and draw with it first before you're
in your real time run loop.
Don't wait until you're in your
run loop and cause a stutter.
Hey, here's a little bit of pseudo code.
Oh, so one thing I should note is really only consider
this if your application experiences a hiccup.
If it experiences a hiccup, particularly when you've bound
an object that you've never used to draw with previously.
So here's some pseudo code for how this works.
For every program in your scene bind it to the context.
For every VAO that's used with that program bind that.
And for every texture used with that VAO program, bind that.
And then for every blend state use with all those
other objects, etc., etc. You set that stake.
And then you draw.
Draw to the offscreen buffer.
This isn't to present to the user,
it's just to warm up OpenGL.
Note about object size.
Know how much memory your objects take.
All current graphics resources
need to fit in memory somehow.
And memory is limited.
On iOS 4 there's no virtual memory
system, so this constrains.
On Mac OS X.
Some of the devices have limited VRAM.
So, you know, if you're using too
much memory there's some cost to it.
There's some paging that might need to happen, and
you don't want to have that in a real time run loop.
Try to use compressed textures.
Textures take up a lot of memory.
There are a number of compressed
texture formats that you can use.
Also, don't use the data types like
a 32-bit float for your textures,
when you could be using an unsigned, 8-bit unsigned byte.
An 8-bit unsigned byte.
Use what you need.
Use the smallest state as possible
that you need for your scene.
We see a lot of time, some applications
allocate this humongous texture.
A 2,000 by 2,000 pixel texture for a little tiny model
that's going to fill up maybe only 200 pixels on the screen.
You're never going to see most of that texture.
It's just a waste of resources.
So fit the texture to the size of the model that's rendered.
Use a 256 x 256 texture in that case.
And really to ensure a smooth frame rate, try to fit, the
entire frames resources into VRAM so that we never need
to page out a texture from VRAM in the middle of a frame.
And if possible, fits entire level
or scene's textures into VRAM.
That way we will never need to
page in the middle of your scene.
So, OpenGL's objects are a very
efficient way to use the API.
However, there is still some costs to binding an object.
You really need to determine this
costs through profiling, however.
Batch or draw calls to reduce this
binding of more expensive objects.
OK? Let's say you've got, you've determined that
some texture takes a really long time to bind,
or takes a fair number of CPU cycles to bind.
In two objects that use this texture.
Don't bind it once then draw, bind it again, and then draw.
We said bind it once and draw, and draw the second one.
Visibility.
OpenGL processes every thing sent to it in some form.
Even if it's ultimately not visible.
You know, we're not going to draw something
that's, you know, behind the camera.
But there is some processing that needs to occur.
The CPU needs to process the draw call.
The vertex shader needs to run to determine whether
or not the camera can actually see this object.
So try not to send them to OpenGL.
Imagine this is the scene of your application.
Here's the view point, the user, the camera.
Here's its field of view.
And here is the frustum OK, this
includes the right and left clip planes,
the top and bottom clip planes,
and the Z near and far clip planes.
So now it will iterate through your list of objects
and determine whether or not they're visible.
Anything within this area we'll send to OpenGL.
Anything outside of this frustum we'll
just discard, we won't send it to OpenGL.
And draw this robot; he's clearly behind the camera.
We won't draw him.
We draw this hero character; he's off to the right.
We won't draw him.
We draw this demon, hey, he is definitely
visible, so we'll mark him as object 0.
We draw this other robot, hey, you know, he's visible.
So we'll draw him.
We check this other demon character;
he's not visible way off to the left.
We won't send him to OpenGL.
This hero character, he's definitely visible.
He' also in the frustum.
So we'll send him to OpenGL and then
this demon character also visible.
We'll send him to OpenGL.
OK? So now we have the set of objects
that we want to send to OpenGL.
You can put this inside of a visibility list.
Ok? Now one thing to note about this
is you don't want to draw the objects
in the same order they were determined visible.
Ok? You want to sort it by render state.
What would happen here you'd bind the demon's texture?
Draw him, bind the robot's texture, draw him.
Bind the hero's texture, draw him.
And then we'd rebind the demon sections
that we bound originally and draw the demon.
OK? Instead what you want to do is sort them, so
that now the demons are together, one bind two draw.
All right, so here's an algorithm that
you can use to sort your rendering state.
It's called a state tree.
So let's say you've got these four characters.
You've got these two guys and they're
clearly using the same texture.
And you've got these two guys, and you've got this
human character and he's got some metal armor on him
and you've got this robot clearly all metal.
So you want to have the shader that
does some little shininess effect.
They used the same shader.
You stick them inside of a tree
and you traverse it in order.
So starting from the top, you bind the first shader.
You bind the texture that's used by these two guys.
You bind the VAO, and you render the demon.
You go up and you've already bound
the texture, don't do it again.
You bind the new VAO and you render this guy.
Go up to the top and now we bind this new GLSL program.
This shininess program.
And we bind the texture, bind the
VAO, render this robot, go on up.
You've already bound the GLSL program, so now we just bind
the texture for this human guy, find his VAO and render him.
Now it looks kind of like a binary tree here,
but it would actually be an enary tree.
This is a very simplistic scene.
You probably would have, you know, many, many
shaders, which would make this tree much wider.
Also, you might want to account
for different rendering state.
Like you might want to account for depth, blend, clip
state, etc., which would make this a deeper tree.
OK? In this case I determined that the GLSL program
objects I said, well those are pretty expensive to bind.
So I'm going to put them at the top of the tree.
We're going to set those first, so that
we don't have to rebind them very often.
OK? So the more expensive objects at the top of your tree,
less expensive objects like vertex
arrays towards the bottom.
One way in which you can reduce CPU overhead, the CPU
overhead of draw calls, is to combine the draw calls.
Basically make less draw calls.
And there are two methods that I'll talk about.
One is texture atlasing.
And this is basically combining
multiple textures into a single texture.
And the second one is instancing.
Instancing requires some special hardware
that's only available on Mac OS X.
My colleague Matt Collins will be talking about
that a little bit more tomorrow and how to use that.
I'm actually going to talk about texture atlasing.
Here you have these four characters, and you've
got these four textures to map to these characters.
In order to draw this in a naive way would be to bind this
texture, draw to it, bind the second texture, draw to it,
find the third texture, draw, find the fourth texture, draw.
OK, you're binding four times for four textures.
Instead you can bind it into one uber texture.
A texture atlas.
Then you bind that one texture, draw, draw, draw, draw.
One bind, four draws.
Much more efficient use of OpenGL.
Here's an example of a texture atlas used in the Quest demo.
Here we've got a lot of different
elements in a single texture.
We've got some flags on the upper right.
We've got a stone wall in the upper left.
We've got stairs, we've got doors, we've
got statue, all in one texture, one bind,
and they can draw a ton of different
things of their dungeon.
Multithreading in OpenGL.
So because there is a fair amount of CPU overhead that
OpenGL incurs, there are reasons to multithread it,
so that you can amortize the CPU
costs across multiple cores.
This makes sense on iOS 4 devices as well,
even though they only have a single core.
CPU intensive calls can block.
And that means that while they're blocking, while
you're in your main loop doing some OpenGL processing,
you can't handle UI events, you can't
handle audio, you can't do your app logic.
Additionally, if you're doing a lot of stuff on the main
thread, there is this is watchdog process that looks
at your app and determines whether
you're in some infinite loop,
whether you're behaving badly, and it may kill your app.
So it may mistake you for doing something like
in an infinite loop or a block and some sort
of deadlock and just kill your application.
So you really want, you could put some of this processing
on anther thread and the watchdog won't do this to you.
So here is the simplest multithreading
technique I'll describe.
And basically you load a second, maybe third
thread, both with, or one or two OpenGL contexts.
And you can use these threads to load data.
Load textures.
Load vertex data.
Compile shaders.
A lot of CPU heavy lifting.
Important things to know is once you're done with
all that loading you want to kill your other threads.
You only want to have one thread
running with an OpenGL context.
OK? Because the other thread, two OpenGL
contexts is running at the same time.
There is some CPU over head that might be
incurred, because there's some locking.
And so two threads can block one another;
two OpenGL threads can block one other.
So just have one OpenGL thread running at a time.
Another more advanced technique is
to use a producer consumer paradigm.
So in this case the main thread
produces data, it can produce, you know,
the animation frame, which your characters are in.
To position on the screen or position in the world,
you can compute the visibility of your objects.
Use that frustum culling technique.
On this thread you won't have an OpenGL context, you're
just producing data to render by a second thread.
When the producer thread is done you
send all the render threads to begin.
And then the render threads can take all that
data, consume it, and render with OpenGL.
It has the only OpenGL context on it, that render thread.
You don't have two OpenGL contexts.
The main thread can then process
audio input and other app logic
in parallel while the render thread
is actually drawing stuff.
And let's say you still have some CPU
overhead and it's not well balanced.
You can move some stuff, like maybe the
visibility test to the render thread.
Give it a more even distribution.
So I've talked a lot about using the
CPU, or the CPU and the GPU together.
You need to also consider using the
GPU with the display, the other device.
That is important to consider when coding your OpenGL.
One thing to note is you can only render
as fast as the display can refresh.
It doesn't make sense for you to render at 200 frames
a second if the user can only see 60 frames a second.
OK? It just wastes battery power.
You're doing a lot of processing that the
user will never have any context into.
Will never see the result of.
On iOS 4 you can use a CADisplayLink API.
And we see a lot of apps using this NSTimer
API to initiate their per-frame rendering.
Instead, we use this DisplayLink API, because the NSTime
is arbitrary when it's fired with respect to the display.
What will happen is NSTimer might fire right before
the refresh, and so there's going to be some latency
between the time you draw and then you can see it.
Or it might be fired right after display has refreshed.
So it's not going to be consistent with respect to display.
And in some very pathological cases
it can reduce your frame rate.
On Mac OS X you can use the analogous API CV display link.
Now we see a lot of games trying to control their main loop.
This is a kind of an outdated way of doing it.
Because, again, you don't need to display, or don't
need to render any faster than the user can see.
Looping more than needed wastes power, particularly on
these MacBooks where people want a fairly long battery life.
You could implement some benchmark mode for advanced
users or developers that runs on the main loop.
But under shipping game in a normal loops,
or for a normal running you may just want
to use CVDisplayLink to initiate your rendering.
A note about coding for both platforms.
So OpenGL ES is a subset of OpenGL on the desktop.
So if you code using OpenGL ES 2.0, you can port your code
that you've invested a lot of time on iOS 4 onto the Mac.
And vice versa.
OK? So if you've got this Mac application, and if you
stick to all the calls that OpenGL ES 2.0 provides,
you can pretty easily port it to the iPhone OS 4.
There are some things to be aware of.
Clearly there are more memory and performance
constraints on the embedded iOS 4 devices.
So you need to consider that.
There are different compressed formats.
You will need to translate for that.
And there are for some kind of silly
reason, slightly different function names.
Now the parameters of these functions are exactly the same.
On OpenGL ES we have this BindVertexArray
OES function and it's exact same function
as BindVertexArrayAPPLE, it just
has a slightly different name.
So you just need to rename your functions.
The sample code that I provided for this session that
you can find for the session's site, compiles for both
and you can kind of use it as a template or
maybe a starter for creating your application
that you might want to port and ship on both platforms.
It's this kind of cool little reflection demo.
And it works pretty well on most platforms.
So in summary, there is a fair amount of
CPU overhead that OpenGL needs to incur,
and you can minimize this to efficiently access the GPU.
Validation is where a lot of this CPU overhead occurs.
You can use OpenGL objects to cache this validation
and minimize state changes and draw
calls to reduce the validation.
For more information you can contact our Graphics
and Game Technologies Evangelist, Allan Schaffer.
As well there's a ton of documentation
on OpenGL at the OpenGL Dev Center.
And there are a lot of engineers from Apple at
these Apple Developer Forums, so you can get help
and ask questions throughout the
year at this devforums.apple.com.
Also, I've posted a bunch of code snippets
and the sample code at this link down here.
So you can check that out.
There's a Q&A on the different variables that you shouldn't
use in your GLSL program and some other information.
This is just the first of six OpenGL
sessions at this year's WWDC.
And there's some great information for both
Mac and iPhone developers, or iOS 4 developers.
The first is OpenGL ES an overview on, for the iPhone.
And that's mainly geared for the iOS 4 developers.
And the second is OpenGL ES Shading and Advanced Rendering.
And this is a pretty cool one.
My colleagues have come up with some great demos.
And even if you're a Mac developer, a lot of
these demos are easily portable to the Mac.
So they're going to be talking about some shadows,
some reflections, some really cool techniques
that you should probably check out
even if you're a Mac developer.
There's also an OpenGL ES Tuning and Optimization session.
And this is going to be great for developers
coding for iOS, but also some of the techniques
that they mention you'll be able
to use and leverage on the Mac.
They're going to be describing a tool that's
available on iOS 4 the OpenGL Analyzer.
And even though it works only on iOS 4,
if you're a Mac developer you can use some
of the same techniques to profile your applications.
They're also going to be talking
about some of the GPU bottlenecks.
Now I talked a lot about CPU bottlenecks, but there is a
whole class of bottlenecks on the graphics processing unit
that you should be aware of and potentially could run
into after you've optimized the CPU portion of your app.
And then tomorrow morning at 9 o'clock there's OpenGL
for Mac OS X and my college Matt Collins will be talking
about a number of the newly available features
on Mac OS X instancing, texture arrays.
He's also got some really cool demos
that you might want to take a look at.
And then finally there's this Taking
Advantage of Multiple GPUs session,
and they're going to be talking
about OpenCL the other GPU API.
And it's used in conjunction with OpenGL
and leveraging multiple GPUs on the Mac Pro
to do some really cool processing with OpenCL and really
great rendering with OpenGL on two different devices.
So thanks a lot for coming.
I really appreciate it.
And I'm hoping you guys will be able to take these
techniques and really efficiently use OpenGL.
[ Applause ]