Transcript
>> My name is Gokhan Avkarogullari. I'm
with the iPhone GP software group at Apple.
And my colleague Richard Schreyer and I will be talking
about OpenGL ES and the iPhone implementation of OpenGL,
what's new on it and all the new things we added on iOS 4.
So last year at the WWDC we introduced
OpenGLOpenGL ES 2.0 and iPhone 3GS.
Since then, we introduced third-generation
iPhone, third-generation iPod Touch, iPad,
and we're going to very soon release iPhone 4 and iOS 4.
With all this new hardware and software, there
are a lot of new features added to the system.
Today we're going to give you an overview
of the new extensions that we added.
We're going to talk about the retina display, the higher
definition display, the higher resolution, and what it means
from OpenGL's perspective, which is a pixel-based API.
We're going to talk about the impact of high
resolution displays on OpenGL based applications.
And finally, we're going to talk about multitasking,
and how you can make the user experience the greatest
with the changes to your OpenGL API calls.
We're going to talk about new extensions first.
Multisample Framebuffer is an extension being implemented
to help resolve aliasing issues in the rendering operations.
So let's look at what aliasing is.
Here is an example screenshot from
Touch Fighter application.
If you draw your attention to the edges of the wings of the
fighter plane, you can see that there's a staircase pattern,
a jaggedness to it, on all edges on the wings of the plane.
If you do a close-up, you can see in the
zoomed up version, there's a staircase pattern.
This is actually a significant reverse on the
live application, because the pattern changes
from frame to frame, and they look like busy lines.
So multisampling helps us deal with this problem
by smoothing out the edges by rounding it
to a higher resolution buffer and then generating
results from that higher resolution buffer.
So how can we do it?
This is your regular way of creating
framebuffer objects without multisampling.
You would have a framebuffer object that you would use
to display the results of your rendering operation,
and it would have, normally, a color attachment, a
depth attachment, sometimes a stencil attachment.
Which, if you want to do multisampling in your
application, you will need two of these framebuffer objects.
One to display the results of your rendering
operation, and the other one to generate the --
to do the rendering into where you
can get higher resolution images.
On the right is the one that you will use for
displaying the results of your rendering operation.
It only has a single attachment in this case,
a color attachment, and it's not filled in yet.
On the left is the multisample framebuffer object
where the rendering operations initially take place.
You can see that it has a depth attachment and
that takes place on this frame-buffer object,
that's why the other one doesn't have a depth attachment.
And you can see that the buffers in the
multisample framebuffer object are fill.
Basically it has a result of your rendering commands.
So with multisampling, once that is done, and the size
of the buffers are different on the right and left.
So we're assuming here in this example, there's 4X
multisampling, and therefore the buffer is attached
to the multisample framebuffer object is four times the size
of the buffer attached to the display framebuffer object.
So at the end of your rendering operation, you do resolve,
which means that you take the average of the pixel --
of the color values of the samples for each pixel
and generate a single color value for the pixel,
and write it out to the framebuffer object that you're
going to use to display your images on the screen.
So let's look at how you can do it in the code.
Here's a single sampled framebuffer creation. You have
a color buffer, you generate it, bind it to your FBO,
and then you get a storage packing storage for that from the
CALayer, and now finally you attach it
to the color attachment of your framebuffer object.
Similar operations takes place for the depth buffer.
You generate it, bind it, and get storage for it through the
render buffer storage API call, and now finally attach it
to the depth attachment of your framebuffer
object in the single sample case.
Now, as I told you before, you have to create two
framebuffer objects for the multisampling operation.
So let's look at how the framebuffer objects are
created for the multisample framebuffer object.
The difference is mainly in how you
allocate storage for your buffers.
In a multisample framebuffer case, the color buffer storage
comes from the render buffer storage multisample Apple API.
Just like in the depth render buffer case.
The difference between this API
and the previous one is you specify
to OpenGL how many samples there will be in your buffer.
So in this case, in this example, it could be four.
Four samples to use.
And the buffers that are created will take into account
that you are planning to use four samples per pixel,
and they allocate buffers based on that information.
Let's look at how you would normally
do a render in a single sample case.
You would bind your framebuffer object, set your report,
issue your draw calls, and finally bind your color buffer
that you want to display on the
screen, and then present it on the display.
With the multisampling case, you have two framebuffer
objects, and you do the rendering operations
to one and display operations on the other one.
So to do the rendering operation on
the multisample framebuffer object,
you need to bind multisample framebuffer object first,
and set your viewport to your regular
draw operations, just like you did before.
But you need to get the data from the multisample
framebuffer object to the single sample framebuffer object.
Now, for that, we need to set the multisample
framebuffer object as a target of a resolve operation,
as the retarget of the resolve operation, and
the display framebuffer object as the draw target
of that frame resolve operation, and finally you should
resolve the multisample framebuffer Apple API call
to get the contents of the multisample framebuffer object
to the screen size on the display frame buffer object.
And just before, you would attach, you would bind
the color buffer so you can display it on the screen.
So as you can see, the changes to your application
to get multisampling behavior enabled is very simple.
You need to change your initialization code
to generate a multisample framebuffer object,
and you need to change your rendering just a little
so that you can do a final resolve
operation at the end of the rendering loop.
You might be thinking, "What kind of performance
implications that would have on my application?"
And the performance implications are different,
depending on what kind of GP you have on the product.
All iPhone OS devices starting with
iPhone 3GS have PowerVR SGX GP on them.
And PowerVR SGX GP has native hardware restoration support
for multisampling operation It understands the difference
between what a sample is and what a pixel
is, and therefore it grounds your shaders
to generate color value per pixel, not per sample.
But it keeps depth values for each sample, and it also
does depth testing for each sample, not per pixel.
So in the four-sample case, it would do four
depth tests and would generate four depth values.
A single color value.
And depending on if the depth test passes or fails, it
might have the same depth value and same color value
for all four pixels, or it might have different
depth and different color values for all samples.
It might have different depth and color values
for all samples, depending on if the pixel is
on the edge of a polygon or the inside of a polygon.
And after all these values are computed for each
sample and your rendering operation is done,
it takes the averages of the color values and that one
would be your final color value after the resolve operation.
PowerVR MBX Lite is the GP that we use
on all our products before iPhone 3GS.
Unfortunately, it doesn't have native support, native
hardware restoration support, for multisampling.
Therefore, we implemented multisampling
in that case, a super sampling.
Which means that we used high resolution buffers, and since
the MBX Lite doesn't know the difference between the sample
and pixel, it generates the same, it does processing
for each pixel in these high resolution buffer.
So it generates color values for each sample, it generates
depth samples, does the depth testing, for each sample,
and finally that all these values result in a single
value written into your result framebuffer object.
That means that there is more performance impact on the
PowerVR MBX Lite GPs than there is on the PowerVR SGX GPs.
So you might be thinking to yourself, "I could
already have done that, render the texture.
I could have generated an FPL that does have larger color
attachments as textures and read them back and then sample
from them and have the Multisample
behavior on my application already.
So what is the difference?"
The difference is that you're giving us the
information that your intention is to do multisampling.
So we can actually award the readback
and then average operation at the end
of your rendering textures multisampling.
When we generate the values, the high resolution
values, we know that you're going to generate,
you're going to use those values
to create a single sampled version,
so we write out both the high resolution
and the lower resolution one.
And I'm going to talk about another extension that helps
performance with the multisampling next, that's the discard.
With the discard, the difference between using
multisampling and render-to-texture becomes even larger.
But first, let's give a small demo, what kind of quality
impact does multisampling have on your application.
So, in this case, as you can see, there is a plane
in the front, the edges of the wings are really busy,
you can see the patterns changing from frame to frame.
And even if you look at the ones actually, the
farthest away, it's even more visible over there.
And I turn on multisampling, it becomes
significantly better than what it was before.
To have no effect whatsoever you
have to have infinite resolution,
but this one is significantly better
than what we had before.
And if I go back and forth, it will be more visible to you,
the change from non-multisampled
case to the multisampled case.
As I said, multisampling is very
easy to get into your application.
And you should try, experiment, see what kind
of quality difference it makes and what kind
of performance impact it has on your application.
We're going to talk about a second application,
that is a Discard Framebuffer extension.
Discard Framebuffer, that helps the performance, fill rate performance with multisampling.
And even for it helps with the fill
rate performance in the non-multisampled case.
Usually once a frame is rendered, the depth
and stencil values that are associated
with that frame are not needed
for rendering of the next frame.
In a multisampled case, not only the
depth and stencil but the color attachment
of the multisampled framebuffer object
is not needed to render the next frame.
So by using this extension, you can
tell OpenGL that you don't need it,
and we can discard those values
without writing out to those buffers.
Which means that a significant amount of
memory bandwidth can be preserved and not used
for this operation, the writing-out operation.
If your application is fillrate-bound, is
bound by the amount of memory activity,
then this extension will help significantly to
get better performance from your application.
Here's how it conceptually works.
Here's our framebuffer object,
it's a single sampled example.
Our framebuffer has color and depth attachment.
Right now not filled in.
And the GP has done, has finished rendering
and has generated the color and depth values.
Without using the Discard extension
the next step would be the GP writing
out those values to the color and depth attachments.
So what we don't really need for
most of the cases is the depth values
for the rendering of the next time, the next frame.
So if you use Discard, you can avoid writing out
the depth value, and it will never leave the GPU.
And this way, the write-out section, the amount of memory
used for writing out those values will be available
to the application, to read more
data into your rendering operations.
Let's look at the code example.
This is how we normally render in a
single sample case when you're not using Discard.
You would have your framebuffer bound, and you
would send your report, issue rendering commands,
and finally present your color buffer on the screen.
But with Discard all you need to do is define
what attachments you're going to discard,
and through the DiscardFrameBufferEXT call,
specify which of those attachments to discard.
And my memory's possible OpenGL will
avoid writing out to those buffers.
You can imagine that in a multisample case,
you've created buffers, they are four times larger
than the original single sample case, so there's
significantly more memory activity for writing out color
and depth values of the multisample buffer.
So Discard makes a significant difference
in terms of fillrate performance
of your application, especially in the multisampled case.
But even in the non-multisampled case, you will still
be avoiding writing to the depth and stencil buffers
and it will help your fillrate in your application.
That was the Discard FramebufferFramebuffer.
We have another extension that might help
you, that might help on the performance side
of your application, and that's Vertex Array Objects.
If you have been to the morning session about OpenGL, the
objects in general, GL objects, were discussed in detail,
and Vertex Array Objects was one of them.
What Vertex Array Objects do is they encapsulate
all the data for vertex arrays into a single object.
Things like where the offset for your pointer
is, your vertex array for your position or normal
or for text coordinates, what the size of each
element is within those arrays, what the stride is,
which arrays are enabled, if there's an index buffer or not.
They're all encapsulated into a single object.
So when you switch between different Vertex Array Objects,
once you bind all the different information it becomes
immediately available to the GL to take advantage of.
So how does this help you?
First of all, it provides a great convenience.
Once you log in your assets for an
object, you can log the vertex away
and encapsulate the entire state into
a single object at the load time.
And if you're not dynamically updating your
object ever after, it means that you're not going
to issue these commands ever again, except for
binding it and drawing from the Vertex Array Object.
It also allows us to do optimizations.
Since this state is validated only once, at
the load time, we don't have to rebuild it.
We know, unless you update it, we know that
nothing has changed, nothing has to be revalidated.
So it saves you CP time on state validation.
On top of that, the Vertex Array Object gives us
very good information about the vertex layout,
the layout of your data for your vertices for the arrays.
So we can make use of that to find
out how to reorder them if necessary
to get better performance out of the drawing operations.
Let's look at the code example.
This is when you are using a Vertex
Buffer Object, how you would render it.
You would bind your video, and you would set
your pointers for your position, for your normal,
for your texture coordinates, and you would basically
enable the non-default enables, such as the normal rate,
enable the texture coordinate enables, and you will basically do-- you will do the
draw operation by calling DrawArrays or drawElements calls.
And finally, you have to set the state back so it
doesn't negatively impact the next rendering operation.
This has to be done at every draw call.
You have to specify these things when
you're using VBOs at every draw call.
With VAOs, you only need to specify
which VAO you're going to use,
because all that information is
already encapsulated in the VAO object.
So you know about them, you don't have to re-specify them.
And we use that and then draw our arrays based on that VAO.
This is possible because you've done the work to
define the state only once, at one-time setup time.
You basically bind your VAO, you specify where
the pointers and strides and all these things are,
and you specify which states are enabled, and they are
basically written out, captured in one place at one time,
and can be re-used over and over again
for the subsequent drawing calls.
That was Vertex Array Object.
As you can see, it's very easy to take
advantage of, and it gives you performance boost,
it will hopefully make your code less error-prone
because there will be fewer lines of code,
and it will make your code more readable, because
there are also fewer lines of changes in your code.
There are six more extensions I'd like to talk about.
I'll give information on each one and how
you can use them in your applications.
The first one I'd like to talk about is
the APPLE_texture_max_level extension.
This extension allows the application to
specify the maximum (coarsest) mipmap level
that may be selected for texturing operation.
So it helps us to control the filtering
across atlas boundaries.
As texture atlases contain textures for multiple
different objects, as mipmap levels get smaller,
there is some atlas boundary filtering operation
that takes place that generates visual artifacts
So this extension is implemented to solve that problem.
You can enable this extension by doing a text parameter
call with the GL_TEXTURE_MAX_LEVEL_APPLE call,
and you can specify up to which mipmap level
you want to use for the texturing operation.
Let's visualize this.
This is a texture atlas from the Quest game.
You can see that there are textures in one, single texture
object, textures for walls, for stairs, for statues,
and all of them are here in one single texture atlas.
If you look at the mipmap levels and I have
from 256 by 256 to 1 by 1 in this picture,
this is the visualization of the
mipmap levels in terms of pixels.
But let's look at the mipmap levels
in terms of the coordinates.
If you were to use 0 to 1 coordinates
for the entire texture atlas,
the 0 to 1 coordinate on this mipmap
levels will look like this.
At the very top level, it will
be exactly correct and perfect.
It will have all the necessary pixels in it.
This is a 256 by 256.
But at the lowest level, 1 by 1 pixel, 0 to 1 coordinate
scan is only one pixel, and therefore it has only one color.
So if you can imagine that you have the stairs, the walls
and the statues, close up, they will use the first one,
the 256 by 256 one, but farther away, they will use this
one, or something small, something closer in mipmap level
to this one, and they will all look the same color.
So texture_max_level extension
avoids this problem by allowing you
to specify the mipmap levels you
care about and only use those.
So in this example, you can say the max level is 3,
and the texturing hardware will only use the first,
second and third level of mipmap levels for
texturing from the particular texture object.
Let's look at another extension that
modifies the texturing behavior,
that is Apple_shader_texture level of detail extension.
It gives you explicit control over setting the
level of detail for your texturing operations.
So if you'd like to have an object look
sharper, then you can specify a mipmap level
that is finer than the hardware would choose.
You can specify a lower level mipmap with higher precision.
Or if you want to have better fluid performance at the
expense of having fuzzy detection applied to your objects,
you can choose a lower mipmap level, a
coarser mipmap level, through this extension.
So all you need to do is in your
shader, enable the extension
and control the mipmap level through the texture_lod.
There are further APIs in this extension
for controlling the gradients and such.
I'd like you to go through and read the extension
to find out what more you can do with this extension.
And this is a shader-based extension,
it's only available on PowerVR SGX based devices that have PowerVR SGX GPUs on them. Okay.
So we also added the depth texture extension to our
system, so that you can capture the depth information
into the texture and use it for things like
shadow mapping or depth of field effect.
In shadow mapping, you would render your
scene from the perspective of light,
and then you will capture the depth information
from the perspective of light, into a texture.
You will do it for every light.
And then once you're rendering from the perspective
of the camera, you can basically calculate
if that particular pixel sees any of the lights, and if it
sees all of the light, it will be eliminated by all of them.
If it sees some of them, it will
be eliminated by some of them.
If it sees none of them, it will be entirely in a shadow.
So you can use that texture extension to do shadow mapping.
But since you're rendering the scene multiple times, there's
a lot of fillrate this consumes, so you need to be careful
about performance implications of using this technique.
Another example of using that textures
is depth of field effect.
I will show a small demo of that,
so I will talk to that in the demo.
One thing I need to remind you that when you are using a
texture as a depth attachment to your framebuffer objects
to capture the depth information of the
scene, you can only use the NEAREST method.
The reason for that is that doing filtering across different
depth values, just you know, it's incorrect values.
Something close to the river and something
far away in the river, when they're filtered,
the depth values filter generates something in between,
though there's no object in between these two depth values.
Okay. So how can you get that texture
extension into your application.
You just generate your texture as usual, and when you
create your framebuffer object, you attach this texture
to the depth attachment of your framebuffer object.
And in subsequent rendering operations, the depth
information will be captured in this texture,
and then you can use it for texturing later to do whatever
post-processing effects or whatever effects you want to do.
And let's look at the demo, how we can generate a
depth of field effect with the depth_texture extension.
So here again, we are using three
planes, and I'd like to point
out that this is how you would normally
render it without the depth of field effect.
Everything is sharp, in focus, the plane in
the back, the stars, the plane in the front.
So here's the depth, visualization of depth information.
This is the depth texture captured by rendering
to texture and then displaying the texture.
So things that are black are closer to the screen,
things that are white are farther away .And we
re-render the same thing in a blurred version,
at a low resolution and the blurring introduced.
So human eye, when it focus onto something, there's a
range of objects that are in that range that are sharp,
but the things that are closer or things that are farther
away from that focus point on the range are blurrier.
So this texture, this will capture that blurrier part,
and the original scene will capture the sharper part.
You can see that they are quite different.
So the operation between the two is
basically generating a mixture of these two.
So here, I'm visualizing which texture I'm going to
use for my finally rendering depth of field effect.
If the focus is by the near plane, and
this entire range, it will be black,
so it's entirely from the darker, from the sharper image.
And as I get the range closer, smaller, it will
basically start using the values from the blurred image.
So that when I have the range at 0, it means
that everything is blurred, nothing is in focus,
and everything will be used from the blurred image.
So let's look at it visually.
So if I have full range, I end up with the original scene.
Original rendering without the depth of field effect.
Then if I set the range to shorter, you can
see that the stars have become blurrier,
and the two planes in the back are blurry.
Now the entire thing is blurry.
I'll move the range out and it gets
things into focus and it becomes sharper.
Or I can use a short range and move my focal point away and
the things in the front will become blurrier and the stars
and the third plane in the back will become sharper.
As you can see, the things -- the
planes in the front are blurrier.
But if I move my range to capture the entire --
near and far planes, again everything is sharper,
because we have enough range everywhere
to have focal point covering everything.
So that's one of the examples of how
you can use depth_texture extension.
Another shadowing technique that's
very popular is stencil shadow volume.
This extension, the stencil_wrap extension,
helps us to improve performance for that.
With this extension, the value of
the stencil buffer will wrap around,
then the array goes in and out of the shadow volume.
Now, stencil shadow volumes is a large topic, and
it's a very nice way of creating real-time shadows.
We're spending a significant amount of time in the
next session over here in the shaders OpenGLS shading
and advanced rendering session, on
how to generate them, how to use them.
There's really cool demos on visualization
of this technique and implementation of it.
We added two new data types for
texturing operations, float textures,
you can specify your textures to
contain 16 bit or 32 bit floats.
So again, this requires hardware support that is
only available on devices that have PowerVR SGX GPUs.
The float values that you can store in your textures
can be used for visualizing high dynamic range images,
for you can use it for tone mapping
and display high dynamic range images
in all its glory on iOS 4 based devices.
It also can be used for general-purpose
GP operations, for the GPU math.
You can load your data in the float textures and do
FFTs and other signal processing or other applications,
whatever math you want to use with this extension.
And the way you specify a texture to be
float values is basically telling to OpenGL
that its format is GL half float OES or GL float OES.
This is the last extension I'm going to talk about today.
It's the APPLE_rgb_422 extension.
And this extension enables getting video textures onto GPU.
Specifically, interleaved 422 YCbCr type of video.
In this extension, we do not specify the color
format of the video, so it could be based on,
it could be captured from a 601,
standard definition video format,
or it could be coming from HD709 based color
format, or it could be JPEG full-range video.
And, so therefore, since we do not specify it,
we give you the freedom and flexibility
to implement your color space convergent
to your-- in your shader.
With this extension, when you specify YCbYCr, you copy the
values over from Y to the G channel, copy the values from CR
to R channel, and Cb values to the B channel,
and once you do your color space conversion,
you end up with the RGB values that are
coming from originally a video texture.
Then again this extension relies on hardware support,
so it's only available on the devices that have
PowerVR SGX.
So, let's look at how you can use this extension.
This is how you would do texturing operations,
something from a texture in non-RGB 422 case,
you will specify your type of image and then you will
create a sample and texture and sample from that.
With the 422 extension, you need to specify
the format of your texture as RGB_422_APPLE,
and you need to specify the type of it as unsigned short,
either 8 and 8 rev or forward or reverse ordering of Cb and Ys,
and finally your shader, you need to
convert from YCbCr values to the RGB values,
and then you can do whatever effect you want to do, do black and white or just attach your texture
to another object that you want
to do -- you want to use it on.
So these are these other six extensions.
The texture_max_level, the shaded texture level of detail,
depth texturing, stencil wrap, flow texturing and RGB 422,
and with that, I'd like to invite
Richard to talk about retina display,
the impact of high-resolution displays
on performance, and multitasking.
>> Thank you.
So thank you.
So, Gokhan has just given us a description of all
the new features you'll find within OpenGL in iOS 4,
so I'm going to continue the what's new topic, but really
going to focus on what's new in the rest of the platform
around you that impacts you as
an OpenGL application developer.
First and foremost among these is the new
Retina Display you'll find on iPhone 4.
So, you've undoubtedly seen the demo.
The Retina Display gives, is a 640x960 pixel display.
That's in effect four times larger
than we've seen on any previous iPhone.
One of the really big points I want to drive home about
the Retina display is that we're not cramming a whole bunch
of content into the upper left-hand corner.
All of the various views and other widgets remain
physically exactly the same size on the display.
The status bar, the URL bar, are all exactly the same size.
What's changed is the amount of detail
you find within any specific view.
This is equivalently true to the UIKit
content as it is to OpenGL content.
So how do you actually adopt -- really
make the best use of the Retina display?
For OpenGL applications, it requires
a little bit of adoption.
It's not something you get out of the box.
The steps are pretty simple.
Right off the bat, we want to render more pixels.
We need to allocate a larger image.
The second step is, now that you're rendering to a
different size image, we've found that a large number
of applications have, for their own convenience,
hard-coded various pixel dimensions in their applications.
That's something that we'll need to flush out.
And finally, this is where it really gets interesting,
taking advantage of the new display
to load great new artwork.
So, step 1.
Generating high resolution content is
actually done on a view by view basis,
and this is controlled with a new
UIKit API called Content Scale Factor.
So, you can figure your view with the same bounds that
you would always have, in a 320 by 480 coordinate space.
What changes is that you set the content scale factor
to, say, 1 or 2, and that will in turn affect the number
of pixels that are allocated to
back the image behind that view.
For UIKit content, this is generally set on your behalf
to whatever is appropriate for the current device.
Right out of the box, all of your buttons and your text
fields are going to be as sharp as they possibly can be.
But that is not true for OpenGL views.
For OpenGL views, the default value
for content scale factor remains at 1,
and you have to explicitly opt in by setting that otherwise.
Usually, the straightforward thing to do is to query
the scale of the screen that you're running on,
and then set that to the content scale factor .On
an iPhone 3GS, this will be 1, and nothing changes.
On an iPhone 4, this will be 2, and you're effectively
doubling the width and height of your render buffer.
At the time you call render buffer storage, core animation
is going to snapshot both the bounds of your view
and the scale factor, and it will do that to arrive at
the actual width of the image you'll be rendering into.
Knowing what that width and height is
is usually pretty convenient to have,
so you can derive that by doing your own bounds times scale,
or , even easier and more foolproof, is to just go ahead
and ask OpenGL what the allocated width and height are.
So, I just want to -- this is actually a pretty good
idea to just ask OpenGL and take these two values
and stash them away somewhere on the side.
They're really useful to have.
That brings us to step 2, and that is, fixing any place
where you have any hard-coded dimensions
that may no longer be valid.
If you've -- if your application is already
universal, it runs on both iPhone and iPad,
you've probably already done this, and you can move on.
If you haven't done that, you may find that you have
a few of these cases, and I want to point out a couple
of the most common cases you'll find in your application.
First is that while core animation has chosen the size
of your color buffer, the depth buffer is something
that you allocate, and the sizes of
those two resources has to match.
And so, and this is a case where we'll want to
use that saved pixel width and pixel height,
and pass it right on through to render buffer storage.
If you don't do this, you'll find yourself with an
incomplete framebuffer and no drawing will happen.
Another common case is GL Viewport.
Viewport is a function which chooses which subregion of
the view you're rendering into at any given point in time,
and every single application has to set it at least once.
You'll find it somewhere in your source code.
Most applications really don't ever use anything other
than a full screen viewport, so this is another case
where you'll just want to pass pixel
width and pixel height right on through.
Step three is actually where it gets really interesting.
At this point, your application is a basic
correctly adopter of the Retina display.
You've now got much greater detail on your polygon
edges, but there's still more room to improve things
and really take advantage of this display.
And so, you know, this is the right place to, for example,
load higher resolution textures and other artwork.
Again, if your application is universal,
you may already have a library of assets
that are perfectly relevant, that
you can use right away on this.
Usually, the easiest way to do this is take your
existing bitmap textures and just add a new base level,
and leave all the existing artwork in place.
This can really significantly improve
the visual quality of your application.
Just one word of caution here, is that you can do this
on any iPhone OS device, but it's going to be a waste
on the devices that don't have large displays,
and so you actually really want to be selective
about which devices you choose to
load the largest level of detail on.
Otherwise you're just burning memory.
One other word of warning is using UIImage to load textures.
UIImage has a size property which refers to dimensions, the
dimensions of that image, but those dimensions are measured
in device-independent points, not pixels.
So if you have a higher resolution image that's
256 by 256 pixels, the size might only be,
the size in points might only be 128 by 128, so you can't
just take those values and pipe them into GL Tech Image 2D.
So, this is another one of those cases where you'll have to
do your own size by scale, or you can just drop down a level
to CGImageGetWidth, GetHeight, which will give
you the image dimensions straight out in pixels.
If you get caught up by this, you'll probably
see some really really strange effects.
That's really about all there is to say
about making the most of the Retina display.
If -- tomorrow there's going to be a session
that talks about the UIKit changes in detail,
which is where you'll hear all about how UIKits
measurements in points, where OpenGL is a pixel-based API.
So UIKit can do almost everything
for you with no application changes,
where you do need some changes for OpenGL.
But really ,when you get right down
to it, the one line of code change
that really matters is setting content scale factor.
There's one more really interesting topic about the Retina
display, is that you're drawing four times as many pixels.
That can have some pretty significant
performance implications.
This is equivalently true if your application runs on
iPad, or even if your application uses an external display.
TVs can be quite large as well.
So I want to talk a little bit about this too.
So really, the first thing to do here is to roll up
your sleeves and start working through he standard set
of fillrate optimizations and investigations.
You have to think about how many pixels is your
application drawing, in this case, X and Y got a lot bigger.
You also have a lot of control
over how expensive each pixel is.
Properties of mipmaps can significantly
improve GPU efficiency.
You are in direct control over the complexity of your
fragment shaders, operations like Alpha Test and Discard,
also the costs of those add up pretty
quickly with screen size as well.
I'm really going to stop here, and really not
get into the details of performance optimization,
because that's a gigantic subject,
and we're going spend a whole session
on that this afternoon, in OpenGL
ES Tuning and Optimization.
That being said, in our experience, there are a
lot of interesting applications that do have room
for performance applications, and do end up being
satisfied with the performance they get on these devices,
even when running on higher resolutions,
both iPhone 4 and iPad.
But that's not universally true.
There are -- some developers are really aggressive.
They're already using everything
these devices have to offer.
And for these particularly complex applications,
you may find that you've used up all the --
you've optimized everything there is to optimize.
And so there's one more big tool in our
toolbox, and we're actually going to go back
to how many pixels are you actually drawing.
So you don't necessarily need to render
at the size that matches the display.
For example, on iPhone 4, has a 640 by 960 screen.
If you could instead render 720 by 480, that's still a
significant step up in quality when compared to a 3GS,
and on the other hand, you're only
filling half as many pixels
as you would be had you gone all
the way up to match the display.
You're going to find very few other opportunities out there
to find a 2x performance jump in a single line of code.
So if this becomes an option that you
want to pursue, how do you do this?
You could just throw some black bars
around the sides, but not really.
What you really want to do is we want to
scale that, take that lower resolution image
and actually scale it to fill the whole display.
Okay? How do you do that?
Well, you actually don't need to do that at all.
This is something that Core Animation will do for you.
In fact, this is something that Core Animation
will do for you really well and really efficiently.
Much more so than you could do yourself.
In this case, you know, actually, move on to the next slide.
That is, in the end, a really nice tradeoff
between performance and visual quality.
So actually, how do you really make use of this?
Well, the answer is that you literally have to do nothing.
This is how your applications already
work right out of the box.
The API that controls this is, again, Content Scale Factor.
As I said, for compatibility reasons, today your
applications render into a view with a Content Scale Factor
of 1, which means that Core Animation is already taking your
content and scaling it to fit the display, very efficiently.
So right out of the gate, your
application already performs as well
as it always has, and looks as good as it always has.
That's a pretty decent place to start, if you
can change your application such that you can run
at the native resolution of these
devices, well, that's pretty good too.
But we see that there's going to be a fairly large class
of applications that do have performance headroom to, say,
step it up a little bit, but you can't take
a four times jump in the number of pixels.
And so there's some really interesting
middle grounds to think about here.
One of these is to stick with a scale
factor of 1, but adopt anti-aliasing.
We just saw Gokhan discuss our edition of the
Apple Framebuffer Multisampling Extension.
This can also do a really good
job of smoothing polygon edges.
In our experience, many applications that adopt
multisampling end up looking almost as good
as if you were running at native resolution,
but the performance impact can be much,
much less severe than increasing
the number of pixels four times.
So this is a very compelling tradeoff to think about.
Another interesting option is that you don't
necessarily have to pick integer content scale factor.
You can pick something between 1 and 2.
In the example I started from, 720 by 480, to get that
effect, you can set a content scale factor of 1.5,
and that's actually all you really have to do.
I want to half change the subject now.
I want to talk about iPad.
iPad has an even larger display, and so the motives for
wanting to do application optimization are just as true,
and for some applications, the motives for wanting to
render to a smaller render buffer are just as true.
There's just one unfortunate catch, and that is
that our very convenient new API is new to iOS 4.
It's just not there for you to use in iPhone OS 3.2.
Fortunately, there is another way, and that
is using the UIView Transform property,
which has been there since iPhone SDK first shipped.
So, I'm going to put up a snippet of sample code here.
So think back to the beginning of the
presentation, when I said that the size
of your render buffer was your bounds times your scale.
On iPad, the scale is implicitly 1.
There's no API, so we'll call it implicitly 1.
In which case, if we want to render an 800 by 600
image on iPad, you can set the bounds to 800 by 600,
and then you can set on the Transform property,
you can set a scaling transform that will take
that and scale it up to fill the display.
Performance-wise, these two methods are --
actually, both performance and quality-wise,
these two methods are pretty much equivalent.
The advantage of this method is that
you can start using it on iPhone OS 3.2,
whereas ContentScaleFactor for iPhone iOS
4 and later is just more convenient.
That's what I have to say about large display performance.
We're going to talk a lot about this kind of performance
investigation in detail later this afternoon in the Tuning
and Optimization session, and if you just run out of steam
there, then you've got some really fine-grained control
over what resolution you actually render at, which can
significantly reduce the number of pixels you have to fill.
That brings us to the last topic of
the day, and that's multitasking.
So, I want to start this by providing an example.
Say your product is a game, and the user is
playing, and they receive a text message.
They leave your game to go write a response.
Sure, dinner sounds great.
They come right back to your game.
They probably spent 10 seconds outside of it.
So what do they see when they return?
They see that they're going to get to wait.
That's not a very good user experience.
In fact, they end up waiting for longer than they
spent outside of your application in the first place,
that's really not a good user experience.
And so, this is what we're talking about
when we talk about fast app switching.
If you went to the session, you heard
that there are a bunch of other scenarios.
There's voice over IP, location
tracking, let's see, location tracking,
there's finish tasks, there's audio tasks.
All of those are about various modes
of doing work while in the background.
Fast App Switching is different.
The Fast App Switching scenario is an application
that does absolutely nothing in the background.
It's completely silent, completely idle.
It's there simply to lie in wait, so that it can leap back
into action the instant that the user relaunches that app.
And in fact, for OpenGL, that is the only mode.
GPU access is exclusively for the foreground
application to ensure responsiveness.
So while you can use some of these other scenarios, you can
create a finish task to do CPU processing in the background,
you do not have access to the GP one in the background.
There's one really important point I want to make,
is that if you think back to that progress bar,
a lot of what that application was probably
doing was loading things like OpenGL textures.
That tends to be pretty time consuming.
So one thing you don't have to do is
de-allocate all of your OpenGL resources.
You can leave all of your textures and all of
your buffer objects and everything else in place.
You just have to go hands off and not touch
them for awhile, but they can stay there.
This means that when a user does bring your application
back to the foreground, all o those really expensive
to load resources are already there, ready to go right away.
Keeping all that stuff in the background
does have some implications on memory usage,
and that leads to a really interesting
tradeoff that you should think about carefully.
It is generally a really good idea to reduce your
application's memory usage when you run in the background.
For example, the system memory
as a whole is a shared resource.
If the application in the foreground needs more
memory, the system will go find applications
in the background to terminate to make room for it.
That list is ordered by who's using the most memory.
Guess who's going to be on top?
So, you have a really compelling,
even perfectly selfish reason,
to want to reduce your memory use as much as possible.
Because that means that your process is probably going to be
more likely to be there to come back to in the first place.
On the other hand, if you're making the resume
operation slow by spending a whole bunch
of time loading resources, we've
kind of defeated the purpose.
There's really a balancing act to be made here.
So the way you should think about it is look at
your application on a resource-by-resource basis,
and think about what you really need to
pick up right where the user left off.
And also think about how expensive is this to re-create.
If you've got your standard textures, those
are probably pretty expensive to re-create.
You want to keep those around.
On the other hand, there are some
that are really cheap to re-create.
Think about your color and depth buffers.
There's no drawing in the background, so they're
really just sitting there, not doing anything.
Re-creating them is not like a texture,
where you have to go to the file system
and load data and decompress it and so on.
Reallocating your color and depth buffers
is just conjuring up empty memory.
It's really fast.
Also think about cases where you actually have idle
resources that aren't actually needed for the current scene.
You know, if you've got a bunch of GL textures that
are around because you needed them in the past,
and you're keeping them around pre-emptively because you
might need them in the future, this is a really good time
to clear out all the idle textures in that
cache and leave all the active ones in place.
A little bit more about the mechanics of it.
How do you actually enter the background and come back?
Your application is going to receive
a data enter background notification.
When this happens, we have to stop our usage of the GPU.
Specifically, your access to the GPU ends
as soon as this notification returns.
So you have to be done before you return from this function.
The second is that you want to save application state.
In this case, if you're writing a painting application
with OpenGL, that might involve using read pixels
to pull back what the user has painted
and save it to the file system.
And this is really important, because
your application may be terminated
in the background to make memory, to free up memory.
And then finally, here's our example of
releasing memory for this application.
You know, we say we're going to go release our framebuffers,
because we can re-create them really
fast, without slowing the user down.
On the other side, when your application
wants to enter the foreground,
you'll receive an applicationWillEnterForeground notification.
We'll spend a tiny fraction of a second allocating
our framebuffer, and then we're ready to go.
This is exactly where we want to be.
So that's great.
Except that there's one other case here that
you really have to think about very carefully.
And this is-- and this is, you might receive
applicationDidFinishLaunching instead.
If your process was terminated while in the background,
then you have lost all of your GL
resources, as well as everything else.
And you now have to reload them,
and that's going to take time.
The more interesting part here
is restoring that saved state,
restoring that state that we saved
when entering the background.
Because when the user enters -- the user doesn't know
when an application is terminated
in the background to free up memory.
It's a completely invisible implementation detail to them.
So to you, it's effectively unpredictable which one of these
paths your application will take, to reenter the foreground.
And so, if in one of these cases you put them
right back in their game, and everything's golden,
whereas in the other case, you make them page through a
parade of logos and select a menu and click Load Game,
the user is effectively going to see random behavior here.
They won't know which case to expect when they
press on your icon, and that's fairly disconcerting.
And so it's really critical here that
regardless of which path your application takes,
you put them back in exactly the same
place, say, reloading their game.
Ideally, the user just can't tell the
difference between these two cases.
Practically speaking, there will
be a performance difference.
The whole point of Fast App Switching is to keep
those resources around, and if they're not around,
you're going to have to spend time to load them.
But for the best-behaved applications, the
application's performance will be the only difference
in behavior the user can see.
That's Fast App Switching.
You want to free up as much memory as possible, but not
to the extent that you're going to slow down Resume.
And then you also need to think about doing a really
good job of saving and restoring your GL state.
Which could include actually the contents of your
OpenGL resources, if you're modifying those on the fly.
So that actually brings us to the
end of today's presentation.
To just give you a quick recap of where we've
been, we've talked about some new OpenGL extensions
to improve the visual quality of your application:
multisample, float texture and depth texture.
We have new features to improve the performance of your
application: Vertex Array Objects and Discard Framebuffer.
Discard and Multisample go together particularly well.
We talked about how to adopt the Retina display.
Ideally, that's one line of code.
We talked about resolution selection for large displays.
This is really your big hammer to solve
fillrate issues if you have no other option.
And finally we talked about multitasking, where the key
is always think about the phrase Fast App Switching.
We have a number of related sessions.
Coming up later this afternoon is OpenGL ES Tuning and
Optimization, where we'll go through the process of how
to actually look at fillrate performance in your
application, as well as introduce a new developer tool
that can really help you understand
what your application is really doing.
Shading and Advanced Rendering is more of
an applied session, where we're going to go
through some really classic graphics
algorithms and then talk
about how we practically applied
them to the graphics in Quest.
OpenGL Essential Design Practices happened earlier today,
and this is a really great talk which goes into the subject
of general OpenGL design practices, where you want to use
this kind of object, learning about modern API changes.
This kind of stuff is equally applicable
to both desktop and embedded OpenGL.
And then finally, there's a couple sessions that
talk about multitasking and the Retina display,
as they apply to the whole platform, not just OpenGL.
You can contact Alan Schaffer directly, he's
our Game and Graphics Technologies Evangelist,
and we also have a great collection of written
documentation in the OpenGL ES programming guide for iPhone.
So, with that, I hope this talk was useful to
you today, and I hope to see you at the labs.
Thank you.