WWDC2001 Session 408
Transcript
Kind: captions
Language: en
good morning everyone to present session
408 OpenGL advanced optimizations that
Elektra introduced OpenGL manager John
Stauffer hi so today we're gonna talk
about advanced optimizations in OpenGL
so hopefully we'll learn a few things
about how OpenGL works and things you
can do to try to tune your applications
so so what we'll learn is the key
components that you need to look at when
you're trying to tune your application
for higher frame rates the thing that I
always like to start talking about is
application component and the reason for
that is that about 75% of the time is
spent in the application in a comedy
OpenGL app and so therefore since 75%
the time is spent there that's where you
have the potential for getting the most
benefit so if you don't tune your
application obviously you're not going
to get a lot out of OpenGL because
you'll spend too much time in navigation
so we'll spend some time just talking
about techniques for tuning your
application to drive up in jail better
and some hints on and tips on how to do
that the second thing is setup so how to
properly setup OpenGL how to get some
machine information how to properly
configure and scale your application
such that it will run well on the
machine that you're targeting the third
thing is state management so state
management basically is where a large
percent of the time that is spent in the
actual OpenGL time frame is spent in
state management so state management
actually is more important than a lot of
people think if you do a lot of
thrashing of state in OpenGL you can
actually decrease your performance quite
a bit texture management texture
management is important to keep your
application correct correctly scaled for
the hardware so that you're not paging a
lot you're not spending a lot
time running out of video memory and
paging on and off vertex operations so
vertex operations are important
obviously to be able to get a lot of
data to the card have an optimal format
for sending the data keeping the data
flow moving quickly to the card and
per-fragment operations so per fragment
operations are their operations that the
card itself is going to do so it's not
CP related but it's car.the it's what
the graphics card is going to have to do
to generate your final image and there's
some trip there's some tips there to
offload some of the work the graphics
card is going to need to do extensions
so a lot of times there's extensions
that you can utilize that are either
directly geared towards optimizing your
application or will help you get the
animation effect you're looking for with
a simpler path so you won't have to do
all you can simplify your CPU work by
utilizing an extension multi CPU or
multi thread utilization obviously if
there's a machine that has two CPUs it's
an ideal situation to spawn another
thread maybe move your graphics off to
that other CPU and lastly what we'll
talk about briefly is where to look for
more information so starting off here
just to get the the image of what OpenGL
looks like and how data goes to OpenGL
it's important to think about in jail is
a data stream so OpenGL fundamentally is
a data stream going to the card and how
the data is organized in that stream is
very important because it will give you
hesitations if you have too many
operations of one type or if you're
flushing and breaking that stream and
causing discontinuities so the
fundamental type of data that goes to up
Jill is vertices which is your 3d data
in state so you can fundamentally this
is a simplistic view but you can
fundamentally break it down and those
two types of our data sets that go to
the card and how that data again gets
organized and sent to the card can make
a big difference
so application so the thing to remember
when you're looking at writing an OpenGL
application is first you have to decide
obviously what type of performance
you're looking for and to do that you
need to obviously decide what type of
user interaction there's going to be you
know whether you need high frame rates
because the user needs a fast response
time on the graphical feedback which may
mean you need 30 50 60 frames a second
to get the proper feel quality display
so you'll need to decide what kind of
quality display in your application
you're going to need and obviously those
two things can be related so adjusting
the right quality with the frame rate is
going to give your user the best
experience so it's important to keep
those in mind your target platform so
deciding what your ideal platform is
going to be and what you're going to run
best on is going to be important so that
you can potentially scale your
application to run well on those those
target platforms and the things to
remember about the target platform are
video memory size how much system memory
you're going to be needing for the
application and potentially what
graphics cards in the system so that you
can have the animation effects that
you're looking for so the thing that a
lot of applications provide obviously is
a mechanism for users to to adjust the
quality settings within the application
and this is usually important such that
an application or user can himself or
herself select the trade-off they want
between performance and quality such
that they can have some influence on
their preferences as to how fast the
application will run or what the quality
will look like so the first thing we do
and we do this a lot at Apple's we'll
take an application to try to analyze
where the time is being spent we'll take
the app of the application and we will
run it with a null layer of OpenGL we'll
try to figure out how fast OpenGL
with you if OpenGL was infinitely fast
how fast will an application run and
this gives us an upper bounds and this
helps us understand what the application
itself is doing and what the application
what profiling may be needed to be
further done in the application to tune
it so two ways - depending on your
programming environment and it just
reminder actually all the code that I'll
be showing today is Mac OS 10 cocoa
based I'm going to since we have limited
slide space I'm going to stick to those
function calls so to know about the
OpenGL layer there's a couple ways you
can do it your application very easily
for the CGI layer if you're programming
straight to the the core OpenGL layer
you can simply set your OpenGL context
to null and what that does is that
actually internal to opengl that will
have opens you'll set all the entry
points to a no op so they will do
nothing and if you're at the applicant
layer then you can use an applicator
call just to clear the current context
and it's equivalent to setting it to
null and again that will just set all
the entry points and dull to no ops and
so what you want to do once you've done
that is you want to measure the time
that's spent in your application to get
a feel for what level of performance
your applications had and here we see a
little code snippet using get time of
day to just quickly calculate time spent
in application so once we've done that
we can calculate an open loop OpenGL no
op frames per second that your
application is capable of so obviously
if you're once you've gotten to this
point you realize that you've no op open
G allowed it's a it's infinitely fast
and you're not achieving the frame rates
that that you would like to be at you
can immediately start you know thinking
about going into your application and
tuning your application what we do is
you can do a quick calculation assuming
an average application spends about 25%
of the time in OpenGL you can take that
open loop frame rates per second and
just multiply by 0.75 so lower that
frame rate down and get an estimate of
what you're going to run
what your performance is going to be
once you enable open Jill and if this
estimated frames per second isn't where
you want to be again you're gonna have
to start looking at either OpenGL or
you're gonna have to start looking at at
your application so to start tuning your
application on OS 10 there's a variety
of tools to do this one tool that's very
useful is called sampler for anybody
that hasn't used sampler it's a a tool
that will thread it's a threaded tool
that will go out and look where your
call stack is at any given time and it
will generate a sample and heuristic of
where the time is being spent in the
application so this tool actually is
very useful it works for CFM apps and
Mach Oh apps and it's part of the
developer install so it's on your disk
at developer applications sampler and
it's a very useful tool
we suggest everybody become familiar
with how to use it and it will show you
where all the hot spots are and you're
in your application code it'll even show
you where the hot spots are in the app
in the operating system itself but you
may want to run this with without OpenGL
and just run your application open loop
and just stress your application and
find out where the hot spots are okay so
that's enough talking about the
application so setting up Open GL the
first thing you need to do obviously is
to go out and query for devices and find
what's what you're what devices you have
how many devices and such so I've got a
couple code snippets up here that will
show you in core graphics how to get
your device at your main device and how
to from the main device generate an
OpenGL display mask so the first code
snippet here is just the main device if
you wanted to go through all the devices
you could get all the active display
devices from core graphics loop through
them generate a display mask that
represents all of the all the devices on
the system and really it's going to
depend on whether you're a full screen
or window to application
as to what you're going what the right
thing for you to do is so what we can do
with this information is we can find out
how much video memories in the system on
each graphics card so here we've got a a
code snippet that will query the
renderer so for video memory and it goes
through the loop and it will look at
each device querying it for video memory
size and this is going to be important
because as we start to try to adjust or
tune our application we're going to want
to make sure that the amount of textures
we have the resources we're going to be
consuming on the card are gonna fit in
video memory so we're going to want to
know this usually upfront if we have a
texture intensive application okay so
when we look at the video memory size
there's several things that we may want
to adjust again we may want to adjust
exercises but we may also want to adjust
the screen resolution if we're going to
be switching into a full-screen mode
let's say we're going to have the
opportunity for picking a screen depth
in a screen resolution if you have
determined your application needs more
video memory and it's potentially
available in the current display mode
then you'll want to switch it down to a
16-bit color potentially or you'll want
to switch down the resolution give the
application more breathing room on the
graphics card and that will help with
keeping your application out of a text
or paging mode and give the a higher
frame rate during the running in the
application so the other thing you'll
want to do is to find out what CPU
you're on one thing that we find very
useful is internally the open till
obviously is using altivec and altivec
can give you substantial performance
boosts if you utilize it so finding out
if you're on a g3 or a g4 is very useful
and tuning to that condition can be very
beneficial the other thing to remember
about that is that typically the
difference between g3 and g4 is that g4
systems are going to
faster and you may want to think about
adjusting your data set size to
accommodate faster systems so quickly
talking about state management so state
management again is is the process of
switching wet mud OpenGL is running in
to get your proper configuration for
drawing your graphics the thing to
remember a state changes you want to
minimize those what we have found is
that in a lot of applications the amount
of time that's actually spending OpenGL
a considerable portion of that is
actually in doing state management and
if you unnecessarily change state you
can cause a lot of thrashing down on the
card because OpenGL has to go through a
lot of setup to properly configure the
graphics card for each state change some
state changes are obviously more
expensive than others and we'll go
through a few of those which ones to
avoid but in general you want to group
your data to minimize state changes and
that will have a significant impact on
what performance you can ultimately
achieve so some general calls you want
to avoid geo flush so you want to avoid
geo flush because what it actually does
is if you again think about OpenGL as a
command stream going to the graphics
card geo flush tells the graphics card
terminate the current command stream
send it to the graphics card and start
me a new one so you've just chopped that
command stream and and sent it on its
way and the reason that you don't want
to do this necessarily is because
there's only so many command buffers
that you can have allocated your
application in a given time so if you
sit there and call Gio flush a lot you
will use up the buffers that you have
available to your application and your
application may be starved for available
space to stick you know put data on the
stream so unless you have to
don't call Gio flush and there's
actually very few reasons to ever call
it usually you can find some other way
to do what you're looking to do if if
you want the user
to see something immediately usually you
just call swap buffers to get the data
swap of the screen and swap buffers
actually calls is implicitly calls a
flush so when you call swap it
terminates a stream sends it to the card
and so you don't necessarily have to
call geo flush yourself another call
that's even more expensive is geo finish
so geo finish is like a GL flush except
for it sends the data to the card and it
actually both block they're waiting for
the graphics card to finish its drawing
so once all the commands have gone to
the graphics card finish to come back
and return then GL finish will actually
return to your application so an
important performance thing to keep in
mind is that geo finish is it's very
expensive it can be a blocking call that
can take quite a while to return so you
want to avoid reading data back from
OpenGL and when it comes to state
management typically what you want to do
is you want to keep the data in the
application that you will need later and
not ask OpenGL for back depending on the
driver and what you're reading back they
can get very expensive reading back data
can actually be the same cost of calling
a GL finish because if you're reading
pixels back for instance the pixels
actually had to be represent the current
state that you are expecting and that is
you've drawn all these command you
issued always drawing commands you're
expecting the pixels to be in the in the
buffer well so opens you'll realizes
this and it when you try to read some
pixels back it's going to have to call
finish wait till all the commands are
finished wait till it's drawn everything
before it and give you the valid pixels
back so so you don't want to read the
frame buffer unless you have to you
don't want to be reading state some
state could be expensive to read and you
don't want to read textures unless you
have to they all can have varying
penalties depending on what mode you're
running in so what you also want to
think up think about when you're writing
up gels avoid complex state settings if
you don't know what a state setting does
usually it's a bad idea to just
arbitrarily throw state changes in there
what you want to do is keep the state as
simple as possible because this will
help the graphics card run in its most
optimal mode this will also usually lead
to less state thrashing when you are
trying to transition from one drawing
routine to another you won't have to do
as much state setup and teardown
so it'll lead to less state transition
so keeping it simple
it's obviously a simplistic concept but
it's something to keep in mind so some
basic complex states that you want to
avoid are lighting user clipping planes
and and full scene well anti-aliasing
like anti-alias lines and dailies points
polygons and the reason you want to
avoid those is because they can be very
expensive to do with modern hardware
lighting and user clipping planes and
even anti aliased lines are pretty fast
so again it may depend on the particular
graphics card you're running on but in
general lighting is very computationally
expensive and unless you have a real
need for it you'll want to keep that
disabled even on the high power graphics
cards today if you start an able
lighting you will you will cause a
graphics card to do more processing and
you will ultimately lower the
performance now whether you actually see
that will depend on how fast you know
what kind of demands your applications
putting on the graphics card but those
are very complex operations for the
graphics card to perform so texture
management so this is a very important
topic because a lot of games nowadays or
applications in general using a lot of
textures and how to properly manage
those can be making a big difference in
the applications performance so several
things to remember avoid uploading the
texture more than once ideally what you
want to do is you want to give up until
the texture and not keep handing it to
OpenGL not don't delete it and then give
it to back to it later if at all
possible and instead let OpenGL do the
management the bookkeeping of whether
the texture should be in video memory or
not avoid so again
avoid keeping a copy and that will save
avoiding keeping a copy in the
application will save your safe system
memory the thing to remember here is
that OpenGL will keep a copy and so
you're gonna have two copies if you keep
one in the application and one in open
and one is gonna be kept in OpenGL
you're gonna have two times that texture
that texture size so it's best if you
delete yours if possible so ways to get
data textures data into the graphics
card or into the driver fast there is an
extension called Apple Apple packed
pixel so this is the fastest way to get
pixel data into OpenGL and it's a very
flexible format it'll support all the
standard OpenGL pixel types by Apple
pixel type so it'll also support a
number of rather odd types that may be
useful for you
you know like five six five or three
three two depending on what your quality
requirement is or whether you don't need
a high R it don't need a deep bit depth
per component you can get away with some
of the smaller bites bits per pixel
components minimize the how often you
change your current texture so changing
your current texture is actually one of
the most expensive operations you can do
and what that means is that changing
your current texture is a GL bind call
and when you when you bind from one
touch to another you're you're basically
just causing OpenGL to potentially
reconfigure all of its texture combiners
in the hardware for the new texture
because the new texture it's going to
require different blending modes and it
can be fairly expensive to do that setup
so typically what you want to do in the
application if you have a lot of data is
you want to group your data in groups
comp with common texture types so that's
the the best way to group the data such
that you minimize your texture changes
so scale textures to your Hardware size
so again earlier we looked at finding
the vram size
so what you want to do is you can do
some basic rudimentary math in your
application and just fundamentally try
to scale your your application to fit on
the graphics card so if you have a lot
of textures you'll need to calculate how
many you're going to need necessarily on
the graphics card anytime it's not it's
not terribly important to get it exact
but you would like to keep it within a
reasonable bounds OpenGL is very
efficient that paging so what you'll not
want to do is is try to keep OpenGL
always out of a paging mode you don't
want to try to second-guess the exact
size of the video memory available and
wearing exactly OpenGL is going to go
into paging mode because if you do that
you're not gonna let OpenGL grow and
utilize some of the mechanisms
internally to the driver that will try
to optimally page textures on and off so
up until uses the internally for paging
textures on and off it's called LR um ru
algorithm that stands for at least
recently used most recently used so
depending on how committed you are how
many textures are committed per scene
whether you're over committed that scene
it will actually switch to different
mechanisms for paging textures on and
off trying to optimally keep the right
set on the graphics card and not unduly
page off ones that are going to be
needed again so that that algorithm
actually works pretty well also
particular to us 10 is we've built the
mechanism that causes almost no CPU work
to page a texture so once the texture is
in OpenGL and had to get paged off it
back in the system memory let's say it
costs very little CPU work to get it
back into the in the stream and back
uploaded on to OpenGL so while it will
cost a little bit of memory bandwidth
while it's getting read and it's going
to cause some AGP traffic the CPU cycles
spent or gonna be pretty minimal so we
find that letting OpenGL do the paging
isn't expensive for the CPU to CPU can
keep on going and as long as you're not
causing too much bandwidth across the
AGP you can get away with a fair amount
of paging so
depending on what you're doing you're
going to also want to split your
textures into tiles for and I've got a
demo of this in a bit where if you're
doing wanting to do smooth animations of
some sort
trying to amortize the data stream as it
goes to the card and trying to keep the
drawing moving while large images are
moving up the stream so again if you
look at the whole process of OpenGL is a
big data stream if you have a four
megabyte texture that's a big block of
data in the middle of your stream so you
can envision that under some
circumstances it'd be good to interlace
that upload with some polygon drawing
maybe a frame here and there such you
can amortize the texture upload time
going across the bus and keep animations
flowing so so here's a little diagram
for texture management one thing that we
recommend on OS 10 is to split your
texture loading off to a separate thread
if you're going to be spooling through a
lot of textures it's a good idea to
maybe spawn a thread that will do that
work for you and the reason for that is
that there's couple reasons one is you
can utilize a second CPU and two you can
utilize pre-emptive multitasking to to
balance out the loading the act of maybe
reading a texture from disk the cost of
loading it in OpenGL you can you can use
the pre-emptive capability as well as
tend to spread that cost out so you
don't end up with a a single point in
your open Junt your rendering stream or
your in your CPU cycles that are blocked
trying to get this texture uploaded and
processed so it's a good idea so if we
look at this this is a basic diagram of
how to set up a two threaded or what
happens when you set up a two threaded
application one loading the OpenGL
textures and one doing the drawing so
what happens is the first thread is
loading the textures and those textures
will get processed and put into the
driver into the kernel driver so the
kernel will have them at this point and
they will be sitting in the kernel
waiting to be uploaded to the card so
you'll have done most the work of CPU
cycles
on the other on the primary thread of
getting the data into the kernel and
then you could have your second thread
come along and issue the drawing
commands
and as long as you have your thread
synchronization correctly organized then
your data will be there by the time you
need it and everything will just flow
much smoother so I've got a demo of this
and this demo shows this basic concept
that the diagram had there so what this
demo tries to show is a couple concepts
one is how to balance the requirements
of your application with quality and
smoothness of frame rate so what we have
here on the left is we have a slider
that will adjust the quality of these
images so for instance down here at the
bottom I can get 64 by 64 textures and
up on the top I get 1024 by 1024
and everything in between so what's
interesting to to look at here is if
you're trying to say write a screen
saver for instance and you're trying to
get these images up to the graphics card
while maintaining smooth animation
you'll see that we get a hesitation and
that hesitation is because one we only
have one thread doing the loading and
the animation so we get a large
hesitation while we spool the texture
off disk we decompress that JPEG and we
load it into OpenGL and give it to the
driver so we can see that that this
isn't going to lead to a very nice
screensaver so we start looking for
techniques to smooth that out and one
thing we can do is we can spawn a thread
and we can give the that thread the job
of spooling the texture off of disk and
loading it into a pill so what we see
now is we see that it's a lot smoother
but it's not perfect
so here's where you can start deciding
whether
frame rate and quality are important one
thing you can do is obviously if you're
not needing to achieve those kinds of
rates of uploading and animating you can
slow it down and the hiccups are almost
gone another thing you can do obviously
is if you want to stay relatively fast
animations is you can lower your image
quality so we're still going a little
bit too fast to get absolutely smooth
animation but so you can see what this
technique we've basically eliminated the
pauses in the animation stream and we're
able to get smooth animations while
we're spooling through a large quantity
of textures this demo actually will
spool through 200 megabytes of textures
simulating a fairly large scenario and
then the third thing we can do after we
decided of frame rate quality we can
also go to a tiled mode so a tile mode
is an attempt to split the texture into
many pieces and to amortize the cost of
uploading that across the bus I've had a
little bit of problem with the tiling
mode so we're gonna give it a shot
though
so the tiling both theoretically now is
using the primary thread to load the
images and and then the drawing thread
is is well there we go so I've got some
thread synchronization issues it's an
attempt to try to amortize the cost of
moving the data across to the card so
with the MP case when it Susan went to
the multi-threaded case we offloaded the
the main thread its job of loading all
of the data from disk and then giving it
open Jill but what we were not able to
do in the multi-threaded cases we're not
able to amortize the cost of moving that
image across the bus across a GP up to
the video memory so so we still see a
small hiccup in the MP case
so as soon as we go to tile mode what
I've done here is I've taken a small
piece of the tile a small piece of the
texture and I've uploaded one small
piece at a time
so I'm able to upload one small piece
per frame and that way not see a big 4
megabyte chunk of data in the data
stream as it goes to the graphics card
and done correctly you can make get a
lot of data up in the system with very
smooth animations so again if you look
at the different scenarios looking at
the stream case so there it is
multi-threaded it's a lot smoother and
if we go tiled so that's a little
example of how to try to get through a
large large amount of texture data and
techniques to get it through the system
without hesitating your animation okay
so now we're going to talk about vertex
operations so vertex operations
obviously are the process of getting
actually getting a 3d data to the
graphics card and there's a lots of good
information about how to do that
correctly and it'll vary depending on
how the data is organized for your
application and potentially you know
what's best for your animation
technically or what you're animating so
if we look at the standard opengl path
which is called the media mode path
which uses a Geel begin end the thing to
remember with GL begin end always is
that you want to pass as much data as
possible between the jail begin and end
you want to call jail begin end as
infrequently as possible and the reason
for that is that there's a lot of
function call overhead gl begin will try
to do some card management some state
management and it will induce function
calls to the lower-level system so
reducing the begin end is the first
thing you can do to get better
performance and I'll go through an
example a little bit of code a little
bit after these couple slides here that
shows how to do that so use efficient
primitives is the next thing to remember
triangle strips are obviously the a good
primitive to use because you get a lot
of triangles
pervert text if you're using individual
quads or individual Verdean dividual
triangles you're gonna get about three
times the amount of vertex data going
through the system and it will hurt your
performance quite a bit if in some
scenarios where you are cpu limited use
vertex arrays so vertex raises the API
for passing a whole strip of data to
OpenGL once so it has the benefit of
reducing the number of function calls
you're making so you you save right
there but it also gives OpenGL the
opportunity to optimize how the data is
moving into the stream and there can be
a big win there so the other thing that
you can use in conjunction with vertex
arrays is compiled vertex array so
compile vertex array is probably one of
the most optimized paths and OpenGL for
getting data through the system
currently and it has the benefit of
highly optimized assembly code runtime
generated assembly code the deficit is
is that if you are passing small amounts
of data there's a little bit of overhead
of logic to get into the routines so
you're not going to want to call a
compiled vertex array with three
vertices because you're better off going
to GL begin end because that's lower
overhead for a small amount of data so
if you have large bolt large arrays of
data let's say greater than sixteen
sixteen may be pushing the smaller end
of it but say greater in 16 vertices per
array
try using compiled vertex array to
probably get you some benefit so looking
at a chart here that shows you
primitives along the x-axis and number
of triangles that you can render per
second along the Y you can see that that
the type of primitive can make a large
impact and the number of triangles that
you can send through the system so down
at the very bottom is polygons polygons
is a most rudimentary way to send data
to OpenGL and then near the upper end of
the spectrum as triangle strips so
triangle strips is the best way to send
data through the begin in immediate mode
path and then at the very far right is
compiled vertex array so you can see
that Kapaa vertex array if fed correctly
can give you substantial boost in
performance now the green bar shows what
you can do on a g3 and the blue and the
orange bar shows you on a g4 there's not
a huge difference but it can make a big
difference ultimately in your
performance and that's primarily these
numbers are actually were on a graphics
card that we're not did not have
transformer lighting on the graphics
card so for a card that does do
transformer lighting it'll make less of
a difference if you have a g3 or g4 okay
so looking at how to potentially
optimize OpenGL I've got a number of
slides here to just basically walk
through the process that every one
should look at and when when they're
trying to figure out how to simplify
their code and how to make it more
optimal so we start off with a basic
loop that is going through setting up a
smooth shaded color mode setting up a
color and then going and then drawing a
triangle so we're doing this every time
through the loop so we're drawing one
triangle we're doing a state change per
triangle and obviously we're not going
to get a lot of data through this
because it breaks every rule we have and
that is you're giving Steve changes and
you're not passing a lot of data per
begin end so the first thing we do is
remove state changes out of the loop and
that will obviously give you the benefit
that now we're we're passing a lot of
data we're not changing the state and
we're not causing global jail to have to
do a lot of state management below but
we still haven't pulled any you know
done in the optimizations with how we're
passing vertex data so the next thing we
do then is well actually we simplify the
state and we simplify by just going to a
flat shaded so we notice that we're not
passing a color per vertex meaning the
colors flat shaded triangles so we're
gonna change that to flatten but then we
pull the triangles out of the loop and
that's an attempt to maximize the amount
of data per begin end and by doing this
we can increase the performance by quite
a bit and in fact after this I have
another demonstration to show you the
effect of that
it can be pretty dramatic just doing
that step alone then what we do is we
try to simplify the API that we're
utilizing instead of passing all the
data through registers we pass a vertex
a pointer to the data and it allows
OpenGL to potentially optimize how it's
copying the data you're not doing a lot
of register setup to get the data
through then we take the step of
realizing that what we're actually
passing is a triangle strip so we we
change the type to triangle strip and we
reorganize how we're passing it and so
now we've just reduced the amount of
data going to open jail by a factor of
three again getting a big performance
boost out of doing a step like that and
then what we do is we realize that we
have all the data actually in an array
so we start using a vertex array and
using draw elements to draw out of that
array so now we've eliminated the loop
all together and we are simply making
five function calls to handle all the
drawing whereas if we looked at the
beginning of the slides we were probably
making hundreds or thousands so we've
eliminated all the function call
overhead and we have given open GL an
opportunity to try to optimize
maternally for how it's going to want to
get the data into the command stream so
now I have another demo showing some of
that effect this is actually a pretty
neat demo and so what this data what
what this demo is is a spherical map
mesh that's being animated with a wave
motion and where we start this this
application right now is in a mode where
the application hasn't been tuned and
the rendering hasn't been tuned and the
way we can tell that is that the red bar
represents the the time being spent in
the application the green bar spent
simulates the time being spent
calculating the wave motion and the blue
bar is the time being spent in OpenGL so
we can see where we're spending quite a
bit quite a bit of time at all these for
spending most of time in the application
so
a little experiment that's interesting
to run as if I take this application
tuning slider and I bump it all the way
up so that the application becomes tuned
we can see we get about we go from 20
frames per second almost a 40 so we
almost double our frame rate by doing
that ok so now if I move this slider
over here which simulates optimizing
OpenGL through the basic steps I just
went through the first one is individual
triangles the second one here has now
Pat is passing moved the begin end
outside the loop and it's passing as
much data as possible per begin end so
we can see that we immediately get some
performance out of that we can see the
blue bars changing by about a factor 2
but OpenGL performance hasn't changed a
whole lot it only went up about 5 frames
so by doing that step we didn't get a
whole lot now if we go to the top one
that's using vertex arrays and again it
didn't change a whole lot so the
interesting thing to learn about this is
that if we take the slider and we move
it up for the application now we realize
that we have gotten 100% improvement on
just optimizing the applications so
optimizing one or the other only got us
a marginal improvement to 2x improvement
but if I optimize both I go from 20
frames a second to 60 so I get a 3x
improvement so the combined effect is
very important so it's important to to
realize that where the x means spin is
is can't be the application or OpenGL so
the second thing we can do then is like
we've been showing here is to spawn a
thread now if we spawn a thread and we
move the green bar on to the thread we
can see that now we are utilizing both
see both CPUs in this this machine this
machine is a dual 500 so now we're
animating at 200 frames a second and we
started off at 20 so we got a 10x
improvement out of this and so now it's
animating silky-smooth whereas before it
was barely crawling along at 20 frames a
second so this is a good example of
example of where you can start from a
pretty dismal performance and do some
simple things and all of a sudden the
whole application comes alive and you're
getting you know 1.5 million triangles a
second and able to deliver a much better
application okay so okay so now we've
kind of talked about the application
setup basically how to drive OpenGL and
all those things are fundamentally CPU
oriented operations and that's a process
of optimizing how effectively you're
utilizing the CPU so we're now we're
going to talk about per fragment
operations a little bit per-fragment
operations are fundamentally what the
graphics card is going to have to do to
convert your data from a triangle to the
image that you see into the frame buffer
and what types of blending or texturing
operations need to be done and there's a
few things just to keep in mind while
you're doing this while you're
programming one is to utilize multi
texture instead of multi pass so
basically all the graphics cards on OS
10 that are accelerated on OS 10 today
have multi texture multiple texture
units so if you want to apply two
textures let's say you can do it in one
pass you can load two textures one
intersection unit-01 detection unit one
and you can apply both textures
simultaneously and this actually has two
benefits one is that again lowers the
CPU overhead because your application is
not having to loop through the rendering
twice and reissue drawing commands to do
the second pass but the second one is is
that it helps the graphics card optimize
its memory traffic because you're not
writing to the frame buffer on one pass
and then having to come back on a second
pass or write the pixel again instead
you're allowing the app the the graphics
card to read to text tools out of the
texture units out of the textures you've
defined combined them and write it once
out to the frame buffer so it lowers the
ultimate band
that you're consuming on the graphics
card so you can you can get performance
a couple ways by going to a multi
texture instead of multi pass so avoid
when possible obviously anything I'm
going to say here is just a suggestion
of things to try to work your way around
and sometimes you're looking for an
effect and you have to do these
operations but avoid read-modify-write
operations on the frame buffer so
read-modify-write operations are
anything that will requires the frame
buffer value to participate in the value
that participated in the color
calculation that will result and be
finally put back into the frame buffer
so things that do that are blending so
if you're blending with the final
destination of the frame buffer it's
gonna have to read the frame buffer it's
gonna have to read its textures it's
going to combine it all then write it
back out so you're gonna get 2x the
bandwidth utilized on the graphics card
of the frame buffer as opposed to an
algorithm that say didn't use a blending
mode now again blending is pretty common
so obviously if your application
requires it you'll have to use it Z
buffering is another thing that Italy
usually will result in a read modify
write and when possible just eliminate
enabling Z buffer so that you're not
doing a read modify write on that Z
buffer okay no thing that's important to
to keep in mind is that modern graphics
cards have the ability to do high-level
coding for you so if you're drawing lots
of triangles
it can ahead of time
: goes out and it's called hierarchical
Z and hierarchical Z will very quickly
take a primitive that you're drawing and
throw it out and it won't result in a
read modified right to the frame buffer
because it knows that that's occluded
through some varying techniques in the
graphics card and it will save you
memory memory bandwidth on to the
graphics card so the way to utilize this
is to render front to back and the
reason you render front to back is
because
want to draw things that are near you
and then when you draw something that's
behind it the graphics card will has
techniques for really very early
determining if that's behind something
already and discard it very early before
it has to do a read-modify-write
operation on the z-buffer and this is
the most effective way to let the
graphics card do its job of utilizing
these silicon gates that have been
dedicated to this and can be quite
effective if you do render front to back
so so some of this is a review we're
gonna talk about opengl extensions and
what OpenGL extensions can can help you
we already talked about compile vertex
array but just review really quickly
this is good for large number of
vertices it reduces a number of
transformations it reduces memory
traffic it allows OpenGL to pre compile
data into a frame into an AGP buffer
ready for transmitting to the card so
whenever possible again use kapal vertex
array texture compression texture
compression is also very good extension
to use it allows to you to minimize the
system memory bandwidth of moving that
texture around it saves you system
memory itself
it saves the bandwidth of moving that
texture up to the graphics card and it
also saves can benefit the graphics card
itself by lowering the bandwidth it
takes to read the texture a text aloud
and to render with it and because it
will do on-the-fly texture decompression
and it will better utilize the on cache
memory on the graphics card so texture
compression can be effective it's really
gonna depend on where your limitations
your where your performance bottlenecks
are in your application but it's a good
one to keep in mind multi texture is the
extension for doing multi texturing like
we've mentioned utilizing more than one
texture unit Apple pack pixel again is
the extension for the best way to pass
pixel data two of them Jill and allows
you to get the the most data the best
bandwidth utilization of
pixal data to the OpenGL system and the
other thing it does actually is it saves
system memory so if you're able to store
a texture in a more optimal format for
your your application let's say one five
five five obviously that's going to be
half the memory utilization of an eight
component eight bits per component
texture so that will give you some
system memory savings some bandwidth
savings and it'll also save video memory
on the graphics card so quick summary so
again going over some of the priorities
so the thing we always tell people as
they again need to optimize their
application because 75% of the time is
typically spent in the application as
and 25% in the OpenGL so optimizing your
application is going to be important and
you won't get good performance until
you've gotten back to us an acceptable
level scale your application to the
target platform try to determine your
vram how much video memory is available
determine how much system memory is
available try to stay within acceptable
bounds that won't cause the system
memory to go into paging determine maybe
the number of texture units on the
graphics card so you can do multi
texturing instead of multi pass look at
your CPU type
try to try to utilize a number of OpenGL
extensions that will help simplify how
the data is being passed OpenGL as well
as potentially give you better effects
allow the user to adjust the graphics
settings such that if the user is
experiencing problems on a particular
platform from one reason or another
allowing the user to vary the quality
settings such that they can get the
performance that they're looking for
obviously is it's gonna be a friendly
thing to do for the user it will give
the user control over some of the
aspects of how the application runs so
for more information there's two good
books on OpenGL and anybody that's doing
OpenGL programming should have these
books one is the the OpenGL programming
guide and the other one is the reference
manual
so these books are invaluable they're
very well written and if you're just
starting to OpenGL or whether you're an
expert these books are always sitting
right next to me on my desk so for
online help there's some good resources
you'll want to go to WWF injeel' org
this is the official opengl web page and
it's got all kinds of neat news
announcements resources it has lists of
applications that are utilizing opengl
documentation it's got all the all the
resources you'll need for finding out
what's the latest in OpenGL and then
there's the lists at Apple comm where
you can join the Apple Open GL list and
there's lots of Macintosh specific
discussions going on in that list where
you can participate or learn from some
of the discussions that are going on
there or send an email of your own and
ask a particularly difficult problem
that you need answered and lastly we met
Sergio at the beginning here so if you
have any questions about OpenGL at Apple
you can contact Sergio and here's his
contact information Sergio is our our
product representative Apple and he can
direct you to somebody
in Apple if he's not the right person or
help you with some of your product needs
and lastly we have after this session we
have advanced OpenGL rendering
techniques it's a very interesting
presentation that will go into utilizing
some OpenGL extensions for doing
advanced rendering I highly suggest it
for people that are looking for new
techniques and some capabilities of
graphics cards today it'll show you some
interesting demos and some nice effects
you