WWDC2004 Session 211

Transcript

Kind: captions
Language: en
section 211 opens yo optimization live
I'm Dave Springer this is chris
Niederauer it's going to be running demo
so it's three-thirty on thursdays like
nearly the last day of the conference
right guys have been here a long time
you're feeling a little conference burn
here's a thought that keeps me going in
the whole of human history there's been
appointed one Dave Springer and for my
life my entire life I get to be him okay
all right we're going to talk about a
couple tools that we developed an apple
opengl profiler and opengl driver
monitor and we're going to have live
demos because we like to live on the
edge and this software you see is pretty
fresh right so anything could happen at
any time but what we're going to do
during these demos and we're going to do
it is show some of the performance
bottlenecks that we've seen that are
common among opengl apps okay so we get
a lot of applications come through the
shop and we see a lot of things like
immediate mode when displaylist might be
more appropriate we see things like text
your upload usage that could be a little
better we see state changes that aren't
always necessary so also will show you
about how to debug your open G
applications using these tools as well
okay now why did we build this tool well
really it's those were the issues that
we're running into all the time you see
a lot of performance problems and we
notice common themes so we built this
tool to quickly identify those areas
where your may be losing performance in
your OpenGL apps
also there was a lot of Congress
conceptions about why performance was
being lost a lot of people would you get
a lot of finger-pointing well now we
have a tool that will exactly measure
and precisely identify where the
performance is going so high so what
does it do profiler will show your your
usage of the OpenGL engine library
collect a lot of data and a lot of
statistics you can also control your
application kind of like a bugger level
and we'll also show the graphic state
that your application has in it and
we'll get into what all that really
means in the demos okay now here's how
profiler works now I got to refer to my
cheat sheet here because my memories
like a sieve and besides this makes it
look like you're prepared beforehand
profiler is it's a runtime system and
what happens is it gets in between your
app and the OpenGL engine okay so if you
don't have to recompile your application
run in a special mode or anything like
that profiler really does work like a
debugger in that sense in that you can
just run your app under the profiler
environment and now how it does it is
that profile it gets into the OpenGL
library at the library level and wraps
all those functions so imagine like the
old days you have jump tables that we
loaded libraries get in there and
masquerade each one of those function
calls to go into profiler and then from
there into the engine or the cgl shim
and it does wrap both cgl which is our
OS dependent layer of OpenGL and also
OpenGL and there's a quick note under
the x window system if using that
platform you actually end up profiling
the
server not not your client app because
it's a runtime system like this it means
that you can launch a nap under profiler
and then using gdb or another favorite
debugger you can attach to that same app
so you can have a full debugger at the
same time that you're running profiler
some little little tricks and how to
keep it synchronized there and you'd
have to experiment with that that's an
exercise to read okay here's a
screenshot this is all new for Tiger
profiler three point 0 first thing we
did was take the two panel approach
startup approach that we used to have
and compress them to one pal so now
there's no start and then start and then
really start this is you select your ass
at the top now let me let me get out of
these fancy build there we go you first
of all set up the app that you want to
profile in this top table here if you
click attach that tables automatically
populated with all the running
applications on your system okay then
you can in the new three-point o
profiler set environment variables this
window by the way is open large this
doesn't default to this size normally
this part of it is hidden when you when
you run profiler you can set a custom
pixel format which I'll talk about more
later you can emulate sort of graphics
drivers again I'll talk about that more
detail layer and new for tiger you can
set environment variables so like you
can from if you're launching from a
shell in UNIX you can you can set
environment variables that you can get
ends inside your f now you can do it
right from profile and I do this all the
time to tell my target apps who launched
different dynamic libraries like debug
versions of a framework for example
you can do that your environment
variables okay and the third part of
this panel is down here the most
important part really of this panel is
the frame rate now this is an out of the
gate estimation of your apps performance
and it's not a real precise nailed down
this is where my performance really is
but this is going to give you a general
idea oh yeah i'm getting about 200
frames per second and I'm expecting 500
okay so that that's what that gives you
now let me get into some of the things
that we collect this is the data that
profiler gathers out of your app it
takes the amount of time that you
actually spend in OpenGL engine so when
you make a call we start a timer go into
the engine come out of the engine
stopped the timer add all those up and
so this is a pretty precise measurement
of your actual usage of the engine / /
call now these are cumulative values and
they are also but they are but they are
per contact or global so you can see
globally have a lot of threads in law
context you can see globally how much
time you're spending in functions or you
can look at it for context as well now
one of the really important numbers on
this now here is the estimated percent
time spent in OpenGL engine and here
we're spending about a quarter of the
time if the app is a hundred percent
we're spending about a quarter of the
time actually in GL in other words on
the GPU now what this number is going to
tell you is profiler the tool you want
to keep using to measure performance or
do you want to move on to something like
shark and Chad tools in order to work on
your performance on the cpu side and
we'll go into more detail later about
how to balance those two off okay with
that I'm going to turn it over to Chris
he's going to show us the devil okay so
we're on computer number three okay
that's good so here's a new profiler
window and as we see it's actually a
little bit smaller than Dave screenshot
is how it starts up by default so we we
can start an application that shows some
of the common pitfalls that we see with
a lot of applications today that use
OpenGL on Mac OS 10 and so this
application we call it what did we call
it charge event tri-tips bandit that was
its name we wrote it it will we'll use
this in OpenGL profiler use these tools
and improve its performance so one of
the first things that you generally want
to do is you want to figure out what
percentage of the time your application
is pending in OpenGL and what it's doing
with that time so we've got the
applications right here ready to launch
and so launch that
so we've got it showing up here it's
just running is it would normally if I
were to start it from the finder and one
thing you may notice is we already have
the frame rate is already showing in the
bottom of the profiler panel because
it's non-invasively able to get capture
the cause of OpenGL and it's displaying
this information so we're saying we're
getting around 33 frames per second
right now so let's go and find out where
that time is going to make that thirty
three frames per second so I'm going to
check these statistics collect
statistics right here and this brings up
a window of all the functions and for
each of those functions we have the
total time that is spending in that
particular function of OpenGL the
average time at each of those calls the
percent of time the that is of all the
OpenGL usage in that application and
then the percentage of time that is in
the overall entire application so one of
the things I like to do is sort by
percent of time in OpenGL this gives us
a good gauge to start with to see where
which which calls are actually taking up
the most time so we look at this list we
see vertex 3f tech chord qf those are
the top two but then I'm also noticing
here there's a finish and so finish
despite only being called fourteen
hundred times we see it's taking quite a
bit of time for each finished call to
execute and as if you attended John John
staffers talked earlier on open G
optimization finish isn't always
necessary a necessary command to use
like flush is sometimes sometimes useful
but not really for there's like there's
cases where it's useful but in this
particular application it's not useful
so what I'm going to do is I'm going to
show you part of the control
functionality of the OpenGL profiler and
so what I'm going to do is I'm going to
disable that function so let's go
so I'm going to open up the breakpoints
window and this window gives a list of
all the functions of OpenGL and you can
control them through different ways
we're going to show you how what if you
don't understand off it yet don't worry
we're going to go over how to use this
window in depth a little bit further
later on so I'm going to look for GL
finish we see the command here and i can
simply we see this column execute and
i'm going to simply turn off that column
and we see already and are statistics
that is turned red which basically which
means that we're no longer calling that
function so i'm going to clear this
recheck and we can confirm that GL
finish is no longer in the statistics so
we're already it's about five percent
faster from what I've measured just by
taking out the function as it allows you
to do this all on the fly so let's see
some of that back to you they thanks
Chris okay i want to mention here that
Chris and I work about 200 miles apart
or so and I have never seen this demo
until now and it was awesome so thanks
it's a silver auction
[Applause]
okay let's go on to another section of
usage data that we harvest with with
profiler now this is a call trace what
is in the last demo you saw that we
capture every function and time it and
in this usage data capture we grab every
function as you call it and store it so
you've got a whole trace of all the GL
function calls you make as you make them
and you can see here that the output is
kind of see style you couldn't actually
just take grab this and compile and get
errors but it does print out the
symbolic names of the parameters so that
makes it a little easier to read and I
want to point out a couple new features
for tiger one is that you can apply a
filter to this trace a couple of you
guys developers out there had this
excellent excellent idea of taking the
this text and running it through various
Python and Perl scripts to come up with
statistical analysis oh now it's built
right in so you just say enable filter
you pick the filter and it'll push it
right through there and show you your
output the other thing that we have on
here that's me for Tiger is timing
information per function so before you
saw in the statistics window that the
timing information is cumulative for all
the function calls so when a great read
GL vertex 3 f for example that timing
information is the sum of all your dl
vertex 3 f calls okay in this case you
get the time for just the individual one
that's on this list now what that's
really useful for is finding hot spots
because you might have instances for
example of cgl pleasurable I'm just
pulling us out of the air that might
take a short amount of time and one or
two that are super long because of state
changes and things like that well on the
stats window it's going to show
up as taking a long time cumulatively
but what you really want to do is narrow
down those one or two that are really
soaking up all the time and figure out
why that is well this timing information
that is coming in tiger will will maybe
we'll define those really fast plus
attached to each one of these lines will
be revealed for a full back trace so you
can click on the function and find out
wearing your code that pacific call was
made narrow it down and again this is
all / context or global so if you have a
bunch of context you can narrow down to
looking at function calls just in one
context okay and i think with that we're
going to turn back over to chris go back
to that computer three thanks so I've
got here I have a second application
here which I've already taken the finish
out of and I'm going to demonstrate how
to more effectively pass down vertices
through your application to open jail so
let's launch this application up again
and so as David showing open geo
profiler allows you to get the trace of
at all the OpenGL functions that are
being called and I'm going to go ahead
and do that so let's click this button
create trace and I'm going to stop that
because otherwise might fill up hard
drive so looking through this trace we
see that there's a lot of vertex begin
and basically called what's that
practical so deal began GL vertex g /
texture / text you land and this is
actually for static data this is some
time this is a actually it's more it
would be more efficient to use display
list for this particular case for
instance the land here is all static yet
I'm passing it down through immediate
mode and so what I'm going to change
about it is
you can you can either add vertex of a
range vertex buffer objects display list
all of these allow you to effectively
pick you can pick the type that you feel
is most appropriate for the type of data
that you're trying to draw with and use
that to more efficiently take advantage
of the video card more efficiently take
advantage of the bandwidth of the system
so one thing to note is when you do have
all these calls like immediate mode
requires a lot of calls with all the
begin end versus if you were to say user
display list which is a single call GL
call list so again also as John software
went over and it's talked earlier today
on on optimization in general in OpenGL
you can use cgl macros to in order to
cut down on the overhead that each
advocate each function call use the
basically the overhead of making the
function call itself so this in the case
where in this particular application
simply switching the cgl macros will
will definitely get us a gain a pretty
good game so but I've already written
this to use displaylist and well first
let's check out the statistics again and
we can see sorting by the Geo time that
we've got vertex 3 f text cord to F
vertex 3d color for F begin and all
these calls are immediate calls and so
let's launch well and so we get about 35
frames per second so I'm going to stop
this application startup one using
display vertex array range and that
alone we're already up to 160 under 70
frames per second 165 simply by passing
the data using displaylist vertex range
and let's go back to statistics and now
we see that we're actually so most of
our time is being spent
in GL call lists and then the rest of
the time most of the rest of time is
being spent in cgl flush travel which
basically means it's waiting for the
video card to stick more data back on it
so we've pretty effectively used opengl
on the cpu side right now so back to you
alright thanks Chris alright let's move
on to application control and some of
these features that are in profiler for
this one of the ways you can control
your app is by setting a custom pixel
format and what we do here is inject a
different pixel format than what you
have what you asked for in your code so
again this is without recompiling your
application or changing any code like
that we can do things for example change
the depth buffer size so you want to see
if your app will run with a 16-bit
z-buffer instead of a 32-bit you can do
that through profiler without having to
rerun your app or we compile your app I
mean yes we run it the other thing you
can do is what we call driver annulation
we don't actually fully emulate the
graphics drivers because we can't it's
it's hardware there's catch there's all
kinds of stuff involved in there but
what we can do in profiler as a runtime
system is get in the way of the GL get
calls and make it seem like you're
running on another card so this is
useful if you want to make sure your app
is following correct code paths for
example you got different code paths
depending on the return from a GL get
string because you're looking at
different card features and you're going
to enable or disable certain functions
in many of those using games all the
time you're going to change a menu that
allows you to to turn on certain
features in a card that's what this
really lets you do
now when you use this feature and you
you change the driver strings that are
getting returned your graphics may not
show up on the card because it's not the
right card you know you your app think
that's something else an nvidia card and
really it's an ati so you use it with
care but it but it does have have you
now another way we can control the
application chris already showed in this
demo GL finish is that you can enable
and disable GL calls so you want to see
what your app looks like without ever
calling GL finish to turn off and he saw
that the app not only look the same but
ran way faster so we can do that you can
also attach script at brake lights now
what that means is you can write little
pieces of GL code and profiler will take
and inject those into your application
at break points while you're running and
then we're going to see an example of
that later on ok this is the break point
window and Chris showed this earlier
these like a debugger you can set
breakpoints but unlucky debugger you can
only set them on certain functions which
is all the GL calls this is not a
general debugger feature this is just a
way to stop your app on certain GL calls
and the interesting thing is that you
can stop it just before it goes into the
engine or you can stop it right after it
comes back from the engine why this is
useful is because along with the
back-trace you know standard debugger
batteries you also get a full snapshot
of the open field of state so you can
see what kind of state changes are going
on in your Jail calls and verify for
yourself and gather it really are
happening or or you know passwords or my
budget
and again this handles the multiple
contacts case now what we do here though
is you know like a debugger if you have
a bunch of threads running and you put a
breakpoint on you know function food
then it's going to stop in every thread
what's the same here if you put a
breakpoint on GL flush then it's going
to stop at every context in every thread
that calls them ok and with that I'm
going to go back to Chris catching up
yelling so GL profile is good four break
points and one of those types of break
points that you can set is a break point
on any geo error so what I'm going to do
is run my application this time with a
breakpoint set any time that geo air
what might occur so I'm going to go up
and go to back to the views that's a
break point and we can see here we've
got a list of different types of errors
we can break on I'm going to break down
error which refers to the normal GL
error and let's start this program up so
already its ecology or and it says that
I'm calling he'll blend with geo blend
equations when in fact blend equation is
not something that's supposed to be
enabled and disabled GL blend is more
common is actually what's supposed to be
there so we get the error GL and val de
new and like we also see we can see the
back-trace and the actual line of code
somewhere here you can see the line of
code where this actual this error is is
occurring and so using this I was able
to quickly realize where the where this
was fix it corrected to GL blend instead
of GL blend equation and and I'll show
you the result
we got one taker yeah let's see so the
blending that's actually if you notice a
function back trace that was in draw sky
and so I had blending basically was not
enabled for that so let's start up the
version without errors I'll set that
break point again let's see break on air
started up and as we can see it's not
breaking on any airs and now we've got
the clouds blending pretty well so let's
go back to you Dave and you can explain
some of the other types of errors that
you and break on thanks Chris okay we
saw breaking OpenGL errors I'm very
useful another way that you can track
these errors in your app is on thread
conflict John talked earlier about
multiple contexts multiple threads and
what's legal what's okay and what's not
well you can have more than one thread
talking to one single GL context but
it's up to you to make sure that the
thread is locking correctly and not in
the content not more than one thread in
the context at the same time if you end
up with more than one thread talking to
the same context you can get all kinds
of funky data corruption problems and
bad things can happen to your computer
so you don't want that well profiler
what really is happening here in this
thread conflict is that you are supposed
to have the mutex lock on the threads if
you're going to talk to one context
profiler has this mode where it applies
with locks that you're supposed to have
so if you get into the case where
threads are going to conflict
it'll trip over one of those locks and
stop and say hey you know there's an
error here then you can go back into
your app again by using the batteries
and you can apply the lock seem clean it
up personally I recommend that you have
one context per thread but that's just
my personal opinion doing what now this
threading collation stuff is only
detected in the OpenGL api's we don't
detect it in the cgl layer so you're on
your own there another way we can detect
errors is the panel out there's a break
on bar error it's tough on vertex array
range and vertex array object errors
essentially these four points some say
any time an index that you're using to
draw with veers outside of an array
range that you've specified or if you're
going to hand in a pointer that is
outside in one of those ranges that you
haven't properly set up then we'll stop
and we'll break and again show you the
back-trace the full GL state everything
you need to see and we validate your
vertex array range on any of these
functions that you see up there
okay we talked about the full snapshot
of OpenGL state it is a full snapshot so
every GL get call that you can make is
done right here and we put it in this
list this reveal list now what happens
is that the state is gathered it's
harvested every time you stop at a
breakpoint and the changes in the state
are shown in red and the changes being
since the last break point and to show
that what I did here in this this
screenshot is I've got a stop on GL &
Abel before it goes into the engine and
then another stop as soon as it comes
out so what I would expect is that it's
dl enabled so I would expect the state
to be turned on rights that I'm that I'm
changing and that's in fact what happens
you can see down here at the bottom and
says it it broke after GL and able in
other words it's gone into the engines
come back out and stopped again and then
you can see there that the call face is
now enabled so this is really useful for
detecting errors where you you think you
have state set up that may not be or
state that's set with with incorrect
values you can watch the change you know
that's we had another taker for that
awesome okay just another quick couple
quick points on this windows there's
under that actions pull down their visas
just shortcut menu options to stop
everywhere before stop everywhere after
stop know or you know it turns on all
those buttons or turn them all off you
can also execute no GL functions so if
you want to see and we've had examples
of this in the lab for people you know
your graphic is slow and my app runs
slow because your graphics just not up
to par well so we said okay you'll take
your app will turn off all graphics and
notice it goes to say
speed guess what so you can run your app
open loop and decide oh well maybe I
better get shark and judge holes out and
you know make that go a little faster
first where I start blaming people
randomly not that I've ever done it and
of course you can just you can ignore
all the breakpoints too if you just want
to run your run your app without
stopping anywhere okay and what's that
turn it back over to Chris and we'll
talk about unnecessary state change so
already I talked about the immediate
mode I talked about how making a lot of
calls actually will result in function
overhead and the same holds true for
setting state except there's also the
fact that setting state can also itself
the actual setting of the state can take
up time and even if that the setting of
the state doesn't take up time you may
not know but the sum of the state
changes will be deferred until your draw
command and that will cause your draw
commands to go a little bit slower so
you won't so like in the statistics for
instance you'll see draw raise taking a
longer time than usual because you've
accidentally turned something on or turn
it on multiple times or just switch some
sort of states that you didn't need to
switch so one thing that developers
should try and do is they should try and
avoid state changes when they can but
they should also keep in mind that
OpenGL does keep track of that as a
state machine that is keeping track of
what you're doing and depending on the
type of state that you're setting it may
be more efficient for you as the
developer with the semantics of the
application to decide whether or not to
do the state change yourself so I'm
going to launch up
application here and I'm going to go
look at the statistics and one thing I
wanted to reiterate sit that they've
said early earlier was the the estimated
time % time in GL is is a is a really
useful feature to look at like here we
see the applications taking ninety-one
percent of sleep excuse me ninety-one
percent of the application is going to
OpenGL and sometimes so depending on
your application fat percentage will be
different but for this particular
application since I'm just pounding on
the graphics hardware I'm not doing
anything that has to I'm not doing any
cpu calculations such as physics or
anything similar to that so because of
that I have a pretty high percentage of
time in OpenGL sometimes it's better to
have a higher percentage of open jail
time because that means that you're
giving more data to OpenGL in general
but so let's look at I wanted to look in
particular at the number of function
calls with GL a naval and GL disabled
and if you look at the number of cost
between those two you'll notice there's
actually they're they're very different
so there's a hundred and eighty 190,000
disabled calls while there's only 125
enable cost so obviously this is not
necessary that means that there's some
sort of imbalance there and that's just
one example of a fake change which is
unnecessary so we so by taking that
state change out you might not you don't
just gained the time in the actual
function itself like here percent so the
average time here is very small here for
the naval I'm disabled however this time
might be actually showing up in your
other calls such as GL begin and other
similar function drawing command so back
to you
these are 9680 I 9600 card on a duel to
all right let's talk about some of the
graphics states that your application
keeps know there's a differentiation
between state in the in OpenGL which is
a state machine and graphic state that
your application owns the difference
being that your app is going to own
things like textures vertex programs and
as this slide shows a depth buffer back
buffer things like that it's not
strictly speaking GL engine state ok but
because it's important especially when
you're debugging and and in performance
analysis to know what's going on with
that state profile of captures at all to
so this view is the depth buffer and
what profiler does here is grab the
z-buffer the depth buffer and then gray
scales it ok so that on your gray scale
here the black pixels are minimum Z and
the white pixels are maximum Z as slider
at the top is showing you your Z Y or Z
range when you get the desk buffer up
and you click that magnifying glass
profiler will automatically analyze the
image and say alright you your minimum Z
value in the depth buffer is such in
this case point three and four in your
maximum is one now in in the in OpenGL
the default is that the z values in the
depth buffer are always between 0 and 1
there is a way to change that but
generally speaking the values of the
folding point range of Z in a depth
buffer is 0 to 1 now
and the idea here is to show how much is
the precision you're using so there's a
one of the common problems you run into
which chris is going to demo later on is
something we call v fighting and that's
our colloquial term for it and how that
manifests itself is you get these little
flashing polygons because there's not
enough Z position to tell which part is
in front and which part is behind
consistently and so you don't have
enough precision in your depth buffer
how you can see if you have enough
precision or not is by using this view
and if that orange bar at the top is
really tiny then you've got almost no Z
precision wider that bar is the more z
precision you have so that's what you're
striving for and the way that you affect
the precision is by changing the near
and far planes in your GL frustum call
and I want to this sometimes has been
appointed confusion especially on the
OpenGL listing where the values of Z in
the z-buffer the range of those values i
should say is not affected by GL
presently always go from 0 to 1 what
changes with GL frustum changes is how
many of those bits you're going to use
for the z comparing the z-buffer make
sense in other words you want to you
don't want to have just the top two bits
being used for all your z compares you
want to try and get all 32 or all 16 or
whatever your depth is and your near and
far plane are going to be the
determining factors for how much
precision you the actual values are
always do 0 to 1 if the range is always
here 21
okay then another kind of buffer you can
look at with profiles stance or stencil
buffer what profiler does is pseudo
color the stencil planes way to use a
stencil buffer is that you set
individual bit planes so profiler you
can pseudo color those on this example
here we've got three bit planes being
used in the stencil buffer and the
profiler pseudo coloured them with blue
green red so you can see here there's a
black where there's no simple bit set at
all then which planes have stencil bit
set in the the red and the green and the
blue and then where it's purple profiler
composites bit planes together and
because of another pseudo colors so the
purple areas are where you have red and
blue set so both of those bit planes
have been set in your rendering there
now other buffer views that you can get
to the back buffer and that's pretty
straightforward it just looks like the
front buffer before it got swapped you
can look at the Alpha buffer which is
also greyscale colored and you can look
at all your auxiliary buffers so
depending on how many you asked for in
your pixel format or how many the engine
or card supports that's how many you can
look at buffer views are all static so
they're just what your app put in there
you can't edit them and then shove them
back in and say oh well what happens if
I really had a Z precision range of you
know much bigger than I really do you
can't do that it's just it's just
reporting what you did it's just static
static images
okay no with that turned over to Chris
so I'm going to show an example of being
able to look at those buffers so looking
closely at this application we can see
in the background where the waters in
the land our meeting is meeting you see
it's sort of the land doesn't quite look
right it's not a smooth Lant it's not
there's not a smooth line there and what
I think this is i think is V fighting so
the way that I would check on this the
first thing I would do is I'm going to
take a look at the depth buffer so to do
this I have to set a breakpoint in order
to specify exactly where I want to look
at the buffer so I go up to views and
see breakpoints and for the for the
since I want to look at the depth buffer
right before basically when everything's
been drawn i'm going to set a breakpoint
right before geo clear is called so that
that means everything's done if going on
to the next frame but since I said it
before it won't actually execute it
until so here so I set my break point
and I go up to views let's look at the
depth buffer so well you can set slider
here but i'm just going to use the auto
fine in max which will look we can see
that the men in the max v value that
we're using is actually very small so
the precision that we're using leaves a
32 bit depth buffer in this case but
we're only using a very small amount of
it from point 996 to 1.0 and what we'd
like is for this number this value to be
a lot bigger we'd like to we'd like to
use a lot more if that's your the one so
let's go ahead and figure out why the
thrust why why is this this is why we're
using so little as a V buffer and i'm
going to set a breakpoint on GL frustum
and I think that I don't actually call
thrust them unless I resize the window
here we go trust them and we see the
frustum is being set with the XY or I
can tremble with these organisms are
basically and then these two values are
the Zeeman and the v-max we're going
from one to a hundred thousand or a
million or something like that which is
pretty large considering that I'm only
really drawing from zero to forty so
because of that that's making our depth
buffer looking correct so let's see so
I'm going to show another application
another this same application with the
frustum modified so it will clip between
zero and 40
and again I'm going to look at the depth
buffer well we can see already that the
water is looking much nicer we've got
clear line special where we used to be
having some v fighting issues so let's
look set a breakpoint Jill clear and you
can see the depth buffer automatically
updated and we're actually using a lot
more that range so in effect we've
gotten rid of those issues and back to
you alright thanks Chris as the fighting
is that's that in the past has been a
real hard one to find you know we've got
a lot of chatter on the OpenGL list
about my polygons keep flashing in and
out and to try and it is not obvious
that your Z precision is related to the
GL Preston call the GL Runciman has
nothing to do with the death buffer
right so there's not that instant
correlation ok more of the application
graphics state profiler will capture all
your textures that you're uploading
vertex programs and fragment programs
and you can look at those and make sure
in verify for yourself with profilers
that you really did upload what you
think you upload and one place where
this is really useful & chris is going
to show later is in your MIT maps
because you can get a lot of weird
texturing errors when you think you've
got a min map up there that you really
don't this screenshot up here is showing
a cube map and what profiler does their
capture each of the individual six faces
that go on the cube map and we stick
them on a cable
you can rotate that around and it will
show you which map is being applied to
which face is acute so again verify that
you've uploaded the right texture to the
right face plus there's a bunch of
information up there that talks about
the internal form as a source format so
when you're looking at performance
issues in terms of what kind of what
kind of texture formats the cards going
to perform best with you you can see oh
well if I change the internal format and
ask for a different internal format you
can maybe get better foreign so the
texture dimension there's a mid nap
slider down there which is chris is
going to get into more detail on as well
and other little buttons and things like
that just show you can flip the texture
up and down let's lie down and so with
that and turn over to crisp them okay so
we've got the profiler running here this
time when we launched the application
just like normal we this time I set
collect resources however and so this
brings up the resources window right
here and so right now I'm viewing the
textures so looking at this application
we see it's a nice sunny day you know as
the sun's that it looks pretty warm here
but looking down at the ground looks
kind of cold like snow but that's
actually because one of my textures
isn't uploading correctly and so by
default when a texture image is not
specified correctly it defaults to a
white texture and so let's go and see
why this texture is being white so we
see well here's that a grass texture we
don't see that grass texture in here we
got fans showed you all these resource
I'll show you these and like the cloud
and so the MIT map slider down here
actually serves a dual purpose in that
when when you have MIT mapping enabled
it will let you slide between all the
MIT maps and see each one and when it's
disabled it will this actual slider here
will be disabled
so let's go look at one of the textures
we know is uploaded correctly and go
look through these MIT maps it looks
like they're all specified correctly so
let's go back to the grass texture and
we notice that these mipmap these mipmap
levels have not been updated so to fix
this either we could turn off MIT
mapping for this particular texture or
what I what I'm going to do is just I'm
going to specify the MIT map levels for
all of those and so I'm going to
actually this is a great a great way to
show off the scripting ability in
profiler so I'm going to on-the-fly
disablement mapping for that for the
textures so that hopefully we can make
sure we can verify that this is why this
texture is not showing up so I'm going
to go to the breakpoints window which is
where you can set up your script and I'm
going to have a script that turns off
MIT mapping so a logical place for me to
do this is after every bind texture call
so by doing this after each bind texture
call I'm going to call geotech parameter
I with even though the target texture 2d
of the mint and sets a min filter to
linear as opposed to MIT linear mipmap
linear so let's go ahead and do that so
text I'm going to attach the script
through the actions here so
open up my script you know Chris while
you're doing that I want to jump in here
sure you'll notice that whenever chris
is up there looking for function that
he's not moving the mouse around he's
typing on here because it finds yeah
chris is a real keyboard or headed guy
and so we put in the morning good yeah
put in these ways to find functions
class by just typing okay so let's catch
that script and for this particular
script I'm going to it have some moment
map scripts I just I just specified I'm
going to have it executes after after
the bind texture call you can have
either execute before after so I'm gonna
have it executes after and after it does
execute the script I'm going to have it
continue you can have it otherwise pause
and show you the state after the scripts
and done so let's watch this attached
and as you can see on the fly we've
corrected that and everything looks much
better than it did before well this is a
live demo ladies and gentlemen that just
really worked awesome yeah shoot so back
here alright thanks Chris okay so let's
move on to the OpenGL driver martyr the
second tool in our in our suite we can
call it a sweet as it has more than one
tool as to okay driver monitor is where
we're profiler attaches to your software
and shows how your software's
interacting with opengl driver monitor
attaches to the hardware and it shows
you what's going on in the GPU now
earlier versions of driver monitor had
these really bizarre obscure parameter
names like dart wait time and stuff like
that one of my favorites and we got a
lot of questions like
what does that mean so we developed a
decoder ring to say well when you look
at these arcane cryptic parameter names
this is what's really going on and you
had to go to this this Earl to get that
well for tiger we built all that into
driver modern so not only are the
parameter names text that's even sort of
human readable it's you can roll over it
and it'll pop up the decoder ring for
that particular parameter and tell you
what you're welcome driver monitor does
remote monitoring to which means if you
have a full screen app it's pretty hard
to run another app on top of it and see
it what you actually can't so what you
can do is run your full screen app on
one computer and then as long as you're
connected on a land with a second
computer you can run driver motor on
that second one and monitor the other
GPU over the network
okay let's have a demo driver monitor so
I'm going to show the drive mother in
use I'm going to start out my
application just using profiler just
because it's handy got my list of
applications and I've launched it up and
everything looks good let's bring up the
driver monitor and so we've got the list
of everything we've got of all the
parameters and well by default whoops
you can set use descriptive name by
default it will be like this and we've
got actual English but you can change
back to the old names if you like those
if you're weird like me I guess nerd
yeah and you also have mouse overs which
explain everything that that you'd want
to know about these things so here I've
added right now I'm viewing on the graph
the current free video memory the
texture page off and page on data and so
we see that right now we've got let's
switch this to linear and we've got
about seven megabytes of vram eight
megabytes of drm free and we can see
that there's only about one or two
megabytes being paged on of texture data
each frame if if you if you don't
understand what's in the any of these
things all you can always just go over
these in your free time set it up figure
it out pick the ones that you think are
going to work for what you want to
figure out and so let's actually make
this window pretty large and we know
this 00 this video card must have more
vram than I expected look
and really beefed up these devil machine
ah that's what this little make us look
good I don't know let's see so we see
that actually we can see that the
current free video memory is bobbing up
and down and low let's go up to 1
gigabyte we see that we're actually
uploading texture page long data is
reached it's about 400 megabytes per
second simply because I've made the
window so large and I've got multi
sampling all those nifty features on so
it's taking up a lot of urine so you
because I've used the driver mother to
figure this out I can see that it's the
vram issue that it's causing it to slow
down so much when I'm at full screen and
what I'm going to do is I can I decided
to fix this by using compressed textures
which allow me to stay the same
resolution but i'm actually using only a
quarters of memory nvram for these
textures using some opengl extensions
which allow you to do this let's do that
again and look at the driver monitor we
can see that the vram has flattened out
if I were using well it's about it's a
little bit faster but in more extreme
cases you'd see huge benefit from from
doing things such as compressing
textures saving your vram and there's so
many other things that you can check you
can see what where your time is being
spent using the dragon Molitor so all
right that's it thanks
all right quick words on what's new for
tiger in profiler and Driver monitor you
saw the single control panel you saw the
decoder ring built in the new trace info
stuff for the call trace we're also
going to we've worked on better
integration between OpenGL s and shark a
lot of lot of you have said all my time
in the shark trace is being spent in gld
get string what is that well it's not
really there we fixed it also coming in
tiger remote profiling so as a driver
monitor you can hook up across the
network and monitor the GPU of a full
screen app do the same thing with OpenGL
profiler so you can run your full screen
app really in full screen and get the
full OpenGL profiler benefit you're
welcome quick note this you need to have
the same OS and profiler versions
running on both computers so to make
that work okay now into to wrap up let's
talk about really your performance
issues is a balancing act and we've seen
here a lot of talk about how you can
improve the performance your GPU usage
but and you use profile to do that and
driver motor you also have a CPU in the
computer so you need to be sure that
you're on top of its performance too so
your performance improvement cycle is
going to work like this first of all
your GPU usage might be very very high
in your CPU usage low as a ratio whole
app is one hundred percent GPU to start
with might be way up in the high 90s and
cpu down and ten Thank as a used
profiler and improve your performance on
the graphics card well now what's going
to happen if your GPU usage as a
enige is going to drop may be driving
your CPU usage higher switch over to
chuckles and shark start driving that
cpu usage back down well that's going to
because the ratio that's going to start
pulling your GPU usage up and then this
is a cycle and it's going to and what
you want to ultimately get towards is
where they're just about fifty-fifty
belts you're never going to be you know
that's a perfect ideal you may not reach
but that's how you would use these tools
in conjunction with each other