WWDC2001 Session 408
Transcript
Kind: captions Language: en good morning everyone to present session 408 OpenGL advanced optimizations that Elektra introduced OpenGL manager John Stauffer hi so today we're gonna talk about advanced optimizations in OpenGL so hopefully we'll learn a few things about how OpenGL works and things you can do to try to tune your applications so so what we'll learn is the key components that you need to look at when you're trying to tune your application for higher frame rates the thing that I always like to start talking about is application component and the reason for that is that about 75% of the time is spent in the application in a comedy OpenGL app and so therefore since 75% the time is spent there that's where you have the potential for getting the most benefit so if you don't tune your application obviously you're not going to get a lot out of OpenGL because you'll spend too much time in navigation so we'll spend some time just talking about techniques for tuning your application to drive up in jail better and some hints on and tips on how to do that the second thing is setup so how to properly setup OpenGL how to get some machine information how to properly configure and scale your application such that it will run well on the machine that you're targeting the third thing is state management so state management basically is where a large percent of the time that is spent in the actual OpenGL time frame is spent in state management so state management actually is more important than a lot of people think if you do a lot of thrashing of state in OpenGL you can actually decrease your performance quite a bit texture management texture management is important to keep your application correct correctly scaled for the hardware so that you're not paging a lot you're not spending a lot time running out of video memory and paging on and off vertex operations so vertex operations are important obviously to be able to get a lot of data to the card have an optimal format for sending the data keeping the data flow moving quickly to the card and per-fragment operations so per fragment operations are their operations that the card itself is going to do so it's not CP related but it's car.the it's what the graphics card is going to have to do to generate your final image and there's some trip there's some tips there to offload some of the work the graphics card is going to need to do extensions so a lot of times there's extensions that you can utilize that are either directly geared towards optimizing your application or will help you get the animation effect you're looking for with a simpler path so you won't have to do all you can simplify your CPU work by utilizing an extension multi CPU or multi thread utilization obviously if there's a machine that has two CPUs it's an ideal situation to spawn another thread maybe move your graphics off to that other CPU and lastly what we'll talk about briefly is where to look for more information so starting off here just to get the the image of what OpenGL looks like and how data goes to OpenGL it's important to think about in jail is a data stream so OpenGL fundamentally is a data stream going to the card and how the data is organized in that stream is very important because it will give you hesitations if you have too many operations of one type or if you're flushing and breaking that stream and causing discontinuities so the fundamental type of data that goes to up Jill is vertices which is your 3d data in state so you can fundamentally this is a simplistic view but you can fundamentally break it down and those two types of our data sets that go to the card and how that data again gets organized and sent to the card can make a big difference so application so the thing to remember when you're looking at writing an OpenGL application is first you have to decide obviously what type of performance you're looking for and to do that you need to obviously decide what type of user interaction there's going to be you know whether you need high frame rates because the user needs a fast response time on the graphical feedback which may mean you need 30 50 60 frames a second to get the proper feel quality display so you'll need to decide what kind of quality display in your application you're going to need and obviously those two things can be related so adjusting the right quality with the frame rate is going to give your user the best experience so it's important to keep those in mind your target platform so deciding what your ideal platform is going to be and what you're going to run best on is going to be important so that you can potentially scale your application to run well on those those target platforms and the things to remember about the target platform are video memory size how much system memory you're going to be needing for the application and potentially what graphics cards in the system so that you can have the animation effects that you're looking for so the thing that a lot of applications provide obviously is a mechanism for users to to adjust the quality settings within the application and this is usually important such that an application or user can himself or herself select the trade-off they want between performance and quality such that they can have some influence on their preferences as to how fast the application will run or what the quality will look like so the first thing we do and we do this a lot at Apple's we'll take an application to try to analyze where the time is being spent we'll take the app of the application and we will run it with a null layer of OpenGL we'll try to figure out how fast OpenGL with you if OpenGL was infinitely fast how fast will an application run and this gives us an upper bounds and this helps us understand what the application itself is doing and what the application what profiling may be needed to be further done in the application to tune it so two ways - depending on your programming environment and it just reminder actually all the code that I'll be showing today is Mac OS 10 cocoa based I'm going to since we have limited slide space I'm going to stick to those function calls so to know about the OpenGL layer there's a couple ways you can do it your application very easily for the CGI layer if you're programming straight to the the core OpenGL layer you can simply set your OpenGL context to null and what that does is that actually internal to opengl that will have opens you'll set all the entry points to a no op so they will do nothing and if you're at the applicant layer then you can use an applicator call just to clear the current context and it's equivalent to setting it to null and again that will just set all the entry points and dull to no ops and so what you want to do once you've done that is you want to measure the time that's spent in your application to get a feel for what level of performance your applications had and here we see a little code snippet using get time of day to just quickly calculate time spent in application so once we've done that we can calculate an open loop OpenGL no op frames per second that your application is capable of so obviously if you're once you've gotten to this point you realize that you've no op open G allowed it's a it's infinitely fast and you're not achieving the frame rates that that you would like to be at you can immediately start you know thinking about going into your application and tuning your application what we do is you can do a quick calculation assuming an average application spends about 25% of the time in OpenGL you can take that open loop frame rates per second and just multiply by 0.75 so lower that frame rate down and get an estimate of what you're going to run what your performance is going to be once you enable open Jill and if this estimated frames per second isn't where you want to be again you're gonna have to start looking at either OpenGL or you're gonna have to start looking at at your application so to start tuning your application on OS 10 there's a variety of tools to do this one tool that's very useful is called sampler for anybody that hasn't used sampler it's a a tool that will thread it's a threaded tool that will go out and look where your call stack is at any given time and it will generate a sample and heuristic of where the time is being spent in the application so this tool actually is very useful it works for CFM apps and Mach Oh apps and it's part of the developer install so it's on your disk at developer applications sampler and it's a very useful tool we suggest everybody become familiar with how to use it and it will show you where all the hot spots are and you're in your application code it'll even show you where the hot spots are in the app in the operating system itself but you may want to run this with without OpenGL and just run your application open loop and just stress your application and find out where the hot spots are okay so that's enough talking about the application so setting up Open GL the first thing you need to do obviously is to go out and query for devices and find what's what you're what devices you have how many devices and such so I've got a couple code snippets up here that will show you in core graphics how to get your device at your main device and how to from the main device generate an OpenGL display mask so the first code snippet here is just the main device if you wanted to go through all the devices you could get all the active display devices from core graphics loop through them generate a display mask that represents all of the all the devices on the system and really it's going to depend on whether you're a full screen or window to application as to what you're going what the right thing for you to do is so what we can do with this information is we can find out how much video memories in the system on each graphics card so here we've got a a code snippet that will query the renderer so for video memory and it goes through the loop and it will look at each device querying it for video memory size and this is going to be important because as we start to try to adjust or tune our application we're going to want to make sure that the amount of textures we have the resources we're going to be consuming on the card are gonna fit in video memory so we're going to want to know this usually upfront if we have a texture intensive application okay so when we look at the video memory size there's several things that we may want to adjust again we may want to adjust exercises but we may also want to adjust the screen resolution if we're going to be switching into a full-screen mode let's say we're going to have the opportunity for picking a screen depth in a screen resolution if you have determined your application needs more video memory and it's potentially available in the current display mode then you'll want to switch it down to a 16-bit color potentially or you'll want to switch down the resolution give the application more breathing room on the graphics card and that will help with keeping your application out of a text or paging mode and give the a higher frame rate during the running in the application so the other thing you'll want to do is to find out what CPU you're on one thing that we find very useful is internally the open till obviously is using altivec and altivec can give you substantial performance boosts if you utilize it so finding out if you're on a g3 or a g4 is very useful and tuning to that condition can be very beneficial the other thing to remember about that is that typically the difference between g3 and g4 is that g4 systems are going to faster and you may want to think about adjusting your data set size to accommodate faster systems so quickly talking about state management so state management again is is the process of switching wet mud OpenGL is running in to get your proper configuration for drawing your graphics the thing to remember a state changes you want to minimize those what we have found is that in a lot of applications the amount of time that's actually spending OpenGL a considerable portion of that is actually in doing state management and if you unnecessarily change state you can cause a lot of thrashing down on the card because OpenGL has to go through a lot of setup to properly configure the graphics card for each state change some state changes are obviously more expensive than others and we'll go through a few of those which ones to avoid but in general you want to group your data to minimize state changes and that will have a significant impact on what performance you can ultimately achieve so some general calls you want to avoid geo flush so you want to avoid geo flush because what it actually does is if you again think about OpenGL as a command stream going to the graphics card geo flush tells the graphics card terminate the current command stream send it to the graphics card and start me a new one so you've just chopped that command stream and and sent it on its way and the reason that you don't want to do this necessarily is because there's only so many command buffers that you can have allocated your application in a given time so if you sit there and call Gio flush a lot you will use up the buffers that you have available to your application and your application may be starved for available space to stick you know put data on the stream so unless you have to don't call Gio flush and there's actually very few reasons to ever call it usually you can find some other way to do what you're looking to do if if you want the user to see something immediately usually you just call swap buffers to get the data swap of the screen and swap buffers actually calls is implicitly calls a flush so when you call swap it terminates a stream sends it to the card and so you don't necessarily have to call geo flush yourself another call that's even more expensive is geo finish so geo finish is like a GL flush except for it sends the data to the card and it actually both block they're waiting for the graphics card to finish its drawing so once all the commands have gone to the graphics card finish to come back and return then GL finish will actually return to your application so an important performance thing to keep in mind is that geo finish is it's very expensive it can be a blocking call that can take quite a while to return so you want to avoid reading data back from OpenGL and when it comes to state management typically what you want to do is you want to keep the data in the application that you will need later and not ask OpenGL for back depending on the driver and what you're reading back they can get very expensive reading back data can actually be the same cost of calling a GL finish because if you're reading pixels back for instance the pixels actually had to be represent the current state that you are expecting and that is you've drawn all these command you issued always drawing commands you're expecting the pixels to be in the in the buffer well so opens you'll realizes this and it when you try to read some pixels back it's going to have to call finish wait till all the commands are finished wait till it's drawn everything before it and give you the valid pixels back so so you don't want to read the frame buffer unless you have to you don't want to be reading state some state could be expensive to read and you don't want to read textures unless you have to they all can have varying penalties depending on what mode you're running in so what you also want to think up think about when you're writing up gels avoid complex state settings if you don't know what a state setting does usually it's a bad idea to just arbitrarily throw state changes in there what you want to do is keep the state as simple as possible because this will help the graphics card run in its most optimal mode this will also usually lead to less state thrashing when you are trying to transition from one drawing routine to another you won't have to do as much state setup and teardown so it'll lead to less state transition so keeping it simple it's obviously a simplistic concept but it's something to keep in mind so some basic complex states that you want to avoid are lighting user clipping planes and and full scene well anti-aliasing like anti-alias lines and dailies points polygons and the reason you want to avoid those is because they can be very expensive to do with modern hardware lighting and user clipping planes and even anti aliased lines are pretty fast so again it may depend on the particular graphics card you're running on but in general lighting is very computationally expensive and unless you have a real need for it you'll want to keep that disabled even on the high power graphics cards today if you start an able lighting you will you will cause a graphics card to do more processing and you will ultimately lower the performance now whether you actually see that will depend on how fast you know what kind of demands your applications putting on the graphics card but those are very complex operations for the graphics card to perform so texture management so this is a very important topic because a lot of games nowadays or applications in general using a lot of textures and how to properly manage those can be making a big difference in the applications performance so several things to remember avoid uploading the texture more than once ideally what you want to do is you want to give up until the texture and not keep handing it to OpenGL not don't delete it and then give it to back to it later if at all possible and instead let OpenGL do the management the bookkeeping of whether the texture should be in video memory or not avoid so again avoid keeping a copy and that will save avoiding keeping a copy in the application will save your safe system memory the thing to remember here is that OpenGL will keep a copy and so you're gonna have two copies if you keep one in the application and one in open and one is gonna be kept in OpenGL you're gonna have two times that texture that texture size so it's best if you delete yours if possible so ways to get data textures data into the graphics card or into the driver fast there is an extension called Apple Apple packed pixel so this is the fastest way to get pixel data into OpenGL and it's a very flexible format it'll support all the standard OpenGL pixel types by Apple pixel type so it'll also support a number of rather odd types that may be useful for you you know like five six five or three three two depending on what your quality requirement is or whether you don't need a high R it don't need a deep bit depth per component you can get away with some of the smaller bites bits per pixel components minimize the how often you change your current texture so changing your current texture is actually one of the most expensive operations you can do and what that means is that changing your current texture is a GL bind call and when you when you bind from one touch to another you're you're basically just causing OpenGL to potentially reconfigure all of its texture combiners in the hardware for the new texture because the new texture it's going to require different blending modes and it can be fairly expensive to do that setup so typically what you want to do in the application if you have a lot of data is you want to group your data in groups comp with common texture types so that's the the best way to group the data such that you minimize your texture changes so scale textures to your Hardware size so again earlier we looked at finding the vram size so what you want to do is you can do some basic rudimentary math in your application and just fundamentally try to scale your your application to fit on the graphics card so if you have a lot of textures you'll need to calculate how many you're going to need necessarily on the graphics card anytime it's not it's not terribly important to get it exact but you would like to keep it within a reasonable bounds OpenGL is very efficient that paging so what you'll not want to do is is try to keep OpenGL always out of a paging mode you don't want to try to second-guess the exact size of the video memory available and wearing exactly OpenGL is going to go into paging mode because if you do that you're not gonna let OpenGL grow and utilize some of the mechanisms internally to the driver that will try to optimally page textures on and off so up until uses the internally for paging textures on and off it's called LR um ru algorithm that stands for at least recently used most recently used so depending on how committed you are how many textures are committed per scene whether you're over committed that scene it will actually switch to different mechanisms for paging textures on and off trying to optimally keep the right set on the graphics card and not unduly page off ones that are going to be needed again so that that algorithm actually works pretty well also particular to us 10 is we've built the mechanism that causes almost no CPU work to page a texture so once the texture is in OpenGL and had to get paged off it back in the system memory let's say it costs very little CPU work to get it back into the in the stream and back uploaded on to OpenGL so while it will cost a little bit of memory bandwidth while it's getting read and it's going to cause some AGP traffic the CPU cycles spent or gonna be pretty minimal so we find that letting OpenGL do the paging isn't expensive for the CPU to CPU can keep on going and as long as you're not causing too much bandwidth across the AGP you can get away with a fair amount of paging so depending on what you're doing you're going to also want to split your textures into tiles for and I've got a demo of this in a bit where if you're doing wanting to do smooth animations of some sort trying to amortize the data stream as it goes to the card and trying to keep the drawing moving while large images are moving up the stream so again if you look at the whole process of OpenGL is a big data stream if you have a four megabyte texture that's a big block of data in the middle of your stream so you can envision that under some circumstances it'd be good to interlace that upload with some polygon drawing maybe a frame here and there such you can amortize the texture upload time going across the bus and keep animations flowing so so here's a little diagram for texture management one thing that we recommend on OS 10 is to split your texture loading off to a separate thread if you're going to be spooling through a lot of textures it's a good idea to maybe spawn a thread that will do that work for you and the reason for that is that there's couple reasons one is you can utilize a second CPU and two you can utilize pre-emptive multitasking to to balance out the loading the act of maybe reading a texture from disk the cost of loading it in OpenGL you can you can use the pre-emptive capability as well as tend to spread that cost out so you don't end up with a a single point in your open Junt your rendering stream or your in your CPU cycles that are blocked trying to get this texture uploaded and processed so it's a good idea so if we look at this this is a basic diagram of how to set up a two threaded or what happens when you set up a two threaded application one loading the OpenGL textures and one doing the drawing so what happens is the first thread is loading the textures and those textures will get processed and put into the driver into the kernel driver so the kernel will have them at this point and they will be sitting in the kernel waiting to be uploaded to the card so you'll have done most the work of CPU cycles on the other on the primary thread of getting the data into the kernel and then you could have your second thread come along and issue the drawing commands and as long as you have your thread synchronization correctly organized then your data will be there by the time you need it and everything will just flow much smoother so I've got a demo of this and this demo shows this basic concept that the diagram had there so what this demo tries to show is a couple concepts one is how to balance the requirements of your application with quality and smoothness of frame rate so what we have here on the left is we have a slider that will adjust the quality of these images so for instance down here at the bottom I can get 64 by 64 textures and up on the top I get 1024 by 1024 and everything in between so what's interesting to to look at here is if you're trying to say write a screen saver for instance and you're trying to get these images up to the graphics card while maintaining smooth animation you'll see that we get a hesitation and that hesitation is because one we only have one thread doing the loading and the animation so we get a large hesitation while we spool the texture off disk we decompress that JPEG and we load it into OpenGL and give it to the driver so we can see that that this isn't going to lead to a very nice screensaver so we start looking for techniques to smooth that out and one thing we can do is we can spawn a thread and we can give the that thread the job of spooling the texture off of disk and loading it into a pill so what we see now is we see that it's a lot smoother but it's not perfect so here's where you can start deciding whether frame rate and quality are important one thing you can do is obviously if you're not needing to achieve those kinds of rates of uploading and animating you can slow it down and the hiccups are almost gone another thing you can do obviously is if you want to stay relatively fast animations is you can lower your image quality so we're still going a little bit too fast to get absolutely smooth animation but so you can see what this technique we've basically eliminated the pauses in the animation stream and we're able to get smooth animations while we're spooling through a large quantity of textures this demo actually will spool through 200 megabytes of textures simulating a fairly large scenario and then the third thing we can do after we decided of frame rate quality we can also go to a tiled mode so a tile mode is an attempt to split the texture into many pieces and to amortize the cost of uploading that across the bus I've had a little bit of problem with the tiling mode so we're gonna give it a shot though so the tiling both theoretically now is using the primary thread to load the images and and then the drawing thread is is well there we go so I've got some thread synchronization issues it's an attempt to try to amortize the cost of moving the data across to the card so with the MP case when it Susan went to the multi-threaded case we offloaded the the main thread its job of loading all of the data from disk and then giving it open Jill but what we were not able to do in the multi-threaded cases we're not able to amortize the cost of moving that image across the bus across a GP up to the video memory so so we still see a small hiccup in the MP case so as soon as we go to tile mode what I've done here is I've taken a small piece of the tile a small piece of the texture and I've uploaded one small piece at a time so I'm able to upload one small piece per frame and that way not see a big 4 megabyte chunk of data in the data stream as it goes to the graphics card and done correctly you can make get a lot of data up in the system with very smooth animations so again if you look at the different scenarios looking at the stream case so there it is multi-threaded it's a lot smoother and if we go tiled so that's a little example of how to try to get through a large large amount of texture data and techniques to get it through the system without hesitating your animation okay so now we're going to talk about vertex operations so vertex operations obviously are the process of getting actually getting a 3d data to the graphics card and there's a lots of good information about how to do that correctly and it'll vary depending on how the data is organized for your application and potentially you know what's best for your animation technically or what you're animating so if we look at the standard opengl path which is called the media mode path which uses a Geel begin end the thing to remember with GL begin end always is that you want to pass as much data as possible between the jail begin and end you want to call jail begin end as infrequently as possible and the reason for that is that there's a lot of function call overhead gl begin will try to do some card management some state management and it will induce function calls to the lower-level system so reducing the begin end is the first thing you can do to get better performance and I'll go through an example a little bit of code a little bit after these couple slides here that shows how to do that so use efficient primitives is the next thing to remember triangle strips are obviously the a good primitive to use because you get a lot of triangles pervert text if you're using individual quads or individual Verdean dividual triangles you're gonna get about three times the amount of vertex data going through the system and it will hurt your performance quite a bit if in some scenarios where you are cpu limited use vertex arrays so vertex raises the API for passing a whole strip of data to OpenGL once so it has the benefit of reducing the number of function calls you're making so you you save right there but it also gives OpenGL the opportunity to optimize how the data is moving into the stream and there can be a big win there so the other thing that you can use in conjunction with vertex arrays is compiled vertex array so compile vertex array is probably one of the most optimized paths and OpenGL for getting data through the system currently and it has the benefit of highly optimized assembly code runtime generated assembly code the deficit is is that if you are passing small amounts of data there's a little bit of overhead of logic to get into the routines so you're not going to want to call a compiled vertex array with three vertices because you're better off going to GL begin end because that's lower overhead for a small amount of data so if you have large bolt large arrays of data let's say greater than sixteen sixteen may be pushing the smaller end of it but say greater in 16 vertices per array try using compiled vertex array to probably get you some benefit so looking at a chart here that shows you primitives along the x-axis and number of triangles that you can render per second along the Y you can see that that the type of primitive can make a large impact and the number of triangles that you can send through the system so down at the very bottom is polygons polygons is a most rudimentary way to send data to OpenGL and then near the upper end of the spectrum as triangle strips so triangle strips is the best way to send data through the begin in immediate mode path and then at the very far right is compiled vertex array so you can see that Kapaa vertex array if fed correctly can give you substantial boost in performance now the green bar shows what you can do on a g3 and the blue and the orange bar shows you on a g4 there's not a huge difference but it can make a big difference ultimately in your performance and that's primarily these numbers are actually were on a graphics card that we're not did not have transformer lighting on the graphics card so for a card that does do transformer lighting it'll make less of a difference if you have a g3 or g4 okay so looking at how to potentially optimize OpenGL I've got a number of slides here to just basically walk through the process that every one should look at and when when they're trying to figure out how to simplify their code and how to make it more optimal so we start off with a basic loop that is going through setting up a smooth shaded color mode setting up a color and then going and then drawing a triangle so we're doing this every time through the loop so we're drawing one triangle we're doing a state change per triangle and obviously we're not going to get a lot of data through this because it breaks every rule we have and that is you're giving Steve changes and you're not passing a lot of data per begin end so the first thing we do is remove state changes out of the loop and that will obviously give you the benefit that now we're we're passing a lot of data we're not changing the state and we're not causing global jail to have to do a lot of state management below but we still haven't pulled any you know done in the optimizations with how we're passing vertex data so the next thing we do then is well actually we simplify the state and we simplify by just going to a flat shaded so we notice that we're not passing a color per vertex meaning the colors flat shaded triangles so we're gonna change that to flatten but then we pull the triangles out of the loop and that's an attempt to maximize the amount of data per begin end and by doing this we can increase the performance by quite a bit and in fact after this I have another demonstration to show you the effect of that it can be pretty dramatic just doing that step alone then what we do is we try to simplify the API that we're utilizing instead of passing all the data through registers we pass a vertex a pointer to the data and it allows OpenGL to potentially optimize how it's copying the data you're not doing a lot of register setup to get the data through then we take the step of realizing that what we're actually passing is a triangle strip so we we change the type to triangle strip and we reorganize how we're passing it and so now we've just reduced the amount of data going to open jail by a factor of three again getting a big performance boost out of doing a step like that and then what we do is we realize that we have all the data actually in an array so we start using a vertex array and using draw elements to draw out of that array so now we've eliminated the loop all together and we are simply making five function calls to handle all the drawing whereas if we looked at the beginning of the slides we were probably making hundreds or thousands so we've eliminated all the function call overhead and we have given open GL an opportunity to try to optimize maternally for how it's going to want to get the data into the command stream so now I have another demo showing some of that effect this is actually a pretty neat demo and so what this data what what this demo is is a spherical map mesh that's being animated with a wave motion and where we start this this application right now is in a mode where the application hasn't been tuned and the rendering hasn't been tuned and the way we can tell that is that the red bar represents the the time being spent in the application the green bar spent simulates the time being spent calculating the wave motion and the blue bar is the time being spent in OpenGL so we can see where we're spending quite a bit quite a bit of time at all these for spending most of time in the application so a little experiment that's interesting to run as if I take this application tuning slider and I bump it all the way up so that the application becomes tuned we can see we get about we go from 20 frames per second almost a 40 so we almost double our frame rate by doing that ok so now if I move this slider over here which simulates optimizing OpenGL through the basic steps I just went through the first one is individual triangles the second one here has now Pat is passing moved the begin end outside the loop and it's passing as much data as possible per begin end so we can see that we immediately get some performance out of that we can see the blue bars changing by about a factor 2 but OpenGL performance hasn't changed a whole lot it only went up about 5 frames so by doing that step we didn't get a whole lot now if we go to the top one that's using vertex arrays and again it didn't change a whole lot so the interesting thing to learn about this is that if we take the slider and we move it up for the application now we realize that we have gotten 100% improvement on just optimizing the applications so optimizing one or the other only got us a marginal improvement to 2x improvement but if I optimize both I go from 20 frames a second to 60 so I get a 3x improvement so the combined effect is very important so it's important to to realize that where the x means spin is is can't be the application or OpenGL so the second thing we can do then is like we've been showing here is to spawn a thread now if we spawn a thread and we move the green bar on to the thread we can see that now we are utilizing both see both CPUs in this this machine this machine is a dual 500 so now we're animating at 200 frames a second and we started off at 20 so we got a 10x improvement out of this and so now it's animating silky-smooth whereas before it was barely crawling along at 20 frames a second so this is a good example of example of where you can start from a pretty dismal performance and do some simple things and all of a sudden the whole application comes alive and you're getting you know 1.5 million triangles a second and able to deliver a much better application okay so okay so now we've kind of talked about the application setup basically how to drive OpenGL and all those things are fundamentally CPU oriented operations and that's a process of optimizing how effectively you're utilizing the CPU so we're now we're going to talk about per fragment operations a little bit per-fragment operations are fundamentally what the graphics card is going to have to do to convert your data from a triangle to the image that you see into the frame buffer and what types of blending or texturing operations need to be done and there's a few things just to keep in mind while you're doing this while you're programming one is to utilize multi texture instead of multi pass so basically all the graphics cards on OS 10 that are accelerated on OS 10 today have multi texture multiple texture units so if you want to apply two textures let's say you can do it in one pass you can load two textures one intersection unit-01 detection unit one and you can apply both textures simultaneously and this actually has two benefits one is that again lowers the CPU overhead because your application is not having to loop through the rendering twice and reissue drawing commands to do the second pass but the second one is is that it helps the graphics card optimize its memory traffic because you're not writing to the frame buffer on one pass and then having to come back on a second pass or write the pixel again instead you're allowing the app the the graphics card to read to text tools out of the texture units out of the textures you've defined combined them and write it once out to the frame buffer so it lowers the ultimate band that you're consuming on the graphics card so you can you can get performance a couple ways by going to a multi texture instead of multi pass so avoid when possible obviously anything I'm going to say here is just a suggestion of things to try to work your way around and sometimes you're looking for an effect and you have to do these operations but avoid read-modify-write operations on the frame buffer so read-modify-write operations are anything that will requires the frame buffer value to participate in the value that participated in the color calculation that will result and be finally put back into the frame buffer so things that do that are blending so if you're blending with the final destination of the frame buffer it's gonna have to read the frame buffer it's gonna have to read its textures it's going to combine it all then write it back out so you're gonna get 2x the bandwidth utilized on the graphics card of the frame buffer as opposed to an algorithm that say didn't use a blending mode now again blending is pretty common so obviously if your application requires it you'll have to use it Z buffering is another thing that Italy usually will result in a read modify write and when possible just eliminate enabling Z buffer so that you're not doing a read modify write on that Z buffer okay no thing that's important to to keep in mind is that modern graphics cards have the ability to do high-level coding for you so if you're drawing lots of triangles it can ahead of time : goes out and it's called hierarchical Z and hierarchical Z will very quickly take a primitive that you're drawing and throw it out and it won't result in a read modified right to the frame buffer because it knows that that's occluded through some varying techniques in the graphics card and it will save you memory memory bandwidth on to the graphics card so the way to utilize this is to render front to back and the reason you render front to back is because want to draw things that are near you and then when you draw something that's behind it the graphics card will has techniques for really very early determining if that's behind something already and discard it very early before it has to do a read-modify-write operation on the z-buffer and this is the most effective way to let the graphics card do its job of utilizing these silicon gates that have been dedicated to this and can be quite effective if you do render front to back so so some of this is a review we're gonna talk about opengl extensions and what OpenGL extensions can can help you we already talked about compile vertex array but just review really quickly this is good for large number of vertices it reduces a number of transformations it reduces memory traffic it allows OpenGL to pre compile data into a frame into an AGP buffer ready for transmitting to the card so whenever possible again use kapal vertex array texture compression texture compression is also very good extension to use it allows to you to minimize the system memory bandwidth of moving that texture around it saves you system memory itself it saves the bandwidth of moving that texture up to the graphics card and it also saves can benefit the graphics card itself by lowering the bandwidth it takes to read the texture a text aloud and to render with it and because it will do on-the-fly texture decompression and it will better utilize the on cache memory on the graphics card so texture compression can be effective it's really gonna depend on where your limitations your where your performance bottlenecks are in your application but it's a good one to keep in mind multi texture is the extension for doing multi texturing like we've mentioned utilizing more than one texture unit Apple pack pixel again is the extension for the best way to pass pixel data two of them Jill and allows you to get the the most data the best bandwidth utilization of pixal data to the OpenGL system and the other thing it does actually is it saves system memory so if you're able to store a texture in a more optimal format for your your application let's say one five five five obviously that's going to be half the memory utilization of an eight component eight bits per component texture so that will give you some system memory savings some bandwidth savings and it'll also save video memory on the graphics card so quick summary so again going over some of the priorities so the thing we always tell people as they again need to optimize their application because 75% of the time is typically spent in the application as and 25% in the OpenGL so optimizing your application is going to be important and you won't get good performance until you've gotten back to us an acceptable level scale your application to the target platform try to determine your vram how much video memory is available determine how much system memory is available try to stay within acceptable bounds that won't cause the system memory to go into paging determine maybe the number of texture units on the graphics card so you can do multi texturing instead of multi pass look at your CPU type try to try to utilize a number of OpenGL extensions that will help simplify how the data is being passed OpenGL as well as potentially give you better effects allow the user to adjust the graphics settings such that if the user is experiencing problems on a particular platform from one reason or another allowing the user to vary the quality settings such that they can get the performance that they're looking for obviously is it's gonna be a friendly thing to do for the user it will give the user control over some of the aspects of how the application runs so for more information there's two good books on OpenGL and anybody that's doing OpenGL programming should have these books one is the the OpenGL programming guide and the other one is the reference manual so these books are invaluable they're very well written and if you're just starting to OpenGL or whether you're an expert these books are always sitting right next to me on my desk so for online help there's some good resources you'll want to go to WWF injeel' org this is the official opengl web page and it's got all kinds of neat news announcements resources it has lists of applications that are utilizing opengl documentation it's got all the all the resources you'll need for finding out what's the latest in OpenGL and then there's the lists at Apple comm where you can join the Apple Open GL list and there's lots of Macintosh specific discussions going on in that list where you can participate or learn from some of the discussions that are going on there or send an email of your own and ask a particularly difficult problem that you need answered and lastly we met Sergio at the beginning here so if you have any questions about OpenGL at Apple you can contact Sergio and here's his contact information Sergio is our our product representative Apple and he can direct you to somebody in Apple if he's not the right person or help you with some of your product needs and lastly we have after this session we have advanced OpenGL rendering techniques it's a very interesting presentation that will go into utilizing some OpenGL extensions for doing advanced rendering I highly suggest it for people that are looking for new techniques and some capabilities of graphics cards today it'll show you some interesting demos and some nice effects you