WWDC2000 Session 194
Transcript
Kind: captions Language: en good afternoon ladies and gentlemen and welcome to the session tuning for velocity engine and MP I'm Glenn Fisher I run the performance marketing group in worldwide product marketing at Apple Computer and as such have been involved with lots of the engine and MP technologies for the last several years delighted to have all of you here today we have a great crew of presenters from metalworks and would like to thank them for the work they've done in preparing this session and also Chris Cox from Adobe who has been instrumental in helping us with some of the demos that you'll see today so why are we here two key technologies for Apple going forward or velocity engine and MP that the processor technologies that we've been developing make those technologies available to our customers or well in the future and we need to make sure that you're on board supporting those technologies to get the best performance out of your application and to take advantage of the performance that's there in the machine for our customers at the same time we recognize that it isn't always easy to take advantage of these technologies so we're looking for ways to make it as easy as possible that have been working closely with Motorola and Metro works over the last few years to make it as easy as possible for you to take advantage of these powerful technologies for information how many of you have actually written velocity engine code or alt of that code how many of you played around with it a few more okay and how many of you are here to find out how to get started great so that's wonderful to have new recruits to the to the crowd so what we're covering today how to debug velocity engine applications using Metro works code wire and we'll give you some specific examples of tuning code for velocity engine using Metro works code wire we'll also give you a brief introduction to code water support for MP coding and debugging and to do that I'd like to bring up Bob Bob Campbell who's elite compiler engineer at Metro Works Bob [Applause] hello I have to admit I don't actually write a lot all two-bit code and that's one of the reasons why we're very grateful to Chris Cox for providing us one of his examples I look at a lot of altivec code but my tendency is to critique other people's in and not actually write it along that line changing from scalar code to vector code requires that you you put some thought into what you're doing and standard things that applied to scaler code still apply to velocity engine so you should profile your code you should make sure that if if you're going to make something run fast or make something that you're actually spending time doing run faster so find your hotspots and look at them and think about ways to rewrite them another issue for altivec is that alignment is important the engine will run much faster if you have 16-byte aligned data then it will run if you're if you've got things misaligned then you're going to have to waste cycles realigning them another thing is because you're going to work on things in big chunks of four eight or sixteen elements you can't really have an if statement that does something different on one element that you're going to do on the other element so you need to take advantage of instructions like V select that allow you to basically do conditional assignments within straight line code with no ifs and probably a third point is that one of the most powerful features of altivec is the data streaming instructions so if you're writing code that's going ripping through memory then you ought to tell the CPU hey I'm going to be going ripping through memory and I'm going to be reading you know every so often tell it how much you're going to be reading and what your stride is that way it will read the data into the cache before you even get there as the example that we have that we're going to show you today you can almost plan on restructuring your code algorithms that work well scale early may be significantly different than the algorithms that you can to work on in vectors example that we have from Chris Cox is a rotate example and the rotate example was interesting because you change the way you move through your data in order to enable you to speed up and we'll actually talk about that when we run the example another point about altivec is that you know you see this great instruction and it'll it'll do some across to let you add four values together and produce a result well you can't just go stick that in every time you need to do that one instruction you need to be doing more than just using one vector instruction because there's setup overhead and you won't get back your setup overhead if you're only running one instructions so it's not a good idea to write a bunch of little tiny and functions that call one single altivec function and kind of in that point one of the things that we've discovered is that there's some there's some bookkeeping regarding saving altivec registers so one of the comments has been if you're going to do a bunch of altivec code there's a pragna you stick in at the beginning that says I'm going to be using all two bit code generate all your code with all the function calls and then turn it off at the end that saves some of the intermediate bookkeeping regarding saving the registers because it's sort of like sets it up at the beginning and cleans it up at the end and last is you need to look for parallelism in your algorithms places where you can do four things at a time or eight things at a time instead of looking for sort of iterating through one at a time at that point I want to bring up Richard at well and we're going to run through a demo of some rotate code from Adobe oops and I'm in charge of the monitors so so here basically Richard is a has launched the program in the debugger or week the beginning here yeah we're to be and you want to run to the first breakpoint the part that's interesting is the original rotate algorithm just went through rows at a time writing out columns at a time so it's a 90-degree rotate so if you're going along like this and you're writing down like this can you make the font bigger richer the change to do for the altivec algorithm is instead of to think about working in rows and columns is to take the input and break it into tiles where the tiles exactly fit basically 16 bytes by 16 bytes squared so we're going to do is we're going to load a 16 byte by 16 byte square into altivec registers rotate it in the registers and then write it back out where it goes so instead of going along word at a time we're going to grab a chunk we're going to rotate it and we're going to write it back out so actually Richard is here at the at the function which is going to do this and I'm not really sure how I'm defconn want to go into the instructions but it essentially does want to step through it oh we were going to show off Richard wants to show us some debugger features easier what do you works on so we've got a nice register window you can look at all the altivec registers you can step through code you can watch the registers as they change as it loads them so basically at this point he's going to do one more step and it's going to like basically load it all loaded this whole tile into memory and so then you're looking at Chris going you should explain that stuff but it's actually pretty neat algorithm because instead of doing what I would have done which is individually move the elements within the vectors around to the right spots he used a trick with the merge instruction to sort of merge partway and then second merge which puts everything exactly where it wants to go and if you really want to know are we going to make this code available weekly okay so we won't we won't promise that yet but it's it's a pretty nice thinking about the problem differently and looking at what all that can do and instead of saying you know move my elements individually I move I move half my elements halfway where I want them and then the second set of merge instructions moves them the other half scroll along so basically that set of eight instructions causes everything to rotate and I think Richard wanted to show a few debugger features look can hear me now okay so so one of the problems that we have was how to represent the vector registers because they're so large so what we decided to do is make use of the struct paradigm that we already have in the variable view and you can take a look at all of the scalar elements that live within the vector elements and you can modify these things individually in order to help you with your debugging so because we have the ability to do that we also wonder what we can do for breakpoints and we have a conditional breakpoint feature in the IDE but because of the way Motorola specified the struct as being anonymous you couldn't access the elements on the side we had to invent some syntax for you to get inside those things so if you look down the breakpoints window here we have a conditional breakpoint and we're going to set a breakpoint on line 32 and it's going to be on the source 1 variable if you take a look at the syntax its SRC 1 dot followed by what looks like an array notation this is just something that we've borrowed because we're using this array like notation to indicate the scalar elements inside each vector so what we can do is we can change the PC and move it up here if I go run it's going to hit the breakpoint I change it back again and change the condition should bypass the breakpoint as it did any other deflector features the vector register windows so huge so we made it scrollable and your small on real estate on your power book or whatever now we don't have do for power book yeah but when we do you'll be able to you know grow the window and expect register values and then shrink it back down and it remembers the position you had it scroll to before so you can kind of you know manage the debugging a bit better and not let the debugger get in the way of your programming okay so could we quit the debugger and run the non debug version sure I want to show because basically Chris was kind enough to set us up with with an example that runs the scalar code several different ways and then runs the altivec code so we could just run that and of course it's going to build it the scalar code was what I was talking about earlier where it goes through memory bike by row writing out columns then an additional one that was tried to do it away was writing out by rows but reading by columns sort of inverting the way you do it to kind of look at the differences in the way it walks through the way it walks through memory so then you know and I can't hardly read that no it's not your fault it's our terminal window the first the first attempt up there took what four four point three seconds the the second attempt took five point three five point two and the third one was five point one those were all scalar attempts of different ways of doing it and the one thing I like about this is the fact that Chris very religiously has a base case he writes a new case he keeps the base case and he always keeps comparing the times to make sure that we're actually getting better the fourth case is the vector case which is the one that we step through in the debugger and that one took what does it say there two point seven two point seven seconds so not quite half but the third one which is actually the interesting one was where he combines all the features he added in the data stream touch instruction where he sort of tells the CPU which memory he's going to read ahead and that one runs at one point one point six which is what did we get to we got better than half right yeah so it kind of shows you that that there's two parts to the velocity engine one part is the vector unit but to get full performance out of it especially for data streaming operations you really need to take advantage of the cash ins and starting to tell it what you're going to read ahead of time I'm sure Richard you want to open open the altivec No yeah you're right go here there so it's basically the datastream touch instruction it takes the pointer it takes a cash pattern you need to read through the manual a little bit and what's the one the one on the end the last parameter oh that's right this is stream one I should know this so anyway that's what I was going to talk about I was actually going to save stuff for QA and I want to bring up let's go push the right button push the right thing bring up Ken he's going to talk about MP debugging well we've been doing a lot of work with the ID and the debugger trying to carbonize it and get the tools ready for the pro 6 beta CD that we've got here but we would also kept Richard really busy working on other Apple technologies to all the elta vet debugging support you saw and then also support for the newer flavors of the MPD buggy or MP api's and we had some MP support in the debugger before but Richards really revved it and and made it work with the newer api's there's a new Metron of extension and a newer and newer plug-in and some new user interface he's done too so see we've already you've already seen some of this in the demo but just to recap there's the vector register window for alpha BEC you saw how vector registers and variables can be shown as strux for easy viewing of those and then the work we've done to support that and the expression parser and also for conditional breakpoints for NP we already had sort of a paradigm as having separate thread windows in the debugger you can also view all the threads in one window with the pop-up menu if you want to look at them that way and so we just use the same paradigm for EMP tests we also list the empty tasks along with the cooperative threads inside the processes window so you can look at specific processes and see exactly which which MPTs and which cooperative threads they have and then we've also tweaked the registers window interface a bit so that you can look at registers for separate MP tasks and in separate register windows and so now I'll go back to Richard for a demo of some of the things I just talked about there we go okay so if you were in the session this morning on Mac OS 9 and multitasking George Warren was up on stage and he showed you this little demo called closed for you and it's a regular Mac OS program that's drawing into this window but all the drawings are being done by an MP task so if you try to do this without MP tasks you probably couldn't keep the interface alive by dragging down the menus and and so forth so you know what we want to be able to do is debug something like this so previously we had we had two versions of metronome we had an empty Metron up and we had a regular metronome and we split those out a few releases ago because some memory issues Apple has fixed all those things and since they've basically jettisoned the old MP 1.4 API we've decided to jettison support for it too so we combined the extensions together so there's only one Metron up extension for debugging so you don't to swap this thing out you empty stuff and it makes it a lot easier so so I'm going to debug this thing so the program starts up and first thing I want to show you is the Preferences in the processes window so we can see close you've empty under control the debugger here we've got the main thread which is a cooperative thread and it's suspended and I'm just going to step through into the code to the point where we're just about to create the empty tasks to do the work so can anyone read the text ok I guess I'll make it a bit bigger here we go so what we're going to do is we're going to step over this line and it's going to create an MP task and MP tasks are created in the runnable state and in the main blitt routine that's going to draw into that window I've got a breakpoint set so what should happen is I should stepped over this thing we're going to get another thread window and we're going to hit the breakpoint so here we go so let me just resize the window here because we had little bug so under the control the code where your debugger we have the co-operative task that was drawing the interface and we also control the NP task so you know these things are running completely independently of each other so so what I can do is I can step in the main task then go back to the empty task and I can step independently the two so let's get the main task running again let's stop the way next event so I'm going to take the breakpoint off hit run so we've got the program running back here again except what's happening is we've got the breakpoint still in the empty task so we're not getting any drawing happening but the cooperative task is free to run and that's what's drawing the window letting us drag it around so I'm going to go back to the window here move this around a bit so you see in the corner so the blitter is running in a loop so I'm going to set the code where your debugger stopping every time it hits the breakpoint and you notice that the way this thing works is it updates the display underneath the mouse so as I move the mouse around hitting resume we're going to get the display updating so let's go back to the process is a window so we can see that we've got a few tasks so we've got the co-operative tasks we've got our MP tasks which reads is stopped and the tasks above it is called the death watch task it's something the OS crates and it tears down all the MP tasks when the application exits so let's go back the breakpoints window here and let's temporarily disable the breakpoint get the MP task running again and that's updating let's set the break point again next time we've lit stop so basically we've got empty 2.0 support for debugging footwork [Applause] hey kill that's ready no dates that's it okay thanks Richard mostly this stuff is a little bit newer and it's going to be finished up next week and then put up for download yeah so if you if you get the beta 6 tool CD that we have here at at WWDC and then just check check our website in about a week and you can get the same stuff that Richards been demoing here well we all have them outside the door after this session is over I'm sorry we have the same architecture for showing multiple threads whether it's Java or empty but it's up to Metro note to actually figure out you know what the threads are doing okay that's it for our MP demo so now here's Godfrey howdy I want to thank our Metro works friends for coming and showing us this stuff we left a lot of time for Q&A today so that we could feel their quest feels your questions and just just run it kind of informally from that point so without further ado we have three mics set up people should queue up at the mics and a little road map of some more sessions this afternoon we have apples performance tools from echo s10 and tomorrow we'll have a debugging session in the main hall at 9 a.m. so we have people queuing up put all of our presenters please step up to the stage you