WWDC2003 Session 111
Transcript
Kind: captions Language: en it's a session 111 writing threaded applications for Mac os10 as you know writing a threaded application is one way that you can maximize resources and performance on any system whether it's a single or dual processor this session will go over a lot of the tips and tricks the do's and don'ts of writing the threaded application and why you would want to do it and times when you would want to write a thread so hopefully you've been able to take a look at the resources online that developer site this session will supplant a lot of that information and I like to bring up George Warner who will deliver the presentation this afternoon thank you Mark click here ok before get off this slide I just want to touch on my new job title some people have noticed ask me about any of you don't know I was I'm the engineer formerly known as the mixed mode magic fragment scientist for obvious reasons those technologies have kind of faded out somewhat so people have asked to suggested that I update my job title excuse me so the schizophrenic optimization scientist now a little debate over whether it should be plural or not but ok so the agenda will be covering some terminologies with there's some name conflicts and some of the threading models the why the why not be covering different architectures talk about the dues in the dome will be doing some demos and then we'll break for some Q&A s so so the original thread manager was a cooperative thread manager and everything was scheduled cooperatively and we've referred to it threads when we came out with the MP implementation in 8.6 we didn't want to call them threads so we call them task now we're running on unix system where everybody knows what a task is and there's a bit of a clash in the naming space so for the purpose of today's presentation when I'm talking about a thread we're talking about an independent path of execution everybody is pretty much up on what a threat is when I talk about a process it's a collection of threads plus the resource it's basically an application or a task that's running in the UNIX world you'd call it a task but I have a tendency to call it a process and I want to point out multiprocessing is the general case of multitasking back in 8.6 when we introduced all this a lot of people I'd suggestible why don't you use a compete thread and they go well I don't have the empty box and it took me a while to educate the market and make them understand that multithreading really doesn't have anything to do with multiprocessing per se there's just advantages that if you do have another processor the multi-threading API can help you take advantage of that so so why use threads there's a customer expectation that your application is going to be very responsive that when i click a button something's going to happen and there's nothing worse than when you get the little spinning beachball so scalability like I said earlier if you actually have into a processor box customers really want to see both processors getting utilized so if you write your code threaded there's a better chance that those the second processor resources will be getting utilized so we back win when I first started working on the MP stuff these benchmark numbers came out they keep getting regurgitated and reused every year anybody's come to previous session I've seen these numbers the really interesting number there is the top one we on a dual processor box we went to 2.3 and everybody goes wait a minute how can you get better than two times the performance you want to get twice the horsepower and that's exactly what I thought when the first time I ran the numbers so I ran the numbers again and when I went and analyzed numbers and tried to figure out what was going on what we finally determined was we don't just have twice the processors we also have twice the cash so our data job just happen to not quite fit well in a single processor one megabyte cash but it's extremely well in a two mega cash and so we got actually better than twice the performance and it's called superscalar so so why use threads this is what I call the shelf space theory that the more things you have scheduled and running out there the more CP time you'll get so it's one way if there's two apps running with the same priority then they'll both get approximately fifty fifty fifty percent of the time if you have if you've got three threads running and there's only one other thread running in the same priority and you're getting three quarts three quarters of the time so it's kind of a cheesy way to borrow more time the way I would ever do that so so synchronous request talk about the spinning beachball if you call an API in your event loop and install the processor for whatever the time is all of a sudden you get the spinning beachball and your customer goes well this isn't a very responsive app I don't like this one so you can put blocking calls in their own thread and let the main thread continue running dispatching events updating the screen and let the user do what he wants to do and when the synchronous call finishes he can signal back to the main thread that hey doing this you can update the progress bar you can you know continue running on so polling is bad none of my favorite examples anyone has ever taken a vacation with a nine-year-old this is the equivalent of are we there yet are we there yet are we there yet and for me about is annoying so blocking is much better you get away from all the asynchronous calls and playing with callbacks and dealing with does it had it happen when I thought it would and chaining events etc you can just put everything in its own thread a good example of this was then i think it was 8.1 or so when we actually went to the finder copy and once one time you do a finder copy and you go get coffee because it was only could do the copy when we when what a very clever engineer i want mention his name it wasn't me now I said I did say clever engineer didn't I wrote the asynchronous copy and basically he chained all these i/o completion with this state machine is about four pages a really complicated go that gave you a headache even be in the same room with and it was really sweet never buddy like the fact that you have multiple copies going on and and all this kind of fun stuff and that was great up until 86 came out and when you took those four pages of state machine and replaced it with like six lines of code we spun it off on another thread we do file open you know source file open destination file open read from source right to the destination loop until the end of file close the source close the destination done pretty simple the only other thing going on with all that was updating the progress bar over in the finder and that was all happening behind its back so very simple code easy to maintain we let the interns do that so so when to avoid threads so you want to avoid threads winging it adds complexity if you have global data in your application that you're going to share between multiple threads and you're going to have locks on it and that can get you can introduce deadlock conditions and so you've got a program to prevent that etc so global data requires locks if it's just shared it may not require locks you got one thread that's only reading the data it may not require locking the data other reason to avoid threads if you have non thread safe ap is then there's alternatives look into that added overhead it takes about two hundred microseconds on a dual gigahertz machine to create a thread from the maca thread up okay and the preemption time is about 40 micro seconds on a duel jig interestingly enough I had the numbers for the new hardware but I couldn't put in my slide for what should the obvious reasons I couldn't rehearse to an audience that hadn't been briefed on the new numbers so on the new box it's around 170 microseconds on the creation and it's still exactly 40 micro seconds on the new box because the preemptions has to save twice the number of registers the wide registers that twice number twice the width of the registers so I was pretty happy to break even on the preemption time but we'll keep continue working hopefully bring that down some more the memory footprint each stack that you create gets the 5 and 12 k virtual stack obviously that's not using your physical but it will eat up your virtual memory space and so you want to create a whole bunch of threads and run out of memory Colonel resources every time every thread you creates majko thread has a kernel resources that we allocate to it the hardware context storage which is all those registers in it and I said 32 here but now it's probably changes to 64 about 2k per thread so and that memory is physical and it's locked down and so we want to avoid using too many of that hundreds of threads when I talked about the preemption in the 40 micro seconds that's acumen of so if you've got thousands of threads running you're going to probably spend as much time pre-empting and switching between all those threads is running out useful code so you don't want to get crazy with inventing with the spawning threads so other options cooperative threads if you've got unthread safe AP is you can use the co-operative thread manual manager and schedule those cooperatively you can use the timer's carbon timers cocoa timers etc to have task run it at predetermined intervals so it's writing architectures parallel task with parallel i/o buffers the example of this would be if you've got multiple independent tasks that don't have really really that much to do with each other I think the one of the very first multiprocessing projects ever worked on I'm going to date myself 20 years ago we are actually running a driving simulator had an AI that running all the cars and we had physics engine doing collision detection literally and and then of the Newtonian physics and then the graphics rendering thread et cetera and when I tell people this they said wow you guys are doing that 20 years ago that's pretty impressive and then I kind of qualify that we were doing about five minutes worth of video in a month so things have progressed considerably no real time back then or not much so it's real slow time so parallel task with shared i/o buffers now this would be an example of where you've got the same thing to do on a lot of different pieces of data and you can split it up into little pieces like if you've got a graphic image and you can like take little postage stamps and feed each little postage stamp off to a different processor or different task and they all do their crunching and do everything do and then put them in the output buffer and put it all back together on the other side and this is the best model for dealing with exactly that case and you typically want as many tasks because there are processors sequential task with multiple i/o buffers I call this the pizza oven some people call the assembly one of the my other engineers in DTS wrote a application that the first input buffer was nothing more than some FF specs that you got from a drag and drop the first task took the list of FF specs and for each one found if it was the file of her folder file or a folder if it was a folder it would beat it back in or if it was a file that we've been to send it to the end to the output of the first task if it was a folder it would iterate over everything and then send each file to the output the second task would take this this iterated or unflattering file specs and start reading them in and streaming them out and then the next task would take the data that's streaming through it would compress it and then the next woods would send it over a network pork and then etc cut around the opposite side of the network it would uncompress it and then it would unflattering and so it would basically do a backup across the network and with the compression and everything else we actually got better throughput than then most of the finder did doing a flat uncompressed copy across but this is an example of the sequential task where you basically doing one thing after another after another after another to the same data so threading architectures many applications have both parallel and serial or sequential execution paths so it's basically everyone has no they're at best and no which model works best with what they're doing so word processor you could have a grammar or spell check running independently of a font texture Kern rendering etc so driving simulator like I mentioned earlier you could have a is driving the other cars same times you've got a physics engine running same times you've got a rendering engine running that's putting bits on the screen so so implementations I'm not going to go into an API count or anything like that depending on which architecture for programming the end there's thread implementations and all of them java has java threads carbon has NS threads Coco has in estrus all mixed up just what they just wanted to see if I was paying attention okay and so you use the one at the pro via pure environment I have to say having looked at the implementation of these the top layers are extremely sad they're sitting right on top of P threads so I wouldn't worry too much about the overhead in that model and I wouldn't go to P threads unless you really really really really want to so so the carbon coming the common concepts between the different threading models is basically thread management creating threads destroying threads setting the thread priority or weight etc the synchronization primitives you got mutexes and semaphores and queues and all the event groups etc ways of communicating between different threads and you've got threadsafe services so this is like currently Malik and the memory allocators the file io signa swallow or else the red state etc so this is probably the most important part of my talk the do's and the don'ts is this kind of a collection of all the things that I ran into from della pelt excuse me from developers coming to me with issues and sitting down and going over code and figuring out why we're having problems but so I mentioned earlier that 200 ml are 200 micro second creation time can add up if you're dynamically creating and destroying a lot of threads so a good way to avoid that is to pre-allocate and use pools same thing with the memory and etc try to be as data-driven as possible you will write your code CPU driven so you want to get a fee as much data to it and have it ready to go when when you get ready to do typically this isn't a problem most people write their code and call this format they have a prologue where they open filed allocate memory etc a crunch crunch crunch crunch and when they're done they close their files dispose the memory and releasing etc some of the other implementations on other operating systems for some reason love doing the kill and cancel kind of things where they have one thread stopping another thread and our system doesn't play well with what I call a synchronous behavior and we didn't design it that way for good reasons and and we really if you want to take advantage of way we do things you really want to avoid using you know those kind of AP is when it when it all possible use the synchronization synchronization methods instead so uses a mutex or semaphore etc to control so when you use some afford cement or you texts on a data structure you have to kind of Qi valence between having too many and too little if you have one big huge data tree and you've only got one lock on it then the chances of someone else needing that if you've got multiple threads running it's a lot higher than if you're mult if you do your locks down in the branches of the tree so you don't want to print you don't to lock everything because you'll probably spend more time locking unlocking things than actually accessing your data so there's a trick that kind of find the balance now this is back again to are we there yet the void that's been waiting one of the common mistakes I see is occasionally we'll have a developer that wants to wait on two things at once and and what they'll typically do is they'll check one with a timeout and after the timeout they'll check the other one and with a timeout and then we'll look back and check the first one again and now they're basically still pulling are we there yet are we there yet what you can do instead is have two threads either thread you know waiting on a different event different mutex and when either one of those threads gets its signal it signals the third thread that I've got my signal and so when he gets that signal that's the or he can continue if you have to if you're waiting on two things if you want both of events to happen before you continue then use block on one and wait for it to finish and when it finishes go and block on the other one and wait for it to finish and its daughter post the second one has already happened you'll just continue running so there's no reason to there's never a reason to a pole GUI this is another one as areas i get a lot of questions on seemed like everybody that ever jump sea and the very first thing you want to do is try to do some gooey things the good news is I mean things are getting much better and in the in the mac OS 9 days it was just the definite don't nowadays we have courts you can draw output to the screen we have OpenGL you can output to the screen and if you carbonell pick on carbon here if like for example you're doing that file copy and you want to update a progress bar a real easy way to do that use post event queue and that way you can tell your main event thread update the progress bar so one nice thing about the carbon event model in particular is I can send it 14 update events and it doesn't put 14 events in the queue its smartest no I've already gotten an update event in the queue and just leaves that one in there so you don't have to worry about overflow on the queue so so I've got a demo all right this would be the not threaded version and usually then the menus are dead I can drag them in you down oops I could drag them into town finished okay turn it back off again I've got the threader running down in the bottom so you could see what's going on we've got three threads how we make our own down here we can see that okay so if I click here you'll see taking up all the CPU time down in the main thread here can't move and there's the spinning beachball so that's an example of a non response to the application if I try that threaded you can see it's running over here and the set thread over here I can drag the window around menus work and it's the behavior that your applicator your users are going to expect so all right back to the slide so as you can see always has it been a whole lot of new information the best thing thing we say we think you know what we've been working on we haven't broken anything that we know about everything that their works works the way it has pretty well and hopefully we'll continue working on that I'll keep working on the preemption time to keep that down but other than that that's about the extent of it so hopefully we'll cover anything else you want to know in the Q&A so thank you just to give you a wrap up of some of the sessions the colonel extension program is techniques that was on Monday so on the DVD you'll be able to get the opportunity to take a look at that session for contacts George Warner and DTS you may already be familiar with George through your contacts and email myself for that has a desktop hardware evangelist let's go ahead and start Q&A in the vest invite a few folks up you