WWDC2003 Session 100
Transcript
Kind: captions Language: en good afternoon and welcome to the i/o kit session I'm Craig Keithley I'm the i/o technology evangelist and Apple's Worldwide Developer Relations group when we started about five minutes ago or 10 minutes ago I could have probably identified everybody in this room it's hard to go up against Xcode one of the things that's toughest is properly architecting and writing IO kit kernel extensions to get optimum performance you can improve on this you can prove on your techniques by doing multi-threading as a common well almost want to say misconception that you can't do threading in kernel extensions you can and to go into that will bring up Godfrey good eye my name is Godfrey van der Linden I'm a iokit architect and like the vast majority of people who I expect it to be here I would probably prefer to be an X code right now myself another thing is the the handouts suggest that I shall be talking about memory even though memory is interesting most of this presentation will be about threading the new piece of hardware has got some interesting memory issues and I will be available in the porting lab after this and on Wednesday if anybody wants to talk to me about how to set up memory Maps on the new hardware but there's no formal session on memory on in this presentation okay so as an introduction Mac os10 has something like a hundred threads running in the system at any one time even when it's idle I mean I was running top on my system today and it had a hundred 40 threads going s what makes kernel programming so much fun when you've got a hundred 40 threads operating inside that environment at one point in time it can lead to some interesting mind-bending problems that just is why I enjoy kernel programming a lot so in this session I'm going to be discussing threading generally how sort of high-priority threads work inside the system and also I'll be talking about iokit threading and then finally I shall discuss the teardown synchronous teardown of device drivers when you get a hot unplug it's sort of it's not really threading but if you don't know how to do it properly it's very easy to get nasty explosions so what I'm hoping you'll learn is a better understanding of how thread schedule on Mac os10 how Iook it does the synchronization I get synchronization model is very unusual I've never seen it in any other operating system and probably because it's my invention I think it's very very cool and also Mac OS 10 is hot unplug we took to attempt so that before we got it right and I don't think that's really been presented before so I think it's really it works very well so this is its presentation okay so the the first part of the presentation will be on threading specifically how threads work inside the operating system I won't really be talking about threads and the kernels so much as how threads interact how the scheduler interacts with it how the dispatcher works Mac OS 10 doesn't differentiate between kernel threads and user threads yes we have different priority bands but we don't really differentiate the way they operate so over the next few slides I'll be discussing the thread priority bans how the dispatcher works what the scheduler does what it means to take high priority because it's different to what most programmers think and also priority inversions which is a very standard problem that we're having now so these are the priority bans as you can see there's quite a few of them and there's a few there that aren't really threads at all the primary interrupts and idle threads aren't really threads but you can consider them thread context at least to the extent of when a primary interrupt is running no thread is running and also when the idle thread is running by definition no thread is running the bands that we would like you to be in is the regular user area but most high ends hardware and let's face it that's what I orchid is all about tends to have tighter requirements and they'd like to go for higher priorities now we're experiencing a lot of I call it priority arms races people are going for higher and higher priorities and it's really degrading the overall system performance so I'm hoping in this presentation to convince you to get out of the real time band if at all possible and down into the top of the user band because I think that's probably the the best place for most of us to be okay so Mac OS 10 is a dispatcher based system there is a subtle difference between what a dispatcher is and what a scheduler is we do have some sort of scheduling that we're basically a dispatcher based system what that means is that dispatcher takes the current thread that's executing clocks it then selects the next thread and runs it so what is blocking mean well generally a thread and our operating system as I said earlier we I had 140 threads running but in fact they were all blocked waiting for some user events waiting for some IO to complete waiting for the lid to close or open or for the battery to run dead for all I know and that's what we mean by blocked in our operating system those threads are put on to wait queues and we really just ignore them so yes we have 140 threads running but they're really all asleep which is the best way to have a thread in my opinion the next thread thing that happens is a pre engine so what is the preemption it's essentially you've used your quantum and the system says okay I'm going to put you on the end of your priority bans run Q we'll discuss that in a second and then I'll select the next guy to run every time the dispatcher is invoked it selects the highest priority thread available and runs ER so that's really has some very potentially nasty side effects if you have an infinite loop at high priority and that high priority thread runs to completion then nothing in lower priority will run so let's say you overload the real-time band remember earlier the real-time band is the highest priority band in the system if that overloads then you're not going to get any time at all for the IO band and the IO band is probably where you're trying to store your data onto a disk or take it off a disk so you've just stopped the system from doing what you need to have done and that's probably not what you're after so what does the scheduler do well it's job is sort of an oversight committee we do have a scheduler it will get better in the future the scheduler we have right now is essentially for time share when your thread has run for long enough we will change your priority down a little that doesn't necessarily mean you're going to start running slower at least not straightaway if there's no other thread that's competing with you that's runnable then you're going to continue going and it won't make any difference if your priority is down however if you are competing with another thread the scheduler is job is to try to make sure that the system balances its loads appropriately and the other thing is that that aforementioned spinning real-time thread and we had this problem early on with the system spinning real-time threads will not give any time to the system including to the keyboard so that you can stop it so one of the jobs of the scheduler is to say hey this real-time thread is taking far too much time in which case it will taint it over too timeshare and then say oh by the way I've been running from eight seconds so it gets depressed very quickly which is a good thing because it means that you can use kill -9 and get rid of the thing it came quite late really the original development of Mac OS 10 when we got the real-time phrase that was quite a common problem because you know everybody's written infinite loops and a real-time thread infinite loop meant taking the big hammer out and hitting reboot and that's really really painful when we didn't have journaling file systems okay so what does this usually do essentially the communication mechanism between the scheduler and the dispatcher is the Run queues now earlier I mentioned that the dispatcher finds the highest priority Fred that's runnable in the system and then runs it well that's the Run queues logically you can think of it as run run queue / priority in the system and then if one thread is runnable it's in one location in the run up queues and the scheduler just manipulates the locations in the run queue also the scheduler collects statistics so that you can do things like top and latency and a number of other tools so that you can find out what what the system is really doing for and on your behalf so in a time share thread once the thread has run for so an example of what the scheduler does is time share thread as I mentioned if your thread has run for sufficient Quantum's we will drop your priority dropping your priority isn't as I said really a bad thing it's only sometimes bad if you're using so much CPU power and some you need that much CPU power and another thread comes up and say another task the user launches another task and then your thread will fall out well you know the user did launch that other task perhaps he really does want that task to run so let the timeshare do its job except for when you were certain that the user really really cannot afford to let any CPU go to the other guys in which case you would use different things and as I said the infinite misbehaving real-time threads is another example of what the scheduler does so what does high-priority really mean there's nothing really that can make slow code go fast if your code is slow high priority will not make your code go faster you will get slightly more cpu time but it it really is measured in percent maybe one or two percent more CPU time higher priority won't give you high faster code and the only way to get you faster code I'm afraid is to run your code through performance analysis and clean it up it's very easy to write bad algorithms unfortunately what high priority does give you is it gives you a reasonable chance of running with a very low latency so your thread is blocked midi event comes in for example and Mac OS 10 will probably get you a thread run in around less than a millisecond on average our max jitter I haven't run this for a while but the last time I saw it the max jitter for a real-time band in no competition was running at about 600 microseconds or something unfortunately in the real world there is always a little bit of other competition at that high band so I think we're running our jitters at around 3 milliseconds again I'm not sure exactly what those numbers are of course when you have high priority then you can end up very very easily using so much CPU time that you're not allowing the low-level paths of the system like disk i/o anytime at all it's a bit of a shame we had recently a developer raised a problem which was they were using so much higher priority time that the firewire thread the firewire work loop which is an i/o thread wasn't getting sufficient time to even acknowledge packets on the bus and when that happens you start getting weird little disk errors and the system itself hasn't really got time to clean up because you're using all of the time at high priority we call that a priority inversion and priority inversions are really hard to get to get rid of and this is really the biggest problem with arms races if you in a priority inversion that's going to cause some problems and it may take a very fundamental redesign so the way you've set up your workloads so how do you decide your thread priority really it comes down to exactly what your latency requirements are it's not what performance you're after it's what your latency is user interface events for instance a keyboard for a midi sequencer or something like that really does need very low latency because a human has said I will move my finger and if the human system the the detection system is aware can't hear us the sensors don't hear it within a certain amount of time then the keyboard feels wrong and that time is very short I mean for computer time it seems enormous it's about five milliseconds but five milliseconds isn't really very very long on a modern operating system especially because our standard quantum is 10 milliseconds so the higher priority that you want the if you really do need that extremely low latencies that's when you go for high priority but you really want to be certain that you need a very low latency if you're reacting to data off the internet well frankly it's who cares what the performance is you're dealing with 30-second timeout so anyhow now I'm not suggesting that you go time share I don't think tie chair is appropriate if you're doing some sort of stream based information processing however you probably don't need to be real time because the whole internet itself is arbitrary and finally there's sort of low latency it's stuff when you're waiting for local resource or something off the disk a local firewire or something like that it's sort of low latency without being ultra-low latency so that's how you would use your bands I would suggest for extremely load you would use the time constraint stuff the real time banned for who cares I would probably use hi use herb and possibly below the carbon async threads but you can play with it a bit and then for low disc stuff I would suggest that you go to the top of the user bands and you disable time share altogether these are all things that you can look up on the ADC websites to find out how to do it there are priority inversions I want the highest priority except for when I don't party inversions can really happen almost anywhere in the system but the most common ones we're seeing is again the real-time bans Mac OS tens real-time band is very very good as extraordinarily powerful but unfortunately to get a really good low level maximum jitter we've had to give you enough power to hang the system effectively and that means that your code now has to be far more complicated because you have to work out how to back outs your high priority thread to give the rest of the system some time now traditionally on most operating systems Mac OS 9 and Windows for instance iOS really high priority there is nothing you can do to get them out of the way and you would have to take whatever jitter is around with Mac OS 10 we have deliberately chosen to make the real time thread the highest priority threads in the system even higher priority than iota which gives you extraordinarily good jetta characteristics but it comes at the cost of complexity so there's a couple of priority inversion strategies the best priority best strategy of all is get out of that high priority band if you're experiencing priority inversions drop your priority if you can if the Deuter is appropriate examiner there are some wonderful tools in the system my favorite is latency latency will show you a histogram if the performances in the system if you can evaluate and have really hard numbers for what performance you need latency will let you know what priority band will work well on your target system so if you can lower your priority that's the best thing ever if not you're going to have to complicate your algorithm you'll need to split into produce a consumer model where you have small amounts of work to be done at very very high priority and larger amounts of work done at low priority so for instance if you're streaming off a desk you would have a low priority thread in your system and you know I don't usually recommend having multiple threads but this is the time to use it you'd have a lower priority thread in the system that's feeding a high priority thread but at the cost of introducing some latencies the high priority thread would just take whatever data it needs when it's available and that way you get a producer consumer it's pretty good it's complex it works very well indeed the worst choice it's really bad because it doesn't give you 100% of the CPU is to deliberately say I'm going to let the system have some time so approximately every 10 milliseconds or there abouts one buffer every two buffers or however it is that your workload is divided go to sleep for a millisecond cause you sleep for a milliseconds and then you will guarantee the system or at least some other threads sometime I don't like that the problem with this solution is that if you're not competing with anything else then that extra 10% or 8% after the system is used it is gone and you can't use it and you're only doing that to save yourself the complexity of a good producer/consumer queue or lowering your priority in the first place you see if you lower your priority you can use 100% of the CPU there's a really nice anecdote iTunes started early on with timeshare threads for it's ripping and the thread actually drops in priority because it uses 100% of the CPU one of the cool things I'm not sure if you've done it I've recently done it with the AAC encoding I was ripping my entire record collection over to 128-bit AAC and the system was very very performant while it was going and I never had any idle time on the system at all it was just 0% idle that's because the ripping was a very low priority so look out for lower than regular priority is actually a good thing if you're using a hundred percent of the CPU anyhow that's just the introduction to threading there is a lot more to be said I could talk for hours but unfortunately I don't have it we'll have to move on the next set is work looping I a kid this is essentially how I occurred does its synchronization I shall be discussing the work loop and the event sources in this part if you're a traditional iokit driver this is the mechanism we're recommending and it's really quite hard to avoid now unfortunately work loop itself is an unfortunate name the the way it was originally designed we did have a thread that all iOS went and we could guarantee single threaded access to hardware because we only had one thread that talked to the hardware but the difficulty is that the i/o systems were taking context switches which was slowing down all i/o so we came up with this idea of the gate and the gate allows us to schedule IO on hardware directly without having taken a context switch and you know what's a gate well the gates a lock it's a recursive lock it's not really very complicated at all and it's sort of obvious but it took us a while to come up with it and it made a big difference in our performance so what a work loop really is on our system now is it's a container for the gate which is a recursive lock it's a list of event sources that need to synchronize with respect to that lock and by the way it has a thread yes okay it has a thread in fact the threads optional one day in the future I'm going to get rid of the thread and only do it if you have interrupts event sources so a work loops gate the single threading is provided by the work loops gate being closed across all event source action routines I shall define what that term means in a little while so traditionally eunuchs traditional unix solution for MP is to have one big like one goober lock that protects the whole operating system so whenever you need to do anything you would take the Euro burlock and then you would be safe until the uber lock gets dropped and there would only be one lock and naturally you get contention and only one thread could one on the system at a time the other end is muk muk has hundreds and hundreds of micro locks and extraordinarily complicated locking hierarchies so that you can make sure that you get locks in the right order and it's got lots and lots and lots of tiny little locks which is great but they're very heavy it also is extraordinarily complicated locking hierarchies are nasty and they have to be done in one direction which means for i/o systems completion routines are painful so we needed to come up with something different what we came up with is the workload we schedule we have one work loop one gate as it were per major interrupt delivery as part of the system so a pc i scuzzy card for instance has a work loop a USB controller has a work loop a firewire controller has a work loop so on a typical running system we have maybe 13 work loops this is a compromise between the hundreds of micro locks that mark uses and the to uber locks that BST users it turns out it's very very powerful because this allows us to deliver completion routines so all of our drivers stack on top of this one lock so by far the majority of our key drivers as I say don't create their own work loop they use their providers work loop now if you've used iokit burning lengths of time you would have seen the client provider model and the client provider stacking and you will see that this statement is recursive if I call my provider and the provider also doesn't implement get work loop it calls its providers and eventually you get down to the bottom of the system and you say hey here's the work loop use this so high-level drivers always synchronize against the bottom of the system as I mentioned earlier only PCI devices and motherboard device drivers 10 to create work loops in most cases your hardware will not need a work loop and is probably better if you don't create one in fact if you do create a work loop that builds on top of another work loop you can be in for a whole world of hurt talk to I'm sure we'll have a raid developer around here if you want to see somebody who really experiences pain discuss device teardown with a raid developer so you can use the systems there is because the statement is recursive there has to be a way of terminating the recursion there is a system work loop that you can grab hold of just by walking down the stack eventually you hit the roots of the provider tree and bingo there's a work loop it's not a bad work loop to use and we really do encourage you to use it because we'd like to limit the number of threads in the system this is a good thing for system performance however if you're using it's a shared resource so don't be too greedy with it if you need if you expect a lot of interrupts to be used or you have very tight timing requirements it's probably better not to use the system work loop but to create your own so an event source an event source has an action routine which I'm now going to define but essentially it's an action routine as synchronous with respect to the work loop all event sources have an action routine and an owner and usually registered on a workload in fact an event source is really only meaningful when it's registered on a work loop but of course people can temporarily register it and then remove it and register and remove it because there's a fairly lightweight operation registering on an event source an action routine it's just a call out function when you create an event source you're saying to the system I expect this event to occur at some time in the future and when it does call this function and that's what an action routine is all action routines in the system are synchronous with respect to all registered event sources on a particular work loop I mean if you're familiar with Java you may have seen Java's synchronous routine concept where you can have a number of routines in a class and you say this is a synchronous routine or these routines are synchronous with respect to each other only run one of them that's how I think of eventsource actions all of the event sources up and down the entire stack are synchronous with each other now that sounds as though it's a recipe for contention but it hasn't proved to be so far but there are some tricks there that you need to be aware of in general don't go to sleep while you're in an action routine very bad things happen again we recently found a driver which was going to sleep in an action routine for eight milliseconds and that introduced eight milliseconds worth of latency we do have ways of pointing fingers in the system so you won't get away with it for any length of time okay and when you register an event source with the work loop and you generally just do is service : get work loop and that's the mechanism that gives you the entry into the recursive statement or saying that's how you find the workload one of the things actually will cover that later okay so the first event source the most hardware pci hardware developers I was about to say real hardware developers which is the side of my background unfortunately think of is okay how do we get interrupts because it's what it's one of the fundamental things that vary from OS to OS our filter interrupt event source is the mechanism we recommend for firewire app for PCI Hardware the event source is used to deliver hubber interrupts to a driver it takes the interrupts causes the work loop to schedule this is the this is the only thing that causes the workload to schedule in fact so the primary at primary interrupt time it's very quick it just comes along and increments a number and it says hey work loop you've got some work to do kick and then it goes back to sleep again which automatically gets back into the dispatcher that I mentioned earlier the dispatcher says hey look I'm looking for the highest priority thread in the system and it's a work loop the work loop start scheduling so the latencies are very very short and the filters generally don't have to do any work at all but we do recommend that you must always implement a filter because you don't know if your hardware is going to be in a shared chassis or not and when you're sharing interrupts event sources it's a very good idea for you if your hardware supports it to say hey this wasn't me just return false from the filter now the action routine is synchronous with respect to the mic loop you're going to see this statement a lot but the filter is totally asynchronous it's a primary interrupt you have to do special things to stop it from coming out which is why I would recommend single producers single consumer queueing or something of that nature with the filter routine you need to synchronize with the filter routine you've you've got to be very careful okay so now the other major event source is the timer event source there's lots of reasons to use the timer pole mode drivers which we don't recommend but people are doing it so that's one of the reasons for using it but the most common one is Hardware timeouts oh dear nothing is responded in 30 seconds I have to do something I owe kit timers the timer event source is built on top of the current flash thread cool api's they're very wonderful api so highly I just love them they're very very lightweight and they're a great solution there is a problem though if you remember back to my earlier diagram thread cool threads of very high priority they're higher priority than work loops which means if your timeout and your interrupt occur at exactly the same time the timeout will schedule first so best thing check to see if your hardware is done in the timeout code and if it is fine you've beaten the interrupt before it got delivered if not a timeout is occurred okay here I have to make an embarrassing admission this is my bug it's been my bug for a long time now and I will fix it soon there is no synchronous way of canceling a timeout really it's just painful it's embarrassing there's not I'm turning red up here the safest way to delete a timer is to let the timer expire and then on another thread delete it don't rearm the tie sorry I have to give you the warning because it is the big caveat with these things but it's a really a problem and I'm hoping to fix it but I can't go back with some time and fix it and Jaguar and cheetah serve I'm afraid if your drivers have to run back in time in Puma and Jaguar systems then you are going to have to let the timer expire and guess what the timer's action routine is synchronous with respect to the i/o work Lou same as usual ok the command gate command gates rather interesting a lot of people think it's a lock it isn't really it's just a sort of container a pointer to the lock that is in the work loop remember I said the i/o work loop should be called the work gate or the command gate gives you access to that work loop so for all command gates on a particular work loop only one there is still only one gate command gates allowing you to run code synchronously with respect to the workload but without a thread switch it just takes the gate allows you to run some code and then you will drop the gate fairly quickly now I admit that the run action run command API is clunky especially if you use to writing locks and just saying hey take the lock drop the lock take the lock drop the lock you know it turns out run action has really come to our rescue several times first of all debugging recursive locks where you mismatch the lock unlock pair is really painful so with run action you can't get it wrong because it's a subroutine it just says take the lock call the subroutine return the lock on the exit path there is no avoiding it so you can't get it wrong the other thing that it gives you is that it gives you when when you use show all stacks and it's a really wonderful command for tracking down dead locks and other problems that are running in the system show all stacks will show up run actions they will be there on the system and you we have caught so many dead locks because of show wall stacks and run action is there on the system whereas if you just take a lock you have to memorize everybody else's dry even once you don't write and say oh look this this routine 15 levels down in the stack it takes a lock and I know that because well I can read minds run action you don't have to read mais there it is it's in the back-trace you know okay this is the really cool part about command gates it's command sleep command wake up another thing that it sort of came a bit late it's when a client thread is calling into your driver it often says hey I want some data and your Hardware hasn't got any data available yet for streaming for whatever reason like the device you're talking to a slow so what you can do is you can block the client thread by calling command sleep and it will block until some event occurs now this is in fact the mechanism I was talking about that the dispatcher uses this is how you block a thread until some event occurs now there's lots of other ways of doing it but this is the one that's built into the way the command gate does its job data acquisition drivers are a typical case for this we don't really have any hardware direct call-outs one of the most common requests we got is well we can't write our application because the interrupt routine don't call out into user lands well no we're not going to call out into use land we can't allow that thread to disappear into some code that we don't trust the command sleep command wake up gives you something that is very very close to that if you have a sufficiently high priority thread blocked in command sleep then when you take your interrupts to routine and your hardware turns up and says I have some data available the scheduled using command wake up to wake up the thread it's just so fast it's amazing so you can use command sleep to emulate interrupt call-outs out to user Lance have the user provide a thread it's your application you provide the thread block it in your kernel extension using commands sleep very lightweight wake it up using command wake up so that's it for our work loops what we're about to do is how we use this stacking model of
- synchronously tear down device drivers
so oops sorry off-by-one sorry I was good okay so I remember I was mentioning the locking PST does it's working with currently does it's locking with funnels mostly it doesn't affect my acute developers however kernel extension developers generally must be aware of the funnels there are two funnels in the system we do share if we go dual CPU you can issue an i/o on the network funnel or one processor and on the system funnel on the other processor and it's al compromise on the traditional BSD overlock funnels are good but they're really not locks and this is not the right floor on to discuss funnels writing funnel code that can switch between the system funnel and the networking funnel is difficult counseling impossible because NFS works but it's bloody close to impossible funnels can cause lot delays on work loops though so you do have to be aware of it if you've got a piece of hardware that's delivering into BS dB at the TT of the serial ports the disk drive system or the networking system you must be aware that those completion routines will probably try to take a funnel those funnels are going to cause some sort of latencies because there's only two of them okay now we can do a synchronous device teardown so although my device is gone this can cause nightmare tearing down a stack is just so hard and this animation I'm hoping will demonstrate what's going on as you can see here I'm just trying to emulate the stacking that we have in our system so far on the left is where your bus is let's Korell a USB bus and on the right you have the client thread running so the first step is we got to tear down the buses detected the device is gone and this is how we implement it and first didn't work real well we disappeared the device but at the same time we're on an MP system a client thread has just come down and it's issued an IO request it's a bit of a problem because they're going to meet eventually and when they do you get a panic and very bad things happen when that happens no blue screen of death whatever panics are really hard to debug and this particular one is nasty because everything looks perfectly alright but your hardware is crashed and it's not really obvious so how do we deal with this well we do it synchronously I guess that's obvious our solution is to use the work loop stacking this is why drivers really can't opt out of the work loop system not if they want to do dynamic unloading and most of our developers like the idea that they can unload their drivers so it means you have to be at least partially aware of were clips to do unloading what we do is when we get an unload we will tell the nub that has disappeared to terminate and the terminate does a fewer things like it goes recursively up the stack making marking everybody is inactive it does that through request terminate but basically it calls a function called do terminate and do terminate is a recursive function as you can see as I've implemented here in pseudocode it essentially just does a headfirst recursion with will terminate calls and tail recursion on did terminate you can rely on will terminate messages turning up in your driver before any of the clients know to forget will terminate and you can rely on did terminate after all of your clients have got their notifications so your responsibility and will terminate is to if you it sort of depends on where you are if you're an intermediate driver Eames you have a series of commands that you know are outstanding and they're in your own queues you haven't handed off to the next driver down then it's your responsibility to return those io requests with errors immediately if you have client threads blocked in your driver on command sleep command wake now the command sleeps you should return those immediately with an arrow as well wake them up and notify them that they're going to wake up with an offline error the error we use is K IO return offline and by the way if you're higher in the stacks and you start seeing offline errors coming by you know what's happening now somebody's got to will terminate and you can expect it will terminate fairly soon by the time you get to the top of the drivers FAQ it should be expected that all outstanding iOS and block threads ideally have been returned so that makes the top of stack drivers job much easier notice we haven't torn anything down yet all of our pointers are valid one other thing with will terminate is you should be returning errors if possible given the API immediately if any other IO commands come down while you're doing this you should be returning errors after you see will terminate okay if the driver is on the top of the stack you're expected to implement did terminate no top of the stack varies you are top of the stack because there is nobody on top of you which means when you're tearing down eventually you're going to be top of the stack again now indeed terminate you must stop all future calls down to your provider you must wait asynchronously and that's a bit subtle for all provider calls to return so if you have threads that have gone through you then you should be aware of those threads and you should not call closed on your provider until all client pools have gone through now unfortunately you have to do that asynchronously you have to return from the did terminate so your primary responsibility though is to close your provider as soon as you reasonably can as soon as you know that you can synchronously guarantee that no client threads will get through you and no client threads are already gone through have already gone through you then you can call closed on your provider but not before if you cannot make that determination that you would have to wait for some threads to return then you must return from terminate immediately anyhow it's a bit subtle and what then happens is when the client thread does return take the command gate and then call close it's really tricky to implement well in general you don't have to worry about it you can make certain assumptions if you're an intermediate driver the only drivers that really have to be aware of this is top of stack drivers and we write those usually Apple writes those we've got the user client for USB and firewire we have the media BSD client and I would like to say the serial bsd client which we own but it's broken I owned that one as well and that's about it really in conclusion I guess Freddy comes down to please lower your priority we don't want an arms race and the system will work a whole lot better if you use the lower priority the other thing that was interesting is work loops work loops are way cool they integrate well with the system you can't get deadlocks if you're on a workload unless you're a rate driver and if you are a raid driver heaven help you that's what Darwin is for I guess and finally synchronous teardown please implement it properly you'll terminate did terminate and by the way synchronous teardown applies even to PCI devices I mean you could be pc card but also if whenever you do a text unload you're essentially going through a device teardown so further things that might be interesting we have an open source presentation that we'll be discussing how X and new works among other things that's coming up tomorrow we have kernel programming interfaces on Wednesday and we have writing threaded applications on Mac OS 10 writing threaded applications isn't a direct hit on what we're trying to do is very very high-level but it should be interesting and also there's a series of Hardware talks coming up tomorrow the Bluetooth USB firewire and some feedback forums who to contact is Craig Keithley and I think I'll hand over to him [Applause]