WWDC2003 Session 100
Transcript
Kind: captions
Language: en
good afternoon and welcome to the i/o
kit session I'm Craig Keithley I'm the
i/o technology evangelist and Apple's
Worldwide Developer Relations group when
we started about five minutes ago or 10
minutes ago I could have probably
identified everybody in this room it's
hard to go up against Xcode one of the
things that's toughest is properly
architecting and writing IO kit kernel
extensions to get optimum performance
you can improve on this you can prove on
your techniques by doing multi-threading
as a common well almost want to say
misconception that you can't do
threading in kernel extensions you can
and to go into that will bring up
Godfrey good eye my name is Godfrey van
der Linden I'm a iokit architect and
like the vast majority of people who I
expect it to be here I would probably
prefer to be an X code right now myself
another thing is the the handouts
suggest that I shall be talking about
memory even though memory is interesting
most of this presentation will be about
threading the new piece of hardware has
got some interesting memory issues and I
will be available in the porting lab
after this and on Wednesday if anybody
wants to talk to me about how to set up
memory Maps on the new hardware but
there's no formal session on memory on
in this presentation okay so as an
introduction Mac os10 has something like
a hundred threads running in the system
at any one time even when it's idle I
mean I was running top on my system
today and it had a hundred 40 threads
going s what makes kernel programming so
much fun when you've got a hundred 40
threads operating inside that
environment at one point in time it can
lead to some interesting mind-bending
problems that just is why I enjoy kernel
programming a lot so in this session I'm
going to be discussing threading
generally
how sort of high-priority threads work
inside the system and also I'll be
talking about iokit threading and then
finally I shall discuss the teardown
synchronous teardown of device drivers
when you get a hot unplug it's sort of
it's not really threading but if you
don't know how to do it properly it's
very easy to get nasty explosions so
what I'm hoping you'll learn is a better
understanding of how thread schedule on
Mac os10
how Iook it does the synchronization I
get synchronization model is very
unusual I've never seen it in any other
operating system and probably because
it's my invention I think it's very very
cool and also Mac OS 10 is hot unplug we
took to attempt so that before we got it
right and I don't think that's really
been presented before so I think it's
really it works very well so this is its
presentation okay so the the first part
of the presentation will be on threading
specifically how threads work inside the
operating system I won't really be
talking about threads and the kernels so
much as how threads interact how the
scheduler interacts with it how the
dispatcher works Mac OS 10 doesn't
differentiate between kernel threads and
user threads yes we have different
priority bands but we don't really
differentiate the way they operate so
over the next few slides I'll be
discussing the thread priority bans how
the dispatcher works what the scheduler
does what it means to take high priority
because it's different to what most
programmers think and also priority
inversions which is a very standard
problem that we're having now so these
are the priority bans as you can see
there's quite a few of them and there's
a few there that aren't really threads
at all the primary interrupts and idle
threads aren't really threads but you
can consider them thread context at
least to the extent of when a primary
interrupt is running no thread is
running and also when the idle thread is
running by definition no thread is
running the bands that we would like you
to be in is the regular user area but
most high ends hardware and let's face
it that's what I orchid is all about
tends to have tighter requirements and
they'd like to go for higher priorities
now we're experiencing a lot of I call
it priority arms races people are going
for higher and higher priorities and
it's really degrading the overall system
performance so I'm hoping in this
presentation to convince you to get out
of the real time band if at all possible
and down into the top of the user band
because I think that's probably the the
best place for most of us to be okay so
Mac OS 10 is a dispatcher based system
there is a subtle difference between
what a dispatcher is and what a
scheduler is we do have some sort of
scheduling that we're basically a
dispatcher based system what that means
is that dispatcher takes the current
thread that's executing clocks it then
selects the next thread and runs it so
what is blocking mean well generally a
thread and our operating system as I
said earlier we I had 140 threads
running but in fact they were all
blocked waiting for some user events
waiting for some IO to complete waiting
for the lid to close or open or for the
battery to run dead for all I know and
that's what we mean by blocked in our
operating system those threads are put
on to wait queues and we really just
ignore them so yes we have 140 threads
running but they're really all asleep
which is the best way to have a thread
in my opinion the next thread thing that
happens is a pre
engine so what is the preemption it's
essentially you've used your quantum and
the system says okay I'm going to put
you on the end of your priority bans run
Q we'll discuss that in a second and
then I'll select the next guy to run
every time the dispatcher is invoked it
selects the highest priority thread
available and runs ER so that's really
has some very potentially nasty side
effects if you have an infinite loop at
high priority and that high priority
thread runs to completion then nothing
in lower priority will run so let's say
you overload the real-time band remember
earlier the real-time band is the
highest priority band in the system if
that overloads then you're not going to
get any time at all for the IO band and
the IO band is probably where you're
trying to store your data onto a disk or
take it off a disk so you've just
stopped the system from doing what you
need to have done and that's probably
not what you're after so what does the
scheduler do well
it's job is sort of an oversight
committee we do have a scheduler it will
get better in the future
the scheduler we have right now is
essentially for time share when your
thread has run for long enough we will
change your priority down a little that
doesn't necessarily mean you're going to
start running slower at least not
straightaway if there's no other thread
that's competing with you that's
runnable then you're going to continue
going and it won't make any difference
if your priority is down however if you
are competing with another thread the
scheduler is job is to try to make sure
that the system balances its loads
appropriately and the other thing is
that that aforementioned spinning
real-time thread and we had this problem
early on with the system spinning
real-time threads will not give any time
to the system including to the keyboard
so that you can stop it so one of the
jobs of the scheduler is to say hey this
real-time thread is taking far too much
time in which case it will taint it over
too
timeshare and then say oh by the way
I've been running from eight seconds so
it gets depressed very quickly which is
a good thing because it means that you
can use kill -9 and get rid of the thing
it came quite late really the original
development of Mac OS 10 when we got the
real-time phrase that was quite a common
problem because you know everybody's
written infinite loops and a real-time
thread infinite loop meant taking the
big hammer out and hitting reboot and
that's really really painful when we
didn't have journaling file systems okay
so what does this usually do essentially
the communication mechanism between the
scheduler and the dispatcher is the Run
queues now earlier I mentioned that the
dispatcher finds the highest priority
Fred that's runnable in the system and
then runs it well that's the Run queues
logically you can think of it as run run
queue / priority in the system and then
if one thread is runnable it's in one
location in the run up queues and the
scheduler just manipulates the locations
in the run queue also the scheduler
collects statistics so that you can do
things like top and latency and a number
of other tools so that you can find out
what what the system is really doing for
and on your behalf so in a time share
thread once the thread has run for so an
example of what the scheduler does is
time share thread as I mentioned if your
thread has run for sufficient Quantum's
we will drop your priority dropping your
priority isn't as I said really a bad
thing it's only sometimes bad if you're
using so much CPU power and some you
need that much CPU power and another
thread comes up and say another task the
user launches another task and then your
thread will fall out well you know the
user did launch that other task perhaps
he really does want that task to run so
let the timeshare do its job except for
when you were certain that the user
really really cannot afford to let any
CPU go to the other guys in which case
you would use different things and as I
said the infinite
misbehaving real-time threads is another
example of what the scheduler does so
what does high-priority really mean
there's nothing really that can make
slow code go fast if your code is slow
high priority will not make your code go
faster you will get slightly more cpu
time but it it really is measured in
percent maybe one or two percent more
CPU time higher priority won't give you
high faster code and the only way to get
you faster code I'm afraid is to run
your code through performance analysis
and clean it up it's very easy to write
bad algorithms unfortunately what high
priority does give you is it gives you a
reasonable chance of running with a very
low latency so your thread is blocked
midi event comes in for example and Mac
OS 10 will probably get you a thread run
in around less than a millisecond on
average our max jitter I haven't run
this for a while but the last time I saw
it the max jitter for a real-time band
in no competition was running at about
600 microseconds or something
unfortunately in the real world there is
always a little bit of other competition
at that high band so I think we're
running our jitters at around 3
milliseconds again I'm not sure exactly
what those numbers are of course when
you have high priority then you can end
up very very easily using so much CPU
time that you're not allowing the
low-level
paths of the system like disk i/o
anytime at all it's a bit of a shame we
had recently a developer raised a
problem which was they were using so
much higher priority time that the
firewire thread the firewire work loop
which is an i/o thread wasn't getting
sufficient time to even acknowledge
packets on the bus and when that happens
you start getting weird little disk
errors and the system itself hasn't
really got time to clean up
because you're using all of the time at
high priority we call that a priority
inversion and priority inversions are
really hard to get to get rid of and
this is really the biggest problem with
arms races if you in a priority
inversion that's going to cause some
problems and it may take a very
fundamental redesign so the way you've
set up your workloads so how do you
decide your thread priority really it
comes down to exactly what your latency
requirements are it's not what
performance you're after it's what your
latency is user interface events for
instance a keyboard for a midi sequencer
or something like that really does need
very low latency because a human has
said I will move my finger and if the
human system the the detection system is
aware can't hear us the sensors don't
hear it within a certain amount of time
then the keyboard feels wrong and that
time is very short I mean for computer
time it seems enormous it's about five
milliseconds but five milliseconds isn't
really very very long on a modern
operating system especially because our
standard quantum is 10 milliseconds so
the higher priority that you want the if
you really do need that extremely low
latencies that's when you go for high
priority but you really want to be
certain that you need a very low latency
if you're reacting to data off the
internet well frankly it's who cares
what the performance is you're dealing
with 30-second timeout so anyhow now I'm
not suggesting that you go time share I
don't think tie chair is appropriate if
you're doing some sort of stream based
information processing however you
probably don't need to be real time
because the whole internet itself is
arbitrary and finally there's sort of
low latency it's stuff when you're
waiting for local resource or something
off the disk a local firewire or
something like that it's sort of low
latency without being ultra-low latency
so that's how you would use your bands I
would suggest for extremely load you
would use the time constraint stuff the
real
time banned for who cares I would
probably use hi use herb and possibly
below the carbon async threads but you
can play with it a bit and then for low
disc stuff I would suggest that you go
to the top of the user bands and you
disable time share altogether these are
all things that you can look up on the
ADC websites to find out how to do it
there are priority inversions I want the
highest priority except for when I don't
party inversions can really happen
almost anywhere in the system but the
most common ones we're seeing is again
the real-time bans Mac OS tens real-time
band is very very good as
extraordinarily powerful but
unfortunately to get a really good low
level maximum jitter we've had to give
you enough power to hang the system
effectively and that means that your
code now has to be far more complicated
because you have to work out how to back
outs
your high priority thread to give the
rest of the system some time now
traditionally on most operating systems
Mac OS 9 and Windows for instance iOS
really high priority there is nothing
you can do to get them out of the way
and you would have to take whatever
jitter is around with Mac OS 10 we have
deliberately chosen to make the real
time thread the highest priority threads
in the system even higher priority than
iota which gives you extraordinarily
good jetta characteristics but it comes
at the cost of complexity so there's a
couple of priority inversion strategies
the best priority best strategy of all
is get out of that high priority band if
you're experiencing priority inversions
drop your priority if you can if the
Deuter is appropriate examiner there are
some wonderful tools in the system my
favorite is latency latency will show
you a histogram if the performances in
the system if you can evaluate and have
really hard numbers for what performance
you need latency will let you know what
priority band will work well on your
target system so if you can lower your
priority that's the best thing ever
if not you're going to have to
complicate your algorithm you'll need to
split into produce a consumer model
where you have small amounts of work to
be done at very very high priority and
larger amounts of work done at low
priority so for instance if you're
streaming off a desk you would have a
low priority thread in your system and
you know I don't usually recommend
having multiple threads but this is the
time to use it you'd have a lower
priority thread in the system that's
feeding a high priority thread but at
the cost of introducing some latencies
the high priority thread would just take
whatever data it needs when it's
available and that way you get a
producer consumer it's pretty good it's
complex it works very well indeed the
worst choice it's really bad because it
doesn't give you 100% of the CPU is to
deliberately say I'm going to let the
system have some time so approximately
every 10 milliseconds or there abouts
one buffer every two buffers or however
it is that your workload is divided go
to sleep for a millisecond cause you
sleep for a milliseconds and then you
will guarantee the system or at least
some other threads sometime I don't like
that the problem with this solution is
that if you're not competing with
anything else then that extra 10% or 8%
after the system is used it is gone and
you can't use it and you're only doing
that to save yourself the complexity of
a good producer/consumer queue or
lowering your priority in the first
place you see if you lower your priority
you can use 100% of the CPU there's a
really nice anecdote iTunes started
early on with timeshare threads for it's
ripping and the thread actually drops in
priority because it uses 100% of the CPU
one of the cool things I'm not sure if
you've done it I've recently done it
with the AAC encoding I was ripping my
entire record collection over to 128-bit
AAC and the system was very very
performant while it was going and I
never had any idle time on the system at
all it was just 0% idle that's because
the ripping was a very low priority so
look out for
lower than regular priority is actually
a good thing if you're using a hundred
percent of the CPU anyhow that's just
the introduction to threading there is a
lot more to be said I could talk for
hours but unfortunately I don't have it
we'll have to move on the next set is
work looping I a kid this is essentially
how I occurred does its synchronization
I shall be discussing the work loop and
the event sources in this part if you're
a traditional iokit driver this is the
mechanism we're recommending and it's
really quite hard to avoid now
unfortunately work loop itself is an
unfortunate name the the way it was
originally designed we did have a thread
that all iOS went and we could guarantee
single threaded access to hardware
because we only had one thread that
talked to the hardware but the
difficulty is that the i/o systems were
taking context switches which was
slowing down all i/o so we came up with
this idea of the gate and the gate
allows us to schedule IO on hardware
directly without having taken a context
switch and you know what's a gate well
the gates a lock it's a recursive lock
it's not really very complicated at all
and it's sort of obvious but it took us
a while to come up with it and it made a
big difference in our performance so
what a work loop really is on our system
now is it's a container for the gate
which is a recursive lock it's a list of
event sources that need to synchronize
with respect to that lock and by the way
it has a thread yes okay it has a thread
in fact the threads optional one day in
the future I'm going to get rid of the
thread and only do it if you have
interrupts event sources so a work loops
gate the single threading is provided by
the work loops gate being closed across
all event source action routines
I shall define what that term means in a
little while so
traditionally eunuchs traditional unix
solution for MP is to have one big like
one goober lock that protects the whole
operating system so whenever you need to
do anything you would take the Euro
burlock and then you would be safe until
the uber lock gets dropped and there
would only be one lock and naturally you
get contention and only one thread could
one on the system at a time the other
end is muk muk has hundreds and hundreds
of micro locks and extraordinarily
complicated locking hierarchies so that
you can make sure that you get locks in
the right order and it's got lots and
lots and lots of tiny little locks which
is great but they're very heavy it also
is extraordinarily complicated locking
hierarchies are nasty and they have to
be done in one direction which means for
i/o systems completion routines are
painful so we needed to come up with
something different what we came up with
is the workload we schedule we have one
work loop one gate as it were per major
interrupt delivery as part of the system
so a pc i scuzzy card for instance has a
work loop a USB controller has a work
loop a firewire controller has a work
loop so on a typical running system we
have maybe 13 work loops this is a
compromise between the hundreds of micro
locks that mark uses and the to uber
locks that BST users it turns out it's
very very powerful because this allows
us to deliver completion routines so all
of our drivers stack on top of this one
lock so by far the majority of our key
drivers as I say don't create their own
work loop they use their providers work
loop now if you've used iokit burning
lengths of time you would have seen the
client provider model and the client
provider stacking and you will see that
this statement is recursive if I call my
provider and the provider also doesn't
implement get work loop it calls its
providers and eventually you get down to
the bottom of the system and you say hey
here's the work loop use this so
high-level drivers always synchronize
against the bottom of the system as I
mentioned earlier only PCI devices and
motherboard device drivers 10
to create work loops in most cases your
hardware will not need a work loop and
is probably better if you don't create
one in fact if you do create a work loop
that builds on top of another work loop
you can be in for a whole world of hurt
talk to I'm sure we'll have a raid
developer around here if you want to see
somebody who really experiences pain
discuss device teardown with a raid
developer so you can use the systems
there is because the statement is
recursive there has to be a way of
terminating the recursion there is a
system work loop that you can grab hold
of just by walking down the stack
eventually you hit the roots of the
provider tree and bingo there's a work
loop it's not a bad work loop to use and
we really do encourage you to use it
because we'd like to limit the number of
threads in the system this is a good
thing for system performance however if
you're using it's a shared resource so
don't be too greedy with it if you need
if you expect a lot of interrupts to be
used or you have very tight timing
requirements it's probably better not to
use the system work loop but to create
your own so an event source an event
source has an action routine which I'm
now going to define but essentially it's
an action routine as synchronous with
respect to the work loop all event
sources have an action routine and an
owner and usually registered on a
workload in fact an event source is
really only meaningful when it's
registered on a work loop but of course
people can temporarily register it and
then remove it and register and remove
it because there's a fairly lightweight
operation registering on an event source
an action routine it's just a call out
function when you create an event source
you're saying to the system I expect
this event to occur at some time in the
future and when it does call this
function and that's what an action
routine is all action routines in the
system are synchronous with respect to
all registered event sources on a
particular work loop I mean if you're
familiar with Java you may have seen
Java's synchronous routine concept where
you can have a number of routines in a
class and you say this is a synchronous
routine or these routines are
synchronous with respect to each other
only run one of them
that's how I think of eventsource
actions all of the event sources up and
down the entire stack are synchronous
with each other
now that sounds as though it's a recipe
for contention but it hasn't proved to
be so far but there are some tricks
there that you need to be aware of in
general don't go to sleep while you're
in an action routine very bad things
happen again we recently found a driver
which was going to sleep in an action
routine for eight milliseconds and that
introduced eight milliseconds worth of
latency we do have ways of pointing
fingers in the system so you won't get
away with it for any length of time okay
and when you register an event source
with the work loop and you generally
just do is service : get work loop and
that's the mechanism that gives you the
entry into the recursive statement or
saying that's how you find the workload
one of the things actually will cover
that later okay so the first event
source the most hardware pci hardware
developers I was about to say real
hardware developers which is the side of
my background unfortunately think of is
okay how do we get interrupts because
it's what it's one of the fundamental
things that vary from OS to OS our
filter interrupt event source is the
mechanism we recommend for firewire app
for PCI Hardware the event source is
used to deliver hubber interrupts to a
driver it takes the interrupts causes
the work loop to schedule this is the
this is the only thing that causes the
workload to schedule in fact so the
primary at primary interrupt time it's
very quick it just comes along and
increments a number and it says hey work
loop you've got some work to do kick and
then it goes back to sleep again which
automatically gets back into the
dispatcher that I mentioned earlier the
dispatcher says hey look I'm looking for
the highest priority thread in the
system and it's a work loop the work
loop start scheduling so the latencies
are very very short and the filters
generally don't have to do any work at
all but
we do recommend that you must always
implement a filter because you don't
know if your hardware is going to be in
a shared chassis or not and when you're
sharing interrupts event sources it's a
very good idea for you if your hardware
supports it to say hey this wasn't me
just return false from the filter now
the action routine is synchronous with
respect to the mic loop you're going to
see this statement a lot but the filter
is totally asynchronous it's a primary
interrupt you have to do special things
to stop it from coming out which is why
I would recommend single producers
single consumer queueing or something of
that nature with the filter routine you
need to synchronize with the filter
routine you've you've got to be very
careful okay so now the other major
event source is the timer event source
there's lots of reasons to use the timer
pole mode drivers which we don't
recommend but people are doing it so
that's one of the reasons for using it
but the most common one is Hardware
timeouts
oh dear nothing is responded in 30
seconds I have to do something I owe kit
timers the timer event source is built
on top of the current flash thread cool
api's they're very wonderful api so
highly I just love them they're very
very lightweight and they're a great
solution there is a problem though if
you remember back to my earlier diagram
thread cool threads of very high
priority they're higher priority than
work loops which means if your timeout
and your interrupt occur at exactly the
same time the timeout will schedule
first so best thing check to see if your
hardware is done in the timeout code and
if it is fine you've beaten the
interrupt before it got delivered if not
a timeout is occurred okay here I have
to make an embarrassing admission this
is my bug it's been my bug for a long
time now and I will fix it soon there is
no synchronous way of canceling a
timeout really it's just painful it's
embarrassing there's not I'm turning red
up here the safest way to delete a timer
is to let the timer expire and then on
another thread delete it
don't rearm the tie
sorry I have to give you the warning
because it is the big caveat with these
things but it's a really a problem and
I'm hoping to fix it but I can't go back
with some time and fix it and Jaguar and
cheetah serve I'm afraid if your drivers
have to run back in time in Puma and
Jaguar systems then you are going to
have to let the timer expire and guess
what the timer's action routine is
synchronous with respect to the i/o work
Lou same as usual
ok the command gate command gates rather
interesting a lot of people think it's a
lock it isn't really it's just a sort of
container a pointer to the lock that is
in the work loop remember I said the i/o
work loop should be called the work gate
or the command gate gives you access to
that work loop
so for all command gates on a particular
work loop only one there is still only
one gate command gates allowing you to
run code synchronously with respect to
the workload but without a thread switch
it just takes the gate allows you to run
some code and then you will drop the
gate fairly quickly now I admit that the
run action run command API is clunky
especially if you use to writing locks
and just saying hey take the lock drop
the lock take the lock drop the lock you
know it turns out run action has really
come to our rescue several times first
of all debugging recursive locks where
you mismatch the lock unlock pair is
really painful so with run action you
can't get it wrong because it's a
subroutine it just says take the lock
call the subroutine return the lock on
the exit path there is no avoiding it so
you can't get it wrong
the other thing that it gives you is
that it gives you when when you use show
all stacks and it's a really wonderful
command for tracking down dead locks and
other problems that are running in the
system show all stacks will show up run
actions they will be there on the system
and you we have caught so many dead
locks because of show wall stacks and
run action is there on the system
whereas if you just take a lock you have
to memorize everybody else's dry
even once you don't write and say oh
look this this routine 15 levels down in
the stack it takes a lock and I know
that because well I can read minds run
action you don't have to read mais there
it is it's in the back-trace you know
okay this is the really cool part about
command gates it's command sleep command
wake up another thing that it sort of
came a bit late it's when a client
thread is calling into your driver it
often says hey I want some data and your
Hardware hasn't got any data available
yet for streaming for whatever reason
like the device you're talking to a slow
so what you can do is you can block the
client thread by calling command sleep
and it will block until some event
occurs now this is in fact the mechanism
I was talking about that the dispatcher
uses this is how you block a thread
until some event occurs now there's lots
of other ways of doing it but this is
the one that's built into the way the
command gate does its job data
acquisition drivers are a typical case
for this we don't really have any
hardware direct call-outs one of the
most common requests we got is well we
can't write our application because the
interrupt routine don't call out into
user lands well no we're not going to
call out into use land we can't allow
that thread to disappear into some code
that we don't trust the command sleep
command wake up gives you something that
is very very close to that if you have a
sufficiently high priority thread
blocked in command sleep then when you
take your interrupts to routine and your
hardware turns up and says I have some
data available the scheduled using
command wake up to wake up the thread
it's just so fast it's amazing so you
can use command sleep to emulate
interrupt call-outs out to user Lance
have the user provide a thread it's your
application you provide the thread block
it in your kernel extension using
commands sleep very lightweight wake it
up using command wake up so that's it
for our work loops what we're about to
do is how we use this stacking model of
- synchronously tear down device drivers
so oops sorry off-by-one sorry I was
good okay so I remember I was mentioning
the locking PST does it's working with
currently does it's locking with funnels
mostly it doesn't affect my acute
developers
however kernel extension developers
generally must be aware of the funnels
there are two funnels in the system we
do share if we go dual CPU you can issue
an i/o on the network funnel or one
processor and on the system funnel on
the other processor and it's al
compromise on the traditional BSD
overlock funnels are good but they're
really not locks and this is not the
right floor on to discuss funnels
writing funnel code that can switch
between the system funnel and the
networking funnel is difficult
counseling impossible because NFS works
but it's bloody close to impossible
funnels can cause lot delays on work
loops though so you do have to be aware
of it if you've got a piece of hardware
that's delivering into BS dB at the TT
of the serial ports the disk drive
system or the networking system you must
be aware that those completion routines
will probably try to take a funnel those
funnels are going to cause some sort of
latencies because there's only two of
them okay now we can do a synchronous
device teardown so although my device is
gone
this can cause nightmare tearing down a
stack is just so hard and this animation
I'm hoping will demonstrate what's going
on as you can see here I'm just trying
to emulate the stacking that we have in
our system so far on the left is where
your bus is let's Korell a USB bus and
on the right you have the client thread
running so the first step is we got to
tear down the buses detected the device
is gone and this is how we implement it
and first
didn't work real well we disappeared the
device but at the same time we're on an
MP system a client thread has just come
down and it's issued an IO request it's
a bit of a problem because they're going
to meet eventually and when they do you
get a panic and very bad things happen
when that happens no blue screen of
death
whatever panics are really hard to debug
and this particular one is nasty because
everything looks perfectly alright but
your hardware is crashed and it's not
really obvious so how do we deal with
this well we do it synchronously
I guess that's obvious our solution is
to use the work loop stacking this is
why drivers really can't opt out of the
work loop system not if they want to do
dynamic unloading and most of our
developers like the idea that they can
unload their drivers so it means you
have to be at least partially aware of
were clips to do unloading what we do is
when we get an unload we will tell the
nub that has disappeared to terminate
and the terminate does a fewer things
like it goes recursively up the stack
making marking everybody is inactive it
does that through request terminate but
basically it calls a function called do
terminate and do terminate is a
recursive function as you can see as
I've implemented here in pseudocode it
essentially just does a headfirst
recursion with will terminate calls and
tail recursion on did terminate you can
rely on will terminate messages turning
up in your driver before any of the
clients know to forget will terminate
and you can rely on did terminate after
all of your clients have got their
notifications so your responsibility and
will terminate is to if you it sort of
depends on where you are if you're an
intermediate driver
Eames you have a series of commands that
you know are outstanding and they're in
your own queues you haven't handed off
to the next driver down then it's your
responsibility to return those io
requests with errors immediately if you
have client threads blocked in your
driver on command sleep command wake now
the command sleeps you should return
those immediately with an arrow as well
wake them up and notify them that
they're going to wake up with an offline
error the error we use is K IO return
offline and by the way if you're higher
in the stacks and you start seeing
offline errors coming by you know what's
happening now
somebody's got to will terminate and you
can expect it will terminate fairly soon
by the time you get to the top of the
drivers FAQ
it should be expected that all
outstanding iOS and block threads
ideally have been returned so that makes
the top of stack drivers job much easier
notice we haven't torn anything down yet
all of our pointers are valid one other
thing with will terminate is you should
be returning errors if possible given
the API immediately if any other IO
commands come down while you're doing
this you should be returning errors
after you see will terminate okay if the
driver is on the top of the stack you're
expected to implement did terminate no
top of the stack varies you are top of
the stack because there is nobody on top
of you which means when you're tearing
down eventually you're going to be top
of the stack again now indeed terminate
you must stop all future calls down to
your provider you must wait
asynchronously and that's a bit subtle
for all provider calls to return so if
you have threads that have gone through
you then you should be aware of those
threads and you should not call closed
on your provider until all client pools
have gone through now unfortunately you
have to do that asynchronously you have
to return from the did terminate so your
primary responsibility though is to
close your provider as soon as you
reasonably can as soon as you know that
you can synchronously guarantee that no
client threads will get through you and
no client threads are already gone
through have already gone through you
then you can call closed on your
provider but not before if you cannot
make that determination that you would
have to wait for some threads to return
then you must return from
terminate immediately anyhow it's a bit
subtle and what then happens is when the
client thread does return take the
command gate and then call close it's
really tricky to implement well in
general you don't have to worry about it
you can make certain assumptions if
you're an intermediate driver the only
drivers that really have to be aware of
this is top of stack drivers and we
write those usually Apple writes those
we've got the user client for USB and
firewire we have the media BSD client
and I would like to say the serial bsd
client which we own but it's broken I
owned that one as well and that's about
it really
in conclusion I guess Freddy comes down
to please lower your priority
we don't want an arms race and the
system will work a whole lot better if
you use the lower priority the other
thing that was interesting is work loops
work loops are way cool they integrate
well with the system you can't get
deadlocks if you're on a workload unless
you're a rate driver and if you are a
raid driver heaven help you that's what
Darwin is for I guess
and finally synchronous teardown please
implement it properly you'll terminate
did terminate and by the way synchronous
teardown applies even to PCI devices I
mean you could be pc card but also if
whenever you do a text unload you're
essentially going through a device
teardown so further things that might be
interesting we have an open source
presentation that we'll be discussing
how X and new works among other things
that's coming up tomorrow we have kernel
programming interfaces on Wednesday and
we have writing threaded applications on
Mac OS 10 writing threaded applications
isn't a direct hit on what we're trying
to do is very very high-level but it
should be interesting and also there's a
series of Hardware talks coming up
tomorrow the Bluetooth USB firewire and
some feedback forums
who to contact is Craig Keithley and I
think I'll hand over to him
[Applause]