WWDC2003 Session 111

Transcript

Kind: captions
Language: en
it's a session 111 writing threaded
applications for Mac os10 as you know
writing a threaded application is one
way that you can maximize resources and
performance on any system whether it's a
single or dual processor this session
will go over a lot of the tips and
tricks the do's and don'ts of writing
the threaded application and why you
would want to do it and times when you
would want to write a thread so
hopefully you've been able to take a
look at the resources online that
developer site this session will
supplant a lot of that information and I
like to bring up George Warner who will
deliver the presentation this afternoon
thank you Mark click here ok before get
off this slide I just want to touch on
my new job title some people have
noticed ask me about any of you don't
know I was I'm the engineer formerly
known as the mixed mode magic fragment
scientist for obvious reasons those
technologies have kind of faded out
somewhat so people have asked to
suggested that I update my job title
excuse me so the schizophrenic
optimization scientist now a little
debate over whether it should be plural
or not but
ok so the agenda will be covering some
terminologies with there's some name
conflicts and some of the threading
models the why the why not be covering
different architectures talk about the
dues in the dome will be doing some
demos and then we'll break for some Q&A
s so so the original thread manager was
a cooperative thread manager and
everything was scheduled cooperatively
and we've referred to it threads when we
came out with the MP implementation in
8.6 we didn't want to call them threads
so we call them task now we're running
on unix system where everybody knows
what a task is and there's a bit of a
clash in the naming space so for the
purpose of today's presentation when I'm
talking about a thread we're talking
about an independent path of execution
everybody is pretty much up on what a
threat is when I talk about a process
it's a collection of threads plus the
resource it's basically an application
or a task that's running in the UNIX
world you'd call it a task but I have a
tendency to call it a process and I want
to point out multiprocessing is the
general case of multitasking back in 8.6
when we introduced all this a lot of
people I'd suggestible why don't you use
a compete thread and they go well I
don't have the empty box and it took me
a while to educate the market and make
them understand that multithreading
really doesn't have anything to do with
multiprocessing per se there's just
advantages that if you do have another
processor the multi-threading API can
help you take advantage of that so so
why use threads there's a customer
expectation that your application is
going to be very responsive that when i
click a button something's going to
happen and there's nothing worse than
when you get the little spinning
beachball so scalability like I said
earlier if you actually have into a
processor box customers really want to
see both processors getting utilized so
if you write your code threaded there's
a better chance that those
the second processor resources will be
getting utilized so we back win when I
first started working on the MP stuff
these benchmark numbers came out they
keep getting regurgitated and reused
every year anybody's come to previous
session I've seen these numbers the
really interesting number there is the
top one we on a dual processor box we
went to 2.3 and everybody goes wait a
minute how can you get better than two
times the performance you want to get
twice the horsepower and that's exactly
what I thought when the first time I ran
the numbers so I ran the numbers again
and when I went and analyzed numbers and
tried to figure out what was going on
what we finally determined was we don't
just have twice the processors we also
have twice the cash so our data job just
happen to not quite fit well in a single
processor one megabyte cash but it's
extremely well in a two mega cash and so
we got actually better than twice the
performance and it's called superscalar
so so why use threads this is what I
call the shelf space theory that the
more things you have scheduled and
running out there the more CP time
you'll get so it's one way if there's
two apps running with the same priority
then they'll both get approximately
fifty fifty fifty percent of the time if
you have if you've got three threads
running and there's only one other
thread running in the same priority and
you're getting three quarts three
quarters of the time so it's kind of a
cheesy way to borrow more time the way I
would ever do that so so synchronous
request talk about the spinning
beachball if you call an API in your
event loop and install the processor for
whatever the time is all of a sudden you
get the spinning beachball and your
customer goes well this isn't a very
responsive app I don't like this one so
you can put blocking calls in their own
thread and let the main thread continue
running dispatching events updating the
screen and let the user do what he wants
to do and when the synchronous call
finishes he can signal back to the main
thread that hey
doing this you can update the progress
bar you can you know continue running on
so polling is bad none of my favorite
examples anyone has ever taken a
vacation with a nine-year-old this is
the equivalent of are we there yet are
we there yet are we there yet and for me
about is annoying so blocking is much
better you get away from all the
asynchronous calls and playing with
callbacks and dealing with does it had
it happen when I thought it would and
chaining events etc you can just put
everything in its own thread a good
example of this was then i think it was
8.1 or so when we actually went to the
finder copy and once one time you do a
finder copy and you go get coffee
because it was only could do the copy
when we when what a very clever engineer
i want mention his name it wasn't me now
I said I did say clever engineer didn't
I wrote the asynchronous copy and
basically he chained all these i/o
completion with this state machine is
about four pages a really complicated go
that gave you a headache even be in the
same room with and it was really sweet
never buddy like the fact that you have
multiple copies going on and and all
this kind of fun stuff and that was
great up until 86 came out and when you
took those four pages of state machine
and replaced it with like six lines of
code we spun it off on another thread we
do file open you know source file open
destination file open read from source
right to the destination loop until the
end of file close the source close the
destination done pretty simple the only
other thing going on with all that was
updating the progress bar over in the
finder and that was all happening behind
its back so very simple code easy to
maintain we let the interns do that so
so when to avoid threads so you want to
avoid threads winging it adds complexity
if you have global data in your
application that you're going to share
between multiple threads and you're
going to have locks on it and that can
get you can introduce deadlock
conditions and so you've got a program
to prevent that etc so global data
requires locks if it's just shared it
may not require locks you got one thread
that's only reading the data it may not
require locking the data other reason to
avoid threads if you have non thread
safe ap is then there's alternatives
look into that added overhead it takes
about two hundred microseconds on a dual
gigahertz machine to create a thread
from the maca thread up okay and the
preemption time is about 40 micro
seconds on a duel jig interestingly
enough I had the numbers for the new
hardware but I couldn't put in my slide
for what should the obvious reasons I
couldn't rehearse to an audience that
hadn't been briefed on the new numbers
so on the new box it's around 170
microseconds on the creation and it's
still exactly 40 micro seconds on the
new box because the preemptions has to
save twice the number of registers the
wide registers that twice number twice
the width of the registers so I was
pretty happy to break even on the
preemption time but we'll keep continue
working hopefully bring that down some
more the memory footprint each stack
that you create gets the 5 and 12 k
virtual stack obviously that's not using
your physical but it will eat up your
virtual memory space and so you want to
create a whole bunch of threads and run
out of memory Colonel resources every
time every thread you creates majko
thread has a kernel resources that we
allocate to it the hardware context
storage which is all those registers in
it and I said 32 here but now it's
probably changes to 64 about 2k per
thread so and that memory is physical
and it's locked down and so we want to
avoid using too many of that hundreds of
threads when I talked about the
preemption in the 40 micro seconds
that's acumen of so if you've got
thousands of threads running you're
going to probably spend as much time
pre-empting and switching between all
those threads is running out
useful code so you don't want to get
crazy with inventing with the spawning
threads so other options cooperative
threads if you've got unthread safe AP
is you can use the co-operative thread
manual manager and schedule those
cooperatively you can use the timer's
carbon timers cocoa timers etc to have
task run it at predetermined intervals
so it's writing architectures parallel
task with parallel i/o buffers the
example of this would be if you've got
multiple independent tasks that don't
have really really that much to do with
each other I think the one of the very
first multiprocessing projects ever
worked on I'm going to date myself 20
years ago we are actually running a
driving simulator had an AI that running
all the cars and we had physics engine
doing collision detection literally and
and then of the Newtonian physics and
then the graphics rendering thread et
cetera and when I tell people this they
said wow you guys are doing that 20
years ago that's pretty impressive and
then I kind of qualify that we were
doing about five minutes worth of video
in a month so things have progressed
considerably no real time back then or
not much so it's real slow time so
parallel task with shared i/o buffers
now this would be an example of where
you've got the same thing to do on a lot
of different pieces of data and you can
split it up into little pieces like if
you've got a graphic image and you can
like take little postage stamps and feed
each little postage stamp off to a
different processor or different task
and they all do their crunching and do
everything do and then put them in the
output buffer and put it all back
together on the other side and this is
the best model for dealing with exactly
that case and you typically want as many
tasks because there are processors
sequential task with multiple i/o
buffers I call this the pizza oven some
people call the assembly
one of the my other engineers in DTS
wrote a application that the first input
buffer was nothing more than some FF
specs that you got from a drag and drop
the first task took the list of FF specs
and for each one found if it was the
file of her folder file or a folder if
it was a folder it would beat it back in
or if it was a file that we've been to
send it to the end to the output of the
first task if it was a folder it would
iterate over everything and then send
each file to the output the second task
would take this this iterated or
unflattering file specs and start
reading them in and streaming them out
and then the next task would take the
data that's streaming through it would
compress it and then the next woods
would send it over a network pork and
then etc cut around the opposite side of
the network it would uncompress it and
then it would unflattering and so it
would basically do a backup across the
network and with the compression and
everything else we actually got better
throughput than then most of the finder
did doing a flat uncompressed copy
across but this is an example of the
sequential task where you basically
doing one thing after another after
another after another to the same data
so threading architectures many
applications have both parallel and
serial or sequential execution paths so
it's basically everyone has no they're
at best and no which model works best
with what they're doing so word
processor you could have a grammar or
spell check running independently of a
font texture Kern rendering etc so
driving simulator like I mentioned
earlier you could have a is driving the
other cars same times you've got a
physics engine running same times you've
got a rendering engine running that's
putting bits on the screen so so
implementations I'm not going to go into
an API count or anything like that
depending on which architecture for
programming the end there's thread
implementations and all of them java has
java threads carbon has NS threads
Coco has in estrus all mixed up just
what they just wanted to see if I was
paying attention okay and so you use the
one at the pro via pure environment I
have to say having looked at the
implementation of these the top layers
are extremely sad they're sitting right
on top of P threads so I wouldn't worry
too much about the overhead in that
model and I wouldn't go to P threads
unless you really really really really
want to so so the carbon coming the
common concepts between the different
threading models is basically thread
management creating threads destroying
threads setting the thread priority or
weight etc the synchronization
primitives you got mutexes and
semaphores and queues and all the event
groups etc ways of communicating between
different threads and you've got
threadsafe services so this is like
currently Malik and the memory
allocators the file io signa swallow or
else the red state etc so this is
probably the most important part of my
talk the do's and the don'ts is this
kind of a collection of all the things
that I ran into from della pelt excuse
me from developers coming to me with
issues and sitting down and going over
code and figuring out why we're having
problems but so I mentioned earlier that
200 ml are 200 micro second creation
time can add up if you're dynamically
creating and destroying a lot of threads
so a good way to avoid that is to
pre-allocate and use pools same thing
with the memory and etc try to be as
data-driven as possible you will write
your code CPU driven so you want to get
a fee as much data to it and have it
ready to go when when you get ready to
do typically this isn't a problem most
people write their code and call this
format they have a prologue where they
open filed allocate memory etc a crunch
crunch crunch crunch and when they're
done they close their files dispose the
memory and releasing etc
some of the other implementations on
other operating systems for some reason
love doing the kill and cancel kind of
things where they have one thread
stopping another thread and our system
doesn't play well with what I call a
synchronous behavior and we didn't
design it that way for good reasons and
and we really if you want to take
advantage of way we do things you really
want to avoid using you know those kind
of AP is when it when it all possible
use the synchronization synchronization
methods instead so uses a mutex or
semaphore etc to control so when you use
some afford cement or you texts on a
data structure you have to kind of Qi
valence between having too many and too
little if you have one big huge data
tree and you've only got one lock on it
then the chances of someone else needing
that if you've got multiple threads
running it's a lot higher than if you're
mult if you do your locks down in the
branches of the tree so you don't want
to print you don't to lock everything
because you'll probably spend more time
locking unlocking things than actually
accessing your data so there's a trick
that kind of find the balance now this
is back again to are we there yet the
void that's been waiting one of the
common mistakes I see is occasionally
we'll have a developer that wants to
wait on two things at once and and what
they'll typically do is they'll check
one with a timeout and after the timeout
they'll check the other one and with a
timeout and then we'll look back and
check the first one again and now
they're basically still pulling are we
there yet are we there yet what you can
do instead is have two threads either
thread you know waiting on a different
event different mutex and when either
one of those threads gets its signal it
signals the third thread that I've got
my signal and so when he gets that
signal that's the or he can continue if
you have to if you're waiting on two
things if you want both of events to
happen before you continue then use
block on one and wait for it to finish
and when it finishes
go and block on the other one and wait
for it to finish and its daughter post
the second one has already happened
you'll just continue running so there's
no reason to there's never a reason to a
pole GUI this is another one as areas i
get a lot of questions on seemed like
everybody that ever jump sea and the
very first thing you want to do is try
to do some gooey things the good news is
I mean things are getting much better
and in the in the mac OS 9 days it was
just the definite don't nowadays we have
courts you can draw output to the screen
we have OpenGL you can output to the
screen and if you carbonell pick on
carbon here if like for example you're
doing that file copy and you want to
update a progress bar a real easy way to
do that use post event queue and that
way you can tell your main event thread
update the progress bar so one nice
thing about the carbon event model in
particular is I can send it 14 update
events and it doesn't put 14 events in
the queue its smartest no I've already
gotten an update event in the queue and
just leaves that one in there so you
don't have to worry about overflow on
the queue so so I've got a demo all
right this would be the not threaded
version and usually then the menus are
dead I can drag them in you down oops I
could drag them into town finished okay
turn it back off again I've got the
threader running down in the bottom so
you could see what's going on we've got
three threads how we make our own down
here we can see that okay so if I click
here you'll see taking up all the CPU
time down in the main thread here can't
move and there's the spinning beachball
so that's an example of a non response
to the application if I try that
threaded you can see it's running over
here and the set thread over here I can
drag the window around menus work
and it's the behavior that your
applicator your users are going to
expect so all right back to the slide so
as you can see always has it been a
whole lot of new information the best
thing thing we say we think you know
what we've been working on we haven't
broken anything that we know about
everything that their works works the
way it has pretty well and hopefully
we'll continue working on that I'll keep
working on the preemption time to keep
that down but other than that that's
about the extent of it so hopefully
we'll cover anything else you want to
know in the Q&A so thank you just to
give you a wrap up of some of the
sessions the colonel extension program
is techniques that was on Monday so on
the DVD you'll be able to get the
opportunity to take a look at that
session for contacts George Warner and
DTS you may already be familiar with
George through your contacts and email
myself for that has a desktop hardware
evangelist let's go ahead and start Q&A
in the vest invite a few folks up
you