WWDC2004 Session 435

Transcript

Kind: captions
Language: en
hi everybody welcome to fishin for 35 my
name is Xavier drew and I work in
developer relations most of my job is
actually to go around the world and kind
of like tell you about the latest
technologies and encourage you about
like you know opting for like adapting
murder and Mac os10 technologies what
has been coming quite often is that very
often people are afraid of you know
threading the application so we've been
doing a lot of what we call workshop
where we take like you know groups of 20
25 30 people and you know we kind of
teach them one topic you know we have
cocoa workshops we have like you know
h-hi toolbox carbon workshop when we do
this presentation with a very very good
feedback and very interestingly I think
it really have a dub actually and
trading your application so that's going
to be able to pick today so what are we
going to talk about so today we're going
to talk quickly about why you as a
developer should care about trading we'd
go through some threading terminology
some buzz words to make sure that we all
speak the same language here French
obviously and we'll go through a couple
of examples of trading architecture I'll
go to like three of the main
architecture that I think are used
around shredding and I think the best
part is going to be like you know I'm
going to try to take you step by step
and try to like teach you how to thread
your application using the NPA PR and
then we'll have some do's and don'ts and
something that I think is a crude mo
okay so why should use thread well
because if you stress on your thread
your application we're going to get you
fifty percent of like one of these brand
new g5 and a brand new display are
usually rejection that's what I stuck
and you're probably thinking really know
naturally of course but why should you
thread your application well the first
thing that comes to mind of course is
scalability we are shipping now like a
lot of these boxes with like you know
two CPUs inside specifically you as a
developer with you nice application if
you'd only one cpu it's like you know
having our users you know use
envestra thousand dollars for like using
half of a machine I think a lot of
applications could actually use
threading in their app and we're going
to go here to the exam of the main
content and explain white notes image of
the big deal some time to thread part of
an application here what I'm showing you
is actually a couple of results with
some drastic transformation in this case
we have like you know Gaussian blur and
I can we go to a motion blur and you can
expect when you thread a part of your
application for like hi intensive tasks
between a 1.3 to 2.3 times faster and
here you're probably wondering why 2.3
times faster right if I have two CPUs
how can i get like more than twice as
fast right and here in this case if i'm
not mistaken with we're getting super
scalar reserve the main important thing
to understand is that in order exists we
have a little bit more than just two
cpus okay we have a very very strong
gerke picture by the neighbors earth to
actually take advantage of both of these
g5 in the dogs specifically here why do
we get things such as 2.3 times faster
well for the simple reason that actually
each CPU is going to have his own bus to
go to the king of the memory controller
that makes a big big big difference in
certain cases maybe you are grains is
going to be cpu bounded in this case
when sweating your application just
imagine that you're giving twice as much
bandwidth to the main memory so why you
spread so obviously customers
expectation as i talked about and
scalability once again the three of a g5
shipping right now all have your cpu's
this is a big deal and I think you know
you've seen like you know the industry I
mean internet been doing some
announcements about dual core this is
something that you know really vendor
street moving toward so please keep that
in mind your future development okay so
it skips cassette so we're going to
throw a couple of those words and trust
me by the end of the presentation your
party be speaking French okay so what's
the thread so thread think of a thread
as an independent execution code path
and this is very very important if
you're new to threading and if you have
no clue about how you should be able to
swear your pick
think about your function but you're
going to want to thread it is it
considered as you know something that
can be executed independently of the
rest of the code and specifically think
of it as something that can have its own
stock and register set so what's the
process the process is going to be
actually a collection of thread with the
resources necessary to run specifically
a processing code like you know when you
launch applications don't let the
process and inside that process you
could have different actually thread
going on okay the cool thing here is
where the process has its own address
space that means that actually the
thread inside the address space can
access actually global variables which
you had to okay and memory that you pass
on the mainframe to the other thread you
know it's a memory from the same address
space obviously okay now before we go
and we explain why and how you should
use thread let's think about it a little
bit about when you should not use thread
okay obviously in the case where it's
going to add complexity to your
application you don't want to get in
that business there is no need for you
to spend six months like you know
trading your application if it's going
to be like you know every time you want
to add a feature you know another six
month of like you know trying to find
out and like you're managing the thread
doing the thread management obviously
things are going to require a log or bad
idea and here in this case think of it
as like in the database and you know a
lot of people trying to write to the
same to the same record depending on the
granularity of the lock obviously you're
going to have to be very very careful
and everyone of course is like you know
using non thread safe API that means
like you know don't try to thread your
carbon drawing code but you know this is
the tool box okay UI elements cannot be
threaded for joy and I'm going more
details about that some of the options
you could use cooperative trading which
is like you know probably like you know
what most of you have been using on Mac
OS 9 I put timers here and I'm talking
here but carbon event timers and this is
very often people don't understand but
like you know well you know when doing
like a bunch of processing on the hard
drive and I want to show the user of a
progress and give them a chance maybe to
cancel operation
something you could use the carbon event
timer for that okay use a carbon event
timer but going to fire you know every
second or or twice per second it's up to
you and then like you know the toolbox
will curl you inside your cabin event
handler your timer event handler and
actually like you know between enable
you to update whatever you want on the
screen okay so I want you to put timer
okay now so hopefully you get the idea
and if you're here in this room
obviously you want to learn and you know
you're interested in trading your
application now I'm going to go through
what I think of three of the main
trading architectures that we see out
there parent ask we'd like share
biographer you can read the slide better
than me I think it's better if I show
you a picture so in the first case one
example to think about the spiral
taskbar eobard for architecture would be
to think of maybe let's see a simulator
a flight simulator you get data in one
preferred that data is going to be
computed and the result is going to be
maybe you're going to compute
atmospheric setting okay so this is
totally an independent a few have like
you know one I over for one output
buffer they don't depend on the rest of
the data then on thread number two what
you could do is compute let's say
background and you get you know the data
let's say from the internet or form like
you know some geostationary satellites
so you get the data in another thread
that's you I or buffer then you do the
processing to compute maybe some fractal
terrains okay like some 3d World and
then at the end you get that you know G
word North Korean or you know cg context
or like a note on shares your first
whatever you want etc etc could have in
your thread you know number three for
instance do the computation of velocity
of whatever you want is collision you
know and here the main idea in these
architectures that think of it as you
know you have any different input
buffers and an output buffer and they
just do like you know like your
different processing on each of the
spread all right
this architecture is actually the one
that works better i think in my mind and
I think like a lot of applications could
use that and here in this case we have a
birther of data okay it could be like an
image and you want to apply it from form
or it could be it could be pretty much
anything could like a huge array of like
floating points and you want to Latino
compute the courtesy news but seniors
the tangent and you know generate a DNA
king of this huge output buffer here in
this case this is what I'm going to use
in my demo and not to ruin it what's
going to happen is that i have this
application that computes a fractal so
what do I have in input in input what I
do have is you know this buffer which is
actually a pointer on my off screen okay
I have a graph bored enough screen and
it points to the beginning of my image
and you know to complete the model void
space which is the fact that i'm going
to be computing I just need to compute
like you know it's like each pixel is
inner or out of a model workspace well
what I can do very quickly and in an
easy way that hopefully i will show you
is that I could take that initial buffer
and fast is actually 2n thread i'm going
to divide actually my picture in n
different park and n different threads
are going to be computing actually the
data for me and then at the end because
of some magic will actually at five
different pointers inside my image you
know think of it as like you know just
like slicing the initial image and
passing one flies to each of the
threader I don't need to let you know we
combine actually all the result because
I just have like you know the initial
pointer to the off-screen but that's
something you could apply pretty much
you anything think of it like you know
let's say you have is hard drive and you
need to compress the older files one by
one well you could spend you know ten
thread and you know each of the threads
will take one of the five from the
directory think of it for instance if
you doing hpc or if you're doing in this
case like you know side side computation
and you have a huge area of data but you
need to that you know crunch to it and
apply you know lecture so meh safety is
or you know to the transformational
rotation of the data set where you could
actually use the same actually
architecture here to go through your
data set you know you spawn and you like
you know you're going to slice through
input data and pass it to end thread and
then we combine the data
the last one sounds more difficult but
actually works pretty well and here in
this case we could have stickers on task
with multiple out there for the type of
usage making of the type of application
that could use that or applications that
need to execute that say n different
task on initial data set one example
that I'd like to give people is for
instance take a word processor let's say
you're this word processor and you have
like this you have to run different tack
on to like you know the data set so you
open a file it's a huge fight like it
makes a megabyte two megabytes whatever
could be a hundred K and what you're
going to do is that you have to learn
expand shaking then grammatical analysis
and then maybe after that you're going
to want to translate the research into
French German or you name it here in
this case which will happen is that the
input buffer of initial buffer will be
the first paragraph okay of the document
we're going to pass back to thread
number one trend number one is going to
do this by checking when the spell
checking ins down you take the output
buffer and you pass it to three number
two who's going to be doing the
grammatical analysis for instance okay
so what do we have at this point in time
we have thread number two during
grammatical analysis on paragraph number
one and then frame number one will be
grabbing actually let's say paragraph
number two of a document and diverse
spell checking on it etc acts that are
so think of it as our cascading the
result of the previous thread and here
in this case of n straight actually
dependent of the result of an N minus 1
thread but after an operation all the
threads would be actually full doing
work and then the output buffer will be
obviously like you know the French deck
selecting overcorrected English text
whatever you want
okay so now let's talk about the
different implementation and what a PRS
as you can use that you can use as a
developer what's important to understand
is that your macro extend the three
different implementations I'm going to
be talking about actually are
implemented over pthread which is really
good news I mean if you've been coming
from Mac OS 9 this is like you know my
question is like a tree multitasking
system preemptive multitasking system
and it's great to evacuate that
implementation and obviously if you're
coming from a unique background you
probably very pleased with that on top
of that which we have like different
type of representation so Java is like
the jealous read in the middle top of
the thread and other you know their own
API obviously carbon as what we call the
NPAPI and I took a little bit more
detail about that in a second and cocoa
with any thread actually a same thing a
set of API is implemented on top of
peace with usually like what i get is
like the first question i get after my
presentation is like so so we have all
these choices so what what should i do
that why should i use NP instead of
peach red well it's that there is no
answer for that question because the
idea here is that we give you as many
choices as we can and it's up to you
guys as a developer to let you find what
fits best for you i'm going to show you
i'm going to be using the NPA PRS
because when i started i had no clue and
you know you need to actually thread my
application and I wonder like you know
what I want you to use your piece words
were to be to level four me and
documentation was kind of hard to notice
you know coming from the mac days in am
between a unique developer yes and the
in pap has video for the nice
abstraction so that's why i decided but
if you more comfortable with P threads
and you've been dripping with beatriz
please do so use P threads okay there is
no big deal and then cocoa Java
depending of like you know whats up
application you're developing it's up to
you okay for carbon development and I
should rephrase that because we've had
some folks actually doing coke 0 but use
the NPAPI is in wonderful workshop
citizens to see a PR you know coke
applications can use it as well you can
pretty much everything you want from
cocoa but if you have a carbon
application the NPA PRS are available in
will keep processing the edge and
hopefully it's clear for you guys in the
back and you know we offer like you know
services an object such as MP SMS
or MP cute and MP test and I'm going to
be talking about that in more detail and
once again important to understand that
you know believe that what we did in
fact is offer you an abstraction level
on top of P thread okay so you're going
to get a king of a result and the
quality of peach red okay so straining
implementation this is where things get
interesting you have to approach it the
first and i remember from talking to
some folks out there in a thing oh you
wanted to throw my application but this
is going to be a nightmare me you don't
realize you know we have like you know
the menu management i need to keep track
of what's going on I mean this is going
to take me like a year to thread my
application so the first approach of
course is the difficult one I think
which is you have an application that is
not ready right now and you're going to
retweet everything and here the main
idea of course is to you know giving
users as much responsiveness as you can
but I think there is a better way
actually to like stop spreading your
application depending of what type of
applications you have but like you know
another task would be to actually just
thread CPU intensive operation and here
the main advantage of that approaches
that you don't have to replicate cure
your whole application okay so let's say
you have an application and you do some
computing and you need to compute you
know I don't know like a 3d generate
motor model or you have to do like some
compression well what you could do is
like when you get and the user execute
that test you could start you know just
playing that part of a processing and
you know i'm going to show you
techniques actually to aikido enable you
to do that without having to react to
the rest of the app because we're going
to work on that part of the code and we
won't touch the rest of application
so if you want it to like you know
thread your whole application you have a
couple of contents to internment you
know threat management obviously you
know what fred is going on like if the
user wants to like we do the operation
the train is not done you have to
actually kill the thread and we started
in this kind of thing and then you have
to do some synchronization of course you
know when a thread is done maybe going
to spawn like you know five thread and
they're going to do like each part of
something you can have to notify back to
the main event loop but you know like a
thread is done or it's like you varies
an error or like it crashes you name it
and of course you'd have to like you
know make sure that you know you
implement threadsafe services in your
application as well you know but we
force you to think well but global data
is being accessed not only as great but
i just write as well and i'm gonna have
to put a lock on that so like my thread
can access it now let's go with what I
think is the simplest approach which is
like you know thread just one part of
application thread like you know an
operation in your application that takes
relatively a long time but is really cpu
intensive so the way to go about that
like identify a tight loop but uses out
of cpu or in the just acts like a very
long time and always a 20-percent okay
the main idea here is that if you fit in
that category I think it's very very
straight forward to actually just / tube
but you have to ensure that in a bad
suit can be divided in my example for
instance you know I don't have any data
dependencies from one thing cell to the
other if you want it to do like
something a little bit more elaborated
where you know value of a pixel depends
on the one that is you know like maybe
ten rows below or like you know like
five pixels before and maybe do a bit
more difficult to achieve because you
have to make sure that actually but
pixel value has been computed
so typically what are you going to
conforto here in this case in micro diet
like you know compute money abroad that
was taking a bunch of parameters and
here like you know a look that was
actually doing the work okay and i was
going line by line typically the best
way to look at it but try to find and
you know your code of juicy you have to
do like you know like searching for
loops with the main idea here is like
you know identify something in the
aspect of like you know you have big
loop to a large number and baddest on
processing here in this case remember
what I said at the beginning you need to
make sure that the code can be executed
independently here in this case I had to
ensure that compute my fractal you know
the API the function that does all the
work could actually be executed as a
separate entity okay which no big deal
because noone pretty straightforward so
let's try to like you know see like you
know a nice graphic here like you know
what's going on so what's going on where
you're not shredded remember i have one
thread one process okay one thread in
the process that has been known to
process being the application in that
case you know a kid i guess like you
know comment form economic carbon event
handler at that hair do some
benchmarking know like you know compute
the factor then i get into the main
thread i compute model broad then after
that you know i compute a VPI but that's
the real work that going to do like note
that lube that goes like each line and
then we go back to compute model brought
the preferred binfield it's been
computed and then i go back to the main
event loop and the display the buffer
result ok so now how we gonna have to
recapture that part of the code that
routine in order to be threaded okay so
what's going to happen seven before
somebody is going to do the benchmark
like compute you know at the model both
space what i'm going to do that is that
i'm going to spawned two threads i'm
going to divide my buffer into two
different park the first one I'm going
to spawn friend over one and that's
going to be in this routine that is
pretty much an exact copy of actually
the one I had before but just with an
adjustment of parameters for like the
beginning and the end of a computation
and hear what I do is that I thought the
offset you know a pointer to the
beginning of a picture and then the last
you know another parameter is actually
the end
when I want to stop in your hand these
cases like the size of a picture / to
you know for the number of loops thread
number two is going to be spun as well
and here as you can see the red value
has been changed and what's going to
happen is that i passed the second half
of the picture ok and here you're
probably wondering well why just to
write I mean and this is actually
something that happened during one of
our workshops where like you know some
folks were wondering where the prime
here is that you know you think that you
know there's only two cpu's with what
happens is one day you know you have
more cpus and that's true you should not
make that type of assumption your code
should be able to actually divide and
fly at runtime and it's not very
difficult just from the number of CPUs
and you can actually do not divide your
picture like that which actually I
dealer as well the crew thing here is
that remember I'm gonna spawn these
threads but I don't want to get into
business of doing management of thread
ok I don't want to wreck ejector my
whole application so I still want the
application to be blocked so when I get
inside that routine can kill him on the
road I want to spawn my to thread but I
want to wait there I don't want to go
back to the main event group because you
know I don't want to be a virtue like
you know don't want the user to click
again and recompute and because then I'd
have to be like oh that management find
out that the threads are done computing
we start them I mean kill them we start
them with a new parameter so here my
idea was in a very skimpy I want you to
take advantage to the site that you know
the 2.5 is like you know two processors
inside so what I wanted to do is that
made that computation as fast as
possible I didn't want to let you know
react to everything so here what I'm
going to do is that in compute model
growth calc you and abroad you know I'm
going to wait I'm going to sit down I'm
going to wait and I'm good we're going
to see how we going to do that and then
obviously remember i said when those
words are done we need to find a way to
signal or notify the main thread but
actually way down ok because remember
we're inside the routine calcul Mundel
broad i spawned two threads these two
trades are going to be just like doing
some work but you know it's like 10
milliseconds to sponsor thread and then
we go to the next color ok and then
there is no way for me to get back to my
routine because that need dependent
execution cut part 4
member that so we need to signal okay we
need to get back to the internet say hey
I'm done okay okay so how we going to
achieve that step number one and
hopefully it's a big enough for you get
in the back step number wise that you
can have to initialize the empty library
ok and here like the example i'm taking
i'm going to be using actually impede a
break okay we'll keep processing the
edge ticketing cross services in the
framework conservative so first thing
i'm going to do is the count the number
of processors okay then i'm going to
create a queue and i'm going to explain
like in and marketers what that is about
and then after that i have this loop
that goes from zero to actually the
number of processors and I create a task
think of a task as a threat well kind of
let me go in more detail about that the
NPAPI of a cool at fraction of it i mean
i really like to personally because i
think it made my life very easy for
implementing that feature think of it
this way what happens is that we're
gonna have a queue while we're in a semi
job and the NP library is going to be
the one actually dispatching that to the
different thread and signing out when a
thread is down when does not done on
this kind of thing so this is good
because i don't want to have to deal
with about all that sir so here in this
case we create a few and that q is going
to be actually a global object and this
is where i'm going to schedule when i
set on a thread this is where i'm going
to schedule my job to be executed and
then after that the NPAPI are going to
be actually the one distributing that
load actually to the different task even
if you have a dual tool for instance you
can create you know for test or a task
if you want it all six or that matter
it's up to you and I sure that in the
demo ISM actually interesting things
about the king of the overhead for
creating more tasks than processors so
there is another thing that is kind of
create that we had a George Warner with
the working dgs and psycho phrenic you
know optimizer guy it was some sample
code because i want to see when i said
okay i like to sweat that like you know
what do you think i should do like why
should i read and you wrote actually yet
another abstraction layer on top of the
NPAPI so it's pretty cool so you know to
do like the job but i showed you here
you can just do that on a call NP
Bennett and I'm going to be posting
actually the sample code that code
specifically for you guys so you can use
like a very easy set of API to actually
like you know submit jobs and initialize
the stuff it I think it's three or four
routines it's very cool and then years i
can MPG herbs in it as i showed you and
an MP job submit and actually writing
the sample code is going to be submit my
threads on my thread submit a job to
actually work you so with you had in a
second then step number two we're gonna
have to move up i loop inside you know a
new routine so remember what i said like
you know where the calculated that does
the work now we're going to create
something but you know can be executed
independently ok and here what happens
is that i'm going to create a new
routine i could have been overridden the
other one but and in this case because i
want to be able to reuse my sample code
i'm going to pass through one point void
pointer because then I can do dynamic
type casting and I can use that code
later on in another project if I wanted
ok that's going to give a routine that
function that is going to be cold
actually by the thread ok this is my
execution third path but going to be
executed in dependency of the rest of my
application ok so that routine is going
to be the one doing the crunching so
what you're doing there you should
prepare for data what I mean by that is
that I'm going to retire us actually the
void pointer to like some internal data
so i can get back you know the beginning
of the loop end of the loop a pointer on
the picture in Europe because it's a
month at work space another imaginary
number to compute actually the deltas
and find out if the numbers in or out of
another broad space so like the real for
the imaginary part and etc then i do my
crunching so it's like yet a loop that's
going to be executing here and i'm going
to compute like you know it's like you
know the pixel is in or out of the space
and then once i'm done i signal so i
need to find a way to let you know
because that routine once we get out we
lost win the blue ok so i need to find a
way to say hey you know what I did my
job you know i'm done i computed like
half of a picture is finished
okay so step number three what I'm going
to do is that you know to simplify once
again I don't want to go back to the
main event loop I'm going to create a
new routine which is a compression which
in and what that's going to do is like
it's going to sit tight going to let you
know you know just waiting there it's
going to be a routine that's going to be
cold for my main thread okay I don't
find that you know I'm going to be
waiting there and that's going to be my
routine that's going to be waiting to be
signaled okay and once again here you
know that enables me to keep my existing
architecture and not to have to rocky
picture the whole application ok so here
you have actually the way that I'm going
to do to schedule for work first I'm
going to create a semaphore and the
semaphore is going to be that object but
I'm going to keep between actually my
freezer and I'm going more details about
the same are in a couple of slides the
API i'm going to use MP jobs image is
actually the one that is in the sample
code that i'm going to give you guys but
enables me to just actually submit a job
and here in this case when it texts
actually report pointer on my routine
which is the calculor model betrayed
prague and the two parameters you see
after remember actually a pointer on the
data don't make the same mistake that's
what I did which is I created the
pointer so first remember each of
actually the threaded routines has his
own register and stack so that means
that you know you want to pass a pointer
on memory because once your garden there
you want to make sure that that memory
is unique okay so wherever I did the
first time was that you know I get like
you know create a new pointer I set my
data inside so I said like start of the
loop at zero and ended like you know
half of the picture then I spawn my
thread then I use the same pointer and I
dunno just put aside like you know the
parameters incidents that you know start
from half of a picture but the fact of a
matter when I was doing that I was
modifying memory that was actually being
executed in another thread because I
passed out to my first threat so don't
make that mistake here in this case I
create two pointers pnp to have been
divested typecasted to avoid point to
avoid star and then after that you know
you have to understand that the integers
image you submit the job it doesn't wait
until the job is finished okay but the
main idea of xfering actually that part
of a routine so it comes back and then
you know i spawned
second thread comes back doesn't wait
for it to be finished and then we wait
for completion and this is actually the
routine that's going to block that
virgin is going to stay and wait for me
to be finished ok so the semaphore is
the topic object that can I neighbors
that's going to enable us to actually be
notified when the thread is finished I
had this first version of the slide that
use the semaphore semaphore in French is
like you know the light with the state
bad idea think of it as like you know
like little dogs with state changes like
maybe a state ever think of it at the
state table and this is what we're going
to use we're going to use that object to
actually find out when the threads are
done so here in this case we're going to
call NP create semaphore which is in
Russia processing the edge the first two
parameters of the maximum states and the
initial value in here in this case the
next day is going to be 2 begins respond
to thread and the initial state is going
to be 0 I want to start at zero and
sends me back actually a pointer on my
data and here like you know what happens
is that that data is global okay because
what happens I wanted as a global
because I wanted to be able to access
from like you know of a different thread
for the main thread and response red
okay remember because of the thread
there are actually in the same memory
space okay so i can do that so now
remember we have actually the threaded
prog how do we notify how do we signal
but actually like you know you're done
how'd you change the state in the
semaphore was very easy NP signals a
merfolk available as well in like mucci
processing garage and you just pass
actually your global semaphore so
waiting on the semaphore that's what i
call the winning game you have two ways
to wait on the same fo NP wait on same
effort that we are going to be using if
you want to sit tight and wait until
aquino you being notified if you pass k
duration forever what happens is that
you're going to wait that code is going
to block and if somebody changes the
state in the semaphore okay so here is
what happened you remember we sponsored
a number one respondent number two and
we call this a p.i.m.p wait on semaphore
so we're waiting there because we passed
a generation forever when the signal is
done in the thread what happens that the
same afford state change it goes to one
then but ap has come back because it's
been the state has been changed and put
it back to zero and i'm going to show
you the isaac to the effing the next
time she's better the key generation
immediate changes the state in the same
as far as soon as you could've API it
doesn't work ok so for instance the
state was at two you know and you would
call NP we're on semaphore with schedule
a shin immediate it would actually like
subscribe the states will go back to one
and then 20 if you watch could it twice
so now let's look at our nice graphic
again and let's see what going on in the
3d case where environment red we you
know we've been passed the buffer when
calcul monday broad we create the
semaphore to state initial state 0 okay
what's going to happen afterwards that
i'm gonna storm drain number one and
here like you know the Imperial on
simmer for should be after my mistake
but I'm going to spend three number one
and thread number two ok as we did
before so now what's going on in that
point at that point we have three number
one doing some computation thread number
two doing some computation on the other
half of the picture and the main thread
is blocked on NP wait on semaphore ok
which is good that's what we want we
don't want to wreck attack sure so
that's good now what happened boom MP
signal semaphore we done we will
completely lacking a half of a picture
we're finished with the head you know
what we're done computing do whatever
you want now my part is done the thread
is finished what happens is that the
items do not actually increments the
state in NP wait on semaphore ok I'm
sorry but going come in the state in the
semaphore so from zero we come to one MP
singer semaphore is done a whole thread
is finished it's done that routine not
anymore but then what happens the state
changed to in our main thread NP we're
on simmer for changes and comes back and
doesn't block and in doing so we put the
states of the semaphore 20 ok then after
that and you know in this case I said
you know thread number one finishes but
it doesn't really matter ok semaphores
are we entering so if both finish at the
same time very
I brain knows what to do so don't worry
about that so now let's say and you know
it doesn't matter to reflect red number
2 finishes before one we don't really
care so now let's give a 10 piece image
43 number two is done you know we're
done with our loop then we could empty
signor semaphore in that in that routine
what happens is that that increments
actually the count on the semaphore with
back to 10 the semaphore state has
changed then what happens when MP we're
on semi for the second one we hide comes
back the state goes back to zero that
means that actually like the calculor
model brush is done the widow we don't
block on a block in that routine we go
back to the main event loop and we
display the result so remember that
everything below or like you know the
same Avella calcul model god has not
been church or main event loop the rest
of our application we didn't have to do
anything don't make sense good okay now
let me show you a demo of that we could
switch to the demo number one please
what a great first thing first I wanted
to mention that actually which are Kurds
who is a one of a long time developers
on your computer send me that Kurt and
thank you Richard for that we were
working on some things and it's any bad
code and then I decided hmm would be
good to use that as an example for
trading and so then what I did is that
you know a thread per application so
thank you Richard there we go so here
what we have is just like you know basic
man robot space if you do some research
what happens is that you should know
that before doing anything you should
put some kind of benchmarking if you
want to do some work on performance here
in this case what I do is that I have a
benchmark let me move that a little bit
everybody can see here i put a benchmark
and so here it's a pretty easy space you
know you have to understand that the
difficult part to compute is actually
the part in black okay so here what
happens that I have a slide so i compute
i think that picture i compute the
picture like something like you know 10
or 20 times I don't know exactly but oh
what happened here we crash the we
disappear
oh that's a good dinner now it's see I'm
sorry I don't know Lydia click to self
if you remove the dirt okay so what
happens here is that I see me to the
benchmark actually reach our word that
could and it just sends us like you know
how long it is to compute and here you
can see that it took like you know 95
like point 95 seconds to complete that
space once again in foreign to
understand that the black part is a
difficult project computes obviously
here you know there's nothing very
difficult compute what I wanted to show
you too is like one of the tools that
ships with our system which is called a
thread viewer and you guys know about
treasure raise your hands you know ok
good pretty good seems like all of you
know about threading already and here
you can see that what I did is that
initialize at the beginning the MP
libraries with like three processors but
for kicks actually this is routine that
enables you to create like more threads
if you want it but i'll show you that so
what happens here is that we can see a
cleaver work at is going on i'm going to
go to like maybe a difficult more
difficult bottom i have a cheat sheet to
make things faster so here we're going
to try to find like we're going to try
to fill the screen with more blacks like
you know we really use the cpu power
computing the white part is like pretty
straightforward and easy
we just take a second
okay actually it's good enough it's not
worry too much about it okay good so
we're here let me remove like vodka dick
and so here I'm going to benchmark it
and i want to show you that here we
using only one thread in the bottom okay
you can see here actually the drain is
like you know when user space and so
we're computing here which is kind of
sad because that's a typical example
actually you want to use thread for that
type of like you know computing
obviously it makes a lot of sense so
here you can see that you know to doing
my benchmarking takes quite some time
and that's your g5 and I have something
like you know something like two gigs of
ram so obviously you guys don't set a
software that complete model board space
or maybe not that close but I think you
can actually probably relate to like
some part of your code that you could
use that in so here we're done you can
see I rabid straight and it took like
you know six-point 31 seconds and you
know you have a min and Max like you
know computed what we're going to do is
that you know i'm just going to use
excess reading and i'm going to do to
benchmark again and i'll show you the
code after that which what you can see
here is that now both gpus being
utilized the white space you can see
between the threat is because the fact
of the matter what happens is that i'm
doing a flight so what happens is that
the picture takes a certain number of
second but i do bad protesting something
like 20 or 50 times so that's why like
you know we go back to the main event
loop because the threads are gone to you
see one thread at one point and here you
see like you know four seconds so we
went for Maggie know what is what it's
like eight point something to four so we
get like almost twice speed improvement
I did some testing before which was
actually rather interesting but between
the SP results I got an average
depending on how difficult things like
1.7 times faster between the different
results the cool thing too is that if
you then on top of that put alchy beg
you get some dramatic performance
because then you use both I cubic unit
128-bit computing per cycle and you get
to something that gets very very cool so
here in this case you know depending
who's like what you're doing with valkyr
icons are threaded I get to like
something like one point eight one four
nine times faster depending on the
spacer
so I can show you that you know now if i
do the benchmarking you know where eight
or nine seconds and now if I do I'll ki
dekh process reading we get a huge speed
improvement here one point seventy eight
so we go taking a six times faster
between the threading and relative
action okay so that's a cool demo what I
want to show you that it's always hard
and I got the question which is like way
so how many slides how do you slice your
picture right them and you have two CPUs
but what happens you know if you let's
say I'm gonna have four slides and i
want for thread and i have only two CPUs
and I was saying well that's true so
what's the overhead and this is where
you see that actually back row stan is
truly a great with it great at
multitasking because i'm going to put
four threads and for jobs and i'm going
to revert you like because it goes way
too fast what actually doesn't matter
but if I benchmark here and here you can
see we'll have four threads going on
okay you're going to see that varies not
so much overhead and instruction case is
actually depending on the memory usage
you can actually some pretty good result
so it's very interesting because what
I'm getting to is that you know you
could ship code that is spreaded okay
and let's say you're going to decide on
two threads and you divide your picture
or whatever you're doing the processing
into threads and then you go back and if
you run it actually on your powerbook
you see that there is not so much of a
big overhead in certain case it's
probably gonna be the same speed
depending on what are the processes are
running on the system of course so this
is very important to understand truly
great multitasking system okay showed
you that let me show you the code
quickly this is the library I told you
about and these are the two files i'm
going to be actually posting for you and
until you drink the Q&A I don't want
take too much time right now but here we
have like you know a couple of whoppers
and here you have a MTG at minute so
what we do just as I showed you what
you're going to create it like you know
the number of processor that schedule
okay it's going to be one or two and
then after that we create the queue this
is actually a global variable and this
is where actually you can assemble your
card for exactly sure no clear and then
I just like that this basic loop at
actually Chris create a task
professor okay now let me go back to the
king of the compute code this is the
factor will we get in here that large
enough everybody can see in the back for
good look clear hello yeah everybody can
see okay good thank you so here I just
have a global that I checked you know
it's because for the demo purposes that
check if when the 3d kiss on that and
here i have actually something that
comes with a number of jobs and that's
another global that i use but instead
actually if you like the menus as you
guys told you that you know i create you
know a pointer for each of the data
structure i want to pass to each of the
thread then i have a loop that you know
submit the job and this is very cool
because seriously with the MP rapper api
that I had it took me a couple of hours
to implement that where the cuckoo guy
to give it was here that has the
software but that compression proc
spending 20 sonam it's pretty clear good
stuff he has a wavelet compression
algorithm in less than four hours we
actually change its code to use that
type of of threading and it took us
actually like three hours but that's
because I was cutting I mean somebody
has weed is probably way faster but here
the cool thing is that you can see I'm
using NP jobs image which is an API that
is part of like you know because i'm
going to give you guys and i just pass
actually like you know my routine ok and
some pointers and here you see like the
job data which attract the stuff i have
let me show you I don't know why certain
comments seems to be hiding things oh
I'm sorry ok let me open the project
again great awesome
I'm going to go back here I mean here
and I want to shoot the stupid code and
I want through here it doesn't see
simply okay so we're here so what
happens this is actually remember of a
routine that's going to be called by
each trailer okay remember execution
could pass that's going to be done in
dependency so this is actually what's
going to be cold I just passed actually
the address of that routine very cool
first thing I do is that a rich prefers
actually the data obviously you could
pass backing of the fractal data which
is the trip to I created but because I
wanted to use them the MP dot see you
know the mpg of the seeker you know
another program I decided to use void
void star you know the way of like not
depending on the data type this is
actually equipment once you see the
credit clicker so a repurpose all the
data okay we good are we compute
actually like you know the Delta of a
start remember i have to adjust that
routing of various like you know from
the start and making it to the end of a
computation and here don't worry about
that i should have remove that but the
main thing is that we here i do the look
why is it fast well because you know
what happens is that think of it as like
two different code paths wine is going
to start and do like the first part of a
picture and the second one the second
part and I change back to making the
beginning and end of set and we do the
computation in floating point here and
into the velocity engine here and then
when I'm done I signal the semaphore
remember what I showed you like in the
program okay if we could switch back to
the slides
okay some recommendation don't so you
know the empty you need to the stuff i
showed you when you like you to
initialize like no MP library and
company Murph like jobs and submit
create active attacks don't do that you
know for each time that somebody request
to do an action that's going to close
your credit could do that in your main
you know when you start the program and
then when your program quits just clean
after yourself don't recreate an Aquino
we use in this case I do that because I
wanted to have an exemplar why use like
you know a trade for instance and we'll
show you the results with only two CPUs
but in the typical case you would not
want to like you have this overhead so
do be data-driven and what I mean by
that is that think of it as do you setup
your memory management create urs and
then your thread should be actually the
routines that do the real work okay we
don't want to start to be in a thread
and wait for cabinet in somewhere like
be notified by another thread you want
to use video threads for doing like you
the data crunching you know when to sit
in there because then what's the point
of having a thread is like you know
you're waiting for like you know
something to happen from Mike another
thread and you know it could be a case
but the main idea is you want to like
use like you know the bandwidth of like
you know the g5 you want to do the
thread have the threads of the data
crunching so before you spawn the thread
you know set up all urs set up the
memory and all these things you need to
do to let make the data crunching
attractive and then you know when you
come back where do the closing or
windows before that's right to that do
the cleaning you know a cleaner the
memory but you have allocated for the
thread and that kind of thing okay so
let's go back what happened in the 3d
case you're going to love that side
remember we create we create the
semaphore okay this your state 0 that's
good we found the two threads ok so
we're number one friend number two and
then word when we went on the semaphore
but what happens it would block okay as
I told you before but fine okay or
initial goal was to you know really get
the data crunching wanted to do that
operation as fast as we could and we
didn't want to like reality of a whole
application so let's say now step number
two
your application with a step number
one that we still the first step okay
let cool your customers are very happy
because some operations are like you
know like up to the two-point times
faster that's good everybody's happy but
now let's make the whole experiment
better and let's try to the cactus read
the whole application what should we do
in this typical case well that's going
to be the easy one but what we're going
to do that will create a semaphore when
like you know in concealment that word
that's good we can stand the thread but
does the wedding remember waking up
before back routine I had something that
was doing the weight with the two curves
to wait for completion what I'm going to
do now is that that code I'm going to
put it in a routine and I'm going to
spawn that routine as being like you
know one thread okay so what that
traders is that we know where it sits
down right now it's like just waiting
then what we're going to do that going
to spawn the other two thread thread
number two or number three and then when
we do that what happens well we go back
to the main event loop so not careful
because that means the user could
actually go back into the head benchmark
again and the thread could still be
running so we maybe we'll go back to
that first part of a presentation what I
said hey guys you have to be careful if
you want to thread the whole application
you can have to do some thread
management it's possible okay it's just
I want you guys to understand that there
are different steps in different ways
where you can program thread the
application so let's say we did that
work well very cool and what happens
when it works well we're going to signal
remember the signal is going to bump
actually 21 in this case like the
semaphore then what happens is that the
NP wait on semaphore in our routine
because you know once again the same as
for its global is going to actually come
back so like you know that code doesn't
block now we're block on the number 2 MP
sigler semaphore is actually closed
inside 20 number to the state changes to
one and then what happens then p we're
on semaphore is going to turn it back to
zero at that point in time we have a
thread that tells us that thread here
thread number one says hey my other two
threads I've done doing the work now
what do we do we have that but we need
to identify the main event group
remember we have to tell the event loop
hey I'm done so how we going to do that
when very cool there is a very nice
karbonn avantco which is called post
advanced too cute and what that's going
to do is that we're going to create a
carbon event by sea to the CPI that's
going to send it to the event manager
and doing that manager is going to
dispatch it to a main thread okay this
is what you're going to do and you want
to update UI for instance okay first
event took you very good you passed the
carbon event you just need to install
carbon evan handler the carbon evan
handler could be installed on a window
on a controller widget an interview on
the application it's up to you a lot of
flexibility and then inside my
application the main events you get
notified and then i can display my
picture when we're done okay so some do
then don't in that case when you start
trading the whole application be careful
with the UI okay it's okay to draw with
quark you may have some issues depending
on what you're doing but it's okay to
draw with quad from different thread and
we have select with some sample code on
the DTS excuse me on details on all
developer.apple.com website and ability
to check that out and George is going to
be here and can give you the complete
URL OpenGL is ok as well and once again
if you want to know t5 and many thank
you for drawing a button for dating a
scroller please use post-event 2q plus
23 is very cool because you can call
back from wherever you want you create
your own carbon event with your type you
know like it's up to you and then you
know you have your carbon event handler
and your window and your application is
going to be just going to be cold this
is the way to do like you know user
interface from different threads all
right quick summary don't like this so
once again thread your application wins
appropriate obviously some of the exact
examples I gave you here maybe don't
apply to you okay don't start going you
know in a frenzy and I stopped reading
even if it's easy and the main idea is
but I would encourage you guys to go
back and think maybe for a couple of
mins things like you know what part of
my application is taking a long time
right now what part can I do better for
a user and you know once again you can
have two motivations for filling the
application it could be responsible okay
because you doing a lot of things and
you know sometimes the user can do
anything the menus don't go down and you
block
and you want to improve that user
experience but then you ever wanted for
like you know in this case for instance
that I showed you is like you know
you're doing a lot of CPU intensive that
you know maybe you're doing like the
computing or like you know this huge
array like matrix manipulation or you
your job is to compute like you know you
get an MRI and we have to find out it's
like in about mrs cancer or something
you get the idea and it can take that in
a long time but the typical case i want
you to think and think in terms of
canvas gibbeh divided could i use
different thread is my your code path
can be executed you know independently
so think about that i think a lot of
developers actually sometimes just don't
think about it because we're going all
this feature but i think it's very very
important with what our users are buying
now and this new g5 that actually will
think about responsiveness and high
performance by threading and once again
be posting the sample code probably
tonight actually I figure can do it now
let's fire daunting a position after
that and I go into more details about
that in a second alright if you want
more information we have some stuff on
carbon threads in the Milky processing
services and adt home you know it's
developer apple com let you read the
slide if you're interested in cocoa
obviously cookers on threading as well
if you interesting in the POSIX actually
pthread you can just do a man the main
page is actually pretty good I mean I
know if you're coming from back when I
probably thinking like man page but I
want with the terminal but I personally
I encourage you to actually check it out
it's a very very good start when you
look for information the Dow insidious
repository if you really wanted but
technically I think the open group or
collection pretty good news and updated
information on pthread okay we have some
technical notes the 2020s of a technical
architecture and we are actually a
technology with the MPSF regime