WWDC2004 Session 435

Transcript

Kind: captions Language: en hi everybody welcome to fishin for 35 my name is Xavier drew and I work in developer relations most of my job is actually to go around the world and kind of like tell you about the latest technologies and encourage you about like you know opting for like adapting murder and Mac os10 technologies what has been coming quite often is that very often people are afraid of you know threading the application so we've been doing a lot of what we call workshop where we take like you know groups of 20 25 30 people and you know we kind of teach them one topic you know we have cocoa workshops we have like you know h-hi toolbox carbon workshop when we do this presentation with a very very good feedback and very interestingly I think it really have a dub actually and trading your application so that's going to be able to pick today so what are we going to talk about so today we're going to talk quickly about why you as a developer should care about trading we'd go through some threading terminology some buzz words to make sure that we all speak the same language here French obviously and we'll go through a couple of examples of trading architecture I'll go to like three of the main architecture that I think are used around shredding and I think the best part is going to be like you know I'm going to try to take you step by step and try to like teach you how to thread your application using the NPA PR and then we'll have some do's and don'ts and something that I think is a crude mo okay so why should use thread well because if you stress on your thread your application we're going to get you fifty percent of like one of these brand new g5 and a brand new display are usually rejection that's what I stuck and you're probably thinking really know naturally of course but why should you thread your application well the first thing that comes to mind of course is scalability we are shipping now like a lot of these boxes with like you know two CPUs inside specifically you as a developer with you nice application if you'd only one cpu it's like you know having our users you know use envestra thousand dollars for like using half of a machine I think a lot of applications could actually use threading in their app and we're going to go here to the exam of the main content and explain white notes image of the big deal some time to thread part of an application here what I'm showing you is actually a couple of results with some drastic transformation in this case we have like you know Gaussian blur and I can we go to a motion blur and you can expect when you thread a part of your application for like hi intensive tasks between a 1.3 to 2.3 times faster and here you're probably wondering why 2.3 times faster right if I have two CPUs how can i get like more than twice as fast right and here in this case if i'm not mistaken with we're getting super scalar reserve the main important thing to understand is that in order exists we have a little bit more than just two cpus okay we have a very very strong gerke picture by the neighbors earth to actually take advantage of both of these g5 in the dogs specifically here why do we get things such as 2.3 times faster well for the simple reason that actually each CPU is going to have his own bus to go to the king of the memory controller that makes a big big big difference in certain cases maybe you are grains is going to be cpu bounded in this case when sweating your application just imagine that you're giving twice as much bandwidth to the main memory so why you spread so obviously customers expectation as i talked about and scalability once again the three of a g5 shipping right now all have your cpu's this is a big deal and I think you know you've seen like you know the industry I mean internet been doing some announcements about dual core this is something that you know really vendor street moving toward so please keep that in mind your future development okay so it skips cassette so we're going to throw a couple of those words and trust me by the end of the presentation your party be speaking French okay so what's the thread so thread think of a thread as an independent execution code path and this is very very important if you're new to threading and if you have no clue about how you should be able to swear your pick think about your function but you're going to want to thread it is it considered as you know something that can be executed independently of the rest of the code and specifically think of it as something that can have its own stock and register set so what's the process the process is going to be actually a collection of thread with the resources necessary to run specifically a processing code like you know when you launch applications don't let the process and inside that process you could have different actually thread going on okay the cool thing here is where the process has its own address space that means that actually the thread inside the address space can access actually global variables which you had to okay and memory that you pass on the mainframe to the other thread you know it's a memory from the same address space obviously okay now before we go and we explain why and how you should use thread let's think about it a little bit about when you should not use thread okay obviously in the case where it's going to add complexity to your application you don't want to get in that business there is no need for you to spend six months like you know trading your application if it's going to be like you know every time you want to add a feature you know another six month of like you know trying to find out and like you're managing the thread doing the thread management obviously things are going to require a log or bad idea and here in this case think of it as like in the database and you know a lot of people trying to write to the same to the same record depending on the granularity of the lock obviously you're going to have to be very very careful and everyone of course is like you know using non thread safe API that means like you know don't try to thread your carbon drawing code but you know this is the tool box okay UI elements cannot be threaded for joy and I'm going more details about that some of the options you could use cooperative trading which is like you know probably like you know what most of you have been using on Mac OS 9 I put timers here and I'm talking here but carbon event timers and this is very often people don't understand but like you know well you know when doing like a bunch of processing on the hard drive and I want to show the user of a progress and give them a chance maybe to cancel operation something you could use the carbon event timer for that okay use a carbon event timer but going to fire you know every second or or twice per second it's up to you and then like you know the toolbox will curl you inside your cabin event handler your timer event handler and actually like you know between enable you to update whatever you want on the screen okay so I want you to put timer okay now so hopefully you get the idea and if you're here in this room obviously you want to learn and you know you're interested in trading your application now I'm going to go through what I think of three of the main trading architectures that we see out there parent ask we'd like share biographer you can read the slide better than me I think it's better if I show you a picture so in the first case one example to think about the spiral taskbar eobard for architecture would be to think of maybe let's see a simulator a flight simulator you get data in one preferred that data is going to be computed and the result is going to be maybe you're going to compute atmospheric setting okay so this is totally an independent a few have like you know one I over for one output buffer they don't depend on the rest of the data then on thread number two what you could do is compute let's say background and you get you know the data let's say from the internet or form like you know some geostationary satellites so you get the data in another thread that's you I or buffer then you do the processing to compute maybe some fractal terrains okay like some 3d World and then at the end you get that you know G word North Korean or you know cg context or like a note on shares your first whatever you want etc etc could have in your thread you know number three for instance do the computation of velocity of whatever you want is collision you know and here the main idea in these architectures that think of it as you know you have any different input buffers and an output buffer and they just do like you know like your different processing on each of the spread all right this architecture is actually the one that works better i think in my mind and I think like a lot of applications could use that and here in this case we have a birther of data okay it could be like an image and you want to apply it from form or it could be it could be pretty much anything could like a huge array of like floating points and you want to Latino compute the courtesy news but seniors the tangent and you know generate a DNA king of this huge output buffer here in this case this is what I'm going to use in my demo and not to ruin it what's going to happen is that i have this application that computes a fractal so what do I have in input in input what I do have is you know this buffer which is actually a pointer on my off screen okay I have a graph bored enough screen and it points to the beginning of my image and you know to complete the model void space which is the fact that i'm going to be computing I just need to compute like you know it's like each pixel is inner or out of a model workspace well what I can do very quickly and in an easy way that hopefully i will show you is that I could take that initial buffer and fast is actually 2n thread i'm going to divide actually my picture in n different park and n different threads are going to be computing actually the data for me and then at the end because of some magic will actually at five different pointers inside my image you know think of it as like you know just like slicing the initial image and passing one flies to each of the threader I don't need to let you know we combine actually all the result because I just have like you know the initial pointer to the off-screen but that's something you could apply pretty much you anything think of it like you know let's say you have is hard drive and you need to compress the older files one by one well you could spend you know ten thread and you know each of the threads will take one of the five from the directory think of it for instance if you doing hpc or if you're doing in this case like you know side side computation and you have a huge area of data but you need to that you know crunch to it and apply you know lecture so meh safety is or you know to the transformational rotation of the data set where you could actually use the same actually architecture here to go through your data set you know you spawn and you like you know you're going to slice through input data and pass it to end thread and then we combine the data the last one sounds more difficult but actually works pretty well and here in this case we could have stickers on task with multiple out there for the type of usage making of the type of application that could use that or applications that need to execute that say n different task on initial data set one example that I'd like to give people is for instance take a word processor let's say you're this word processor and you have like this you have to run different tack on to like you know the data set so you open a file it's a huge fight like it makes a megabyte two megabytes whatever could be a hundred K and what you're going to do is that you have to learn expand shaking then grammatical analysis and then maybe after that you're going to want to translate the research into French German or you name it here in this case which will happen is that the input buffer of initial buffer will be the first paragraph okay of the document we're going to pass back to thread number one trend number one is going to do this by checking when the spell checking ins down you take the output buffer and you pass it to three number two who's going to be doing the grammatical analysis for instance okay so what do we have at this point in time we have thread number two during grammatical analysis on paragraph number one and then frame number one will be grabbing actually let's say paragraph number two of a document and diverse spell checking on it etc acts that are so think of it as our cascading the result of the previous thread and here in this case of n straight actually dependent of the result of an N minus 1 thread but after an operation all the threads would be actually full doing work and then the output buffer will be obviously like you know the French deck selecting overcorrected English text whatever you want okay so now let's talk about the different implementation and what a PRS as you can use that you can use as a developer what's important to understand is that your macro extend the three different implementations I'm going to be talking about actually are implemented over pthread which is really good news I mean if you've been coming from Mac OS 9 this is like you know my question is like a tree multitasking system preemptive multitasking system and it's great to evacuate that implementation and obviously if you're coming from a unique background you probably very pleased with that on top of that which we have like different type of representation so Java is like the jealous read in the middle top of the thread and other you know their own API obviously carbon as what we call the NPAPI and I took a little bit more detail about that in a second and cocoa with any thread actually a same thing a set of API is implemented on top of peace with usually like what i get is like the first question i get after my presentation is like so so we have all these choices so what what should i do that why should i use NP instead of peach red well it's that there is no answer for that question because the idea here is that we give you as many choices as we can and it's up to you guys as a developer to let you find what fits best for you i'm going to show you i'm going to be using the NPA PRS because when i started i had no clue and you know you need to actually thread my application and I wonder like you know what I want you to use your piece words were to be to level four me and documentation was kind of hard to notice you know coming from the mac days in am between a unique developer yes and the in pap has video for the nice abstraction so that's why i decided but if you more comfortable with P threads and you've been dripping with beatriz please do so use P threads okay there is no big deal and then cocoa Java depending of like you know whats up application you're developing it's up to you okay for carbon development and I should rephrase that because we've had some folks actually doing coke 0 but use the NPAPI is in wonderful workshop citizens to see a PR you know coke applications can use it as well you can pretty much everything you want from cocoa but if you have a carbon application the NPA PRS are available in will keep processing the edge and hopefully it's clear for you guys in the back and you know we offer like you know services an object such as MP SMS or MP cute and MP test and I'm going to be talking about that in more detail and once again important to understand that you know believe that what we did in fact is offer you an abstraction level on top of P thread okay so you're going to get a king of a result and the quality of peach red okay so straining implementation this is where things get interesting you have to approach it the first and i remember from talking to some folks out there in a thing oh you wanted to throw my application but this is going to be a nightmare me you don't realize you know we have like you know the menu management i need to keep track of what's going on I mean this is going to take me like a year to thread my application so the first approach of course is the difficult one I think which is you have an application that is not ready right now and you're going to retweet everything and here the main idea of course is to you know giving users as much responsiveness as you can but I think there is a better way actually to like stop spreading your application depending of what type of applications you have but like you know another task would be to actually just thread CPU intensive operation and here the main advantage of that approaches that you don't have to replicate cure your whole application okay so let's say you have an application and you do some computing and you need to compute you know I don't know like a 3d generate motor model or you have to do like some compression well what you could do is like when you get and the user execute that test you could start you know just playing that part of a processing and you know i'm going to show you techniques actually to aikido enable you to do that without having to react to the rest of the app because we're going to work on that part of the code and we won't touch the rest of application so if you want it to like you know thread your whole application you have a couple of contents to internment you know threat management obviously you know what fred is going on like if the user wants to like we do the operation the train is not done you have to actually kill the thread and we started in this kind of thing and then you have to do some synchronization of course you know when a thread is done maybe going to spawn like you know five thread and they're going to do like each part of something you can have to notify back to the main event loop but you know like a thread is done or it's like you varies an error or like it crashes you name it and of course you'd have to like you know make sure that you know you implement threadsafe services in your application as well you know but we force you to think well but global data is being accessed not only as great but i just write as well and i'm gonna have to put a lock on that so like my thread can access it now let's go with what I think is the simplest approach which is like you know thread just one part of application thread like you know an operation in your application that takes relatively a long time but is really cpu intensive so the way to go about that like identify a tight loop but uses out of cpu or in the just acts like a very long time and always a 20-percent okay the main idea here is that if you fit in that category I think it's very very straight forward to actually just / tube but you have to ensure that in a bad suit can be divided in my example for instance you know I don't have any data dependencies from one thing cell to the other if you want it to do like something a little bit more elaborated where you know value of a pixel depends on the one that is you know like maybe ten rows below or like you know like five pixels before and maybe do a bit more difficult to achieve because you have to make sure that actually but pixel value has been computed so typically what are you going to conforto here in this case in micro diet like you know compute money abroad that was taking a bunch of parameters and here like you know a look that was actually doing the work okay and i was going line by line typically the best way to look at it but try to find and you know your code of juicy you have to do like you know like searching for loops with the main idea here is like you know identify something in the aspect of like you know you have big loop to a large number and baddest on processing here in this case remember what I said at the beginning you need to make sure that the code can be executed independently here in this case I had to ensure that compute my fractal you know the API the function that does all the work could actually be executed as a separate entity okay which no big deal because noone pretty straightforward so let's try to like you know see like you know a nice graphic here like you know what's going on so what's going on where you're not shredded remember i have one thread one process okay one thread in the process that has been known to process being the application in that case you know a kid i guess like you know comment form economic carbon event handler at that hair do some benchmarking know like you know compute the factor then i get into the main thread i compute model broad then after that you know i compute a VPI but that's the real work that going to do like note that lube that goes like each line and then we go back to compute model brought the preferred binfield it's been computed and then i go back to the main event loop and the display the buffer result ok so now how we gonna have to recapture that part of the code that routine in order to be threaded okay so what's going to happen seven before somebody is going to do the benchmark like compute you know at the model both space what i'm going to do that is that i'm going to spawned two threads i'm going to divide my buffer into two different park the first one I'm going to spawn friend over one and that's going to be in this routine that is pretty much an exact copy of actually the one I had before but just with an adjustment of parameters for like the beginning and the end of a computation and hear what I do is that I thought the offset you know a pointer to the beginning of a picture and then the last you know another parameter is actually the end when I want to stop in your hand these cases like the size of a picture / to you know for the number of loops thread number two is going to be spun as well and here as you can see the red value has been changed and what's going to happen is that i passed the second half of the picture ok and here you're probably wondering well why just to write I mean and this is actually something that happened during one of our workshops where like you know some folks were wondering where the prime here is that you know you think that you know there's only two cpu's with what happens is one day you know you have more cpus and that's true you should not make that type of assumption your code should be able to actually divide and fly at runtime and it's not very difficult just from the number of CPUs and you can actually do not divide your picture like that which actually I dealer as well the crew thing here is that remember I'm gonna spawn these threads but I don't want to get into business of doing management of thread ok I don't want to wreck ejector my whole application so I still want the application to be blocked so when I get inside that routine can kill him on the road I want to spawn my to thread but I want to wait there I don't want to go back to the main event group because you know I don't want to be a virtue like you know don't want the user to click again and recompute and because then I'd have to be like oh that management find out that the threads are done computing we start them I mean kill them we start them with a new parameter so here my idea was in a very skimpy I want you to take advantage to the site that you know the 2.5 is like you know two processors inside so what I wanted to do is that made that computation as fast as possible I didn't want to let you know react to everything so here what I'm going to do is that in compute model growth calc you and abroad you know I'm going to wait I'm going to sit down I'm going to wait and I'm good we're going to see how we going to do that and then obviously remember i said when those words are done we need to find a way to signal or notify the main thread but actually way down ok because remember we're inside the routine calcul Mundel broad i spawned two threads these two trades are going to be just like doing some work but you know it's like 10 milliseconds to sponsor thread and then we go to the next color ok and then there is no way for me to get back to my routine because that need dependent execution cut part 4 member that so we need to signal okay we need to get back to the internet say hey I'm done okay okay so how we going to achieve that step number one and hopefully it's a big enough for you get in the back step number wise that you can have to initialize the empty library ok and here like the example i'm taking i'm going to be using actually impede a break okay we'll keep processing the edge ticketing cross services in the framework conservative so first thing i'm going to do is the count the number of processors okay then i'm going to create a queue and i'm going to explain like in and marketers what that is about and then after that i have this loop that goes from zero to actually the number of processors and I create a task think of a task as a threat well kind of let me go in more detail about that the NPAPI of a cool at fraction of it i mean i really like to personally because i think it made my life very easy for implementing that feature think of it this way what happens is that we're gonna have a queue while we're in a semi job and the NP library is going to be the one actually dispatching that to the different thread and signing out when a thread is down when does not done on this kind of thing so this is good because i don't want to have to deal with about all that sir so here in this case we create a few and that q is going to be actually a global object and this is where i'm going to schedule when i set on a thread this is where i'm going to schedule my job to be executed and then after that the NPAPI are going to be actually the one distributing that load actually to the different task even if you have a dual tool for instance you can create you know for test or a task if you want it all six or that matter it's up to you and I sure that in the demo ISM actually interesting things about the king of the overhead for creating more tasks than processors so there is another thing that is kind of create that we had a George Warner with the working dgs and psycho phrenic you know optimizer guy it was some sample code because i want to see when i said okay i like to sweat that like you know what do you think i should do like why should i read and you wrote actually yet another abstraction layer on top of the NPAPI so it's pretty cool so you know to do like the job but i showed you here you can just do that on a call NP Bennett and I'm going to be posting actually the sample code that code specifically for you guys so you can use like a very easy set of API to actually like you know submit jobs and initialize the stuff it I think it's three or four routines it's very cool and then years i can MPG herbs in it as i showed you and an MP job submit and actually writing the sample code is going to be submit my threads on my thread submit a job to actually work you so with you had in a second then step number two we're gonna have to move up i loop inside you know a new routine so remember what i said like you know where the calculated that does the work now we're going to create something but you know can be executed independently ok and here what happens is that i'm going to create a new routine i could have been overridden the other one but and in this case because i want to be able to reuse my sample code i'm going to pass through one point void pointer because then I can do dynamic type casting and I can use that code later on in another project if I wanted ok that's going to give a routine that function that is going to be cold actually by the thread ok this is my execution third path but going to be executed in dependency of the rest of my application ok so that routine is going to be the one doing the crunching so what you're doing there you should prepare for data what I mean by that is that I'm going to retire us actually the void pointer to like some internal data so i can get back you know the beginning of the loop end of the loop a pointer on the picture in Europe because it's a month at work space another imaginary number to compute actually the deltas and find out if the numbers in or out of another broad space so like the real for the imaginary part and etc then i do my crunching so it's like yet a loop that's going to be executing here and i'm going to compute like you know it's like you know the pixel is in or out of the space and then once i'm done i signal so i need to find a way to let you know because that routine once we get out we lost win the blue ok so i need to find a way to say hey you know what I did my job you know i'm done i computed like half of a picture is finished okay so step number three what I'm going to do is that you know to simplify once again I don't want to go back to the main event loop I'm going to create a new routine which is a compression which in and what that's going to do is like it's going to sit tight going to let you know you know just waiting there it's going to be a routine that's going to be cold for my main thread okay I don't find that you know I'm going to be waiting there and that's going to be my routine that's going to be waiting to be signaled okay and once again here you know that enables me to keep my existing architecture and not to have to rocky picture the whole application ok so here you have actually the way that I'm going to do to schedule for work first I'm going to create a semaphore and the semaphore is going to be that object but I'm going to keep between actually my freezer and I'm going more details about the same are in a couple of slides the API i'm going to use MP jobs image is actually the one that is in the sample code that i'm going to give you guys but enables me to just actually submit a job and here in this case when it texts actually report pointer on my routine which is the calculor model betrayed prague and the two parameters you see after remember actually a pointer on the data don't make the same mistake that's what I did which is I created the pointer so first remember each of actually the threaded routines has his own register and stack so that means that you know you want to pass a pointer on memory because once your garden there you want to make sure that that memory is unique okay so wherever I did the first time was that you know I get like you know create a new pointer I set my data inside so I said like start of the loop at zero and ended like you know half of the picture then I spawn my thread then I use the same pointer and I dunno just put aside like you know the parameters incidents that you know start from half of a picture but the fact of a matter when I was doing that I was modifying memory that was actually being executed in another thread because I passed out to my first threat so don't make that mistake here in this case I create two pointers pnp to have been divested typecasted to avoid point to avoid star and then after that you know you have to understand that the integers image you submit the job it doesn't wait until the job is finished okay but the main idea of xfering actually that part of a routine so it comes back and then you know i spawned second thread comes back doesn't wait for it to be finished and then we wait for completion and this is actually the routine that's going to block that virgin is going to stay and wait for me to be finished ok so the semaphore is the topic object that can I neighbors that's going to enable us to actually be notified when the thread is finished I had this first version of the slide that use the semaphore semaphore in French is like you know the light with the state bad idea think of it as like you know like little dogs with state changes like maybe a state ever think of it at the state table and this is what we're going to use we're going to use that object to actually find out when the threads are done so here in this case we're going to call NP create semaphore which is in Russia processing the edge the first two parameters of the maximum states and the initial value in here in this case the next day is going to be 2 begins respond to thread and the initial state is going to be 0 I want to start at zero and sends me back actually a pointer on my data and here like you know what happens is that that data is global okay because what happens I wanted as a global because I wanted to be able to access from like you know of a different thread for the main thread and response red okay remember because of the thread there are actually in the same memory space okay so i can do that so now remember we have actually the threaded prog how do we notify how do we signal but actually like you know you're done how'd you change the state in the semaphore was very easy NP signals a merfolk available as well in like mucci processing garage and you just pass actually your global semaphore so waiting on the semaphore that's what i call the winning game you have two ways to wait on the same fo NP wait on same effort that we are going to be using if you want to sit tight and wait until aquino you being notified if you pass k duration forever what happens is that you're going to wait that code is going to block and if somebody changes the state in the semaphore okay so here is what happened you remember we sponsored a number one respondent number two and we call this a p.i.m.p wait on semaphore so we're waiting there because we passed a generation forever when the signal is done in the thread what happens that the same afford state change it goes to one then but ap has come back because it's been the state has been changed and put it back to zero and i'm going to show you the isaac to the effing the next time she's better the key generation immediate changes the state in the same as far as soon as you could've API it doesn't work ok so for instance the state was at two you know and you would call NP we're on semaphore with schedule a shin immediate it would actually like subscribe the states will go back to one and then 20 if you watch could it twice so now let's look at our nice graphic again and let's see what going on in the 3d case where environment red we you know we've been passed the buffer when calcul monday broad we create the semaphore to state initial state 0 okay what's going to happen afterwards that i'm gonna storm drain number one and here like you know the Imperial on simmer for should be after my mistake but I'm going to spend three number one and thread number two ok as we did before so now what's going on in that point at that point we have three number one doing some computation thread number two doing some computation on the other half of the picture and the main thread is blocked on NP wait on semaphore ok which is good that's what we want we don't want to wreck attack sure so that's good now what happened boom MP signal semaphore we done we will completely lacking a half of a picture we're finished with the head you know what we're done computing do whatever you want now my part is done the thread is finished what happens is that the items do not actually increments the state in NP wait on semaphore ok I'm sorry but going come in the state in the semaphore so from zero we come to one MP singer semaphore is done a whole thread is finished it's done that routine not anymore but then what happens the state changed to in our main thread NP we're on simmer for changes and comes back and doesn't block and in doing so we put the states of the semaphore 20 ok then after that and you know in this case I said you know thread number one finishes but it doesn't really matter ok semaphores are we entering so if both finish at the same time very I brain knows what to do so don't worry about that so now let's say and you know it doesn't matter to reflect red number 2 finishes before one we don't really care so now let's give a 10 piece image 43 number two is done you know we're done with our loop then we could empty signor semaphore in that in that routine what happens is that that increments actually the count on the semaphore with back to 10 the semaphore state has changed then what happens when MP we're on semi for the second one we hide comes back the state goes back to zero that means that actually like the calculor model brush is done the widow we don't block on a block in that routine we go back to the main event loop and we display the result so remember that everything below or like you know the same Avella calcul model god has not been church or main event loop the rest of our application we didn't have to do anything don't make sense good okay now let me show you a demo of that we could switch to the demo number one please what a great first thing first I wanted to mention that actually which are Kurds who is a one of a long time developers on your computer send me that Kurt and thank you Richard for that we were working on some things and it's any bad code and then I decided hmm would be good to use that as an example for trading and so then what I did is that you know a thread per application so thank you Richard there we go so here what we have is just like you know basic man robot space if you do some research what happens is that you should know that before doing anything you should put some kind of benchmarking if you want to do some work on performance here in this case what I do is that I have a benchmark let me move that a little bit everybody can see here i put a benchmark and so here it's a pretty easy space you know you have to understand that the difficult part to compute is actually the part in black okay so here what happens that I have a slide so i compute i think that picture i compute the picture like something like you know 10 or 20 times I don't know exactly but oh what happened here we crash the we disappear oh that's a good dinner now it's see I'm sorry I don't know Lydia click to self if you remove the dirt okay so what happens here is that I see me to the benchmark actually reach our word that could and it just sends us like you know how long it is to compute and here you can see that it took like you know 95 like point 95 seconds to complete that space once again in foreign to understand that the black part is a difficult project computes obviously here you know there's nothing very difficult compute what I wanted to show you too is like one of the tools that ships with our system which is called a thread viewer and you guys know about treasure raise your hands you know ok good pretty good seems like all of you know about threading already and here you can see that what I did is that initialize at the beginning the MP libraries with like three processors but for kicks actually this is routine that enables you to create like more threads if you want it but i'll show you that so what happens here is that we can see a cleaver work at is going on i'm going to go to like maybe a difficult more difficult bottom i have a cheat sheet to make things faster so here we're going to try to find like we're going to try to fill the screen with more blacks like you know we really use the cpu power computing the white part is like pretty straightforward and easy we just take a second okay actually it's good enough it's not worry too much about it okay good so we're here let me remove like vodka dick and so here I'm going to benchmark it and i want to show you that here we using only one thread in the bottom okay you can see here actually the drain is like you know when user space and so we're computing here which is kind of sad because that's a typical example actually you want to use thread for that type of like you know computing obviously it makes a lot of sense so here you can see that you know to doing my benchmarking takes quite some time and that's your g5 and I have something like you know something like two gigs of ram so obviously you guys don't set a software that complete model board space or maybe not that close but I think you can actually probably relate to like some part of your code that you could use that in so here we're done you can see I rabid straight and it took like you know six-point 31 seconds and you know you have a min and Max like you know computed what we're going to do is that you know i'm just going to use excess reading and i'm going to do to benchmark again and i'll show you the code after that which what you can see here is that now both gpus being utilized the white space you can see between the threat is because the fact of the matter what happens is that i'm doing a flight so what happens is that the picture takes a certain number of second but i do bad protesting something like 20 or 50 times so that's why like you know we go back to the main event loop because the threads are gone to you see one thread at one point and here you see like you know four seconds so we went for Maggie know what is what it's like eight point something to four so we get like almost twice speed improvement I did some testing before which was actually rather interesting but between the SP results I got an average depending on how difficult things like 1.7 times faster between the different results the cool thing too is that if you then on top of that put alchy beg you get some dramatic performance because then you use both I cubic unit 128-bit computing per cycle and you get to something that gets very very cool so here in this case you know depending who's like what you're doing with valkyr icons are threaded I get to like something like one point eight one four nine times faster depending on the spacer so I can show you that you know now if i do the benchmarking you know where eight or nine seconds and now if I do I'll ki dekh process reading we get a huge speed improvement here one point seventy eight so we go taking a six times faster between the threading and relative action okay so that's a cool demo what I want to show you that it's always hard and I got the question which is like way so how many slides how do you slice your picture right them and you have two CPUs but what happens you know if you let's say I'm gonna have four slides and i want for thread and i have only two CPUs and I was saying well that's true so what's the overhead and this is where you see that actually back row stan is truly a great with it great at multitasking because i'm going to put four threads and for jobs and i'm going to revert you like because it goes way too fast what actually doesn't matter but if I benchmark here and here you can see we'll have four threads going on okay you're going to see that varies not so much overhead and instruction case is actually depending on the memory usage you can actually some pretty good result so it's very interesting because what I'm getting to is that you know you could ship code that is spreaded okay and let's say you're going to decide on two threads and you divide your picture or whatever you're doing the processing into threads and then you go back and if you run it actually on your powerbook you see that there is not so much of a big overhead in certain case it's probably gonna be the same speed depending on what are the processes are running on the system of course so this is very important to understand truly great multitasking system okay showed you that let me show you the code quickly this is the library I told you about and these are the two files i'm going to be actually posting for you and until you drink the Q&A I don't want take too much time right now but here we have like you know a couple of whoppers and here you have a MTG at minute so what we do just as I showed you what you're going to create it like you know the number of processor that schedule okay it's going to be one or two and then after that we create the queue this is actually a global variable and this is where actually you can assemble your card for exactly sure no clear and then I just like that this basic loop at actually Chris create a task professor okay now let me go back to the king of the compute code this is the factor will we get in here that large enough everybody can see in the back for good look clear hello yeah everybody can see okay good thank you so here I just have a global that I checked you know it's because for the demo purposes that check if when the 3d kiss on that and here i have actually something that comes with a number of jobs and that's another global that i use but instead actually if you like the menus as you guys told you that you know i create you know a pointer for each of the data structure i want to pass to each of the thread then i have a loop that you know submit the job and this is very cool because seriously with the MP rapper api that I had it took me a couple of hours to implement that where the cuckoo guy to give it was here that has the software but that compression proc spending 20 sonam it's pretty clear good stuff he has a wavelet compression algorithm in less than four hours we actually change its code to use that type of of threading and it took us actually like three hours but that's because I was cutting I mean somebody has weed is probably way faster but here the cool thing is that you can see I'm using NP jobs image which is an API that is part of like you know because i'm going to give you guys and i just pass actually like you know my routine ok and some pointers and here you see like the job data which attract the stuff i have let me show you I don't know why certain comments seems to be hiding things oh I'm sorry ok let me open the project again great awesome I'm going to go back here I mean here and I want to shoot the stupid code and I want through here it doesn't see simply okay so we're here so what happens this is actually remember of a routine that's going to be called by each trailer okay remember execution could pass that's going to be done in dependency so this is actually what's going to be cold I just passed actually the address of that routine very cool first thing I do is that a rich prefers actually the data obviously you could pass backing of the fractal data which is the trip to I created but because I wanted to use them the MP dot see you know the mpg of the seeker you know another program I decided to use void void star you know the way of like not depending on the data type this is actually equipment once you see the credit clicker so a repurpose all the data okay we good are we compute actually like you know the Delta of a start remember i have to adjust that routing of various like you know from the start and making it to the end of a computation and here don't worry about that i should have remove that but the main thing is that we here i do the look why is it fast well because you know what happens is that think of it as like two different code paths wine is going to start and do like the first part of a picture and the second one the second part and I change back to making the beginning and end of set and we do the computation in floating point here and into the velocity engine here and then when I'm done I signal the semaphore remember what I showed you like in the program okay if we could switch back to the slides okay some recommendation don't so you know the empty you need to the stuff i showed you when you like you to initialize like no MP library and company Murph like jobs and submit create active attacks don't do that you know for each time that somebody request to do an action that's going to close your credit could do that in your main you know when you start the program and then when your program quits just clean after yourself don't recreate an Aquino we use in this case I do that because I wanted to have an exemplar why use like you know a trade for instance and we'll show you the results with only two CPUs but in the typical case you would not want to like you have this overhead so do be data-driven and what I mean by that is that think of it as do you setup your memory management create urs and then your thread should be actually the routines that do the real work okay we don't want to start to be in a thread and wait for cabinet in somewhere like be notified by another thread you want to use video threads for doing like you the data crunching you know when to sit in there because then what's the point of having a thread is like you know you're waiting for like you know something to happen from Mike another thread and you know it could be a case but the main idea is you want to like use like you know the bandwidth of like you know the g5 you want to do the thread have the threads of the data crunching so before you spawn the thread you know set up all urs set up the memory and all these things you need to do to let make the data crunching attractive and then you know when you come back where do the closing or windows before that's right to that do the cleaning you know a cleaner the memory but you have allocated for the thread and that kind of thing okay so let's go back what happened in the 3d case you're going to love that side remember we create we create the semaphore okay this your state 0 that's good we found the two threads ok so we're number one friend number two and then word when we went on the semaphore but what happens it would block okay as I told you before but fine okay or initial goal was to you know really get the data crunching wanted to do that operation as fast as we could and we didn't want to like reality of a whole application so let's say now step number two your application with a step number one that we still the first step okay let cool your customers are very happy because some operations are like you know like up to the two-point times faster that's good everybody's happy but now let's make the whole experiment better and let's try to the cactus read the whole application what should we do in this typical case well that's going to be the easy one but what we're going to do that will create a semaphore when like you know in concealment that word that's good we can stand the thread but does the wedding remember waking up before back routine I had something that was doing the weight with the two curves to wait for completion what I'm going to do now is that that code I'm going to put it in a routine and I'm going to spawn that routine as being like you know one thread okay so what that traders is that we know where it sits down right now it's like just waiting then what we're going to do that going to spawn the other two thread thread number two or number three and then when we do that what happens well we go back to the main event loop so not careful because that means the user could actually go back into the head benchmark again and the thread could still be running so we maybe we'll go back to that first part of a presentation what I said hey guys you have to be careful if you want to thread the whole application you can have to do some thread management it's possible okay it's just I want you guys to understand that there are different steps in different ways where you can program thread the application so let's say we did that work well very cool and what happens when it works well we're going to signal remember the signal is going to bump actually 21 in this case like the semaphore then what happens is that the NP wait on semaphore in our routine because you know once again the same as for its global is going to actually come back so like you know that code doesn't block now we're block on the number 2 MP sigler semaphore is actually closed inside 20 number to the state changes to one and then what happens then p we're on semaphore is going to turn it back to zero at that point in time we have a thread that tells us that thread here thread number one says hey my other two threads I've done doing the work now what do we do we have that but we need to identify the main event group remember we have to tell the event loop hey I'm done so how we going to do that when very cool there is a very nice karbonn avantco which is called post advanced too cute and what that's going to do is that we're going to create a carbon event by sea to the CPI that's going to send it to the event manager and doing that manager is going to dispatch it to a main thread okay this is what you're going to do and you want to update UI for instance okay first event took you very good you passed the carbon event you just need to install carbon evan handler the carbon evan handler could be installed on a window on a controller widget an interview on the application it's up to you a lot of flexibility and then inside my application the main events you get notified and then i can display my picture when we're done okay so some do then don't in that case when you start trading the whole application be careful with the UI okay it's okay to draw with quark you may have some issues depending on what you're doing but it's okay to draw with quad from different thread and we have select with some sample code on the DTS excuse me on details on all developer.apple.com website and ability to check that out and George is going to be here and can give you the complete URL OpenGL is ok as well and once again if you want to know t5 and many thank you for drawing a button for dating a scroller please use post-event 2q plus 23 is very cool because you can call back from wherever you want you create your own carbon event with your type you know like it's up to you and then you know you have your carbon event handler and your window and your application is going to be just going to be cold this is the way to do like you know user interface from different threads all right quick summary don't like this so once again thread your application wins appropriate obviously some of the exact examples I gave you here maybe don't apply to you okay don't start going you know in a frenzy and I stopped reading even if it's easy and the main idea is but I would encourage you guys to go back and think maybe for a couple of mins things like you know what part of my application is taking a long time right now what part can I do better for a user and you know once again you can have two motivations for filling the application it could be responsible okay because you doing a lot of things and you know sometimes the user can do anything the menus don't go down and you block and you want to improve that user experience but then you ever wanted for like you know in this case for instance that I showed you is like you know you're doing a lot of CPU intensive that you know maybe you're doing like the computing or like you know this huge array like matrix manipulation or you your job is to compute like you know you get an MRI and we have to find out it's like in about mrs cancer or something you get the idea and it can take that in a long time but the typical case i want you to think and think in terms of canvas gibbeh divided could i use different thread is my your code path can be executed you know independently so think about that i think a lot of developers actually sometimes just don't think about it because we're going all this feature but i think it's very very important with what our users are buying now and this new g5 that actually will think about responsiveness and high performance by threading and once again be posting the sample code probably tonight actually I figure can do it now let's fire daunting a position after that and I go into more details about that in a second alright if you want more information we have some stuff on carbon threads in the Milky processing services and adt home you know it's developer apple com let you read the slide if you're interested in cocoa obviously cookers on threading as well if you interesting in the POSIX actually pthread you can just do a man the main page is actually pretty good I mean I know if you're coming from back when I probably thinking like man page but I want with the terminal but I personally I encourage you to actually check it out it's a very very good start when you look for information the Dow insidious repository if you really wanted but technically I think the open group or collection pretty good news and updated information on pthread okay we have some technical notes the 2020s of a technical architecture and we are actually a technology with the MPSF regime