WWDC2003 Session 615
Transcript
Kind: captions Language: en [Applause] thank you and good afternoon so we're going to today at talk this afternoon a little bit on building computational clusters with mac OS x server and with ex serve and i think it's really important before we get too far into the presentation to make sure we're all here understanding the right thing clustering is one of those really powerful overloaded words that means a lot of things to a lot of people and I get asked all the time ooh you know I want to X Custer my exurbs well what does that mean and I so I think it's important to recognize that clusters are use the term clusterings actually used in two very distinct areas the first is clustering for high availability I want to take a server service and cluster two or more servers together for high availability such that if one would go down the other one takes over its place with ideally no network interruption that's something very different than what we're here to talk about today which is clustering for computational ability aggregating the computational performance of several servers together to generate a larger compute farm per se so just to be clear that's what we're going to be talking about this afternoon anyone interested in learning a little bit more about applications for high availability with xserve invite you to my session tomorrow afternoon on deploying xserve will touch on some approaches to the other kind of clustering so let's talk a little bit about we want to cover in this session and dive right into it so obviously the first thing might be why would might want to build a cluster what goals would it deliver for me how can we use xserve and mac OS x server in a cluster what are the benefits and advantages typical cluster architecture what does it mean to build a cluster what is the topology the network the basic system requirements and physical requirements needed to do that talk a little bit about deploying applications on a cluster what kind of things can be distributed and some approaches to that physical Network concerns when you bring a large number of CPUs into a certain area they're obviously requirements that go above and beyond the number of machines it requires planning on you know power cooling and network requirements we'll touch on the requirements in those areas as well tools and techniques for deploying of cluster what resources are available and what tools are available and highlight some of those capabilities so of course the first big question would be you know why clusters and you know clustering is a something we've seen really take over or really become predominant in the industry of the last year or two matter of fact we've seen a lot of success with early adopters of xserve deploying exurbs into large cluster deployments for aggregating computational power and it's really quite simple the fact is that the kind of problems that researchers in many different domains are trying to solve our significantly greater computational challenges than a single CPU can provide so you could you know have even the fastest cpu sit insuran on a very challenging problem but it's going to take in an incredible amount of time you know what better way than to aggregate many machines to tackle that problem the other challenges computational capability is not growing nearly as fast as the need to process data and you can look at any number and we'll give some specific examples any number of fields where we the sheer fact that the only way to solve these problems that are being a challenged today is by throwing more CPU horsepower at it the chips are just not getting fast enough compared to the the challenging problems that we're trying to solve the other reality is that you know the typical work horses in the compute space previously which were very large you know 16 32 64 way SMP boxes are incredibly expensive and complex you know it's not always easy to throw very large you know quote-unquote supercomputer type hardware at these kind of problems and it's not very inexpensive either fact is that you know small you know quote-unquote commodity hardware clusters are beginning to take significant ranks in the top supercomputing ranks and this is just a trend that's going to continue the other advantage of clusters is that and we'll talk more about this is that it's very easy to scale a cluster to both budget and problem so you know based on a given budget you can you know very granular lee add more computing power by adding more compute elements to a problem and of course it's also easy to scale out that cluster based on the size of your problem and so if you need more computational power add some more modular servers to your cluster deployment so it makes it very flexible in these areas some example applications so take some of the more obvious ones first you know image manipulations rendering compositing the only way these digital effects that are being done in theaters today are being generated is by massive computation behind the scenes and again clustering is a way to achieve some of these goals simulation you know biology simulating car crashes airplanes in financial models of the markets is something very well mapping out to compute cluster environments and of course one of the mainstays in the cluster market some of the areas where we're seeing some of the biggest strongest earliest successes of ex-service is in the life sciences with genomics and other kind of life science analysis let me give you some more dramatic examples so so Pixar's Toy Story they have an amazing little website up on their pics are calm website and there's a whole category called how we do it you know behind the scenes and on that site they talk about what it took to build Toy Story 2 and the example they give is the average frame in toy story 2 took six hours to render and I'll stress average because they also highlight some of the more complex work took closer to 80 hours of frame but just as an example six hours of frame if you do a little bit of quick math you multiply that by 24 frames a second time 60 seconds in a minute x 92 minutes in the feature film that turns out to be around you know shy of eight hundred thousand hours or you know roughly 92 years of computation behind that and that's you know just for the rendering piece of the film so you can imagine that the only way to deliver this kind of result is with massive computational clustering power behind the scenes and you know it's also interesting that this is toy story 2 which is a number of years ago I can only imagine what Finding Nemo took to render look at another example is in the genomic front life sciences environment the fact is that the data that needs to be analyzed is growing significantly faster than Moore's Law so you know processors aren't getting as fast as the data is growing so again Moore's Law you know processor speed doubles every 18 months based on this is data from genbank NCBI roughly in the same time frame that member Lee the amount of data is growing by the order of 8x so again the only way to do this is aggregating computing power so let's talk a little bit about X serve so why do I do we think xserve has a story in the computational space well a couple different reasons the first and foremost is you know in the one you for fact we're able to deliver quite a bit of processing power in a small form factor when we couple that with velocity engine in the g4 exer becomes a very potent machine for computational analysis now I'll save the question now which is you know the g5 question with the xserve a lot of people ask me that I'm sure that will come up in Q&A so i'll mention it now that not able to comment and answer that question for you today but anyone who's seen the g5 machines when they were here earlier in the week the heat sink is rather significant and so have a little bit of a challenge to get this in an xserve but it will be an interesting challenge regarding mac OS 10 at michael's 10 is a tremendous advantage for x in the cluster space in that we have a very powerful open source BST core UNIX operating system that can leverage the major open source projects and compile major applications it out in industry and yet take advantage of an operating system that's very easy to deploy and easy to maintain this is a real highlight of X server Mac os10 server and we'll talk actually some of the the ways you can very easily and rapidly deploy Mac OS 10 servers and xserve in this space and finally remote management you know rack mounted servers are meant to live in Iraq and not necessarily with the system administrator in front of the rack at all the times and clusters are even more so in that environment and so having powerful remote management tools is essential to be able to maintain and monitor the status of a cluster finally when we introduced the xserve there's a big request for from customers on an xserve that was really streamlined for compute needs and based on that customer request we designed a special configuration of the xserve especially optimized for this so this is our compute cluster node and actually you can see several units in the rack up here on stage this machine is against streamlined optimized just for compute tasks and just to highlight the two configurations in the differences you know again the the current what we call affectionately the slot load xserve which was the the speed bump to exer we introduced in the February time frame we offering two standard configurations single a dual processor 12 for hard drive bays up to 7 20 gigabytes of storage dual Gigabit Ethernet cd-rom or combo Drive VGA standard and of course mac OS x server software preloaded ready to go with unlimited client license the compute node on comparison is again the streamlined version with always two processors again optimized for compute a single drive bay so there aren't even you know blanks actually a different facial on the machine on board Gigabit Ethernet no optical drive no really obviously you don't want optical and video and every single unit in your cluster and mac OS x server unlimited i'm sorry 10 user license with that machine since obviously it's not being driven as a file server 10 users provides all the remote management tools that are needed for that down machine let's talk a little bit about a typical cluster deployment what is it you know typically look like when we deploy xserve in a cluster environment and so there are three main pieces in exergue cluster deployments start with the will call the top which is the head node the head node and actually in the cluster we have here is built upon this model is the top machine typically this is a full xserve with multiple drive bays this is really the machine responsible for managing the cluster obviously it typically runs the software to distribute the load manage user access provides storage to the compute elements in the cluster that could be through internal storage or through external storage and again in this example here we have onstage and extra raid providing the storage for that head node one of the most important pieces in a clusters of course the interconnect network so typically the head node is the only machine that you have connected to your put you know campus or corporate network production network per se it's typically the only thing that's directly accessible to end users on the network the interconnect network actually connects the head node to the compute elements this can be done in a number of ways in most traditional fashion is using ethernet networking technologies of 100 megabit networks or gigabit networks for added performance we have mirror net capabilities on Mac OS 10 and mac OS x server so marinette is a high-performance pci interconnect card with extremely low latency between nodes and for certain kinds of applications this is very very important and so Mira net is an option available and so there's also an accompanying mirror net switch that allows that interconnect and actually firewire becomes very interesting for smaller clusters as an interconnect since for the couple cost of a cable you can change several Xers together especially with firewire 800 built-in on xserve an extra xgy on the back of the xserve that when we introduced firewire 800 we made sure that there were two ports of firewire 800 on the back so that we could chain down the back of a small cluster and have firewire connectivity now firewire has a lot of interesting properties as an interconnect network it has extremely low latency it has DMA capability between nodes which means one one node can DM a memory out of another node without any processor intervention by the secondary hosts we also have in the latest revisions of mac OS x server have added IP over firewire so we have the ability to run standard IP networking in any application that takes advantage of standard IP stacks can take advantage of the firewire network built right into into mac OS x server so this becomes very interesting again for the cost of a firewire 9 pin 2 9 pin cable chain down a small cluster you know somewhere between four and eight machines this is really ideal no additional switching costs and of course you know the workhorse of the of the clusters of course the compute elements so you know any number of compute elements can be added to the environment and scale that out to your specific tasks and problems so the way we see this deployed with extra hardware is of course an xserve standard server configuration as the head node providing storage and network access optionally an xserve raid for up to two and a half terabytes of raid protected storage again through fibre channel which are represented by the heavy white line an extra of compute node compute node configurations and of course in this particular example a 10100 ethernet switch so if we look beyond the hardware now let's talk about some of the other issues that we have to deal with when we look at deploying clusters which are some of the most interesting things from a excerpt perspective which is deployment and management so obviously the first things we have to look at our physical concerns you know where is this equipment going to live what are the power environmental networking requirements and also logical concerns you know software installation configuration management so it's take each of these individually so let's first talk about power environmental so xserve actually has quite an interesting advantage in that with the g4 processor it has quite a low power consumption and heat output compared to other competing processors in its class so when we look at the exurbs off the back of the data sheet we rated at 3.6 amps at 3 45 watts now this i should add is a for fully fully loaded system if you stuff every possible thing you could in the box and thats actually has margin on top of that what we've actually found is that when you actually measure real-world usage when you match the processor at one hundred percent running velocity engine optimized code really leveraging the processing power out of the xserve you're going to really have to work hard to draw around two amps at 134 watts so the actual real-world power consumption is actually much lower than the actual system rating and we actually publish these numbers in our knowledge base k base info apple com and we also propose bt's per hour which is just shy of 460 BTUs an hour for a dual processor system so for one system that's that's not too much of a challenge you can pretty much plug an xserve in just about anywhere and not have a problem but what happens is when you rack a whole bunch of these in a space and multiplying these out you've got to really take into account both power and heat requirements so you know if you multiply those by 16 you start getting nearly 60 amps and about 50 500 watts of power obviously much lower than that in real world sumption but you do have to plan up for start-up currents which are actually close to rated consumption and one of the one of the things that you can do is actually stagger the startup in the cluster to prevent maximum current load at the power up time and of course you know you know over 7,000 BTU an hour so from a environmental space making sure you have adequate cooling for a room that's going to host a clusters of course critical so you know the big things here is of course that obviously you need to plan for these requirements you know if you're you know ups becomes critical and being able to plan appropriately for that one of the strategies that we're starting to see is typically the most important element in a compute cluster is providing power and backup power for the head node in the storage so very rarely do you actually put all the elements on protective backup power since if you have resource management software any terminated computation will be restarted automatically when an element's available so the head node is really the critical piece in this whole puzzle so the next thing becomes an issue of the networking you know what what kind of problems do you have and what kind of interconnects are required to be able to manage this cluster and so one of the big factors becomes whether how much I owe it does the particular compute task do and whether you get the bang for the buck deploying Gigabit Ethernet as a as an example across a series of compute elements if there's heavy I oh there might be dramatic advantages to adding that kind of network behind the scenes other types of compute jobs don't require that it's very compute focused and sits and processes on a small amount of data and a great example that kind of a more dramatic example that is for example study at home right if you've ever run the study at home client even over a very low speed modem connection it will download you know a block of data and it will sit in schwerin on it for hours so the having a high-performance network back to the machine doling out to work is not real critical however other problems that's not the case the other question becomes that of latency a lot of computational problems are what you might call them Barisan a parallel in that you know a job can be sent out across a whole bunch of nodes and with no dependencies on it will just sit and churn and then return the results other problems have very tight dependencies that the results of one computation will get fed into the results of another computation and they'll be tightly coupled and so having a low low latency not necessarily bandwidth but low latency between the machines becomes very very critical and this is where solutions like Marinette become important because of that low latency interconnect and of course the other thing that you'll you'll always want to manage is just you know who can connect to this cluster and typically access is provided through the head node and secured through the head node so you authenticating to the head node submit your job to the head node and the work becomes to be managed from there let's talk a look at some of the logical concerns the management and this is an area where we actually have a lot of advantages with the tools that are provided in mac OS x server what's interesting is a lot of the tools that we provide a mac OS 10 server for desktop management become very applicable for cluster management and so tools like cloning tools net food and network install becomes very very valuable in in cluster management the reality is is that for a cluster you know more often than not every every compute element needs to have the exact same system image exact same access to tools and so net food becomes a very very viable way to manage and have basically a single system image across your entire cluster and actually when we add some of the headless tools that we provide out of the box with mac OS 10 server and with xserve literally you can take a brand-new xserve out of the box hold down a button on the front panel and have it net boot off your head node it really provides quick out of the box deployment for xserve of course remote management tools are essential again clusters are meant to live in iraq somewhere should be able to access them from anywhere you may choose to provide command line access with things like SSH or provide web tools so provide web interfaces and you'll see some examples of that in this session and finally user accessibility actually just touched on that you know web interface and terminal access and finally back up being able to have a backup strategy for the head node being but back up that critical data again typically in these deployments the compute elements themselves are thought of as disposable they're quick quick too easy quick to replace a reimage enjoy the head node that becomes critical on the storage that's provided there particularly the computation programs and the results so before I introduce my next speaker wanted to highlight and will kind of bring up some of the things that we typically say for the end of these presentations up to the front wanted to really highlight some of the key cluster resources that are available for Mac os10 emeka stem server you know in one of the inside check lunch yesterday someone asked about you know Apple providing an MPI stack and whether they thought that was important I think one of the interesting things I think you'll see here is that this is an area where we have actually a wealth of solutions to choose from and we actually prefer having a large number of excellent open source and third-party solutions available so if we look at some of the examples you know one of the keys of any cluster deployment is drm software digital distributed resource management and we have a number of solutions to choose from in a platform lsf some grid engine open PDF PBS pro as some of the top examples high-performance computing tools so again in the MPI stack area n pitch NP ID pro lamb MPI Linden paradise and pooch from dogger research are all great examples of solutions in this space I'll available for Mac OS 10 grid computing tools gridiron software a really excellent piece of distributed application resource Globus toolkit was actually ported to Mac OS 10 from the bio team and as a port of available from them science and research applications you'll see a demo of grid mathematical a little later in this session excellent tool for a number of kinds of computation life science applications turbo blast turbo hub turbo bench from turbo works bioinformatics toolkit which is a set of life sciences applications that have been ported to Mac os10 kind of a single click double double clickable installer for Mac OS 10 and inquiry bioinformatics that the same solutions wrapped up into a really easy deployment solution you'll hear more about that in a minute so with that said I'd like to introduce our next speaker Michael famous from the bio team michael is a principal investigator and a founding partner and is going to talk to a little bit more about accessible clustering thanks doc thanks just work again my name is Michael a penis and I'm a scientist and co-founder of a bioinformatic consulting group called the bio team one of the interesting things I learned from one of my host here at the conference is that one out of nine attendees is a scientist that true scientist here I'm actually quite impressed that the way this platform is it seems to resonate with the scientific community it's it's very well matched anyway so what motivates me as a bio informatics consultant in the morning is how to take advantage of boundless computing in this presentation I'm going to briefly talk about some of the pressures in life science computing just expand a little bit beyond what Doug talked about briefly talk about computing solutions with emphasis on clustering and then talk about instant clustering and I'll explain that as we go along quickly the bio team is a group of scientist focused upon delivering life science solutions the group but what makes us somewhat unique is that we're somewhat vendor agnostic we work with all sorts of platforms including Apple platforms as well the principles of bio team have been working together on several projects over the past few years and most recently a great deal of projects with Apple this list here shows some of the clients that we've worked with and actually I bumped into representatives of some of these organizations at this conference as well so again what motivates me as a bio informatics consultant is to get my clients to think about what you could do with boundless computing that is what if cpu was not a limitation in modeling and simulation what if you had very fast access to terabytes of information and how what's the most appropriate way to visualize the the knowledge that is derived from these data and analyses in doing so the computing has to be accessible and in order to do that some level abstraction has to be defined Apple seems to be great at this in terms of for example the finder the emphasis is on the user experience as opposed to the nuts and bolts of the computing behind the scenes it's true for to enable scientific computing as well it's not about the computers it's about the applications and pipelines involved in the scientific scientific computation so what is important from a scientific perspective is quick data access reliable fast execution and application interoperability what is not so important in order to carry out science is the nuts and both of the computing you know the details of the storage how the storage is laid out or even what type of processor is used it really doesn't matter them and it shouldn't matter from a scientific perspective so there are there are many approaches to solving vast computing problems and clustering it can be inappropriate solution there are benefits to clustering as Doug pointed out and perhaps some arguments against clustering and I'm going to augment what Doug was talking about in terms of why clustering well I think one of the most compelling reasons to go for clustering is scalability because in terms of scientific research it's very difficult to forecast what you're going to be doing tomorrow so clustering inherently is a blueprint for growth you can't if you architect the system properly you can increase your computing power in step with your your computational need another compelling reason is price performance of commodity hardware we've seen that you know with the announcement of the g5 dual processor box for only three thousand dollars that that's same computing power if I wanted to buy a four or five years ago would probably be tens if not hundreds thousands of dollars for the same thing computing is getting ridiculously cheap if you look at the curve they're going to be giving it away pretty soon so the trick is how do we take advantage of it another important aspect about cluster computing in terms of architecting clusters is flexibility you can construct your architecture based upon the scientific demands of the patient in your infrastructure there are many parameters that you can tweak from a hardware perspective for example as that was pointing out there are network alternatives and that decision is based upon the applications that are within your workflow also there are storage options as well do you take advantage of local caching on the individual nodes or do you some kind of network or sand available storage and even the processors that are some applications may take advantage of different processors or accelerators better than others and I think one of the an interesting flexibility compelling reason is clusters kind of transcends of a single vendor solution you're allowed to build a cluster of components that are optimal to your app to your workflow as opposed to a single vendor providing everything which some of the components made the ideal some may not reliability is another reason for clustering and this may not be so obvious it does pointed this out a little bit but with careful architecture of a cluster careful identification of single points of failure your cluster can have extremely high availability high uptime the architecture can be such that if a compute element dies you just pop it out like you would replace a light bulb in your home you wouldn't have to shut down the grid of your home and unsolder the light bulb inside or a new one in you just unscrew it and plug it in but clustering is not appropriate for all applications or all workflows or or types of scientific research not all applications math too loosely coupled architectures an example of this could be relational database engine another reason why clustering may not work out very well is management complexity is if you don't take careful attention to the initial architecture the the effort required to maintain your system may scale with the number of elements within your cluster that's a sign of failure I think one of the more compelling reasons like clustering it can be difficult is user application complexity you may have a really good application that one's great on a single processor but you need a thousand times that how do you break that up and run it in parallel really depends upon the application the tools that we can use to construct that there isn't a silver bullet that you can use to automatically paralyzed an application it still takes special skill to do that reliability is also on this list because of that the architecture is not correct you can have and you don't pay attention to single points of failure it may be very difficult to maintain also achieving high utilization seems to be important consideration people it comes back to the user application if you don't get the degree of parallelization necessary to utilize the clusters engine you're kind of wasting all your computer elements this is a different topological view of what Doug showed earlier in terms of the we call it the portal architecture and again this is a great way of abstracting the computing resources from both from an administrative perspective as well as a user perspective neither administrator or users are allowed to access the individual nodes within the private subnet in which the cluster elements reside on and because you do this then the notes become anonymous that allows you to replace them if if something goes wrong so I mentioned scalability again I think scalability is a crucial issue in that clustering can address but there are many characteristics of what scalability is scalability in terms of quantity of data that you're distributing on the cluster scalability in terms of number of users that are hitting the system all those have to be addressed in terms of architecture I think one of the more significant components of achieving scalability is fault tolerance how is your cisco systems going to be aware of some kind of adverse event within your cluster the flipside of fault tolerance is automation how we going to respond to that adverse condition so that you can continue to process so you don't need extensive management or monitoring capability to ensure completion of your workflow okay so i wanted to contrast that two different approaches for computing the mainframe SMP monolithic approach compared to the clustering approach so again one thing that's going against the clustering approach is application complexity it's definitely more difficult to make full usage of a cluster than if you had a SP type machine over what's going against the mainframe smt approach the computing is the upfront cost a mainframe type system with comparable compute power of a cluster can be about 4 to 20 times more expensive as I mentioned a cluster architecture can give you better scalability but countering the up up for cost of the mainframe system is the total cost of ownership of maintaining that system if you don't architect your cluster correctly then it can be extremely expensive communicating so if you can address the application complexity and the management complexity then the clustering solution can be very compelling ok I'm going to switch gears a little bit and talk about the bio team inquiry the inquiry was 10 an award last night an apple designer award in the category of server solutions so what is inquiry the concept behind inquiry is instant scalable informatics just add hardware so the concept is to provide a full functioning informatics solution and you start out with an empty cluster and the trick the fun part is that we can do this in about 20 minutes so what we do with inquiry we deployed many clusters many types of clusters but there's some common denominator that we see in our deployments we've taken these best practices in terms of network configuration OS configurations various optimizations deploying the right administration tools monitoring tools but we don't stop there that if we stop there that would be a fine cluster that you can use from an IT basis but we go beyond that our goal is to enable the scientist in this case the bio informatics scientist the inquiry cluster is loaded with more than 200 open source applications which are all clustering abled and we provide a consistent user interface a web interface to all these applications and on top of that we deploy about 100 gigabytes of genomic data so as soon as the cluster is up you're ready to fire so the idea is to go from the many computer concept to a single virtual computing resource that is usable by a scientist but we go beyond that like I said before it's we don't want to necessarily trained scientists to become computer scientists we want to empower them to go from to extract the command line into something that is more accessible so inquiry is a orchestration of many open-source tools and utilities just to mention a couple of the components behind inquiry the first one is pies pies is a very cool tool from the pasture Institute and the heart of pies is essentially a collection of XML documents describing a bunch of command-line bioinformatics tools you know starting from this set of XML documents we can render an interface whether it be a a web interface or a web services interface so now that we presented the application we connect the the execution of that application to the cluster using Sun grid engine or we can use platform Alice F and that's all completely abstracted away from the user in integrated within inquiry we've also deployed several monitoring and administrative tools that are commonly available in the open source domain for example this is ganglia and it provides a very nice snapshot of your cluster and allows you to drill down to get more detail of the health of various nodes within your cluster in addition we provide another perspective of your cluster from the load management system so how are jobs running on your system what who's running jobs and what jobs are pending in and everything from the user jobs mission perspective because we're using pies we have a great deal of flexibility in terms of how the application interfaces are presented to the user for each application within enquiry you provide two interfaces a simple view which gives you just the bare bones of what you need to in order to execute that application in addition we provide an expert view with all the bells and whistles of the excellent that application along with complete documentation for each of those flags for each for every application within inquiry also with an inquiry we manage results that are generated from the various applications results calculated at different times are accessible and can be retrieved in either pipe into other applications or just we examined okay so one of the funny fun things about inquiry like I said we can deploy this within 15 or 20 minutes and this is inquiry we deploy it on an iPod okay and the idea behind that is first we take the ipod we plugged it into the head node and we we mount the ipod and run an application called the cluster configuration tool as shown here in this tool provides a way of setting the number of nodes in the cluster external IP addresses just the external things about that is needed to describe that cluster I'm step two in configuring your cluster is then you boot off of the ipod from the head node and that takes about five or six minutes and when that's done images for the entire cluster are loaded onto the head node you're essentially done with the ice at that point in the third step is to boot each individual node of the cluster from the head node and that can take anywhere from a few minutes to up to ten minutes because depending upon the network is there pointed out that you've deployed within the cluster but that can be done in parallel so when you look at the aggregate it takes about you know 15 20 minutes before you have a fully working cluster and that's it [Applause] relations on your award I'd now like to introduce Theodore gray director of user interfaces for Wolfram to talk a little bit about grid Mathematica and its solution oh can we have the demo machine okay mathematica is a presentation tool so we use it instead of keynote and I just typed in my presentation isn't that typical okay so the first thing i should say is that this is actually not my talk ordinarily will be given by roger governments and our director of rd but he could be here so i'm giving the talk which should be interesting at least I didn't have to prepare it that's one plus so good Mathematica is basically the grid version of Mathematica Mathematica itself is a desktop application you can buy it it's a very general-purpose programming language and system for doing mathematics there's product called network Mathematica which is basically a network license server and there's an application package called the parallel computing toolkit which is an application pack that lets one Mathematica session manage multiple ones on a network and then grid Mathematica is essentially a marketing concept of those two together and it's cheaper per node than the regular copy of Mathematica but you can actually put together those different elements separately so basically one of the goals of grid Mathematica is is like has been mentioned several times before here to try to abstract as much as possible the details of the configuration of your cluster and sort of what brand of computer it is and things like that so the sort from the system point of view you think of a cluster in terms of you have some processors you start processes on them you schedule them and you exchange data in the mathematics of you you you think of having colonel processes Mathematica Colonel so that's what we call the computational engine and you have expressions in the mathematical language that you want to have evaluated and the sort of grid clustering element is to distribute those processes those mathematic expressions to different kernels running on on a cluster and it's sort of who try to be buzzword compliant and so we can handle let's say I guess I guess we have the same sort of general region where you have a head machine which is the master and you have these multiple ones which are not accessible from the outside world and you have mathematically handling that communication strictly between mathematical processes not involving any other sort of resource management software the system is written entirely in top-level Mathematica code which means it's completely machine independent completely platform independent and you're not restricted to any of the sort of you know see data types or anything like that you can use arbitrary mathematical expressions which which could be you know numbers arrays of numbers strings but also you know structured symbolic expressions that represent either mathematical objects or you know protein structure or whatever you could sort of a general thing so it's not just for sort of numerical or data analysis type things you can do well abstract mathematical sorts of things to the communication between processes is through math link which is our sort of high-level communication protocol it uses whichever of the underlying protocols you'd like I think in this case we have it configured to use TCP between nodes but if you have different we have devices for various different kinds of work so you can have either relatively tightly clustered things or you could have them on you know distant more loosely clustered things our sort of resource management mechanism which I'll demonstrate in a little while it supports both sort of automatics load balancing as well as the letting do that manually and you can do that which I have to admit another expert in cluster computing but it sounds great and it also the sort of concurrency controlling structures deal with a you know Colonel the dies or doesn't come back or whatever even deserve shuffle things around which will also see in a minute okay so let's actually do this we're going to start Mathematica here yes so this these evaluations are being are running on the head machine it just told us that the name of the head machines X serves 0 will kind of use this machine name as a way of telling where a calculation is going and this is a NOS 10 version so this is kind of some configuration which took most of yesterday to get right but you know that's because I didn't know how to do it the little bit about plugging an ipod and it's automatic that would be great and so now what we're going to do is actually launch all we're launching 10 kernels and that's because if there's five machines with two processes each so we're kind of putting one process on each computer so that's finished now we're going to do a little saying this this command says take this expression and evaluate it on each of the clients so you see it's returned xserve 12345 twice and each of them is a mac OS version so i should note at this point that from here on out absolutely nothing would be different about any of these demos in any way shape or form if this had returned a list of you know son and pc or linux or you know anything else there's absolutely nothing machine dependent or hardware dependent or platform dependent or anything which you know it's sort of it's a nice advantage because if you built some cluster and then you find out that oh you built a big Linux cluster but now you can get max that are cheaper per you know per CPU cycle you could just add some max to it or vice versa so let me show you some simple examples of how you actually use the parallels when we saw this so this is just running on the local machine and this says run the same command on this particular node number one and it's machines that just means run it on all of them so we can see and obviously you could put something you know more interesting you could do a Hydra factorial on each one and get that back for those of you who are familiar with Mathematica this probably will make somewhat more sense them to those who aren't but this is just showing some basic Mathematica command table build a table of expressions like this and here we're building this sort of the demonstrated table of machine name always on the same machine and now we're going to do it on farming that out to the processors now you notice that it's used the same machine over and over again and that's actually because I discovered that just a few minutes ago because this command is too fast so it's done before the load management is basically saying is done already we'll just do it on the same machine but if we slow this down a little but if we put in let's say a thousand sec torial and suppress the output still too fast all right so now if you see it's now sort of distributing a little bit more because the processes are not actually finishing instantly okay so another function is map map takes a function and applies it to each of the arguments in a list do the same thing here and again it's kind of boring because it's just too fast but you get the idea that that many of the sort of programming constructs that you have in Mathematica for building tables or for applying functions to data can be parallelized very easily and as long as there isn't you know data dependency between the instances of that function it'll just work and there's a host of other sorts of commands that are built in dot products in our inner product animation plotting things like that that are automatically or whether it's prepared sort of parallelized versions this is an example of how you distribute data and code so this is a mathematica program it actually in Rogers version is a talk it just added up the numbers 1 to N but I thought that was silly so what I had to do is add up the numbers 1 through n and then add the process ID and then take the factorial of that just so we would get a better number so we'll also it proves that I have great confidence this system because I have no idea what the process IDs are going to be so we execute this in the head machine we made that definition on the head machine and now we execute it and we get a number that that is involved with the processing the idea in some way and now what we're going to do is export this definition and that command took the definition that we made in the head machine and distributed it to all the nodes which you can do because you know it's not a seed program it's not something you have to compile it's a mathematical expression that can be interpreted by the mathematica interpreter and now we'll go and evaluate this machine name trend I just will let us see which one each one executed on and we do that and so now we have the the 10 results and you'll see each number is a little bit different because it had a different process ID okay so I'm doing for time let me skip these and this one this is basically showing the lower level operations where rather than just do a map you can actually set up a queue where you you I'm not going to go through the details here but you basically tell it the queue up these processes and then you can you know sir asked for it to wait for certain ones to finish and you can wait for a list in which everyone finishes first will return sort of like a select call if you familiar with UNIX and that allows you that sort of a foundation in which you can build your own manual more sophisticated load balancing and process managing things so now here's an example and as I mentioned this is actually Rogers pocket I don't actually know that much about parallel computing but I thought I would make a little example and to whip something up for the demo to see if I could do it and what this does is it recreate the keynote demo fractal that I used a couple days ago and this is the code for that fractal and so here this will run this example now on the just on the head machine as a single process and you see it goes through and this is a little sort of graphical animation progress monitor thing that I wrote a while ago as you can see it's kind of poking along you may notice it's not much slower than the g5 demo that's because I'm not computing as many points and because it's also not doing the big num calculation in the background it's not in fact the case of this is as fast as a g5 and they're to split the animation together so now let's run it on the grid and for those of you can see the light look at the light here we get very important always to watch the lights on your your pleasure and here we go now notice the first frame you don't get any faster because it takes you know they're all doing at the same time but then once it gets going you basically get 10 at a time okay so if I'm reading my clock right I really need to hurry so basically the advantages are it's much much cheaper than buying separate copies of mathematica if you buy the node it works at a completely open ended heterogeneous environment absolutely no restrictions at all as long as it runs Mathematica we have sort of high-level symbolic representation of the parallel structures and the parallel control that you need to do are the controls to get good performance it's pretty easy to take existing code and as long as it's suitable for parallelization it's easy to do that and you can do it you can do this sort of in those in the rich world of Mathematica rather than the sword more limited you know worlds of C and Java or whatever where you have to do a lot more sort of by yourself and as a prototyping environment for parallel algorithms of course it's very nice and I guess that's about it and we all have to remember to close these other words we leave things running thank you [Applause] okay well i would like to basically wrap up by pointing to a quite a variety of resources for more information so first of all there's been a lot of tracks and sessions that are relevant to this topic and unfortunate a lot more earlier in the week so hopefully you had a lot of opportunity to get to see some of these sessions this week on the enterprise IT track if you don't encourage you to watch the videos since there were some excellent sessions there will be a session tomorrow on deploying x server eight in the afternoon and i'm sorry deploying xserve tomorrow afternoon and friday afternoon on deploying xserver aid encourage you to attend those sessions on the developer slide there's some excellent sessions on development tools for the UNIX layer and performance tools and those were again either earlier today or yesterday but again encourage you to watch the videos if you weren't able to attend those sessions who to contact so again you uh for information or follow-up you're welcome to contact me again Doug Brooks my email is up there Michael and Fyodor have contacts as well and ask if eleventh who's our server technology evangelist wasn't able to attend the session but from a developer perspective he's your contacts from a server technology perspective additional resources so we gave you a list of solutions earlier I wanted to point you to a key page that we've recently put up about two or three weeks ago which is the compute cluster solutions page if you go to the apple.com / server page you'll find quite a number of solutions specific pages one specifically on clustering which highlighted all the key solutions that were referenced today again the same page can provide product information and of course information on both the bio team and grid mathematica this is an area where there's a wealth of a community support from mailing lists and so I wanted to make sure you're well aware of several key mailing lists that apple and some third parties host Apple hosts the saitek and the eunuch supporting list which are both very relevant to the cluster space and bioinformatics org has some excellent mailing lists for bio clusters and bio darwin development under under darwin you