---
title: WWDC2004 Session 646
framework: wwdc
role: article
path: wwdc/wwdc2004-646
---

# WWDC2004 Session 646

## Transcript

Kind: captions Language: en all right welcome to Apple solutions for science at W see for 2004 thank you I'm bud tribble vice president for software technology and what I'd like to do is go through a few of the trends we're seeing in scientific computing and for those of you who've been watching Apple and scientific computing over the past few years it's been an incredible explosion for us we're seeing every year more and more scientists adopting the Mac and I occasionally attend scientific conferences and you just look around the room and see how many people are using the mac and these days a typical bioinformatics conference maybe 30 40 plus percent of people with macs so it's really gratifying to see my original background in science and so I sort of warm up every time I see I see a scientist using the max I know that they're there just being a lot more productive trends that we're seeing and any of you who are in in the scientific community there'll be nothing really surprising about these because i'm sure you are seeing them to every day the first is exponential growth of scientific data next is clustering for cost effective performance and for those of you who are running clusters you know this already but if you're not running a cluster a cluster is probably soon in your future strong focus on application optimization and getting to results more quickly is what science is all about getting your results published quickly tuning your app so you can get crunch through the data more quickly can be incredibly productive and I'll talk about that a little bit ease of deployment administration if you're a scientist your job is not to tinker with the computer your job is to do science and one of the things Apple does best is ease of use and in the case of scientific computing that includes ease of management ease of administration ease of setting up portable UNIX taking your your laptop or your powerbook with you with your complete environment on it you know scientist very mobile pop relation there on planes going to conferences being able to take your environment with you is incredibly productive and Apple does a great job at that I'll spend a little bit of time on 64 bit and 64bit I think you're seeing the first instance of that with 64-bit address space in Tiger it's in your preview release there's a 64-bit compiler there I think this is going to be a big deal for the scientific community and then finally open standards-based tool development and let's just get started so exponential data growth this graph shows over 18 months the top curve is the size of the genomic database that's out there available on the web and the lower curve is actually Moore's law you can see that the growth of data and and you know bio informatics is just one example of where this is happening but the growth of data is huge luckily disk storage is exceeding Moore's law in terms of what we can offer you in terms of disk storage and I was saying yesterday we're down to about three dollars per per gigabyte with the xserve raid so this kind of changes the equation in fact it also changes sometimes the algorithm to use the fact that you can have huge amounts of storage on line one of the one of the trends it's not noted as a trend here that I'm noticing is that many problems are becoming more accepting more amenable to a brute-force approach that in the past would take you know very arcane algorithms and these days you don't spend a lot of time if you can just brute force it because the hardware and the storage costs have come down so much interesting case study here swine burn University University Center for Astrophysics from supercomputing in Australia they chose xserve raid for price and performance now they have over 13.5 terabytes of Astrophysical data accessed by a 130 node cluster they have over one terabyte of data generated daily so this is it this is a huge amount of storage a huge amount of bandwidth being being consumed xserve raid is connected by fiber channel to their to their server cluster and a quote from professor Matthew Bale says the performance of xserve raid is quite exceptional it can easily handle sustained read and write operations at 100 megabytes per second on a single channel which is twice as fast as the previous generation rate equipment that they were using so xserve RAID with fibre channel and now if you need a SAN storage solution Apple has that too with xn really a complete very cost-effective solution for performance let me first storage clustering and you know clustering is just exploding in the scientific community there's a move from supercomputer custom architectures to clusters fairly inexpensive one used systems xserve we actually offer a special configuration of xserve specifically for building clusters so it's sort of a stripped-down version a little bit less expensive than and the licensing is sort of tuned for building clusters I forgot exactly what we call it but it's on the web page combine that with with inexpensive storage with extra raid we have a product called ex grid ex grade 10 it's on your on your disk you get it with with tiger it's included with tiger and it's it's a grid computing solution distributed computing for the rest of us it's produced by our advanced complication group led by dr. Richard Crandall it's it's basically an easy way to submit and run computational tasks as you know grid computing is typically amenable to situations where you're even using spare desktop cycles around your institution so grid computing lets you run tests on computers you don't necessarily manage our own versus clustering where everything is managed very very closely ex grade supports as I mentioned either dedicated manage resources or ad hoc resources where someone just offers up some of their desktop cycles X period handles the hard work of connecting the nodes into a cluster monitoring the node activity scheduling the tasks on the nose copying the executables and data to the nodes and staging the output and collecting the results and most recently with x rayed 10 we now have MPI support so this is this is a grid computing for the rest of us there are other solutions commercial solutions and otherwise on the market but this is one with the sort of Apple added value of it just works out of the box incredibly simple nodes can use rendezvous to find each other on the network so a very simple easy to use grid computing solution I'll mention a Princeton Center for the Study of brain mind and behavior and this group is using clusters they have a 64 node cluster one head and 63 nodes in the cluster is doing computation they do brain activity mapping and neural net simulation so what they do is take MRI scans and look at the spin DK to figure out what the blood flow is through the brain and have the subject do different activities I don't know watching watching movies or whatever and look at what parts of the brain get activated in terms of blood flow they also do some neural net simulation the data sets from a single MRI scan will be 2 to 10 gigabytes so huge data sets that are dealing with on their cluster they use a variety of applications MATLAB inquiry episode suma brain Voyager this is a mix of both commercial applications like MATLAB as well as open source applications so kind of the full combination of applications whether proprietary open source is available on a Mac cluster one of the things they found in moving to this cluster was that a single g5 at two gigahertz was up to ten times faster than previous SGI origin their previous SGI origin so this was a huge step up for them the other thing they notice is that they had been running in their server room dell poweredge they had a couple dell to dell poweredge servers running and they brought up 64 node matt cluster and they were standing there listening to it and seemed really really loud and it turns out that all the noise was coming from the dell poweredge the dell poweredge was way louder than 64 1u nodes from us so interesting advantage there by the way that I'll just go back with you know that that is a big advantage for g five clusters in that the power and cooling that you need for a GFI cluster is actually we do a lot to make sure that that is optimized and power and cooling can be a major cost in deploying clusters easily up to twenty thirty forty percent of the cost of deploying a cluster can actually be power and cooling so you know when you when you price these things out one of the things you have to look at is that aspect as well we do quite well there I mentioned application optimization and this is an area that we spent a lot of effort in providing tools Xcode itself of course the key there is fast turnaround so that you can modify your code and make it faster and try it out but we also have a number of tools that plug into Xcode and can you be used with Xcode one of those is called the shark set of tools shark allows you to do kind of extreme profiling of your application find out exactly where the bottlenecks are where the application is spending its time so you can focus in your efforts on either redoing that algorithm or you know hand coding that inner loop or whatever it takes to get that code running as fast as possible we have shud tools the HUD HUD tools are kind of the next level down at actually looking at what's going on in the instruction pipeline at the instruction pipeline level at the cache hit and miss level so you can really optimize that inner loop and that's very important because as you as you do this sort of activity you find that there are a huge performance game we also have third parties that have tools that help you optimize so IBM now supports xlc and XLF they're highly optimized c and Fortran compilers for 4bx for the Mac os10 with the g5 there's Crescent Bay which is an auto vectorizing compiler and then nag with nag where they're optimizing for trying to polish so a ton of tools to help you get them squeeze the most performance out and just an example of the kinds of improvements you can see by doing optimization Pine mall who i'm sure a lot of you are familiar with molecular visualization tool war in delano achieved a four hundred percent speed up using shark for profiling in just a few days so zeroing in on those bottlenecks and and fixing those algorithms are fixing that code four hundred percent with a few days investment hammer hidden markov modeling for protein sequence analysis eric lindell in just a few hours using the chudd tools that i mentioned achieved six hundred and fifty percent speed up so the message here is the tools are very very simple and easy to use their there on the disk if you're using the g5 or the g4 you know at all for for performance computing take the time to run the tools through the optimization it won't be a huge investment in your time it'll be a huge payback in terms of performance ease of deployment administration so Apple has a number of tools that make administration of especially clusters very easy very simple we have as I mentioned X fan for storage area network solution and we also offer our workgroup cluster for bioinformatics which is that box shown in the corner there which is a pre-configured cluster with everything on it you need for bioinformatics preloaded with applications it's kind of you know a turnkey solution to get your lab up and running on a clustered system for bioinformatics so the kind of the ultimate and ease of use portable UNIX I can't can't stress enough the niceness of this if you're in scientific computing you've got not only you know a portable version of your complete system that you would use in your lab with all the tools whether it's perl scripting or or xcode or the optimization tools i mentioned but you've got the panoply of things you need to keep connected so you know i chat i chat AV as we saw yesterday Safari mail file fault is an interesting one if you're in especially if you're in the commercial market or government market you've got sensitive data on your system you know you can you can encrypt your home directory you can crip the SOT you know another volume on the system and make sure that if your if you happen to leave your powerbook on the feet of a taxi that that data doesn't fall into the wrong hands VPN for communication back to your home network SSH etc etc you saw the the new airport product yesterday that's going to be great for people in the hotel room you just plug that in when you get to the hotel room and you know it's about the size of the power module for them for the powerbook and you can take your laptop around the hotel room share your network with other people in the hotel rooms you want to only if it's legal example of you know the power of taking UNIX with you dr. Jamie Kate at Berkeley does crystallography and he had a large set of applications for crystallography that that ran on UNIX workstations and when he left the lab he had to basically leave his research behind that's no longer true so what he says is Apple's powerbook g4 and Mac OS 10 have allowed me to use the same tools on an airplane on my way to a conference that I could only use before my lab work station so for those of you who've not had the pleasure of powerbook and taking it on a plane and using it I'd highly recommend that I mean it's been a little bit of time talking about 64-bit computing so 64-bit computing is not necessarily for everyone but if you have massive data sets that you need to iterate over as part of a problem you're going to find yourself in need of a 64-bit address space and you know today you can with panther you can put a gigabytes onto an x5 onto a mac OS 10 g5 system but an individual process can only use up to 32 bits of address space and in fact you can probably get up to about 2 gigabytes of address before things start to bump into bump into limits with tiger you will be able to build compile and build a 64-bit address space application so we have a 64-bit version of GCC that will compile 64-bit code we have a 64-bit version of libsystem we're taking a staged approach here so we did lid system first so that's Lib C and and Libya other libraries those are converted to be 64 bit versions the libraries you're compiling against those libraries we compile 64-bit well you don't have are the GUI libraries so for example Coco has not been or carbon has not been converted to 64-bit so what you will do to leverage 64-bit with Tiger is build a computational section of your application as a single process running in you know up to eight gigabytes of memory and that process will communicate with a front end of your application that if you have a need for a graphical user interface and certainly on a cluster system you know the code running on the cluster nodes typically does not have a GUI anyway and so that's that's a great fit eventually in the future will expand the number of 64-bit libraries but for tiger it is confined to lib system in the non libraries and specifically targeted at scientific computation like very large modeling and simulation things that really need to have a full 64-bit address space we're using what's called LP 64 that means that longs and pointers are promoted to be 64 bits integers int will stay at 32 bits this is the standard for UNIX systems so if you have 64 bit code running on other UNIX systems or Linux it should be easily portable to Mac os10 the compiler has been outfitted so that it will if you turn on warnings it'll give you complete set of warnings if your app is not 64 bit clean for example if you're depending on on specifically on sizeof int or the size size of a pointer Mabel it will flag that for you so I highly recommend that you if you have large data sets you take a look at the tiger preview release try out the compiler one caveat to note is that the binary format for 64-bit apps 64-bit executables will be changing with the final release of tiger so if you've compiled something for the preview release you will have to recompile once the final tiger comes out 64-bit apps run run right alongside right alongside 32-bit apps so it's it's flagged in the executable whether this is a 64-bit executable or 32-bit executable in fact you can build your app fat if you want so that you can launch either as a 64-bit app or is it was a 32-bit out example where this might be useful so vertex pharmaceuticals they use power mac g5 to accelerate their drug development which targets viral diseases and flam ettore diseases and cancer and tigers going to allow them to transition their critical molecular modeling application which really has 64 bits addressing requirements transition that to a g5 and quote from Josh web blogger who's the chairman and CEO Mac os10 64-bit memory management will allow vertex to rapidly interact with huge libraries of chemical structures and advance our drug discovery process leveraging open-source I can't stress this enough we have over 100 open source technologies that are projects that are incorporated into Tiger everything from Apache to pearl the Python 2 openldap the berkeley DB mysql jboss you name it we pretty much have it and those packages are included in the release so that the code runs out of the box and and more importantly when updates come out from those products we incorporate them into a system update for Mac so you get the most recent versions or the security patches kept up to date for these projects makes it incredibly easy to use these beyond the ones we package of course there's a huge number thousands really of open source packages available from sourceforge or from think and those applications are basically you know the doubling rate is about doubling every year in terms of number of open source projects available on the Mac so that's an incredible resource for you so that you'll have to reinvent the wheel if you need to get something done the first thing you should do with Mac os10 is go look see if we've got an open source package that that accomplishes the task you want to accomplish so you don't have to write code from scratch kinds of applications available include things like NCBI toolkits emboss PI mall I mentioned earlier Globus w you blast we also have a version of blasters has been optimized by Apple and Genentech AG blast highly optimized for the g5 amber so a huge number of tools available out there for for scientific computation so to summarize the trends we're seeing number one huge growth in scientific data so you can spec-d apples continue to focus with with products like x.x fan and extra raid continue to focus on providing cost-effective storage very high bandwidth to storage clustering for cost-effective performance and you know that is our strategy we have one you servers we don't make huge big iron we don't make 64 way FMP we are all about optimizing one you form factor for building clusters making sure that we can give that to people as inexpensively as possible X grid for building you know I ad hoc clusters strong focus on application optimization I mentioned the shark tools and shud you can expect this the additional performance related tools coming from Apple this is this is the way to squeeze the most performance out of your g5 make sure you're getting getting the absolute most you can get for your application ease of deployment administration out-of-the-box turn it on with work group bioinformatics cluster and you've got got a cluster you know under the desk in your office if you want portable UNIX it can't stress enough the productivity gain that you get from being able to take your your entire lab software with you wherever you go 64-bit address space and this is something that has been you know requested from us and and we are very pleased to be able to offer the non GUI 64-bit app address space with Tiger and this is an area where please if you try it out on the preview release please give us your feedback on what you're finding your the guys that have 64-bit apps and we're committed to make this that the best 64-bit system we can and then finally open standards-based tools development and there they're basically is not an open source tool out there that is not at this point been ported to Mac OS 10 and and that's a huge leverage point for you to not have to reinvent the wheel so Apple really in my mind is the best platform for scientific computing today if you look at all the tools oriented around it available on it the things that we're doing to enable clustering the things that our partners are doing to enable sign to the competing in a variety of areas I really can't point to assistant today that makes a better scientific computer so thanks thanks a lot I'm going to turn it over at this point to dr. Liz care and she's the director of scientific marketing she's going to talk about apple in the sci tech market so thank you thanks bud it's really my pleasure to be here and it's a great see so many faces out here this morning what I'm going to talk to you about for the next 20 minutes is how both my team the saitek marketing team and many other groups at Apple are working towards providing solutions and awareness out to the market to help really drive adoption of the Apple platform for scientific computing me we really think this is a perfect solution and we want to help get that message out there one of the most important aspects of that sorry wrong way there we go one of the most important aspects of that is really driving the awareness and communicating both to our customers and hearing from our customers I'll go through some of the ways that we're doing that one of the most simple ways is through trade shows we've done a number of these in plan to do more this year by OIT world is one and that's what this images from our booths there also is MV which is coming up in Glasgow Scotland a big bioinformatics show and drug discovery technology which is a show that focuses more on the commercial aspects of science Biological Sciences pharmaceutical and biotech these are shows are really important to us because they allow us not just the ability to talk to our current customers but to let other people know that this is an area we're interested in and to hear from people who maybe we don't normally talk to another type of event that was doing our focus customer events where we actually go to a customer site and give them hands-on experience with some of our newer tools these events shown in the images were to promote the power mac g5 and the performance of those computers for applications and scientific computing another thing we're doing is focusing on advertising that goes specifically to our scientists this is a little tongue-in-cheek obviously this isn't an ipod and the point isn't that we're focused away from that but just that in many cases our consumer advertising overtakes what we what our scientists see and they don't think of us as a company that maybe makes computers that are really specific for the scientific market so this is an example of an advertisement that's currently running in both peer-reviewed trade journals like science and nature as well as magazines like the scientist and genome technology and it really is a great ad because it focuses specifically on the power mac g5 and customers talking about why it's great for their use we're also doing some online advertising this is another great way to reach people who maybe don't normally think of Apple and this is an online ad for the workgroup cluster for bioinformatics that bud alluded to we'll talk a little bit more about this solution later another thing that we're really pleased about is launching a science website on apple.com / science this is the homepage this is really geared towards up leveling all that information that's more technical and more geared towards both our scientific developers and our scientific customers so that they can find sort of a home for that information and find it more easily we have lots of downloads and focus on both Apple solutions as well as our third-party solutions but we also have success stories that focus on how customers are using Apple products as examples and to serve as an example for people who are interested in how they might use our technology to help solve their problems this is just a blow up because I'm what I'm going to do is focus in on a couple of these areas and dig down a little bit just to show you what type of information is there so in the upper right hand corner we're going to look at the applications for research I wanted to this one out because this is where most of the information from third-party developers no but source developers live on the website so we've got featured applications on this part of the web to raise awareness for particular applications in this rotates on a regular basis so we don't play favorites or anything we try to give everybody a chance to focus on their applications there's also on the Macintosh products guide which is the comprehensive list of all the applications that are available that run on mac OS 10 both scientific and otherwise there's also a download section so if you find an application you're interested in or somebody wants to download your application for example they can go to either the math and science part or the open source and unix part and all these have specific download sections so another part of this that's interesting I think in the resource section and here we have different we have it broken down into different categories so if you are looking for a particular type of information for example high-performance computing or soft well the software development there's a part for Darwin resources third-party products there are mailing lists and community so if you're interested in joining a mailing list or community to discuss your your challenges or throw something out there and get a response back that that's right there and there's also a lot of links to technical information you can see maybe can see on the right hand side where you can download pdf about the Apple technology we also have been doing what we call saitek initiatives and solutions and also I want to talk a little bit about how we're judging momentum that we're getting in the scientific market so one of the one of the cornerstones of this is the Apple workgroup cluster for bioinformatics we're really really pleased with this because it really ties together the the highly technical aspects of what we're providing to the scientific marketplace plus the ease of use that Apple is known for the idea is I think it's but alluded to is to take the setting up the computer cluster out of the hands of the scientists make if it's really easy make it so that they can have the compute power without having to know how to manage a cluster how to code in Linux how to do any of that they can this is geared to be some they can take out of the box set up themselves and have it running in no time we announcements at macworld in January and we are really pleased it won the Best in Show award at bio IT world for IT infrastructure this was we're just really proud of that and I think it really speaks to how the scientific community is viewing this it's really being adopted for many users mean it's a bioinformatics workgroup cluster but people are using it for biological research they're also using it for application development and interestingly they're using it to direct develop curriculum and teaching programs for bioinformatics at the university level just a couple examples this one's from the Naval Medical Research Lab dr. Michael shoot is using his work group cluster for bioterrorism research and he installed and maintained this himself he has no computer science background whatsoever his uh his favorite thing is to say all he needed was a screwdriver and he was able to set the whole thing up himself get it up and running in 30 minutes they really like the security aspect of this cluster because of course they're working on something that's very critical to to you know the security of the of the of the country they also liked having the applications which come with the workgroup cluster with a web-based interface they like having that app the accessibility of that without having to no command line because a lot of bent Sciences don't know how to how to do that it's much easier for them to have a familiar GUI interface the other thing about the workgroup cluster with a lot of our customers like and which is one of the things that that that was a deciding factor for the Naval Medical Research Center is the scalability of the cluster you can always add to this if you find that your eight nodes isn't enough you can double that or add two more nodes or whatever you need another example is from idaho state university dr mike thomas set out to design a bioinformatics curriculum for the university they bought a five no workgroup cluster what happened was they set it up so much faster than they had planned that they were able to offer their bio at informatics course an entire semester early the other thing that he had done he hired a person a head count to manage the cluster but once it was set up it was running the guy had nothing to do so because it was just going and it was working so they reassigned this person 75% of his time to do something else so they're using this to teach the very first course in bioinformatics at Idaho State University a quote from him which talked a little bit about how this bleeds over into other areas of the university is I think the cluster is going to have a huge effect in our research environment and I think it will help scientists here generate additional funding so he sees this as a way of other scientists at universities referring to this resource and being able to hopefully boost up the value of their grant applications so one of the things we did to raise the awareness of this solution the work group cluster for bioinformatics out to the marketplace is with my team in the higher education marketing team put together a work group cluster awards program to recognize innovation and research the goal was to give away five fully provisioned clusters with for dual process or extra v5 of two gigs of RAM and each comes with the software included BIOS the bio team inquiry package with over 200 informatics applications all the hardware infrastructure the power supply the cables etc and applecare support for three years this is a great thing to win the applicants were tremendous we had hundreds of applications come in from all over the US and we were just blown away by the quality and the this is the time and effort it took people took to put these together and from all aspects of research from higher ed government nonprofit as well as commercial customers I'm like I hope it's not pink on the screen because it's big there okay we'll go with pink so first I'd like to say of the hundreds we we picked five winners but we also picked five honorable mentions because again the quality of these was so incredible that we felt we wanted to extend the acknowledgement to at least ten of the applicant so just very quickly these are the five honorable mentions the first from University of Washington were they're doing HIV evolution research at Yale University dr. Kevin white doing genomic research on model organisms caltex dr. Barbara wall doing gene regulatory networks at University of Pennsylvania dr. David ruse and colleagues are studying parasites and genomics of parasites and at the Institute for genomic research or tiger dr. John Quackenbush is doing all kinds of things but also software development and a lot of genomic database work so now to the very pink winners for the work loop crust rewards the first one on the list of these are not in the order of first second third fourth or fifth they're all winners UCLA dr. Christopher Lee for doing work in comparative genomics an incredible application incredible project at Duke University dr. Simon lynn who is representing a group of scientists doing oncology research an enormously extensible project that he's looking at doing with lots of software development that would be used by the entire oncology research community at MIT we have to have dr. Edwards along for environmental microbial genomics really interesting topic very unique and at University of Wisconsin Mike Newton dr. Mike Newton he's developing statistical techniques for genomic research to really like a light show different genomic research to to really expand the type we have the types of algorithms and such that people can use for that and then finally a Children's Hospital in Oakland the research institute there dr. Deborah Dean is doing really really state-of-the-art chlamydia genomics research much more in the health care area so those are our five winners of the app will work group cluster words I'd like to stop here and give a round of applause to all the applicants and winners okay moving right along and just talking again about the momentum and awareness we have gotten an enormous amount of press coverage both from this awards program but really primarily starting when we when we launched the work group cluster for bioinformatics and started showing up at things like bio IT world and it's been really nice to see the press bhosle the Mac trade press as well as more general press and scientific press really want to hear what Apple's doing in this space and paying attention to the efforts we're making to provide really great solutions to our scientific customers I want to turn a little bit to talk about the developers and some of the work that you all have been doing I think the amount of the number of new applications that have come on to Mac OS 10 and continue to come on to Matt OS 10 is overwhelming the list just keeps growing these are for that are a relatively new either updated or new to the platform from the chemical computing group we have the molecular operating environment or as we like to call it mo matlab 7 enormously popular program for our physical science customers Gio's Fiza is a company that does the Finch sequencing center a great tool for managing sequencing labs and gene codes with sequencer another really popular program for managing sequence DNA sequence data what really drives that I think is the amount of developer support that our world wide developer group provides to our scientific developers as well as others and I just wanted to highlight a few things that that that we have an offer for our developers there are at applebees the Apple Developer connection software developer tools development tools hardware support technical support and services as well as business services and that kind of moves back into my area a little bit but co marketing programs and program discounts this is a blow up I'm not sure how well you can see that but this is what especially now this is what you would see for a particular application on our website and it just is a nice highlight with a description of the program and information about where they get it what the company who the companies that makes it or the individual these all live on apple com science they also are all in the Macintosh products guide we do press release support for developers that are doing a big release will help with promoting that this year all of our scientific conferences were inviting partners specific partners to join us in our booth to help show the solution of Apple hardware and and Mac OS 10 with some of the key scientific applications for that particular audience that we're addressing and then success stories excuse me we're not just doing success stories of our customers but we really want to focus on our developers are using Mac os10 for examples for other scientific developers to look at and use as examples for their own work so I'll finish here and this is a quote that came off the ad which I'm sure you couldn't see because the type was so small from dr. Sean Morrison at University of Michigan Michigan he said the power mac g5 is the fastest computer i have ever used i can have eight different memory intensive applications open on my desktop at the same time with no problems whatsoever in my personal opinion the system is so reliable user friendly and powerful that i don't understand why people endure pcs now yes I think I'd like to disclose by saying what what's not really covered there is is really the key of matching the really powerful hardware and operating system that Apple makes with the really incredible applications that our developers provide because those things have to go in hand in hand to provide the right solution to our scientists and I feel like it's just so tremendous to see the people here really focused on developing and working towards scientific gaps maybe just for personal use but also for commercial use because I really believe that those two things together really make the solution that help address the needs of our scientific community so with that I would like to introduce our next speaker chan peng is from the temasek life science institute laboratory in singapore they have a 75 node extra of cluster it's the largest cluster currently in asia and apple cluster and asia and he both installed it and managed it and he's going to tell you all about his work there please welcome him okay Thank You Elizabeth good morning everyone it's my pleasure to be here share with you our experience of building and using the excerpt cluster for bioinformatics intermatic live science laboratory is suitable so our group is involved in creating a computational biology division that will focus on comparing DNA between different species our current research project is the genome annotation of a cyst good wishes and the study of non-coding regions across cortes genome in parallel to a notation project we are furthering the development of workflow management software biotype to suit our large-scale cluster-based computational needs and smaller workflows students for other projects in TRL inside TRL we work actively with other scientists to provide computational biology support for the places we work with lap of reproductive genomes on the automation of filtering clustering and a notation of in-house generated sequence data and is integration with public databases the foremost large-scale projects we are doing is soon as I reading Gina annotation the genome size of the fifth grading question has been estimated to 360 million bases with approximately fifteen thousand genes the 400 million pieces of raw data delivered from sequence lab is organized into six 66,000 continuous reads we typically runs a series of programs including some well known algorithm like blast and in-house developed solutions to analyze each of these 66 secrets pieces each analysis program generally take somewhere between five minutes to two hours to complete if large amount of data has to be passed from hottest to memory the data i/o speed is extremely important for us so for the annotation projects we need to set up a cluster then can meet our requirements as listed in a slide so the cluster must be able to deliver tremendous computational power it should be easy to install and ready to extend for the future and we require high quality hardware and robust operating system that allows most of the bioinformatics tools to run without any problem in addition these applications should be optimized to achieve the best performance on the platform we also require sophisticated software to manage distributed resources and thousands of computational jobs and finally the hardware and software solution must be cost-effective this is the X of cluster we built in 2003 it has 75 x of units running Mac OS and server HX of units as Duty 4 processor 2gb memory fast disk storage and gigabit ethernet our cluster hostage more than 20 terabytes disk storage and the end across the note are managed by platform area safe so with the help from bio team and apple if we figure out a way to conduct a mass rapid installation we put up xserve unit form an external hard disk which contain a prebuilt disk image during the Buddha period a script automatically restore the image and beauty operating system on the local storage we parallel the installation with for external hard disk and set up the 64 cluster nodes within three hours so Mac OS ken is a PFC based operating system and we fear it is very friendly to the parent ematic tools or original design for Linux or UNIX this line shows the bioinformatics tools available in our trap sure most of the tools are compiled directly from a source code by ourselves although some of them need to be modified a little to cope with the difference between beer t and the linux it is not difficult if you have any some experience with c programming after the basic system is up we spend a lot of time to optimize the performance so at explained in the previous lines we focus on improve data i/o speed for each X of note we strive the two local artists to build a raid 0 set so that it provides 240 chica bites and local storage at average speed of 66 megabytes per second we storm most of the blast database a locally I note to reduce an air traffic and connect all the xserve unit in gigabit ethernet on software level we engage to find the NPI enabled version to replace the normal version if the application itself in supports multi CPU execution we instruct the users to run with prop options for example to specified a for NCBI blast so that it runs a multi-threaded mode in addition to these efforts we also optimized at the compiler level with proper GCC options a lot of aromatic tools can speed up by about forty percent if they were originally made with the default configuration different from other simple biology and answers our sis coach geno a notation involves running a series of programs for each of the 66 six pieces each step of the analysis must be automated so that the entire process won't stop in the middle biotypes is an open source workflow management software maintained by open battle community it was designed to address some of the complex issues in large-scale purge analysis our group contributed their projects and use bio pipe to manage our genome annotation project bio pipe is entirely written improve and Mac os10 developer tool CD provides all the necessary tools we needed for development this screenshot shows the job status in our cluster in April 2004 there are more than 40,000 jobs in the queue and all the hundred are running this is the situation we need to deal with almost every day we use platform RSS to manage the thousands of jobs generated by bio pipe II effectively lsf is the most robust distributed resource management software we have ever used with mac OS x server and air SF we are able to perform large scale by eorge analysis without worrying about pistons system stability setting up a cluster is a one-time task and maintenance is the administrative everyday work luckily we have a few effective tools that help us a lot in daily system administration 12 i would like to mention it server monitor it took us only two hours to set up the server monitor so that it provides an overview for all the 75 extra unit we only need to configure the monitoring server with the IP address of each cluster node in the MV MV administrative account the server monitor retrieves all the important hard-won information for us in a few seconds if we were using other UNIX system the administrator have to manually log into each node or configuration which would take much longer to complete server monitor also features hottest person a warning that is very useful for us to quickly identify the disk with potential problems and we also use server monitor to collect promote information such as children number or mac address for each network adapter another important GUI 20 is apple remote remote desktop that enable the a demonstrator to operate remote machine as if it is local this tool is need for happiness excess especially the new xserve g5 without vga card the mostly charming features of apple remote desktop is the ability to install software by dragon job simultaneously a multiple note we find this feature extremely useful for us when during class a wide system upgrade we are able to update the 64 class no to a newer version with in 13 minutes our previous experience of managing a ribbon a size alpha system involves doing updates from command line and it took us at least half day for the same task there are other command line tools we use frequently and to facilitate cluster management we are we are gratified Panther has great support for command line tools almost every happen almost every GUI application has the command line interface accordingly just to mention a few of my favorite SSH is used to log into the remote node every day bash has been set as default in Panther arcing is the call utility for data synchronization and we use the SH for distributed share so to summarize our experience with excerpt cluster interior basically the Excel unit provides superior computational power we expected the the cluster was quickly set up and we are able to run and optimize most of the barrel somatic tools the entire cluster is robust for our genome annotation project and the daily administrative work is made easy with sophisticated Mac os10 monitoring tools and open source command line tools thank you well thank you very much and I'm just going to point out a few place you can get more information while we're bringing some of the Apple people up here for QA and in terms of questions if you could please use the microphones if you've got any questions and terms of contacts Liz Robert kara our science partnership manager and Elias toopka for tll bio informatics program manager or Cheng pan senior system engineer other resources Liz mentioned the website there's the apple science websites and other related sessions you may be interested in I just want to mention specifically the science lounge on the fourth floor you should check that out there's going to be round table discussions ongoing throughout the conference there all right so let's take let's take the first question over here
