WWDC2004 Session 646

Transcript

Kind: captions
Language: en
all right welcome to Apple solutions for
science at W
see for 2004 thank you I'm bud tribble
vice president for software technology
and what I'd like to do is go through a
few of the trends we're seeing in
scientific computing and for those of
you who've been watching Apple and
scientific computing over the past few
years it's been an incredible explosion
for us we're seeing every year more and
more scientists adopting the Mac and I
occasionally attend scientific
conferences and you just look around the
room and see how many people are using
the mac and these days a typical
bioinformatics conference maybe 30 40
plus percent of people with macs so it's
really gratifying to see my original
background in science and so I sort of
warm up every time I see I see a
scientist using the max I know that
they're there just being a lot more
productive trends that we're seeing and
any of you who are in in the scientific
community there'll be nothing really
surprising about these because i'm sure
you are seeing them to every day the
first is exponential growth of
scientific data next is clustering for
cost effective performance and for those
of you who are running clusters you know
this already but if you're not running a
cluster a cluster is probably soon in
your future strong focus on application
optimization and getting to results more
quickly is what science is all about
getting your results published quickly
tuning your app so you can get crunch
through the data more quickly can be
incredibly productive and I'll talk
about that a little bit ease of
deployment administration if you're a
scientist your job is not to tinker with
the computer your job is to do science
and one of the things Apple does best is
ease of use and in the case of
scientific computing that includes ease
of management ease of administration
ease of setting up portable UNIX taking
your your laptop or your powerbook with
you with your complete environment on it
you know scientist very mobile pop
relation there on planes going to
conferences being able to take your
environment with you is incredibly
productive and Apple does a great job at
that I'll spend a little bit of time on
64 bit and 64bit I think you're seeing
the first instance of that with 64-bit
address space in Tiger it's in your
preview release there's a 64-bit
compiler there I think this is going to
be a big deal for the scientific
community and then finally open
standards-based tool development and
let's just get started so exponential
data growth this graph shows over 18
months the top curve is the size of the
genomic database that's out there
available on the web and the lower curve
is actually Moore's law you can see that
the growth of data and and you know bio
informatics is just one example of where
this is happening but the growth of data
is huge luckily disk storage is
exceeding Moore's law in terms of what
we can offer you in terms of disk
storage and I was saying yesterday we're
down to about three dollars per per
gigabyte with the xserve raid so this
kind of changes the equation in fact it
also changes sometimes the algorithm to
use the fact that you can have huge
amounts of storage on line one of the
one of the trends it's not noted as a
trend here that I'm noticing is that
many problems are becoming more
accepting more amenable to a brute-force
approach that in the past would take you
know very arcane algorithms and these
days you don't spend a lot of time if
you can just brute force it because the
hardware and the storage costs have come
down so much interesting case study here
swine burn University University Center
for Astrophysics from supercomputing in
Australia they chose xserve raid for
price and performance now they have over
13.5 terabytes of Astrophysical data
accessed by a 130 node cluster they have
over one terabyte of data generated
daily
so this is it this is a huge amount of
storage a huge amount of bandwidth being
being consumed xserve raid is connected
by fiber channel to their to their
server cluster and a quote from
professor Matthew Bale says the
performance of xserve raid is quite
exceptional it can easily handle
sustained read and write operations at
100 megabytes per second on a single
channel which is twice as fast as the
previous generation rate equipment that
they were using so xserve RAID with
fibre channel and now if you need a SAN
storage solution Apple has that too with
xn really a complete very cost-effective
solution for performance let me first
storage clustering and you know
clustering is just exploding in the
scientific community there's a move from
supercomputer custom architectures to
clusters fairly inexpensive one used
systems xserve we actually offer a
special configuration of xserve
specifically for building clusters so
it's sort of a stripped-down version a
little bit less expensive than and the
licensing is sort of tuned for building
clusters I forgot exactly what we call
it but it's on the web page combine that
with with inexpensive storage with extra
raid we have a product called ex grid ex
grade 10 it's on your on your disk you
get it with with tiger it's included
with tiger and it's it's a grid
computing solution distributed computing
for the rest of us it's produced by our
advanced complication group led by dr.
Richard Crandall it's it's basically an
easy way to submit and run computational
tasks as you know grid computing is
typically amenable to situations where
you're even using spare desktop cycles
around your institution so grid
computing lets you run tests on
computers you don't necessarily manage
our own versus clustering where
everything is managed very very closely
ex grade supports as I mentioned either
dedicated manage resources or ad hoc
resources where someone just offers up
some of their desktop cycles X period
handles the hard work of connecting the
nodes into a cluster monitoring the node
activity scheduling the tasks on the
nose copying the executables and data to
the nodes and staging the output and
collecting the results and most recently
with x rayed 10 we now have MPI support
so this is this is a grid computing for
the rest of us there are other solutions
commercial solutions and otherwise on
the market but this is one with the sort
of Apple added value of it just works
out of the box incredibly simple nodes
can use rendezvous to find each other on
the network so a very simple easy to use
grid computing solution I'll mention a
Princeton Center for the Study of brain
mind and behavior and this group is
using clusters they have a 64 node
cluster one head and 63 nodes in the
cluster is doing computation they do
brain activity mapping and neural net
simulation so what they do is take MRI
scans and look at the spin DK to figure
out what the blood flow is through the
brain and have the subject do different
activities I don't know watching
watching movies or whatever and look at
what parts of the brain get activated in
terms of blood flow they also do some
neural net simulation the data sets from
a single MRI scan will be 2 to 10
gigabytes so huge data sets that are
dealing with on their cluster they use a
variety of applications MATLAB inquiry
episode suma brain Voyager this is a mix
of both commercial applications like
MATLAB as well as open source
applications so kind of the full
combination of applications whether
proprietary open source is available on
a Mac cluster one of the things they
found in moving to this cluster was that
a single g5 at two gigahertz was up to
ten times faster than previous SGI
origin their previous SGI origin
so this was a huge step up for them the
other thing they notice is that they had
been running in their server room dell
poweredge they had a couple dell to dell
poweredge servers running and they
brought up 64 node matt cluster and they
were standing there listening to it and
seemed really really loud and it turns
out that all the noise was coming from
the dell poweredge the dell poweredge
was way louder than 64 1u nodes from us
so interesting advantage there by the
way that I'll just go back with you know
that that is a big advantage for g five
clusters in that the power and cooling
that you need for a GFI cluster is
actually we do a lot to make sure that
that is optimized and power and cooling
can be a major cost in deploying
clusters easily up to twenty thirty
forty percent of the cost of deploying a
cluster can actually be power and
cooling so you know when you when you
price these things out one of the things
you have to look at is that aspect as
well we do quite well there I mentioned
application optimization and this is an
area that we spent a lot of effort in
providing tools Xcode itself of course
the key there is fast turnaround so that
you can modify your code and make it
faster and try it out but we also have a
number of tools that plug into Xcode and
can you be used with Xcode one of those
is called the shark set of tools shark
allows you to do kind of extreme
profiling of your application find out
exactly where the bottlenecks are where
the application is spending its time so
you can focus in your efforts on either
redoing that algorithm or you know hand
coding that inner loop or whatever it
takes to get that code running as fast
as possible we have shud tools the HUD
HUD tools are kind of the next level
down at actually looking at what's going
on in the instruction pipeline at the
instruction pipeline level at the cache
hit and miss level so you can
really optimize that inner loop and
that's very important because as you as
you do this sort of activity you find
that there are a huge performance game
we also have third parties that have
tools that help you optimize so IBM now
supports xlc and XLF they're highly
optimized c and Fortran compilers for
4bx for the Mac os10 with the g5 there's
Crescent Bay which is an auto
vectorizing compiler and then nag with
nag where they're optimizing for trying
to polish so a ton of tools to help you
get them squeeze the most performance
out and just an example of the kinds of
improvements you can see by doing
optimization Pine mall who i'm sure a
lot of you are familiar with molecular
visualization tool war in delano
achieved a four hundred percent speed up
using shark for profiling in just a few
days so zeroing in on those bottlenecks
and and fixing those algorithms are
fixing that code four hundred percent
with a few days investment hammer hidden
markov modeling for protein sequence
analysis eric lindell in just a few
hours using the chudd tools that i
mentioned achieved six hundred and fifty
percent speed up so the message here is
the tools are very very simple and easy
to use their there on the disk if you're
using the g5 or the g4 you know at all
for for performance computing take the
time to run the tools through the
optimization it won't be a huge
investment in your time it'll be a huge
payback in terms of performance ease of
deployment administration so Apple has a
number of tools that make administration
of especially clusters very easy very
simple we have as I mentioned X fan for
storage area network solution and we
also offer our workgroup cluster for
bioinformatics which is that box shown
in the corner there which is a
pre-configured cluster with everything
on it you need for bioinformatics
preloaded with applications it's kind of
you know a turnkey solution to get your
lab up and running on a clustered system
for bioinformatics so the kind of the
ultimate and ease of use portable UNIX I
can't can't stress enough the niceness
of this if you're in scientific
computing you've got not only you know a
portable version of your complete system
that you would use in your lab with all
the tools whether it's perl scripting or
or xcode or the optimization tools i
mentioned but you've got the panoply of
things you need to keep connected so you
know i chat i chat AV as we saw
yesterday Safari mail file fault is an
interesting one if you're in especially
if you're in the commercial market or
government market you've got sensitive
data on your system you know you can you
can encrypt your home directory you can
crip the SOT you know another volume on
the system and make sure that if your if
you happen to leave your powerbook on
the feet of a taxi that that data
doesn't fall into the wrong hands VPN
for communication back to your home
network SSH etc etc you saw the the new
airport product yesterday that's going
to be great for people in the hotel room
you just plug that in when you get to
the hotel room and you know it's about
the size of the power module for them
for the powerbook and you can take your
laptop around the hotel room share your
network with other people in the hotel
rooms you want to only if it's legal
example of you know the power of taking
UNIX with you dr. Jamie Kate at Berkeley
does crystallography and he had a large
set of applications for crystallography
that that ran on UNIX workstations and
when he left the lab he had to basically
leave his research behind that's no
longer true so what he says is Apple's
powerbook g4 and Mac OS 10 have allowed
me to use the same tools on an airplane
on my way to a conference that I could
only use before my lab work station so
for those of you who've not had the
pleasure of
powerbook and taking it on a plane and
using it I'd highly recommend that I
mean it's been a little bit of time
talking about 64-bit computing so 64-bit
computing is not necessarily for
everyone but if you have massive data
sets that you need to iterate over as
part of a problem you're going to find
yourself in need of a 64-bit address
space and you know today you can with
panther you can put a gigabytes onto an
x5 onto a mac OS 10 g5 system but an
individual process can only use up to 32
bits of address space and in fact you
can probably get up to about 2 gigabytes
of address before things start to bump
into bump into limits with tiger you
will be able to build compile and build
a 64-bit address space application so we
have a 64-bit version of GCC that will
compile 64-bit code we have a 64-bit
version of libsystem we're taking a
staged approach here so we did lid
system first so that's Lib C and and
Libya other libraries those are
converted to be 64 bit versions the
libraries you're compiling against those
libraries we compile 64-bit well you
don't have are the GUI libraries so for
example Coco has not been or carbon has
not been converted to 64-bit so what you
will do to leverage 64-bit with Tiger is
build a computational section of your
application as a single process running
in you know up to eight gigabytes of
memory and that process will communicate
with a front end of your application
that if you have a need for a graphical
user interface and certainly on a
cluster system you know the code running
on the cluster nodes typically does not
have a GUI anyway and so that's that's a
great fit eventually in the future will
expand the number of 64-bit libraries
but for tiger it is confined to lib
system in the non
libraries and specifically targeted at
scientific computation like very large
modeling and simulation things that
really need to have a full 64-bit
address space we're using what's called
LP 64 that means that longs and pointers
are promoted to be 64 bits integers int
will stay at 32 bits this is the
standard for UNIX systems so if you have
64 bit code running on other UNIX
systems or Linux it should be easily
portable to Mac os10 the compiler has
been outfitted so that it will if you
turn on warnings it'll give you complete
set of warnings if your app is not 64
bit clean for example if you're
depending on on specifically on sizeof
int or the size size of a pointer Mabel
it will flag that for you so I highly
recommend that you if you have large
data sets you take a look at the tiger
preview release try out the compiler one
caveat to note is that the binary format
for 64-bit apps 64-bit executables will
be changing with the final release of
tiger so if you've compiled something
for the preview release you will have to
recompile once the final tiger comes out
64-bit apps run run right alongside
right alongside 32-bit apps so it's it's
flagged in the executable whether this
is a 64-bit executable or 32-bit
executable in fact you can build your
app fat if you want so that you can
launch either as a 64-bit app or is it
was a 32-bit out example where this
might be useful so vertex
pharmaceuticals they use power mac g5 to
accelerate their drug development which
targets viral diseases and flam ettore
diseases and cancer and tigers going to
allow them to transition their critical
molecular modeling application which
really has 64 bits addressing
requirements transition that to a g5 and
quote from
Josh web blogger who's the chairman and
CEO Mac os10 64-bit memory management
will allow vertex to rapidly interact
with huge libraries of chemical
structures and advance our drug
discovery process leveraging open-source
I can't stress this enough we have over
100 open source technologies that are
projects that are incorporated into
Tiger everything from Apache to pearl
the Python 2 openldap the berkeley DB
mysql jboss you name it we pretty much
have it and those packages are included
in the release so that the code runs out
of the box and and more importantly when
updates come out from those products we
incorporate them into a system update
for Mac so you get the most recent
versions or the security patches kept up
to date for these projects makes it
incredibly easy to use these beyond the
ones we package of course there's a huge
number thousands really of open source
packages available from sourceforge or
from think and those applications are
basically you know the doubling rate is
about doubling every year in terms of
number of open source projects available
on the Mac so that's an incredible
resource for you so that you'll have to
reinvent the wheel if you need to get
something done the first thing you
should do with Mac os10 is go look see
if we've got an open source package that
that accomplishes the task you want to
accomplish so you don't have to write
code from scratch kinds of applications
available include things like NCBI
toolkits emboss PI mall I mentioned
earlier Globus w you blast we also have
a version of blasters has been optimized
by Apple and Genentech AG blast highly
optimized for the g5 amber so a huge
number of tools available out there for
for scientific computation so to
summarize the trends we're seeing number
one huge growth in scientific data so
you can
spec-d apples continue to focus with
with products like x.x fan and extra
raid continue to focus on providing
cost-effective storage very high
bandwidth to storage clustering for
cost-effective performance and you know
that is our strategy we have one you
servers we don't make huge big iron we
don't make 64 way FMP we are all about
optimizing one you form factor for
building clusters making sure that we
can give that to people as inexpensively
as possible X grid for building you know
I ad hoc clusters strong focus on
application optimization I mentioned the
shark tools and shud you can expect this
the additional performance related tools
coming from Apple this is this is the
way to squeeze the most performance out
of your g5 make sure you're getting
getting the absolute most you can get
for your application ease of deployment
administration out-of-the-box turn it on
with work group bioinformatics cluster
and you've got got a cluster you know
under the desk in your office if you
want portable UNIX it can't stress
enough the productivity gain that you
get from being able to take your your
entire lab software with you wherever
you go 64-bit address space and this is
something that has been you know
requested from us and and we are very
pleased to be able to offer the non GUI
64-bit app address space with Tiger and
this is an area where please if you try
it out on the preview release please
give us your feedback on what you're
finding your the guys that have 64-bit
apps and we're committed to make this
that the best 64-bit system we can and
then finally open standards-based tools
development and there they're basically
is not an open source tool out there
that is not at this point been ported to
Mac OS 10 and and that's a huge leverage
point for you to not have to reinvent
the wheel
so Apple really in my mind is the best
platform for scientific computing today
if you look at all the tools oriented
around it available on it the things
that we're doing to enable clustering
the things that our partners are doing
to enable sign to the competing in a
variety of areas I really can't point to
assistant today that makes a better
scientific computer so thanks thanks a
lot I'm going to turn it over at this
point to dr. Liz care and she's the
director of scientific marketing she's
going to talk about apple in the sci
tech market so thank you thanks bud it's
really my pleasure to be here and it's a
great see so many faces out here this
morning what I'm going to talk to you
about for the next 20 minutes is how
both my team the saitek marketing team
and many other groups at Apple are
working towards providing solutions and
awareness out to the market to help
really drive adoption of the Apple
platform for scientific computing me we
really think this is a perfect solution
and we want to help get that message out
there one of the most important aspects
of that sorry wrong way there we go one
of the most important aspects of that is
really driving the awareness and
communicating both to our customers and
hearing from our customers I'll go
through some of the ways that we're
doing that one of the most simple ways
is through trade shows we've done a
number of these in plan to do more this
year by OIT world is one and that's what
this images from our booths there also
is MV which is coming up in Glasgow
Scotland a big bioinformatics show and
drug discovery technology which is a
show that focuses more on the commercial
aspects of science Biological Sciences
pharmaceutical and biotech these are
shows are really important to us because
they allow us not just the ability to
talk to our current customers but to let
other people know that this is an area
we're interested in and to hear from
people who maybe we don't normally talk
to
another type of event that was doing our
focus customer events where we actually
go to a customer site and give them
hands-on experience with some of our
newer tools these events shown in the
images were to promote the power mac g5
and the performance of those computers
for applications and scientific
computing another thing we're doing is
focusing on advertising that goes
specifically to our scientists this is a
little tongue-in-cheek obviously this
isn't an ipod and the point isn't that
we're focused away from that but just
that in many cases our consumer
advertising overtakes what we what our
scientists see and they don't think of
us as a company that maybe makes
computers that are really specific for
the scientific market so this is an
example of an advertisement that's
currently running in both peer-reviewed
trade journals like science and nature
as well as magazines like the scientist
and genome technology and it really is a
great ad because it focuses specifically
on the power mac g5 and customers
talking about why it's great for their
use we're also doing some online
advertising this is another great way to
reach people who maybe don't normally
think of Apple and this is an online ad
for the workgroup cluster for
bioinformatics that bud alluded to we'll
talk a little bit more about this
solution later another thing that we're
really pleased about is launching a
science website on apple.com / science
this is the homepage this is really
geared towards up leveling all that
information that's more technical and
more geared towards both our scientific
developers and our scientific customers
so that they can find sort of a home for
that information and find it more easily
we have lots of downloads and focus on
both Apple solutions as well as our
third-party solutions but we also have
success stories that focus on how
customers are using Apple products as
examples and to serve as an example for
people who are interested in how they
might use our technology to help solve
their problems this is just a blow up
because I'm what I'm going to do is
focus in on a couple of these areas and
dig down a little bit just to show you
what type of information is there so in
the upper right hand corner we're going
to look at the applications for research
I wanted to
this one out because this is where most
of the information from third-party
developers no but source developers live
on the website so we've got featured
applications on this part of the web to
raise awareness for particular
applications in this rotates on a
regular basis so we don't play favorites
or anything we try to give everybody a
chance to focus on their applications
there's also on the Macintosh products
guide which is the comprehensive list of
all the applications that are available
that run on mac OS 10 both scientific
and otherwise there's also a download
section so if you find an application
you're interested in or somebody wants
to download your application for example
they can go to either the math and
science part or the open source and unix
part and all these have specific
download sections so another part of
this that's interesting I think in the
resource section and here we have
different we have it broken down into
different categories so if you are
looking for a particular type of
information for example high-performance
computing or soft well the software
development there's a part for Darwin
resources third-party products there are
mailing lists and community so if you're
interested in joining a mailing list or
community to discuss your your
challenges or throw something out there
and get a response back that that's
right there and there's also a lot of
links to technical information you can
see maybe can see on the right hand side
where you can download pdf about the
Apple technology we also have been doing
what we call saitek initiatives and
solutions and also I want to talk a
little bit about how we're judging
momentum that we're getting in the
scientific market so one of the one of
the cornerstones of this is the Apple
workgroup cluster for bioinformatics
we're really really pleased with this
because it really ties together the the
highly technical aspects of what we're
providing to the scientific marketplace
plus the ease of use that Apple is known
for the idea is I think it's but alluded
to is to take the setting up the
computer cluster out of the hands of the
scientists make if it's really easy make
it so that they can have the compute
power without having to know how to
manage a cluster how to code in Linux
how to do any of that they can this is
geared to be some
they can take out of the box set up
themselves and have it running in no
time we announcements at macworld in
January and we are really pleased it won
the Best in Show award at bio IT world
for IT infrastructure this was we're
just really proud of that and I think it
really speaks to how the scientific
community is viewing this it's really
being adopted for many users mean it's a
bioinformatics workgroup cluster but
people are using it for biological
research they're also using it for
application development and
interestingly they're using it to direct
develop curriculum and teaching programs
for bioinformatics at the university
level just a couple examples this one's
from the Naval Medical Research Lab dr.
Michael shoot is using his work group
cluster for bioterrorism research and he
installed and maintained this himself he
has no computer science background
whatsoever his uh his favorite thing is
to say all he needed was a screwdriver
and he was able to set the whole thing
up himself get it up and running in 30
minutes they really like the security
aspect of this cluster because of course
they're working on something that's very
critical to to you know the security of
the of the of the country they also
liked having the applications which come
with the workgroup cluster with a
web-based interface they like having
that app the accessibility of that
without having to no command line
because a lot of bent Sciences don't
know how to how to do that it's much
easier for them to have a familiar GUI
interface the other thing about the
workgroup cluster with a lot of our
customers like and which is one of the
things that that that was a deciding
factor for the Naval Medical Research
Center is the scalability of the cluster
you can always add to this if you find
that your eight nodes isn't enough you
can double that or add two more nodes or
whatever you need another example is
from idaho state university dr mike
thomas set out to design a
bioinformatics curriculum for the
university they bought a five no
workgroup cluster what happened was they
set it up so much faster than they had
planned that they were able to offer
their bio at informatics course an
entire semester
early the other thing that he had done
he hired a person a head count to manage
the cluster but once it was set up it
was running the guy had nothing to do so
because it was just going and it was
working so they reassigned this person
75% of his time to do something else so
they're using this to teach the very
first course in bioinformatics at Idaho
State University a quote from him which
talked a little bit about how this
bleeds over into other areas of the
university is I think the cluster is
going to have a huge effect in our
research environment and I think it will
help scientists here generate additional
funding so he sees this as a way of
other scientists at universities
referring to this resource and being
able to hopefully boost up the value of
their grant applications so one of the
things we did to raise the awareness of
this solution the work group cluster for
bioinformatics out to the marketplace is
with my team in the higher education
marketing team put together a work group
cluster awards program to recognize
innovation and research the goal was to
give away five fully provisioned
clusters with for dual process or extra
v5 of two gigs of RAM and each comes
with the software included BIOS the bio
team inquiry package with over 200
informatics applications all the
hardware infrastructure the power supply
the cables etc and applecare support for
three years this is a great thing to win
the applicants were tremendous we had
hundreds of applications come in from
all over the US and we were just blown
away by the quality and the this is the
time and effort it took people took to
put these together and from all aspects
of research from higher ed government
nonprofit as well as commercial
customers I'm like I hope it's not pink
on the screen because it's big there
okay we'll go with pink so first I'd
like to say of the hundreds we we picked
five winners but we also picked five
honorable mentions because again the
quality of these was so incredible that
we felt we wanted to extend the
acknowledgement to at least ten of the
applicant so just very quickly these are
the five honorable mentions the first
from University of Washington were
they're doing HIV evolution research
at Yale University dr. Kevin white doing
genomic research on model organisms
caltex dr. Barbara wall doing gene
regulatory networks at University of
Pennsylvania dr. David ruse and
colleagues are studying parasites and
genomics of parasites and at the
Institute for genomic research or tiger
dr. John Quackenbush is doing all kinds
of things but also software development
and a lot of genomic database work so
now to the very pink winners for the
work loop crust rewards the first one on
the list of these are not in the order
of first second third fourth or fifth
they're all winners UCLA dr. Christopher
Lee for doing work in comparative
genomics an incredible application
incredible project at Duke University
dr. Simon lynn who is representing a
group of scientists doing oncology
research an enormously extensible
project that he's looking at doing with
lots of software development that would
be used by the entire oncology research
community at MIT we have to have dr.
Edwards along for environmental
microbial genomics really interesting
topic very unique and at University of
Wisconsin Mike Newton dr. Mike Newton
he's developing statistical techniques
for genomic research to really like a
light show different genomic research to
to really expand the type we have the
types of algorithms and such that people
can use for that and then finally a
Children's Hospital in Oakland the
research institute there dr. Deborah
Dean is doing really really
state-of-the-art chlamydia genomics
research much more in the health care
area so those are our five winners of
the app will work group cluster words
I'd like to stop here and give a round
of applause to all the applicants and
winners
okay moving right along and just talking
again about the momentum and awareness
we have gotten an enormous amount of
press coverage both from this awards
program but really primarily starting
when we when we launched the work group
cluster for bioinformatics and started
showing up at things like bio IT world
and it's been really nice to see the
press bhosle the Mac trade press as well
as more general press and scientific
press really want to hear what Apple's
doing in this space and paying attention
to the efforts we're making to provide
really great solutions to our scientific
customers I want to turn a little bit to
talk about the developers and some of
the work that you all have been doing I
think the amount of the number of new
applications that have come on to Mac OS
10 and continue to come on to Matt OS 10
is overwhelming the list just keeps
growing these are for that are a
relatively new either updated or new to
the platform from the chemical computing
group we have the molecular operating
environment or as we like to call it mo
matlab 7 enormously popular program for
our physical science customers Gio's
Fiza is a company that does the Finch
sequencing center a great tool for
managing sequencing labs and gene codes
with sequencer another really popular
program for managing sequence DNA
sequence data what really drives that I
think is the amount of developer support
that our world wide developer group
provides to our scientific developers as
well as others and I just wanted to
highlight a few things that that that we
have an offer for our developers there
are at applebees the Apple Developer
connection software developer tools
development tools hardware support
technical support and services as well
as business services and that kind of
moves back into my area a little bit but
co marketing programs and program
discounts this is a blow up I'm not sure
how well you can see that but this is
what especially now this is what you
would see for a particular application
on our website and it just is a nice
highlight with a description of the
program and information about where they
get it what the company who the
companies that makes it or the
individual these all live on apple com
science they also are all in the
Macintosh products guide we do press
release support for developers that are
doing a big release will help with
promoting that this year all of our
scientific conferences were inviting
partners specific partners to join us in
our booth to help show the solution of
Apple hardware and and Mac OS 10 with
some of the key scientific applications
for that particular audience that we're
addressing and then success stories
excuse me we're not just doing success
stories of our customers but we really
want to focus on our developers are
using Mac os10 for examples for other
scientific developers to look at and use
as examples for their own work so I'll
finish here and this is a quote that
came off the ad which I'm sure you
couldn't see because the type was so
small from dr. Sean Morrison at
University of Michigan Michigan he said
the power mac g5 is the fastest computer
i have ever used i can have eight
different memory intensive applications
open on my desktop at the same time with
no problems whatsoever in my personal
opinion the system is so reliable user
friendly and powerful that i don't
understand why people endure pcs now yes
I think I'd like to disclose by saying
what what's not really covered there is
is really the key of matching the really
powerful hardware and operating system
that Apple makes with the really
incredible applications that our
developers provide because those things
have to go in hand in hand to provide
the right solution to our scientists and
I feel like it's just so tremendous to
see the people here really focused on
developing and working towards
scientific gaps maybe just for personal
use but also for commercial use because
I really believe that those two things
together really make the solution that
help address the needs of our scientific
community so with that I would like to
introduce our next speaker chan peng is
from the temasek life science institute
laboratory in singapore they have a 75
node extra of cluster it's the largest
cluster currently in asia and apple
cluster and asia and he both installed
it and managed it and he's going to tell
you all about his work there please
welcome him okay Thank You Elizabeth
good morning everyone it's my pleasure
to be here share with you our experience
of building and using the excerpt
cluster for bioinformatics intermatic
live science laboratory is suitable so
our group is involved in creating a
computational biology division that will
focus on comparing DNA between different
species our current research project is
the genome annotation of a cyst good
wishes and the study of non-coding
regions across cortes genome in parallel
to a notation project we are furthering
the development of workflow management
software biotype to suit our large-scale
cluster-based computational needs and
smaller workflows students for other
projects in TRL inside TRL we work
actively with other scientists to
provide computational biology support
for the places we work with lap of
reproductive genomes on the automation
of filtering
clustering and a notation of in-house
generated sequence data and is
integration with public databases the
foremost large-scale projects we are
doing is soon as I reading Gina
annotation the genome size of the fifth
grading question has been estimated to
360 million bases with approximately
fifteen thousand genes the 400 million
pieces of raw data delivered from
sequence lab is organized into six
66,000 continuous reads we typically
runs a series of programs including some
well known algorithm like blast and
in-house developed solutions to analyze
each of these 66 secrets pieces each
analysis program generally take
somewhere between five minutes to two
hours to complete if large amount of
data has to be passed from hottest to
memory the data i/o speed is extremely
important for us so for the annotation
projects we need to set up a cluster
then can meet our requirements as listed
in a slide so the cluster must be able
to deliver tremendous computational
power it should be easy to install and
ready to extend for the future and we
require high quality hardware and robust
operating system that allows most of the
bioinformatics tools to run without any
problem in addition these applications
should be optimized to achieve the best
performance on the platform we also
require sophisticated software to manage
distributed resources and thousands of
computational jobs and finally the
hardware and software solution must be
cost-effective this is the X of cluster
we built in 2003 it has 75 x
of units running Mac OS and server HX of
units as Duty 4 processor 2gb memory
fast disk storage and gigabit ethernet
our cluster hostage more than 20
terabytes disk storage and the end
across the note are managed by platform
area safe so with the help from bio team
and apple if we figure out a way to
conduct a mass rapid installation we put
up xserve unit form an external hard
disk which contain a prebuilt disk image
during the Buddha period a script
automatically restore the image and
beauty operating system on the local
storage we parallel the installation
with for external hard disk and set up
the 64 cluster nodes within three hours
so Mac OS ken is a PFC based operating
system and we fear it is very friendly
to the parent ematic tools or original
design for Linux or UNIX this line shows
the bioinformatics tools available in
our trap sure most of the tools are
compiled directly from a source code by
ourselves although some of them need to
be modified a little to cope with the
difference between beer t and the linux
it is not difficult if you have any some
experience with c programming after the
basic system is up we spend a lot of
time to optimize the performance so at
explained in the previous lines we focus
on improve data i/o speed for each X of
note we strive the two local artists to
build a raid 0 set so that it provides
240 chica bites and local storage at
average speed of 66 megabytes per second
we storm most of the blast database a
locally I note to reduce an air traffic
and connect all the xserve unit in
gigabit ethernet on software level we
engage to find the NPI enabled version
to replace the normal version if the
application itself in supports multi CPU
execution we instruct the users to run
with prop options for example to
specified a for NCBI blast so that it
runs a multi-threaded mode in addition
to these efforts we also optimized at
the compiler level with proper GCC
options a lot of aromatic tools can
speed up by about forty percent if they
were originally made with the default
configuration different from other
simple biology and answers our sis coach
geno a notation involves running a
series of programs for each of the 66
six pieces each step of the analysis
must be automated so that the entire
process won't stop in the middle
biotypes is an open source workflow
management software maintained by open
battle community it was designed to
address some of the complex issues in
large-scale purge analysis our group
contributed their projects and use bio
pipe to manage our genome annotation
project bio pipe is entirely written
improve and Mac os10 developer tool CD
provides all the necessary tools we
needed for development this screenshot
shows the job status in our cluster in
April 2004 there are more than 40,000
jobs in the queue and all the hundred
are running this is the situation we
need to deal with almost every day
we use platform RSS to manage the
thousands of jobs generated by bio pipe
II effectively lsf is the most robust
distributed resource management software
we have ever used with mac OS x server
and air SF we are able to perform large
scale by eorge analysis without worrying
about pistons system stability setting
up a cluster is a one-time task and
maintenance is the administrative
everyday work luckily we have a few
effective tools that help us a lot in
daily system administration 12 i would
like to mention it server monitor it
took us only two hours to set up the
server monitor so that it provides an
overview for all the 75 extra unit we
only need to configure the monitoring
server with the IP address of each
cluster node in the MV MV administrative
account the server monitor retrieves all
the important hard-won information for
us in a few seconds if we were using
other UNIX system the administrator have
to manually log into each node or
configuration which would take much
longer to complete server monitor also
features hottest person a warning that
is very useful for us to quickly
identify the disk with potential
problems and we also use server monitor
to collect promote information such as
children number or mac address for each
network adapter another important GUI 20
is apple remote remote desktop that
enable the a demonstrator to operate
remote machine as if it is local this
tool is need for happiness excess
especially the new xserve g5 without vga
card the mostly charming features of
apple remote desktop is the ability to
install software
by dragon job simultaneously a multiple
note we find this feature extremely
useful for us when during class a wide
system upgrade we are able to update the
64 class no to a newer version with in
13 minutes our previous experience of
managing a ribbon a size alpha system
involves doing updates from command line
and it took us at least half day for the
same task there are other command line
tools we use frequently and to
facilitate cluster management we are we
are gratified Panther has great support
for command line tools almost every
happen almost every GUI application has
the command line interface accordingly
just to mention a few of my favorite SSH
is used to log into the remote node
every day bash has been set as default
in Panther arcing is the call utility
for data synchronization and we use the
SH for distributed share so to summarize
our experience with excerpt cluster
interior basically the Excel unit
provides superior computational power we
expected the the cluster was quickly set
up and we are able to run and optimize
most of the barrel somatic tools the
entire cluster is robust for our genome
annotation project and the daily
administrative work is made easy with
sophisticated Mac os10 monitoring tools
and open source command line tools thank
you
well thank you very much and I'm just
going to point out a few place you can
get more information while we're
bringing some of the Apple people up
here for QA and in terms of questions if
you could please use the microphones if
you've got any questions and terms of
contacts Liz Robert kara our science
partnership manager and Elias toopka for
tll bio informatics program manager or
Cheng pan senior system engineer other
resources Liz mentioned the website
there's the apple science websites and
other related sessions you may be
interested in I just want to mention
specifically the science lounge on the
fourth floor you should check that out
there's going to be round table
discussions ongoing throughout the
conference there all right so let's take
let's take the first question over here