WWDC2004 Session 618

Transcript

Kind: captions
Language: en
good afternoon this is 618 distributed
computing made easy with X grid and I am
David creamer I'm the engineering
manager for the X grid group we're
having a little bit fun with the names
down at the bottom there are two of us
in the X grade group David creamer and
David Kramer he'll be a test at the end
don't get us confused so this is the
first session after lunch we're going to
do a little bit of an exercise how many
folks here have downloaded and or played
with X grid raise your hand how many
folks have not downloaded or plays X
good ok so we've all just done a
distributed computing problem an
embarrassingly parallel one and we'll
get to that thank you very much for the
input so what we're going to talk about
today exquise pollution for cluster
computing and we believe that it helps
make distributed computing easy although
full disclosure easy is a little bit of
a relative term of course but
significantly easier than it's been in
the past and most importantly you can
today start using the technology preview
of X grid as well as what is going to be
coming with tiger to start grid enabling
your applications and your workflows
whether you're an engineer or a user of
the technology today we're going to talk
about the system architecture and a
general overview of the capabilities and
how xvid works we'll talk about the new
AP is that we'll be introducing with
tiger the grid foundation api's and then
discuss how you can use those in your
applications as well as some of the
other components of X grid to grid
enable your workflow so let's get
started with the overview there we go we
had a couple of goals with X grid so we
wanted to make distributed computing
painless and easy and by easy we mean
easy to install configure use we
obviously have to have enough of a brain
to understand some of the concepts but
you shouldn't have to have that much
brain and fingers to go through all the
pain of setting up ten different
computers a thousand different computers
we want to take care of the hard things
for you secondly it should be non
intrusive one of the goals we have with
Xcode is to be able to do what we call
desktop recovery that is to take
advantage of some of the unused CPU
resources that are sitting around your
environment
and if that caused your friend or
colleague or students computer to crash
or if they notice that there was
something wrong or the behavior is a
little bit different that would be a bad
thing so we want to make sure we don't
mess that up obviously and we want to
solve a range of problems with a range
of architecture as we sort of glibly say
from Beowulf the study at home that is
from pretty massively parallel kinds of
problems to very highly distributed
problems and we'll go through a couple
of examples of that and our target
customers for these products are really
scientists engineers and creative
professionals those are the people who
really can take advantage because those
are the people who have the kinds of
problems that take a long long long time
to execute and tend to be somewhat
parallelizable and again we'll talk
about those problems in a moment in
support of those folks are typically the
software developers and system
administrators who need tool to help
them solve their problems so hopefully
you fit into one or perhaps more of
those camps if not hopefully you learn
something here that will be just fun and
interesting to use anyway so we solve
two primary classes of problems with X
grid first of all what we call
embarrassingly parallel problems
problems that are fairly easy to visible
where I have the same executable that I
just want to run them lots of different
machines with slightly different or
maybe vastly different inputs and then I
want to collect together the outputs
maybe I'm looking for a single output
sort of a needle in the haystack kind of
problem or I want to collapse together
all the different outputs or I just want
to go off and process something and get
the result back a good example of that
is like I have a batch image filter and
thousands of images I want to apply that
same filter to maybe I want to make a
thumbnail for example or something like
that I can use the services of X grid to
distribute that work across a large
number of computers so they don't have
to sit and wait for it to get done so
that's what we call embarrassingly
parallel or in barrel L problems a
second class of problems that X grid is
good for are tightly couple problems
these tend to be more in the science
domains physics and simulation kinds of
problems these are problems that you
have to think really hard about and how
to parallel eyes your algorithm and then
use tools things like MPI we'll talk a
little bit about that later as well to
enable these massively parallel
communications between different nodes
in the cluster so this is not sort of I
have an image and I want to just
transform it this is I have a simulation
problem where this
this proton is hitting that proton is
hitting this proton is hitting that
proton and I want to make sure that I
get the math right across a large number
of computers we can help you there too
that's the message click thank you so
how do we accomplish that what's the
basic functionality that you'll get as
part of X Couture that you do get now in
the tp2 first and foremost we collect
all the nodes and group them together
into a cluster we should back up for a
second we use the term grid and cluster
someone interchangeably in the sort of
more in-depth grid cluster world there
are slightly different definitions that
may be important to you in this case
we're just talking about perhaps
different kinds of computers in any case
we gather them all together and identify
them bring it together in one spot so
that you can start adding work to each
of those computers and then we identify
a queue of jobs and perhaps subtasks of
each job that we're going to be doling
out to each of those computers we
manager we met manage and monitor the
availability of those computers and then
of course dole out the tasks to each of
them as the fastest one becomes
available David will talk a little bit
more about scheduling and we make sure
that the right executables are on the
right nodes so that's an important task
that's often overlooked it's interesting
if you actually step back and look at
what people who run clusters do a very
relatively small percentage of their
time is spent doing the interesting
parallel cluster work the very large
percentage of their time is doing just
almost mind-numbing system
administration because I got to do it on
node 1 and no 2 and 0 3 and 0 4 so we
really want to help you with that and
one concrete thing we do right is
distribute if you want it to the
executables around and the datasets
around to all of the nodes that are
going to be doing the computation and
then of course we gather up the output
from the results from each of those
nodes and bring it back to the client
machine and we'll talk a little bit more
about how that works too so history of
how we got to this point as you can see
here we release the technology preview
one and then fairly recently a
technology preview two of this we've
been trying to develop this project
somewhat in the open and gather feedback
from the community because many of our
customers have very specialized kinds of
tasks and we want to make sure that the
tool is applicable to those tasks
today we're at WWDC and we'll be talking
about features of the ex grid that will
be part of tiger and tiger server
especially so this is an example of
something that I did when I first came
to the X grid team I wanted to get to
understand X crit and I had a few
computers lying around that I'm sort of
glibly calling my el cheapo cluster and
this is a little represent
representation of what I found in my
cube in the cube next to me a powerbook
a couple of g4s and at dual g5 and i
wrote a few lines of shut of shell
script that would call blender which is
a 3d animation modeling program and took
the input file which is this little fish
that you saw running along and said
render this and it chopped it up into
different pieces that's my code to do
the chopping and the end result was I
was getting what's that about three
times the number of frames for a minute
render with my el cheapo cluster and
that's the characterization of the
cluster so this took me honestly
speaking I spent pretty much about a
half day three quarters of a day my boss
is right there so maybe less than that
playing around with the fish animation
to get it to about the way I wanted and
then I just took the resultant little
snips of movies and drag them into
iMovie and there was by render movie and
that's what you saw so really was pretty
well that simple for a simple task like
that I didn't have to think too hard now
if you want to get more complex and do
more interesting things you're going to
have to learn about the architecture and
also the api's of how to use X grid and
for that I want to bring up David Kramer
who is the engineer who wrote just but
all in fact of
hi so I'm David Kramer um expert
engineer so the extra system
architecture has three tiers there's a
client market client a controller and
agent and the agents are the ones that
do the work the client is the one that
has the work to be done the controller's
there in the middle to make sure
everything's going the way it supposed
to go so when it works is the client
submit the job to the controller
controller takes this job puts it up in
the task this isn't really that fancier
exciting what's happening is that a job
is just a list of text so it puts it up
and start scheduling it on the available
computational resources the agent so the
agents start to compute the test and as
they finish the results go back to the
controller and the controller collects
them all and send some back as the job
results to the client so why have three
tiers I traditionally you might imagine
having two tiers where you have a client
and then you have the computational
agents and you send out the work and you
get the results but there's situations
in which that's not the way you want it
to work especially if you have clients
that aren't always online for example
laptops then you want to submit a job
and it's going to take a day to run you
don't want to leave your laptop plugged
and you want to take it home so the
controller's there to keep track of
what's going on to make sure it all gets
done and collect the results for you
when you come back the next day you can
get the result it also allows you to
have multiple agents multiple clients
and they all can find each other by way
of the controller it also supports a
variety of distributed computing styles
so in one case you could have a
dedicated rack of X herbs and they just
in the closet and they're always being
used for computational tasks or you
might just have some IMAX and some
laptops sitting around that sometimes
are plugged in and sometimes we used by
their users and sometimes they're just
sitting there idle and so you can
recover these resources at night or at
lunch or whenever they're not being used
so the first here is the client here
I you can either create an application
using the cocoa framework and have that
be a client and with TP to there is an
application called expert that is that
kind of client or you can use the
command line tool X good to submit job
monitor the grid and get your results so
that that's basically all there is to it
submitting monitoring and retrieving the
agent on the other side of the three
tiers is a background demon it is
configured with preferences there's a
plist file there's some script to start
and stop the agent you can run it and
dedicated mode like on the excerpt or
you can run it in screensaver mode I am
in that mode it is only accepts
computational path than the computer is
idle so the controller in the middle is
also a background in its the server it
listens on a socket and accept
connections from agents and clients it
handles all the resource management and
scheduling monitors the agents accepts
the job puts them up submit them to the
agents collects the results then returns
them to the client there's also a
limited controller built into the
clients framework this means you don't
get the benefits of three tiers now
you've just reduced it down to a
two-tier system but what you gain is you
don't need a dedicated controller so you
can just walk up to a network that has
some agents on it and start submitting
jobs and getting results but again if
you need to go home and take your laptop
home with you you're not going to be
able to continue doing the work because
there's no dedicated controller so the
controller does all the scheduling as I
said and so the scheduler knows about
files job tasks and notes it schedules
the job as they come in they're in a
queue and it takes the one off the top
of the queue looks at what tasks good
news it has to be done and send them off
to the fastest available computer it's
also salt-tolerant which means that if
one of the agents goes offline while
it's doing the work the controller will
notice this and we'll reschedule the
work
on the next available agent the
scheduler also handles dependencies
which means that jobs the test can
depend on arbitrary you are eyes
universal resource identifier each job
in each task in each data set is given a
unique resource identifier and so in the
first example here you can have a job
depend on another job so you have a
second job here with two tasks and none
of those tests will run until all three
tests of the first job complete in the
second case you have one job but it has
these five sets and the first task runs
this might be a pre-processing task and
then the next three tasks will run once
that one complete those are doing the
work and then the final task won't run
until all the rest of the taps finish
and that could be your post processing
stage so the extras software
architecture looks like this you've got
the demons and the applications up top
and they're all built on the XG
framework and the extra protocol and
that in turn is built upon the blocks
extensible exchange protocol RFC 3080
and all of this is built on top of BSC
sock is the core foundation library and
the foundation framework so in xcode
various aspects of the communications
that are important to note are the
controller advertises via rendezvous and
the controller is the only part the only
tier that actually accepts connections
it's the server so the agents and the
clients sit around look browsing for
services and when they find a controller
you can then a then the agent
automatically connects to the controller
that is configured to connect you so the
agent and the client can be configured
either connect to a rendezvous name
service name or you can just use
standard host names or IP addresses and
so you can do this on a local subnet
with rendezvous or you can do this
across the
another thing to notice the agents and
the clients leave the connection open
after they've connected to the
controller and this allows for
asynchronous peer-to-peer communication
so anytime an event occurs to the job
when the job completes the client is
immediately notified there's no polling
so the controller stores and streams the
data from the agent and into the client
so this means that as the agent is
running the tab and generating output
that gets streamed back to the
controller while the task is still
running and that is then in turn
streamed back to the client so you can
start getting results before your tasks
have completed and then one more note
about MPI it's a little bit tricky is
fit when we run an MPI task with X grid
you don't know ahead of time which
computers you're going to run on because
that's managed by the controller so you
just submit the job and then the
controller scheduled it and there's a
computing communication that occurs
between the nose at this point to find
each other and the master node collects
all the IP addresses and saves it to
disk and then runs the actual NP
executable which uses that configuration
file to connect to all the rest of the
processes that are running on the other
agent so there's two models for security
and extras the first is the ad hoc model
and this is what you see in the
technical 3b2 password-based mutual
authentication so this if you want to
have someone join your grid you can tell
them what or they can tell you the
password for their agents and then you
can allow your controller to connect
there's also the second model is the
managed security and this is where all
the components the client controller in
the agent are bound to an open directory
administrative domain so it's the same
set of users on all these computers in
both of these models the agent and
controller can be protected and what
that means what we're protecting against
here is unauthorized use so eight
agent can be configured to only allow
connections from trusted controllers and
the controller can be configured to only
allow connections from trusted clients
and in that way you are assured that
you're not doing work for someone who's
not authorized to use the resources so
in the ad hoc security the communication
occurs in the clear although it may be
encrypted in this itself but at no point
our passwords meant in the clerk there's
a to a random protocol used to make sure
that it's unintelligible to packet
sniffers the agent runs tasks as an
unprivileged accuser because there are
there's no sense of shared users here so
what this means is that if you're
running computational tasks on this
agent in an ad hoc security mode that
you don't get any connections to the
window server you don't have a home
directory and so you only get world
access to the files on the disk this
means you can write into / campy you can
read some various public system
configuration files but you don't have
the ability to read private files of
home in home directories in the managed
security case all the components as I
said are bound to open directory and so
the agent can run tasks as a privileged
user and this would be useful to run the
task as the person who initially
submitted them so this requires
delegation of credentials and but it
allows you to access home and network
directories as the user who submitted
them so you don't have to make all of
your files world readable in the world
writable to have them be used in a
distributed computation so next I would
like to talk to you about how to
actually develop he's in the extra day
p.s so first there's the expert
command-line tool that you can use and
this you can use this from shell scripts
you can use this from your applications
in Cocoa using NSF any way you want to
do it would you what you do is you
factor your computational code into a
command-line executable and then you use
X grid to
submit this submit a job you can include
the executable with the job if it's not
already installed on the remote
computers and then you can collect the
results and you can do all this with the
command line to illustrate an example in
the moment and so that is how the
blender example that David talked about
was done you can also integrate extra
twist your application so you don't have
to write a new application or or write a
shell script you can just link in the
cocoa framework and then use it to
distribute the task so when the grids
available and monitor the status to your
work and retrieve the result so as you
probably caught on to now the lifecycle
of job is that you submit it you monitor
it and you get the result and that
that's the theme of the talk here today
so the command line example here first
you submit the job and so in this case
we're running the cow program which
prints out a calendar and we're family
along june two thousand four and so you
submit it and what gets returned as a
job identifier then with that job
identifier you can retrieve the status
of the job so in this case we're get
back to status and we see when the job
started when it was submitted and what
its status is in this case the job is
already finished it didn't take very
long it is just printing out a calendar
after all and so we'd like to retrieve
the result which we do like this and so
again you use that same identifier that
was returned in the submission and it
just prints out the results to thin it
out here and then you delete the job
because the job sticks around in case
there's an error retrieving the results
the job doesn't get automatically
deleted you have to explicitly delete it
yourself so the grid foundation api is
in our number of classes and all i'll
talk about them in detail in a minute
but this is a big list the one to notice
here is extra resource which is the base
class for a lot of these other classes
here they actually resource
classes all represents a remote resource
that you're monitoring with your
application so they act as proxy objects
so you so they don't they aren't the
actual job they aren't the actual node
that lives in the controller they are
your sort of view onto those things so
using the client API is would you twitch
you begin with is you connect using the
XE connection object and you
authenticate using the XE authenticator
object and then you create a job first
by creating a specification of what that
job entails and that would be the
command and the arguments and what files
of the pen zone and then you create a
submission using that specification and
submit the submission ah and then as if
that submission succeeds you receive
back an extra job object you then can
monitor the job using action monitors
and the related objects the update
monitor and so you use these to get your
asynchronous callback to find out when
that is the job has changed and then
finally when you're ready to download
the result you use the actually file
download object which allows you to
retrieve the results so first of all you
want to connect and authenticate sees
the connection and we use a subclass of
extra Authenticator in the technical
preview called the two-way a random
authenticator so first you create the
connection in this case we're using a
hostname and a port number and 0 means
use the default port number you can also
use an NS net service if you've used if
you're using rendezvous and you've
browsed first surface and then you
create an authenticator set the username
and set the password you may get these
from keychain and then you set the
authenticator on the connection and you
tell us it open which we do right here
and so before you do that you said you
create a grid controller which is a
subclass of Xu resource so it's your
view into the controller that's running
on another computer and so you create
the controller object you set yourself
as the delegates that you can get call
back some interesting events occur and
then you set the connection on the
controller when you initialize it and
you open the connection then when you
you'd like to know when you've actually
found out what's in what the good
controller is doing what past is running
with jobs it's running what file sets
its aware of which knows it's aware of
and so you have the grid controller to
update and you get back and update
monitor object and then when you'd like
to know when that update has succeeded
or failed your delegate will get a call
back so here's the call back the update
monitor resource did update method that
you get that your delegate should
implement and so in this case we check
to see that the resource is the grid
controller and if so we see if the grid
controller is available again and if it
is we call a a method which all will
describe in a minute submit job and that
isn't part of the API this is a method
that you would implement yourself so
when you submit a job you use the
specification object actually
specification and use actually
submission and again use extra
controller many times so for to create a
specification first you need the job
info ott and so rather than showing you
some code on how to create this
dictionary i'm showing you the actual
property list here so info dictionary
contains a job dictionary contains
information about the default task and
so these are parameters that apply to
all the tests and then you actually have
a list of your tasks so in this case we
want all the tests to use cow as the
executable but each task should have a
different set of arguments and in this
case we're getting june two thousand
four and june two thousand five and then
the type of this job is unordered tap
the scheduler is free to run these in
any order simultaneously sequentially
whatever it needs to do to make the
is the resources so with this job info
we create a job specification so you can
imagine that that info dictionary would
be returned by the job specification
info method in the second line of this
method so you create the specification
object using a type the info an
application identifier and application
info the application identifier an int
or completely uninterpreted by extra
they're just there so you can sort you
can filter out jobs so that you only
look at jobs that you've submitted in it
you don't have to worry about jobs that
have been submitted by other clients and
the application info would be sort of
information about the job that's not
relevant to how extra uses it but is
relevant to how you might want to
process it when it's done so using that
job specification that we created in
that method you then create a job
submission and a the job submission
monitor and so you set yourself as a
delegate on this and I'll win something
when the submission succeeds or fails
you will get a call back so here we see
the call back again it's the submission
monitor resource did submit and we check
to see that it's the submission monitor
we're expecting and we take the resource
which we expect to be a job in this case
because we were submitting a
specification with a type of job and
with this job we tell it to update
continuously we want it we don't want to
just find out what it's doing right now
we want to know everything that happens
to it from now until we don't care
anymore until the job is done so we get
a job update monitor object and we set
ourselves as the delegate so we've now
submitted the job and it's time to
monitor to see what happens so we're
going to use the action monitor classes
we're going to use the job task node
node list object just to monitor what's
going on with the grid while this is
running so the actually grid controller
object can be used to get a list of all
these
the resources the node list object is
used to get a list of the nodes hot so a
node list is might consider is kind of
like a virtual cluster it's just a
collection of nodes that are
meaningfully grouped together so you can
submit jobs to run on specific node list
or you can submit jobs to just run on
all of the noon and then the nodes act
the action node object actually contains
the information about the node such as
how many processors it has how fast it
is that kind of thing so when you're
monitoring a job you of course use the
submission object to first get the job
and then you use the job object to get a
list of the task and that is available
once the job is started running and then
with these XG task objects you wait
until they're done executing and then
you can retrieve the results so first
thing to waiting for the job to update
and again we use this update monitor
resource that update method and here we
check to see if the resources the job
and if the job is finished we retrieve
the task so this is what retrieving the
task looks like you ask the job for
listed task and then you walk through
that list and a speech taps to update
and you would want to fit the delegate
on that too so when you get the call
back you're going to want to retrieve
the result and see that you're going to
use actually file and actually file
download so as I mentioned before you
get the standard output in this error
stream from the task and these are
treated to as file the special files
files would have no attributes so you
still use the actually file object to
retrieve these streams and actually file
is a reference to a files attributes in
content it's not the actual content
itself to get the content use the XG
file download object and it's cute lets
you do an asynchronous download of the
file from the controller to the client
and you can download you there into
memory or onto disk you might want to
download into memory if you wanted to do
some further
processing on this data immediately
displayed to the user and you might want
to save it to disk if it's really large
if you're downloading 4 gigabytes of
output clearly you probably don't want
to load it all into memory just so that
you can save it to disk later so you can
download directly to disk and the
actually file downloaded object can also
be used begin downloading standard
output of a task that hasn't completed
yet it will discontinue downloading and
if there's no more output let rip ready
but the task is still running it'll just
wait for more output so here's an
example of how to start downloading file
first you get the output file from the
test and then you enumerate them and it
will walk through them but before we do
that we create a base path and this is
where we're going to be saving all the
results so for each file we create a
file path by using the path of the file
which is probably in most cases just a
file named the just one file name and we
append that to the end of the of the
results directory fat and then we create
a file download object with the output
file and we set a delegate and then we
set a destination for where we want this
file download to go and finally we want
to we need to retain the file download
object so we have a set here and we just
add the file download to the file
download set and we properly manage your
memory so when the file download
finishes the delegate gets this call
back and so we make sure it's one of the
file downloads that we're paying
attention to and we move it remove it
from our set and then if there's no more
file downloads left we know that we've
successfully downloaded all the files
and so we're ready to notify the user
that the job results have all been saved
and at this point I would like to turn
over to Charles par now to talk about
actually using extras with biochemical
models
[Applause]
I would like to thank David to give me
the opportunity to talk here so I want
to present you today some of the work
that I'm doing in brian kobilka slab in
stanford university and i will start
with a short introduction with quite a
bit of biology to explain you our
scientific goal and the kind of
biophysical studies we doing to achieve
these goals and then I'll turn to those
challenge that we're now facing in terms
of data analysis and how we use it's
great to try to deal with this challenge
you probably all know that the brain
regulate hard DVD by releasing
adrenaline to it particularly when
you're a little stressed like I am right
now and but what you probably don't know
is that the adrenaline is actually
relieved by the neurons at the surface
of the heart cells and the adrenaline is
recognized there but the better to
adrenergic receptor and this is the
receptor we are interested in this is a
dish receptor is a protein with a very
precise 3d shape and here is the
schematic of it so on the top is the
outside of the cell that's where the
adrenaline comes from and when the
adrenaline binds the receptor goes
through a series of changes in shape in
3d structure through a series of state
and we want to really understand this
process because this is what ultimately
regulates the heart rate and if we
understand these changes will be able to
develop new drugs for heart disease so
the question we asking are quite simple
is if this model true and if it is true
how many states do we have and how fast
are these transitions and the way we
answer we try to answer these questions
is by using fluorescent probes that we
can attach at a very precise location in
the receptor and then the nice thing
with this project that each state then
has different brightness one state one
brightness and we can monitor the we can
monitor the transitions and I imagine
you
put several trillions of this molecules
in the tube and you're measuring the
average versions of all this population
of receptor and here is some real data
obtained in the lab where you have
fluorescence as a function of time where
you add the drug at time zero and you
can generate more data by using
different concentration so now back to
the model the smaller i showed you
before what we want to do is take this
data in green and try to fit it with the
model read for that we have a oh SI and
application that we have written to do
with the simulation and the feeling but
the problem that we rapidly face is that
we have really many diameters to fit for
each state as i told you we have a
different brightness and then for each
reaction back and forth we have a
different rate so here's the problem we
have many commoners to fit and then when
you do a fit like this you want to give
the computer and initial yes and initial
values that's not too far from the
actual best ways that you can find and
this is very difficult with so many
parameters so it boils down to one
simple problem we have a very large
parameter space but somehow we need to
scan and this means a lot of computer
time and to address this problem of
course we opted for x rayed to dispatch
to the work the load on two different to
several computers and the reason why we
turn to activate those first of all
because we know cluster expert so that
saves us a lot of time because I screen
is really easy and two in style to use
and then we have typically
embarrassingly parallel problem because
we just want to run many independent
ships each with different starting
values and finally we already had a Mike
Weston application region so that means
that meant very little additional code
to write with some famous on your API
this application Diogenes has a cocoa
based graphical front end and there's
also a thermal inversion of it that we
bundled into the application package
so here is how the test we're designed
to run an eighth grade each parameter
with sample over a range of a reasonable
range of values then each combination of
the parameters is constitute a set of
starting values of one thing that each
one task one expert agent typically
consists of about a hundred fifth and
the positive results can be selected
based on the threshold set by the user
we first implemented this with using the
technology preview to with the plug-in
architecture so to the we had to use a
different format for the jobs that could
be created inside the main application
that could then be red and processed by
the plug-in and say it's like to the
file and then to be displayed and
analyzed further on the main application
now we're implementing the new version
of it with expert forum work which makes
it much simpler a much better
integration of the biocontrol with the
ex grade code we have this common line
tool that is donald the application
package that we can use to submit a job
submit a judge that contains this
executable together with a temporary
file that contains a description of the
fit that we want to run of course that's
just one job and what we do is we sense
as in several hundreds of these jobs to
exploit that then takes care of
dispatching the jobs to all agents as
david showed you and a suite also takes
care of retrieving the successful job
send them back to the application for
further analysis and to display the
results so far using x-rays we've been
able to test really extensively those
two models that we call the three states
model and the four states model it was a
big surprise for us because we were not
able to fit the data but because we use
expert and with all this competition
we're really sure that the model those
models don't
we have to scan the parameters space
expensively so we can now dismiss those
models and this is already a very
important result for us surprising
result for the important result and of
course now we are testing other models
so I want to thank the people in my lab
absolutely Brian gayatri that the end
Aaron which were very helpful about this
project but also once thank all the
people in Stanford that has contributed
they are listed here that contributed a
cpu to the cluster and finally I want to
thank all the people around the world
that have contributed to this project
and this is this has been really amazing
now we're reaching 40 gigahertz actually
and so I want to thank them all and
maybe some of you are in the room and I
want to encourage you to join the closer
send me an email the finally whole of
this wouldn't have been possible without
expert so Thank You Apple thank you
David and thank you David so I just
wanted to I love this slider absolutely
love the slide the first time when
Charles sent me his slides I saw this
one I knew right away and I forwarded it
off to David into Richard Crandall who
was sort of the originator of this
thinking that we had essentially
achieved the if you will study at home
for the rest of us this is something
that Charles who as you can tell us a
scientist he has a task he's trying to
get done he is not interested in doing
distributed computation from the inside
he's interested in using the
capabilities of doing distributed
computation to solve his task and he
managed to assemble this amazing grid of
people through a bunch of essentially
social engineering sending out a really
nice email message to his friends and
their friends and their friends and so
on and built himself a 40 gigahertz
cluster just by being a nice guy so here
we are key issues and take away so
technology preview to is available right
now you can download it off of the
website there's some URLs at the end of
course you can always just go to
apple.com / ACG / x grid and that URL is
coming and get it this is part of tiger
server there will be enhanced
functionality is part of tiger server as
well as some components will be part of
tiger so stay tuned to see sort of how
we package this up but
execute as part of Tiger Tiger server
the agents themselves because we
realized that a lot of us especially
want have Panther clients out there
running and it's not feasible to imagine
that every single Panther client will be
updated instantly to Tiger although we
certainly hope that's the case if you're
doing desktop recovery the agents will
run on on Panther as well and we really
really really want your feedback that's
why we put TP to give you one is ep2 out
to the whole community use it there's
bugs that we know about I'm sure there's
bugs we don't know about and we want to
fix those and get this get this to be
great quality so first of all download
gp2x good right now or take the
information you've learned today think
about it send your feedback to the
addresses that are shown here if you're
a user of X grid there's a great user
listed and archive through our email
system that you can look through lots of
posts from Charles and others on things
that they found out as well as just a
generic feedback that David and I read
and if you need more information of
course go to the website this is
obviously in your kits and on the web
page and give us a call lastly I want to
also thank and bring up here for the QA
James Reynolds from University Utah who
has kindly worked with us to put
together the demonstration that you'll
see in the you can see running right now
in the enterprise IT lab we have
harvested the power of a number of
machines running around w WG right now
we've built a grid that's roughly plus
or minus 10 depending on whether these
machines are being used by someone else
or not roughly about 100 gigahertz and
actually it's a little bit more than
that and what James has done is provide
us with a number of scripts and some
interesting models and he's doing a
massively distributed pas vrai render of
3,300 frame movie I think and as the
show has been going on that movie has
been getting longer and longer it's
really quite beautiful so check that out
in the enterprise IT lab