WWDC2001 Session 501

Transcript

Kind: captions
Language: en
good morning I was thinking about just
sitting the audience and starting this
talk because to some extent the Java
Virtual Machine is an invisible thing
right yeah like sits there and does your
stuff when you're programming in swing
or when you're running your application
and things the virtual machine is like
just it's just there it's just the
utility it's just the thing that makes
your java happen and hopefully if my
group does its job right that's you know
hardly even have to know about it right
well that's that's our goal we want you
to just enjoy the benefits of it but in
fact it's an incredible piece of
technology I don't know whether you have
have heard of the game or have played
the game The Incredible Machine or or
have kids who have played that game but
the incredible machine is this great
little great little app on on the Mac
that that lets you build and and and put
together really fun little stuff and I
kind of think of the virtual machine the
java virtual machine as an incredible
machine so anyway in this talk we're
gonna talk about what we're gonna I'm
gonna get up onstage as if I were still
sitting in the in the audience saying
all this and tell you about it so first
of all if you just came from Steve
Narrows talk you heard Larry Abraham's
get up there and talk about how hot spot
is this next generation virtual machine
so it's true it is a fabulous piece of
technology so we're gonna talk about
what hot spots about and why it's so
good and why some of the things we like
about it beyond that we're going to talk
about what Apple does beyond now what
Apple does to add value to the virtual
machine as we get it for the Sun we will
talk a little bit about what is in the
Java developer preview that we are going
to be releasing I guess later this week
or any day now or by the end of the week
whenever we get it out on the website
and finally we'll have
some time for some Q&A so first of all
java virtual machine basics what does a
java virtual machine do so if I say JVM
by now I hope you know that I'm it's the
acronym for the java virtual machine the
virtual machine is responsible for
managing the threads of execution your
code is written in Java and it executes
so various threads so the virtual
machine makes those threads happen it's
pretty easy for us because we have mock
and and threads underneath us but we
keep track of what's going on there and
we the Machine execute your byte codes
it is an operating system in a lot of
respects and having managed operating
systems before let me assure you it is
an operating system in many respects the
Machine collects your garbage your
garbage is your unused objects these eat
up memory and as fast and one of the
virtues of the Java programming model is
you don't have to worry about getting
rid of your memory when you're not using
it it helps us if you know out your
references when you're not using it
anymore helps you also but unless the
Machine take care of that stuff the
machine helps shift control back and
forth between your byte codes and native
code underneath byte codes are great so
our native application ativ libraries
that do great things like speech
recognition speech synthesis quicktime
you know GUI drawing all that kinds of
stuff so there's so much you can get
done in byte codes and some stuff you
have to get done in native code and so
the machine also handles the transition
to and fro library Java home is where we
put you know the standard properties and
things like that and that's something
that you get to extend cocoa Java if you
wander around with Java browser or
something is the is located in a couple
other places it's under system library
Java so if you poke around and find
strange things in strange places this
is meant to help tell you where where
some of our stuff belongs the the next
slide is about what we do or what we
think of is your place so in our system
Java home is library job home and the
things in there that you extend with
your code with your applications are the
bin area sometimes you've got a little
helper helper utilities and stuff Java
home bin is a great place to drop those
if you have jar files that need to be
part of you know the standard extensions
that's what the extensions directory is
all about and put stuff in there and you
don't have to fool around with class
pads anymore that is great clasp has
we're not such a such a great idea
obviously the other stuff their fonts
images you put your properties their
security data you know it's the general
kind of dumping ground for all stuff all
sort of support data for Java programs
cocoa and Mac OS 10 has different
bundling mechanisms that let you package
some of that stuff before but if you're
coming to us from another platform don't
want to rethink that part you know Java
homes where you already put your stuff
this is the part of Java home that we
want you to extend and of course in
applications when you write stuff you
put your your resulting product up there
so that's it with the basics I want to
talk a little bit about hotspot so what
is hotspot
we take hotspot from Sun and we put and
import it to necklace 10 so it's not
hard compared to what it was a Mac was
nine plus nine we had to do things like
invent a threading model we had to -
things like well anyway I'd only go into
it the so the adopted Darwin phase is is
pretty easy because there's natural
threads underneath there's a natural
filesystem model so i/o just happens and
stuff like that so it's it's a porting
job the main thing where we add value at
this stage is we write the interpreter
so this you might think well the
interpreter is just a C file right and
you just kind of compile it well that's
not quite the way it is so I'll talk
about that a little bit later but we we
get the interpreter up and running and
so now you can run hotspot in a in an
interpreted mode and then of course and
we're make a run fast we have to write a
compiler we have to write a runtime a
JIT C compiler that compiles your byte
codes dynamically into machine code you
know roll folds in in your program and
makes it run so that's the basics for
what we do with hotspot right and then
the point comes where we want to make it
better and I'm going to talk a lot about
how we make it better but let's let's
just keep that let's just step back to
hot spot hot spot this next-generation
technology what is that really all about
well first of all it's 700 files I have
four people in my group who work on
hotspot as well as other things so 700
files that's 200 thousand lines of code
it's actually 714 files 220,000 lines of
C++ code so
I don't know whether any of you just
started out programming six years ago
and I've only known Java but C++ code
can be intricately complex if it's not
done well luckily for us hotspot is done
very well but it is still an enormous
undertaking just getting that part up so
let me talk about that interpreter for a
minute the interpreter is actually
combined assembled if you were out of
code templates every time you launch so
there is no interpreter dot C file there
are templates for the interpreter now
why would you like you know Jam together
an interpreter every time you launch the
VM well let me tell you there's there's
two reasons one is we don't quite make
use of this right now but if you're on a
g4 we might be able to have a faster
little loop that would make that could
take advantage of a g4 processor so if
we had just one version of it would have
to be tuned for either a g3 or a g4
another one though that which we do make
use of is if you're debugging every time
you execute a bytecode you'll often have
to ask you have to say am I supposed to
stop here is this a breakpoint am I
supposed to do something else and so
there's at least an if debugging check
that you have to do on every bytecode
that you interpret well not if you
assemble the interpreter on the fly so
we check and we look and we say are we
running this in a debug mode and if not
we assembled the interpreter without
that little if check so the instructions
just go flat-out the interpreter
instructions just go flat out as fast as
they can and it's really important to
have a very fast interpreter because
compilation just-in-time compilation
takes cycles away from your program and
you should only do it when you have to
when it's going to be to your programs
advantage so having a very fast
interpreter is very important and so
that's why that part of the technology
is actually pretty sophisticated I could
tell you more about how when you're
about ready to do garbage collection you
actually swap out the whole interpreter
with a little jump table such that you
jump into and start synchronizing with
the garbage collector thread but
I mean it has a bunch of very
sophisticated technology it's really
cool but if you're not going to be done
with if you're not going to be spending
your life in the interpreter what you
want to do is have a fast compiler you
want a compiler that compiles fast
because it's again taking cycles away
from the running time of your program
and you need it to build good code when
it does spend that time so our compiler
it's fast compiler and compiles into
pretty good code we are anxiously
awaiting the the next generation
technology from some the 1.4 train
because it actually has a better
technology for generating that our code
so we're we're looking at that because
as as Steve knob said
the pendulum is swinging back to the
compiler part so we have a pretty we
generate pretty good code and we're
looking to make that code a little bit
better for you
finally hotspot has a patented I believe
implementation of synchronize now
synchronize is an interesting notion
from the java language of viewpoint when
you go to access a vector object for
example it's methods are synchronized
right and what that means is that
they're safe if several threads are
trying to access do that do operations
on that on that object at the same time
well the reality is that most of the
time when you use a vector only one
threads operating on it at a time in
fact maybe only one thread will ever
operate on that so the idea with his
synchronized operation is that to get a
hold of and acquire the lock for that
object it's a very very fast hand to
dissembled you know compare-and-swap
operation and basically it says that the
data structure for that is built on the
stack of the caller and basically it's
very very cheap there's no extra data
allocated for that only if a second
second thread comes in and says I need
to operate on this object also do we
build a very a heavier weight operation
a heavier weight locking operation and
actually go to the operating system
to say block that thread we don't want
them to spin we want to put them to
sleep until this other guy is done so a
very fast synchronized implementation is
is it's a trademark I know it's a patent
it's it's it's one of the really cool
things about about hot spot a fabulous
thing about hot spot is that garbage
collector I mean when I grew up I never
thought I'd be strolling the virtues of
garbage collectors but garbage
collection is actually a fabulous
technology that lets you program a lot
easier and so I'm going to talk a bit
about garbage collection right now well
not quite yet sorry what are the
benefits I want to talk a bit about what
is all this fabulous technology do for
you in combination so one of the people
who works for me put together a little
tiny benchmark we call it the allocation
micro benchmark it's from one to sixteen
threads or something like that it goes I
want to tell you about it it's several
threads running in this allocate and
objects freedom allocate on objects free
them how many threads can you get gone
at this and any any measures the peak
rate of allocation so the fun thing
about it is he wrote it about four times
he wrote the code in Java
he wrote the code in C he wrote the code
in C++ and he wrote the code in
Objective C which is what cocoa is found
is based on so then of course you run it
on multiple processor machine to make
sure you get two threads actually trying
to do the same thing at the hardware
level at the same time so I have to warn
you micro benchmarks should be taken
with a grain of salt a lot of water and
don't don't think about them too much or
don't trust them to predict your
performance because they often focus on
one very a typical usage pattern I mean
you might use it a little bit but you
don't use it a lot and so anything you
see from a micro benchmark about a
particular little usage pattern it's
really hard impossible really to
extrapolate from that to your program to
any kind of a extrapolated win for your
program so to emphasize the point your
mileage will vary
obviously it'll be much less so let's
talk about the results
so with Sica this allocation benchmark
gives us about 200 objects per
millisecond Objective C is a little bit
less I think well I'm not sure why but
there's a little bit of message overhead
in there for that C++ actually is a
little bit better than C you peek out
around 200 and you know 225 objects per
millisecond with Java in that
interpreted mode allocates faster than
compiled C++ code not bad when the
compiler has run and is running those
threads the allocations are eight times
faster that is pretty phenomenal this is
two threads trying to go after objects
and they get up eight times faster when
you're writing a Java for point of
reference in mrj on Mac OS 9 compiled as
fast as it could go
what's still faster than C or C++ but is
you know it's just a little bit faster
than hotspot interpreted so let me talk
about garbage collection again garbage
collection is 41 years old first paper
on garbage collection was John McCarthy
1960 where he talks about mark-and-sweep
mark-and-sweep is the idea that you got
your objects laid out in memory and you
go and you mark every one that's still
alive and then you get to reclaim this
the stuff between between the objects so
that's pretty cool it was used obviously
on a lisp system three years later
Marvin Minsky of other fame came along
and provided an interesting paper on a
copying and hence compacting collector
where not only do you mark all the
objects that are alive by descending
through their routes but you copy them
into a new space and so that compacts
your memory and so you don't have
fragmentation issues that that that
plagued C programmers all the time
because
your objects can be packed into a
smallest memory they need to survive and
so this hugely extends the love running
lifetime of a program it was a long time
before the next major advance and
garbage collection came along and that
was in 1984 Dave Unger put out a paper
about generational collecting and since
then well and over the course of these
41 years there have been over a thousand
papers written on garbage collection
it's a great topic Java is the first
system where it really comes in the
mainstream for folks though I got my
data for there's this great book called
garbage collection and if if any of this
talk interests you or intrigues you a
little bit I highly recommend you go and
buy this book and reviews all the
algorithms and a very very great way
let's talk about generational collecting
what's the idea of generational
collecting
most objects die young right you use an
object just a little bit it's dead so
the idea is you split memory into
generations such that you can minimize
the number of CPU cycles allocating a
new object and you can minimize the
number of CPU cycles to remember and
keep track of the old ones so the idea
is that old objects actually often don't
change that much in terms of what
objects they hang on to so if you can
never worry about an object if it never
changes then you don't have to spend
cycles even remembering that it's still
alive so in order to make this really
happen the compiler and the interpreter
implement what's known as a write
barrier such that if an object in one
generation gets stored into an object in
another generation we keep track of that
to say hey you better go look at these
objects over here because they might you
know we might have had an
intergenerational reference here so we
can keep track of which objects are
alive so that's sort of the basic
background technology so what I want to
talk about is how is that employed in
hot spot hot spot has for generations
running at once
first generation the Eden is where you
allocate objects and basically it's as
simple as
you got a pointer to the top of memory
you add the size to it and you're done
you have an allocation the only
complication here is that the assignment
for the memory the men plus equal size
is an atomic compare because you got
multiple threads that maybe having to go
after that and you might have missed so
there's actually a little loop to make
sure that you stored what you wanted and
so you have to loop back up and see
whether or not you have to re add from
the top of stack it's very very fast the
so-called new generation is where
objects that have that survived that
survived the first run I mean that's the
only thing you could do with objects in
the new generation is you allocate them
you never worry about them again because
if they ever stay alive the only way
they stay alive is if they got stored in
an older object other than that they're
dead so you just assume that everything
that's in the in the Eden space is dead
because you've actually track it you
actually keep track of what objects are
stay alive and the other generations so
in the new space the new space is a to
space copying collector you know Marvin
Minsky kind of technology from 1963
where objects that are in this space
just get copied over to another one and
compacted and if they survive this kind
of back and forth space a while then we
say that they're no longer a child or an
adult we push them into its kanoon as
the tenured generation and they stay
there from adulthood till till death
actually they can die at any stage but
there are adult objects ago their hot
spot actually has two different
algorithms for using to maintain the
tenure generation the one we ship with
is a pretty classical mark-and-sweep
algorithm there's another one called the
Train collector which you can get to
with - X Inc GC I believe it is we
haven't done much experimenting or much
qualification on that we intend to
though because the virtue of the train
generation is it spends more cycles
keeping track of your objects but you
have less pause times when it goes to do
and flying some dead memory so pause
time is actually kind of important for
we based apps isn't it so we're going to
work on the train collector and see if
we can get it into shape to ship with
you're invited to go play with it
yourself maybe it works just fine for
you
des there's another generation however
which is used for support objects those
200,000 lines of C++ code are possible
because those objects are for the most
part garbage collected their garbage
collected with the same collector that
is used for the rest of your Java
objects so hotspot eats its own dog
foods it collects its own ah it
implements its own collector and it uses
it for its own purposes
so the permanent generation is where the
support objects for the program are used
and other implementations those usually
just come out of the mallet keep but in
hotspots case they come out of the the
so called permanent generation and that
uses a market sweep algorithm objects
they're rarely die
so we it's rare that we actually worry
about those too much let me shift gears
a bit and talk about the things we do
the things we do to make it better to
make Java better first of all from the
VM perspective one of the things we do
is provide better integration better
language integration lets you see more
api's to use to write programs we like
to provide better performance
performance is critical to how you know
how your program looks how it behaves
and we really believe in better
performance there's general observations
about performance there's all different
ways to think about performance in
general you want to do more with less
memory one of the things that hotspot
does is in other implementations they
had an extra word per object just to
keep track of that lock just to keep
track of you know whether or not a lock
was around for an object and they had
another data structure of the handle to
keep track of where it really ought to
be such as these stored handles and
everything and so you know all that pays
off so hotspot runs about 10% small and
smaller memory simply because it doesn't
use handles and it doesn't have extra
data space for that that rarely used
monitor on every object for the client
Steve narrow off talk about how much
effort is being spent on the server well
for the client we think that scalability
means running more apps for the same
amount of memory when he got up there
and said you know it takes I can't
remember what the graph said 60 some
megabytes to run to job applications
well we sell systems with 64 to 128
megabytes memory and we would love for
you guys to write apps and ship them and
have them run well on our you know our
out-of-the-box configuration so for us
that means we have to make sure we use
that memory to the best in the least and
the most efficient way we can another
attribute of performance that we work on
is launch time
nobody wants to buy an app and sit there
and wait 20 seconds for it to launch I
mean you know they put up with it but
it's not one of the things that they
that they're happy about and so if we
can make launch times faster we're gonna
do it for you
and of course faster running time you
know you especially from the VM
perspective the fewer cycles we spend
thinking about what you're supposed to
be doing means more CPU cycles for you
to actually do it so better language
integration J&I is the standard there
used to be others but j'ni is now the
standard if you've programmed the j'ni
it can be a little cumbersome right
because you can't really get to the tune
to an array you know in a rate you got a
copy the the array contents over muck
with them and copy them back and that's
you know it's just cumbersome you get
these J object references and stuff like
that but the value of that is that it
allows that precise collection to go on
within hotspot since you never see a
pointer to a real object we can move it
around we don't have to examine all of
memory and try to figure out whether or
not that bit pattern really represents a
pointer to one of our objects or is just
happens to be some you know the your
current net value of your portfolio
sitting in money dance so that's the
benefit you get for J and I so what we
do we do two things to
extend the ability to program to j'ni we
provide J direct we use that internally
for the super swing and AWT somewhat we
also Qt Java QT j uses that and that
lets you sit in your java code and and
and get to the c routines we've talked
about that in the past I'm not going to
go into too much more detail I have a
code slide a little bit later but but in
using it what happens is you just kind
of write your little wrapper class for
the for the C functions you're going to
be using and then you you have one piece
of code that you do you say you know you
asked J direct to build you a library
and so it generates it writes the j and
i stub codes links them in and then you
just start using your code so when your
static initializer you just say you know
build me a load me in and j direct does
the rest that's pretty sophisticated we
also have a job of bridge which
implements the technology that lets
cocoa java happen there's a standalone
tool a bridgit tool that is used to it
starts with a mapping file and says this
close this Java class maps to the
objective-c class underneath it and the
benefit of that is that for the most
part those Objective C frameworks can
now be subclassed in Java because the
cocoa frameworks are used with setters
and getters so whenever they do a setter
it comes across the Java side and and
and and does things and when it
implements methods we transform the
method names and actually dispatch on
the Java side and vice versa so when you
do super in an job it actually gets
translated by the bridge and gets
dispatched into Objective C below so an
example I don't know how well you all
can see this no I'm not too bad
this is the J direct 3 example I just
pulled this pretty much straight off the
web at at developer.apple.com slash java
and as you see the first line of code
public static linkage
needs to be done typically in a static
initializer some reason why didn't that
show up here hmm anyway the the new
linker part is the part that you should
do in your static initializer and it
tells it to go fabricate something for
prom for the for the class Prime it's a
reference to itself right and so J
direct goes and finds through reflection
you know what native methods what static
native methods are in there what their
names are and what the types of their
parameters are then it goes and looks up
in the runtime and says hey is there a
in this case is there a compute prime
function around if you haven't loaded
the library it'll actually look for that
magic string and actually load that
library for you in case your stuffs out
there and so from that that's it from
that point on you can now do prime
compute prime send it a short and it
will return a long long and you're in
business you're writing in Java and
you're using this this C based library
underneath you the counterpart for cocoa
cocoa is a pretty rich framework Steven
are off says he's been working like obvi
with Steve Jobs for 15 years my tenure
isn't quite that long it's only about 11
but I had something to do with some of
the cocoa api's in a role previously the
one I have now and I wanted to pull up
just a little bit of something I did a
long time ago I can get to it from Java
there's a date formatter there the date
formatter takes a string and turns it
into a date and more than that it can
take a date and turn it into a formatted
string so this is an example that does
does that so the key element here is the
let's see where to go the next Tuesday
at dinner that's a pretty simple little
English string it was a weekend's worth
of hacking and it's kind of fun and but
that actually turns into a real date so
you can actually get to Coco from your
Java and make use of it without having
to wade through Objective C without
having to wade through J and I and I
invite you to take a look
the cocoa examples that are shipped
under developer examples Java app kit
there's actually two or three programs
completely written in Java there's a
game called blasts app there's the
sketch program which is a simple Mac pen
or Mac draw kind of a kind of a game or
a program and there's a text editor in
there so go play with with cocoa it's
kind of fun let me talk now about better
performance I said we tried to innovate
in two areas one was better language
integration the next one is better
performance better performance we all
want to write question is of course how
I mean it's not like you just walk up to
your program unless you have optimize it
and say how do I make it faster and it's
obvious it's optimized it actually is in
our case we had to scratch our heads a
little bit right we said hmm
what are the basic principles of
performance well if you remember if you
think if you've ever done performance
work before you should know that memory
is evil right if you are wasting memory
you are going to spend more time taking
away from a system that might not have
it you might have to bring it in from
disk you might have to I mean just
memory is evil if you can use less
memory to get your job done your system
is going to run faster cycles are moving
or the rate of increase of CPU cycles to
memory bandwidth is just continuing to
the the disparity just keeps going keeps
getting larger and larger and to
ameliorate that we keep putting more and
more caches onto the chip because memory
has to be really close to the CPU so
just just think memory memory is evil
remember that the next thing is that of
course you should steal good ideas
I mean why invent totally new stuff if
there's already some good ideas out
there already so if we think about
memory and we think about good ideas
where do we come to when you talk about
see technology a long time ago they put
shared libraries into this
shared libraries are a mechanism for C
libraries for for programs to share
instructions right so you know we keep
thinking the the C libraries what do
they share well they share their
instructions right that's a dominant
cost that there's a little bit of
utility in that with a dynamic sure that
I were you can swap implementations out
without having somebody relink so
there's a little bit of code portability
in there but but sharing the machine
code the actual assembly instructions is
the dominant savings or shared libraries
another large savings though is the data
that goes along with it and so obviously
how do we what about using or building
some kind of sharing for jar files so if
we look at our initial at an initial
memory configuration for your running
app this is the memory layout for
something that's just getting started
I put in some realistic numbers real
numbers for your java application the
point here is that the Eden space is
actually pretty large to start out with
the new space is where the little
back-and-forth compaines fairly small
the tenure generation remember that's
the one where your objects live in
adulthood and then there's that
permanent generation that that's sort of
you don't know about it but it actually
costs you kind of place right when you
get running that whole space gets
door.the the hot spot keeps those the
ratios of Eden to new and the total of
that to tenured and they keep the ratios
the same put it in a 25 or in a in this
case a 35 megabyte application the
tenured space is where most of your
stuff lives but doggone it that
permanent generation the place where we
keep things like your byte codes and
stuff takes up a fair amount of space
now wait a minute byte codes wait what
about all the byte codes for things like
swing things like Java line string I
mean does your program have a different
version of the byte codes for java.lang
string of course not it's the same byte
codes well why does your program and
memory have a different copy of it no
good reason whatsoever so when we took a
look at what we
it's what we could share we figured out
that it's that space for the byte codes
it's that space for the metadata for
your program comes out of the standard
shipping system libraries so what we did
was so imagine that red space that red
space gets split up into three sections
there's a section that is completely
shareable completely read-only part
there's a section that is mostly shared
it can be touched on but it's mostly
shareable and then there's still the
your class is the byte codes for your
classes that that aren't really
shareable to anybody so this is a review
slide what we did was we added a new
generation we call it the share
generation it has no CPU cost to
maintain because it's there to start out
with it doesn't die because these
objects are immortal so that's pretty
cool if we don't even have to build
these objects and we don't have to even
maintain them that offers us a CPU
savings as well so in addition to
reducing memory we get to reduce the CPU
cycles to get to that to get to this
initial configuration and to maintain it
during the running time of your program
so we talked a bit about the shared
generation it's based on the observation
some objects never change and never die
so those are the objects we want to see
we want to share those are the objects
we maintain on your behalf for the byte
codes for the for the strings and stuff
and your protein in your in your jar
files or in the system jar files at
least what we do is we process those
standard jar files once we take a we
have a list of the of the an ordered
list of the classes that typically get
used in a swing application we load them
into the VM I'm using a special option
which I'm not going to tell you about oh
yeah a key point here is that we don't
execute any byte codes typically when
you load classes into hotspot you
course you know run static initializers
well the static initializers can do
things like look at your command line
arguments they can you know go look at
look at you know go look at disk memory
they can do arbitrary code right and so
that would change the you know change
the state of the program so the idea
here is that we want to just preserve
the jar file we just want to have an
in-memory version of the jar file the
useful the running the useful part of
the jar file is the part we want to save
and share and so we don't execute any
byte codes and then we use that fabulous
garbage collector technology there's a
little part of it that just says iterate
every object in this generation and do
something to it yeah something like a
closure only it's written in C++ anyway
we reapply that garbage collection
technology to pack all the objects that
ever got created and pack them into
these two spaces the shared read-only
space in the shared rewrite space and
then of course we write that space to
disk and the next time you start a pot
spot you just map that into memory do a
little bit of fix up and you're running
right piece of cake simple this is
called pickling Swizzle I know nuts
whistling it's not pickling map and go I
can't remember hole three there's oh I
don't know there's a term for that map
and go maybe that's the right term the
shared generation benefits I think I hit
on some of those already there are
virtually no CPU cycles used for the
shared generation that's what the
asterisk is about the read-only part
that's true the readwrite part we do
spend some cycles and actually a few
more than we need to but it's almost
totally free we rarely read the standard
jars the classes that jar UI gr we don't
even read them to get you started
that saves cycles to process them it
saves you memory to you know read map
the index to the jar file and to you
know read part to map it in and to
wander through it and copy the stuff out
to make our versions of it and stuff
like that and obviously the disk i/o to
get those things off the disk so if you
never have to read them in
that they're not sitting in your disk
cache so that helps with the rest of
your systems performance as well so one
of the benefits from that is that a hot
start you know the the the the second
start of any Java program is always
faster because we save all those cycles
to begin with the a secondary benefit of
this technology is that we can be
smarter about how we lay those runtime
data objects out in memory so for
example there's linkage strings that
that keep your class that roughly know
whenever you reference another object
there's a little linkage string that
goes in and says you know Java laying
the dough or or your your your reference
to your classes that's actually laid
down in the metadata and those strings
are rarely used but if your byte codes
are sandwiched between those rarely used
strings what we do by pulling the
strings out and putting them in their
own space that's hardly ever used and
keeping your byte codes hotter then we
never even pull those pages in off of
disk that reference the data that you
never use and so you're working set
actually gets smaller because we've done
packing to put the hot data into the
memory pages that you actually pull in
off a disk so those disk i/os pack more
punch because they bring in more useable
data due to this packing benefit this
sharing benefit was the one I started
out with it's the last one I want to
talk about Steve showed you how all
together the combined benefits were 20
megabytes for two applications you know
the benefits for three and four and five
are you know are the same so sharing
saves we've measured three to six
megabytes alone the other processing
adds up to some of that other data other
reductions in the working set add up to
some of those other benefits so if
you're writing a swing app and most of
you are you're going to be saving and
getting that for free using our shared
generation technology there are just a
few caveats we don't yet know how to
share
your application jars well your
application jars are actually aren't all
that often shared but getting that
runtime launch time benefit would be
pretty cool so we're gonna at least try
to figure out how to map and go your
stuff so that your stuff launches faster
the first start of a job application is
actually a little slower and that is
because we actually have to do some
processing for all those swing classes
we have to do some processing all up
front that we typically meter out as you
load them on demand and we're working on
ways to to not have to do that
the interpreter those byte codes that
you execute have to be slightly slower
but since in hotspot you spend 90
percent of your time and compiled code
slowing down the 10% you spend and
interpreted by one or two percent isn't
a big deal but I just want to be
truthful a caveat here is that what we
share are the classes on your boot class
path now for some programs that alter
you boot class path hotspot takes a look
at that and says uh-uh we don't know
what they're doing so in in Jade
builders case for example what they've
done is they've provided their own
implementation of certain AWT classes so
that they can use them in their great
designer the designer is a great tool so
if you're using J builder for designing
swing applications here's a tip you can
get sharing for J builder by configuring
jbuilder v which just got pre pre now
since in your bags by adding a line
called add skip path dot slash la to VTR
that la WT dot jar is their jar file for
giving you giving them a better AWT and
that's a configure that that line goes
in a file I called it JSA you can call
it anything you want the magic is dot
config in the open tools area of
jbuilder v so directions for for our
sharing work first of all we want to we
know how to and we can improve the hot
start launch time even further we can
and know how to eliminate for the most
part we know how to eliminate that first
start penalty we know how to extend or
we want to extend this fast start
launching behavior to all the files or
at least the ones we're told to all the
jar files that exist in that extensions
directory we were really pleased with
that second order benefit of packing
data so what we would like to do is
rather than gather all the data for a
class enjambment we want to make the
observation that some methods are never
used in a class so the byte codes for
some methods shouldn't be on some of
those pages that get brought in so we
want to start packing based on the
methods that are used and not just based
on the classes that are used this may
well double the benefit of our sharing
already by reducing your working set by
even more the we of course have to
finish the GC work on the readwrite
shared generation and we of course could
figure out how to share more runtime
data structures that that live in that
permanent generation so the there's a
few things we're not trying to do right
now the non directions it's important
when you're setting out to build
something to know what your goals are
and if you can to identify the goals
that you're not going to try to worry
about so the biggest one for us is we're
not going to share the machine code that
gets compiled for your byte codes I mean
that is the first thing that other
shared libraries that the traditional C
libraries share but in hotspots cases
you got to remember what hotspots about
hotspot is about compiling the methods
that you're actually using not only
combining them but in lining methods
that they use so you get one long pilot
that is really hot because it has
everything it needs to get its job done
so though that highly tight code is
really good for you so when we've
measured how much code do we compile it
never has exceeded two megabytes when
you're running with hotspot applications
like jbuilder and stuff we never end up
compiling more than about two megabytes
of code that code is not worth sharing
that code is the stuff that's hot that
is for your runtime because every time
you run an app of course you get
different hot spots right you shift into
this area and it needs to do that and
then you shift into another app it's all
based on we're on your on your on your
on the work program so the idea with
hotspot is it's gonna optimize what your
program is doing right now and so if we
tried to share that we wouldn't do as
good of a job so we're not gonna share
the compiled machine code that we've we
built on your behalf so the other reason
is it's kind of hard right because if
you do try to share that then it has to
have relocation data in it and so rather
than folding in a branch to a direct
address we have to fold in an indirect
ER it just gets messy it's not very good
there's only one place where sharing
compiled bite codes might make a
difference and that might be for say the
static initializers or the code that you
actually run to get up and running if
the interpreter if that's a dominant
cost to getting a program up and running
it might be better to have a pre compile
but not so good but compiled compiled
not as because hotspot would do normally
but compiled better than the interpreter
might be better to have compiled code
start out but that that's sort of
precompiled stuff and I wouldn't even
characterize it in the same way so we
might look at that
another thing we just decided at the
outset was we are not going to try to
share
have any kind of shared buffer shared
read/write buffer of loaded class
information a shared read/write buffer
of compiled code information shared
read/write buffer of anything because
you know what happens when you have a
shared read/write buffer of something
some other app can make you crash we do
not want that to happen so that's just
not a design point we're going to
provide the status of the shared
generation this code that I'm talking
about is in Mac OS 10 we shipped it on
March 24th you're getting it already if
you're using Java the Merlin
we've asked we talked to Sun we talked
to Sun about a year ago and said you
guys really ought to do something about
sharing because that's what scalability
means for the client and so they said
well the way you do this as you file a
little oh I can't remember the tread not
jsr you file a little you put a feature
request into Merlin and through the open
community process and stuff so we
sponsored one of those guys and it's a
heater request in Merlin which is their
code name for jdk 1.4 and more than that
we we talked with these with these folks
the vm teams know each other and we
talked with them and said you guys
thought about doing this and what about
that and stuff like that but anyway so
we've worked with them testing are
designed out with them and as we develop
this thing and we've provided this code
we've provided it back to sun so that
they can use it for their implementation
of this little feature request so
current status well you have talked to
Larry about the current status of that
so we we've had very positive
interactions with Sun on this work this
comes about in two ways with web objects
for example with web object stress
testing they ran into some bugs and we
kind of chased them down we go mm-hmm
this is bug in what we called portable
code so we call up our friends across
the street and say did you know about
this I go mmm no we didn't so we're
actually feeding bug fixes through
across you know through the
through the indirect channels and making
you know hotspot ash ship by Sun better
for everybody and of course they've
given us some feedback on approaches to
take when we run into some problems and
stuff so the feedback goes both ways I'd
like to spend the lot not the last
section but the last section for QA the
last section of this talk I'd like to
talk about what's in Developer Preview
one which you're going to be getting
either today tomorrow the next day
before Friday the the JVM in Java dp1
it's basically got about two fixes since
we shipped it in Mac OS 10 I just talked
about him actually the web object stress
testing gave us showed us two things
once they started kicking off and so
we've upped our mean time to failure
to at least days I'm not sure we don't
know a failure right now but it was
running in terms of hours and about you
know after about 24 hour now 48 hours of
continuous hammering a bug would show up
and that's the bug I alluded to that we
figured out with was sun's help the the
other thing though that was not quite
right in Mac OS 10 GM was that debugging
was really slow painfully slow and
profiling didn't work so that's kind of
bad so in DP 1 what we've done is we fix
both problems we fixed profiling and
we've picked fixed the speed of
debugging and the way we did that was we
took hotspot 2.0 from the 131 technology
train and packaged it as an extra VM
sitting somewhere in that little
implementation space I told you about so
there's actually two hot spots in DP 1
the one that's configured for normal use
and the one that is secretly utilized
whenever you do debugging or profiling
so now why would we do that I mean
where'd we get that VM from well
obviously we're working on 131 right so
we wanted to get 131 out to you
in some ways especially for debugging
and profiling because we think that's
really important the benefits from
hotspot 2.0 is again it's the client
compiler technology from Sun they also
have a server compiler the debugging as
I said now works fast profiling works
perhaps it didn't work at all
1:31 is actually a hotspot 2.0 is
actually you know a next generation of
the next generation stuff and they have
a register allocator technology in there
that we can use right away and we do use
that so we get better register
allocation when we're compiling and it's
it for shadows or for shadows the
compiler that they're working on 41.4
which does even better code gen so we're
prepping ourselves for getting on board
with the one for work but we didn't stop
there I mean this is Apple right I want
you guys to come to expect more from us
than just what you can read on the web
page is at Sun so what we've done since
then or since we ship GM or Mac OS 10 GM
was we put some smarts in to recognize
when you're on a g4 now what could you
do differently on a g4 well a g4 comes
with this thing called a velocity engine
now what's a velocity engine right
you're supposed to do graphics with that
right well it's a special processing
unit for doing highly fast pipeline
graphics operations to do graphic or
pipeline graphics operations in a high
speed way you've got to read memory like
mad off the bus well if you can read
memory like mad off the bus you can use
it for simple things like copying memory
can't you so we put a copy memory
implementation in there that went on g4s
uses the altivec and it is dramatically
faster than just this any kind of C loop
or assembly loop you can write and
PowerPC yourself so we have that and
it's in it's in our in our in our 131
version of hotspot that's about 200
we put in an optimized instance of I
mean this is just an example of lots of
little
we do for you that you guys will never
hear about all you're ever gonna see it
is it improves your run time but when
you do instance of you typically think
well how would you do it well if it's
not this class I got a look at the
parent class go look at the parent class
go well we put a table in there such
that it's always it's a constant speed
operation instance of works so it's fast
we put in even better register
allocation than what came from 131 131
still doesn't deal with floating-point
registers very well so now we have a
better floating point register
allocation method for when you're doing
those graphics operations sharing is not
in this little package for profiling and
debugging use only VM we know actually
how to make startup times even faster
but since that's part of sharing it's
also not in that thing yet and 131 also
has a technology known as per thread
allocation pools so remember that that
that very fastest Edin technology that I
talked about where you you bump the
pointer and yeah but that but that
reassigned to the memory was that
compare-and-swap instruction well with a
per thread allocation pool you don't
need the compare and swap even so it
really is just about three instructions
to allocate memory instead of a stall
the processor and check with the other
CPUs maybe next to you and make sure
they're not using this memory kind of
instruction so it actually is gonna be
really really fast so I put this up here
because I mean we're starting this this
kind of beta train thing with with DP
one I want you guys to play with it so
if you want to use it for casual use use
it via the command line like this you
say Java dash hs1 underbar three
underbar one HS one three one will run
this this new version of hotspot on a
program you throw at it
if you really like it try using it all
the time there's a link sim like I'll
let you go explore
but there's a symlink under java vm
framework that points to the version of
hotspot that actually gets used it's the
Lib JVM die Lib symlink slam it to point
to that thing you'll find an HS underbar
131 die live somewhere if you slam the
JVM done sim link to point to it you'll
get a hotspot 131 all the time
tell us about it so when it was when is
this thing going to be available I wish
I knew no it's it's going to be coming
it's real soon now so to get to it you
sign up at developer.apple.com
you you go to the connect dot after you
sign up as a developer you're all the
developers you're here right okay you go
to connect that Apple comm and you
download it when you download it what
does it do it preserves your existing
1.3 implementation 1.3 is a
sub-directory under Java frameworks so
it pushes out aside in case you don't
like what you got yet preserves what we
find under Java home that we think
you've augmented specifically the stuff
that's in Lib including your extensions
you know any other third part you know
even QuickTime is in there right stuff
we ship gets packaged up in extension so
we preserve everything that's in
extensions because we actually put some
other stuff in there and we preserve
everything we find in Java Lib home bin
so that's the main motivation for that
first set of slides that tell you the
stuff that we consider our
implementation and the stuff we think
you should extend because we do need to
upgrade you want us to upgrade and we we
gotta we've got to agree on some rules
as the stuff we need we can upgrade and
the stuff that we shouldn't upgrade so
there is a mailing list Java dev that
you can get to go to as I said that page
there I pulled the j'ni example or yeah
the J direct example off
developer.apple.com slash java there's a
section on there that talks about the
java dev mailing list and sign up
members of our the extended Java team
read that respond to it we found it very
useful and we appreciate your comments
from that so a quick road map the first
one wrapping MacOS api's and beans if
you went to Steve narrows one you saw
Steve Llewellyn Steve one is I'm proud
to say works for me and does I've
empowered him to go do great stuff
making more Java happen at Apple and so
he came up with some great api's the
stuff you saw there our API is there
beans you can use them inside jbuilder
to add that kind of technology to your
apps so find out all about it by going
to the section 502 that's today at five
o'clock java development tools steve now
i've talked about that's tomorrow at
10:30 java performance performance is
critical to us so we have a whole
session on how how you can add
performance to your programs how you can
discover it things to avoid things to do
part of the java development tools to
talk is the optimizer demonstration and
jbuilder debugging and PBS or a project
builder debugging and if if that's not
enough if you really I put the J builder
reference up here as well because J
builder is just an awesome tool for
building pure job applications ah that's
about it ah how about that there is the
feedback forum as well on Friday at
10:30 that should've been on the first
one so please come tell us what you like
what you don't like and give us
suggestions as to what you'd like to see
even better Allen Samuel is the contact
he was the guy that introduced Steve
Nara find him as Blucher one at Apple
calm
you