WWDC2004 Session 428
Transcript
Kind: captions
Language: en
hello welcome to maximizing java virtual
machine performance this session will be
given by three of us Victor myself
Victor Hernandez and Roger Hoover from
the Java Virtual Machine team and also
Christy warned who's a responsible
person for responsiveness in tigard so
we'll be talking about is the hot spot
virtual machine in Mac OS 10 that is
used to actually execute your Java
applications this hot squad java virtual
machine comes from sun and we take that
source and we tailored for mac OS 10 and
optimize it specifically for powerpc
currently we are supporting j 2 SC 1 for
2 and has announced yesterday we will be
supporting j2se 5.0 everywhere
throughout this talk will be referring
to as Java 150 which is what we all know
it as but the official name is j2se 5.0
virtual machine on Mac OS 10 you get a
variety of features you get a client
just-in-time compiler a variety of
garbage collection algorithms an
implementation of class data sharing
which is innervated by Apple starting in
Java 13 you get native g5 support also a
jdk whose classes are optimized
specifically for mac OS 10 and finally
you get the debugger and profiler
interfaces JVM di and p I which you can
which development tools can use to
analyze your application Oh what's going
on this thing is a little confused art
sorry I wasn't looking at the slides and
now with 15 there are a whole bunch of
new features specifically you will be
getting new language features in the
Java language which will make your
development time a lot more basically
it'll simplify a lot of development
there's also a new client compiler
feature called a safe point polling
startup time should also be a bit faster
and there will be explicit concurrency
exposed in the Java API s also we're
really excited about this one the class
data sharing implementation from Apple
has been adopted by son and will be
available on all their platforms in Java
15 they've not only taken our source but
they've also optimizes themselves and so
what you will be seeing will be
improvements to our initial sharing
implementation finally there will be a
new tools interface which will be
replacing the debugger profiler
interface that has been deprecated so
Java 15 is available for you today it is
equivalent to the beta 2 version that is
presently being previewed by fun it
installs on your tiger preview CDs are
you a DVD that you got at the conference
this week and you can go ahead and
download it from connectable com so
today's talk will be divided into three
parts first Roger Hoover will be
discussing what is new in Java 15 the
second part I will be discussing how the
hotspot virtual machine optimizes your
application and then finally Christy
Warren will be introducing a very
exciting new Mac os10 application for
profiling your java applications so
here's Roger hello on I'm going to give
you a brief overview of the noon pretty
exciting things that are in I went out
five I was over at javaone a little bit
earlier and they've got hold talks for
each of my slides so this is going to be
very high-level very quick but I'm doing
this to get you interested in new things
in one dot five if you haven't seen them
before and also to point you to where to
find more information about those pieces
that you're interested in so I biggest
changes pretty much in the history of
Java there are lots of language changes
and i'll go into those in the coming
slides there are also a bunch of library
and rent time changes which are also
pretty interesting note the blue bubble
with the jsr numbers in it this is the
java community process has these java
specification request numbers that core
bond with the specifications for this
new stuff and if there's something you
want more information and remember the
number in the blue bubble and you'll be
able to look it up with a URL that I'll
have at the end okay why change language
well there are some great things in here
that I think are going to give a lot of
improvements in productivity most of
these changes are handled by Java Sea
but there are a few things that touch
the vm there's a flag source 1 dot 5 for
Java see that turns these things on it
was not the default in earlier betas but
with beta to the stuff that we have that
we're giving you it is the default
there's a new keyword called a pneum if
you use the new ms it ended up
identifier in the past in your old code
you're going to have to say data source
1 dot 4 to turn off these things in
order to compile your old code ok let's
look at some of these great features the
first one is auto boxing and auto
unboxing well what's boxing I consider a
primitive type like int or boolean to be
a nun box type and capital integer and
capital boolean to be a box type namely
the primitive object is inside an object
box so with one dot for if you use the
box types and the unbox types together
you had to do lots of conversions in
this example here I show you know you
are all the time creating the new
containers for things with one dot five
the compiler does this for you at type
checks it codes much more readable
simple this is excellent for anybody
dealing with these things generics
people talked about doing generic types
for a while in Java it's finally here
before you had the dilemma in Java of
either writing a very general type using
object and then having to do all these
casts in and out and worrying about
doing the type checking or having
runtime cast exceptions when you did it
wrong or you can make something very
specific and not be able to reuse them
now you can write the code with
parameters for the types that are
embedded in the
the data type and get the reusability
and get the safe type checking from the
compiler which then generates the caf
this is for object types only you can't
use primitive types I as as the
parameters here's an example of pair
that takes an arbitrary left and right
type capital L capital R a constructor
for that and public access or functions
for doing for looking inside and then at
the bottom two lines the next to last
line has a place where we've created
we're creating a new pair of capital
integer string and note that the 17 in
that line there is going to be Auto
boxed for you so these features interact
and also a teaser at the last line there
we can now do see style printing and
I'll get to that in a second but there's
where we're calling the access or
functions to pull the values out those
of you who know C++ here's equivalent
code that does exactly the same thing as
C++ the major difference is that C++
compilers typically instantiate the
template for every I instance and thus
you can use primitive types and you
can't do that a Java so here we're using
int and care star okay another feature
static import why static import well is
to two reasons why you'd want this one
reason is that it eliminates a binary
compatibility issue with importing an
entire interface you're just pulling out
the static methods and fields of another
class and so it's simpler for the
compiler but probably the main reason
that you want to use this is it
simplifies the naming because you can
actually import those names into the
current namespace so in one dot for if
you were going to use this my math class
that has this constant PI and this
method times you have to be saying my
math dot this and my math thought that
all the time but in one dot five if you
do an import static you can use PI and
times without qualification so simpler
code compiler does all the work to make
it right
for-loop has been enhanced instead of by
having to specify the induction variable
in the loop if you're using a raise or a
new java.lang iterable type you can
simply say for type variable colon well
here's an example of a string
concatenation in the old method here's
what you can right now you need simply
name the variable that represents each
iteration of the array that piece of
data you're interested in and you can
use it in the loop again simpler code
compiler does the work for you I showed
you this F print at the printf example
before this is enabled by variable arity
methods you can do this I here's an
example you can on say type name dot dot
in a method definition and the compiler
automatically converts that into an
array so you simply use it as an array
this function here just I concatenates
or no you choose as a maximum of a bunch
of strings that you give it but you
could pass any number of strings to it
enumerations and this is similar to the
new type in c you specify a bunch of
constants that get instantiated by the
compiler I and you can use it as such by
saying aye in this case my color yellow
picks out it an individual one but this
also interacts with static import and if
you import this stuff you can actually
talk about yellow and red but there's a
lot more to enumerations than simply a
list of constant you can actually have
methods inside enumerations here's an
example of where I've taken the color
and defined another enumeration fruit
which has a method that does a switch on
the type of fruit and returns the color
now note that we were able to say red
and yellow because we did a static
import the third one Oran
it's actually both a fruit and a color
and if I had just said orange there
instead of my color orange the compiler
would have complained that was ambiguous
so we can go on and and in another file
import both color and fruit and do
computations on them here I'm doing
apple dot my color that will return the
color of an apple and because the
compiler keeps these things is unique
instances you can do equal equal on
enumeration types and it gives you
equality ok another big thing is
metadata I this is currently not hooked
in with the metadata stuff you heard
about yesterday's spotlight although
we'd love to do that at some point in
the future but this is metadata inside
the Java program that allows you to add
additional information into your program
and there are three parts that in order
to make this work there are declaration
there are annotations which well
declarations safe what you're going to
keep track of annotations say are where
you use that and you instantiate that in
inside the program and then runtime
access you can write programs that
actually look at this I metadata via
reflection and the good thing about this
is that it eliminates the encoding of
data into the flag classes like jax-rpc
stuff did just to indicate that these
are special functions you can do this
more cleanly with metadata it doesn't
have the compiler implications of being
dependent upon other classes and this
will be used for programming
documentation tools I suspect we'll see
a lot of neat stuff that uses metadata
so how does it work a metadata
declaration is similar to an interface
declaration except you say at interface
instead of interface it has a bunch of
members value it's going to be special
and I'll talk about that in a minute and
you can give default values so here's an
example we've got a bunch of bugs in our
code and so we define metadata to called
fix me that has a value that's the
problem that needs to be fixed and a
reward what the programmer gets for
fixing this and the default is going to
be a cookie which comes from the
enumeration reward then there I have a
second one just called the bug since we
want to be using this one we're
debugging that has no members and that's
special I'll talk about that in the next
slide okay so once we have the
declaration then we need an annotation
in order to place this and we can place
the annotations before any declaration
in our Java program there's a special
way with a file that we can put a
package annotation because javed really
doesn't have a declaration of a package
that's explicit in code and we can also
put these in front of a pneum constants
so what do they look like well here's
several in a piece of code I've got this
class called perpetual motion and you'll
note that the thing is preceded by a
fixed me annotation that says there's no
such thing and you get a holiday if you
fix it well I'm inside we have a method
psalm that also has an fixed me
annotation and this time I just have a
string why does some subtract well if
you don't specify a member named it
assumes value that's what's special
about value and I didn't have to say
what the reward was because there's a
default on that it'll use cookie is the
default reward and finally notice the
debug annotation in the last line and
since debug has no members I don't have
to do the open close paren so I can just
simply again cleanly put those where I
need it okay so how do I use these
things liking right tools that actually
look at the source to use these because
the it just gets compiled out but I can
also look at them at runtime via
reflection and there's this another
special annotation called at retention
that is used to tell the compiler to
retain that and put it in a class while
so reflection can find it so in my
previous definition or declaration of
I fix me if I said at retention blah
blah blah it would remember this stuff
for metaphor runtime access and in this
example I'm using reflection to get at
the method of the corresponds with some
and I do a dot get annotations that gets
me back an array of these annotations
and then i use the enhanced for loop
just to print them out on the screen so
this is how you'd right tools to use the
metadata information great things for
people who write concurrent programs
with multiple threads on there are some
new classes I I'm going to kind of look
at these inside out the java.util
concurrent atomic does single access
atomic on individual variables that's
exposed in this API and on top of that
there's locks that are built on those
and then and that those are completely
independent of Java internal
synchronization and then there's
java.util current it has a bunch of
classes that are pretty useful that's
built on all of this and this was dug
done book to jsr by Doug lien company
some of you know of here's a brief
overview of some of the things you can
do the in threads there's an executor
interface that gives you fairly
convenient thread pools without doing
lots of work there are lots of different
kinds of queues there's nano time for
nano second clock time within a given
JVM I which performance people are going
to love lots of synchronization
primitives that more match the
literature than what's in Java so you
can implement published algorithms
easier and concurrent access to various
kinds of collections this is great stuff
also in terms of multi-threaded
programming a new Java memory model it
says what what to expect with multiple
threaded threaded program accessing
shared storage there's there's a
specification in the thread original
thread specification for Java it was
widely ignored because it says you can't
do things that were widely done by
optimizing compilers as well as
processes like the PowerPC
reorder instructions what this does is
it presents a realistic model of what
can happen VP for other things it also
guarantees that when you do build a new
object that the final fields are set
beside the constructor so you don't see
an intermediate State they're
practically speaking what does this mean
well we're going to make the apple 15
JVM obey the Java memory model so you'll
be able to count on it and in particular
you really have to use synchronization
anytime you're mucking with shared
storage either the job at synchronous a
synchronized statement or Java util
concurrent also if you're doing a
multi-threaded thing where one thread
sets up a bunch of data and then a bunch
of other threads take off and start
working on it when everything's done
that variable that says that things are
done has to be volatile where the right
of that volatile by one thread releases
all of the things that were done before
it and the read of that volatile and the
other threads acquires all of that stuff
this is the standard way you should use
thread communication via shared
variables they've got to be volatile
otherwise don't be surprised if things
happen out of order in particular things
that you've test and debugged on a
multiprocessor g4 are most likely going
to fail at some point when you run them
on a g5 if you haven't followed the
rules there's a new tool interface that
replaces a jvm p I and di which are now
deprecated and going away presumably in
the next version of Java this has a slew
of functionality that lets you implement
these tools plus hopefully a whole lot
more basically agents plug into the JVM
user C++ programs or at least a piece of
your program they get that saying which
callbacks they want to get and then are
notified and there's also a whole bunch
of functions that they can query the vm
so it'll be exciting to see what comes
of that in the coming months and years
on there's a new monitoring and
management interface basically
designed so you can do things like load
balancing in server environments you can
look at the memory usage in classes and
thread information inside the JVM you
can look at number of processors cpu
utilization in the OS things like that
and finally there's lots more things
that i don't have time to talk about
here's a list of some of some more
things that you may be interested in and
note the last two line for urls the next
to last line fill in the jsr number and
you'll find out where to get the specs
for that particular jsr and son site the
last line also has some great
information Thank You Roger so I will be
going over some implementation details
of the hotspot virtual machine that
should give you a better idea of how
your application can be better optimized
when running specifically on Mac OS 10
so what do you get with the hotspot Java
Virtual Machine you get a client Justin
kind of compiler a variety of garbage
collection algorithms automatic g5
optimization and the sharing of class
data between JVM instances when you have
multiple Java applications running on
the same machine you also get the tune
JRE implementation that we have done for
Mac OS 10 so let's talk a little about
the client compiler the client compiler
dynamically compiles your applications
hot methods after you've called a
particular method a certain amount of
times we go ahead and stop executing in
the interpreter and run run it continue
execution in a jit compiled version of
that method we have personally optimized
the client compiler from Sun for powerpc
that means that we've come up with the
optimal code sequences for each java
bytecode and we've also figured out how
to make best use of the full power pc
register set the clan compiler has a
bunch of object-oriented optimizations
that it does on your methods
in particular for example there's
instance of an check cashed instance of
an check cashed require knowledge of the
full class hierarchy so because their
implementation require knowing who is a
subclass of who internal the hot spot we
keep track of all that information
making the implementation of instances
in check cast as optimal as possible
another object oriented optimization is
in location cash for virtual methods it
turns out that most times that you call
a virtual method even though there might
be multiple implementations of that
virtual method loaded you're probably
still ending up the same target so we
cash the most recent target and try that
one first if it works or does if it
doesn't we have a way of rolling that
back but that's a pretty good thing the
other thing that we're also taking
advantage of is the ability to inline
your java message this has actually been
one of the biggest areas of performance
improvements that we've been able to do
so i'm going to go into in more detail
so what exactly is in lining it's pretty
straightforward in the example i have up
on the on the slides you got average a
call some there's extra overhead of
actually calling the function some so
ideally you would want to have a
situation where average the body of some
is just in line right into the body of
average you don't want to have to do
that in your own code because it it the
code is not as expensive so we do it for
you dynamically in what situations can
we do it well there's very there's a few
opportunities that are very
straightforward to be able to do this
one our field accessor methods there's
no reason for you to access directly
field via their names you can use
methods to do that and also constructors
in your java classes we're also able to
in line a bunch of intrinsic intrinsic
methods are methods that we don't even
need to look at the implementation of
the byte codes we know what the behavior
of that method is and we go ahead and
executed in the optimal PowerPC code
sequence for those we know how to do a
ray copy so this basically applies to
jdk classes and we keep on adding
methods as they become either possible
to be it to come up with an optimal code
sequence or they're used more heavily in
the jdk and there's a examples of a
bunch of in light of intrinsic methods
well the this doesn't include a huge set
of methods that are used in your
applications and that's virtual methods
as I said before you there are is
possible that the target of a of an
invoker chiral byte code is actually not
always the exact same method depends on
what those virtual methods have actually
been loaded but it turns out that we
actually are able to inline those if we
know that there has only been one
implementation of that virtual method
that has been loaded in which case that
method can be considered monomorphic it
turns out that in most most usage
patterns this is actually the case so
we're actually hitting a large
percentage of invoke virtuals with this
optimization there are however
limitations to this since we're able to
in line so many methods the limiting
factor no longer becomes finding
opportunities to in line but actually
the size of the compiled method so if
you're hot method happens to be really
large it might not be able to inline
methods that it calls or might not be
able to be in line by its callers you
also need to be aware of the fact that
we are unable to inline methods that are
synchronized and also methods that are
is exception handlers there's a
limitation in the client compiler and I
want to leave you with the tip that in
previous versions of Java it was
necessary or it was useful to use the
final keyword to basically to make a
virtual method in line a ball and now
that's not needed at all and you should
basically be using final only
words object-oriented purpose okay safe
point polling there's a new feature in
the client compiler in hotspot 15 what
is a safe point a safe point is the
state that a javathread needs to reach
for exact garbage collection
specifically the location of all java
objects needs to be known at that
location currently compiled message are
reaches a point in Java 142 as follows
basically you have a Java thread
executing through compiled code and the
virtual machine has to start suspend
that thread make a copy of the code and
an insert trap at the pre-designated
location and then continue executing at
the previous location in the copy until
you actually hit one of the traps at
which point the virtual machine can take
over and it's known as I the save point
the the fact that we're suspending the
threads makes us possibly a dangerous
thing and it also requires a lot of
extra overhead and pine compiler to keep
track of the locations to to insert trap
set this is now greatly simplified in 15
what we're doing in 15 is basically you
have your compiled method that it's
currently executing through all the
instructions and every once in a while
there is an access to a safe point page
in memory that basically is the no op
it's not reading it or writing anything
that's actually useful but at some point
the virtual machine decides to memory
protect that page and so the next time
you come around to that access it
actually hits a trap and the virtual
machine is able to take over this is
this will result in much more optimal
basically the overhead taken for your
equity for your compiled methods to be
able to get to in a state for garbage
collection that will be greatly improved
in Java 15 because of this okay let's
talk a little about garbage collection
in hot spot right now we're currently
supporting three different garbage
lection algorithms they're each designed
to meet various the needs of different
kinds of applications there is no longer
this notion of one garbage collector to
meet all needs so the original garbage
collector the sphere of garbage
collector that's that you're familiar
with from hot spot since Java 12 or you
know before then is still there and it
still is the default collector but since
Java 14 there have been two garbage
productions algorithms introduced first
there's a concurrent mark and sweep
algorithm which has been designed to
have a higher throughput for larger Java
heat and there's also the parallel
scavenge algorithm which is designed to
have a shorter pause time what I
recommend to you is to run your
application with all three and each and
also change the Java heat parameters to
see where you can fine-tune your
application there is not one garbage
collection algorithm that will apply to
everybody there are a few small changes
to garbage collection happening in Java
15 the main thing is that the parallel
scavenge collector will now be the
default in all mac OS x server
installations this is similar two sons
approached at garbage collection
configurations for Java 15 they're also
making it so that all server
installations will automatically give
the parallel GC and we're applying that
as well and the way they detect that is
any dual mat any Duwamish CPU machine
with greater than 2 gigabytes memory
gets classified as a server machine
therefore it should get the peril of
garbage collector this will definitely
be very good for compatibility between
various installations of Java 15 we
don't necessarily want all of a sudden
performance characteristics to be
different just because you didn't happen
to get parallel scavenge on our platform
the other set of differences in Java 15
is the fact that there were making it
more convenient to you to configure the
heap parameters for performance purposes
in each garbage collection algorithm
specifically in parallel scavenge you
can now designate a percentage of time
that you
hope that your application is spent
during GC under the covers that gets
translated to particular heap sizes of
the permanent generation the new the new
size basically in the past you had to
kind of know how these garbage
collection algorithms were implemented
and now that's the goal is to obstruct
that away for you if you've been using
use adaptive size policy in Java 14
thats also has a few convenience that
also has a few convenience flag
specifically you can specify how long
you want a pause to take and also what
person what ratio of the time of your
free will full java application is spent
doing garbage collection finally I want
to talk about how we've optimized Java
for the g5 these optimizations that
we've done on the g5 require absolutely
no code changes on your part and also no
recompilation basically what we've done
is we've taken full advantage of the
double word registers available on a g5
and also all of the double word
instructions this has been done both to
the hotspot interpreter and the compiler
and there should be big gains overall
but especially those people who are
doing arithmetic of Long's doubles and
slow to will see a substantial
improvement there's specific reasons why
I mean we can do cast in line from you
know flow to integer we're also bid
extractions from those values are much
faster and also the square root
instruction is actually available on the
g5 and we actually call directly into
that instead of writing source for that
as you can imagine that's much faster
also synchronization has been improved
by taking advantage of the extra
lightweight synchronization instruction
available on the g5 and so we are taking
full advantage of all of the G all of
the powerpc instructions for
synchronization how have we actually
been able to measure that performance
gain well we've been tracking signmark
two point oh so I'm are two point O is a
very good example of a few scientific
algorithms it does fast Fourier
transform it does Monica
Arlo approximation and you might be
familiar with the composite score
numbers from the Java Sea of the Union
yesterday but there are a few points
that I want to point out that are
different from that but as you can see
these are our scores on a 1.2 5
gigahertz g4 the 98 number is definitely
pretty low and our goal with the g5 has
definitely get the number up we expected
at least twice as fast this was this the
second bar is basically a pre 10 2.5
gigahertz g5 just twice as fast as the
scores are twice as high as the second
as the first numbers well we actually
get on the g5 are something that's
substantially larger and we're pretty
excited about that this makes the
composite score is now very competitive
with scores reported on other pentium
and other platforms and we're very
excited about that but one thing i do
want to point out is for example why
Monte Carlo is still low it turns out
the Monte Carlo is doing unnecessary
synchronization by attaching
synchronized to the front of a method
you remove that and score increases
dramatically it does increase
dramatically on the g4 as well the
reason I want to point that out is a
decline compiler is unable to detect
unnecessary synchronization and it
therefore is evolved into the into the
hands of the java developer to do
analysis on their application to see if
this could be the case in your
application and that's it and i'd like
to invite christie to introduce sharks
for java
hi everyone hope you having a good
afternoon today I'm here to talk about
shark for java and high level
performance analysis how many of you
have heard of shark or use shark wow I
didn't expect quite that miss you you
know about this program but now we're
going to show it for java so i'm going
to go through this part pretty quickly
then so shark is this is the ultimate
profiler you can get on Mac os10 it's a
really neat program in the past has been
great for analyzing our C and C++
Objective C programs it does both what's
called costs and youths analysis I'll go
into that a little more later it can
profile a running process a thread or
even the entire system and for Java you
know it's limited to just a single
process but it can do time samples like
other profilers you but just some things
that we used to be able to do in the old
sampler programs such as allocation
tracing and even exact method tracing so
it can act like cheap ralph and record
every invocation of a method so there's
two other methods are really nice
addition to the usual time profiling you
see on other profilers it also does non
Java profiling as I mentioned for time
memory function even low-level hardware
events if your application is a real
time type thing you can use that in as
usual to study jni calls and you can
download this beta from
developer.apple.com and please get that
because the version of shark on your
tiger CD does not have the Java support
we worked really hard in the last few
weeks to deliver this for you guys for
wwts I hope you enjoy it but you have to
get it off the website the good news is
it runs on both Panther and tiger
yeah so you're going to take this back
to your development system as it is use
it and just rock with it it's awesome so
some key features of shark is it
provides a profile view that gives you a
simultaneous heavy and tree perspective
and i'll show you they'll become more
clear when i show you the demo and we've
also introducing this chart for
sophisticated data mining and filtering
we also provide a chart view though to
visualize the execution of your program
and especially for enterprise
applications is a really neat feature
you can be remote profiling over a
network we run a command line tool and
your ex serves sitting in a cage
somewhere you know you can talk to it to
shark free of rendezvous and you know
control it so you have minimal impact a
new survey you can analyze Tomcat you
jsps you know whatever so that's a
really neat thing and you learn more
about you know the detailed features of
shark at got chart is friday at three
thirty p.m. so i'm going to talk about
as a few general principles here that
motivate the data mining what makes
software slow well probably best known
as bad algorithms you're using a bubble
sort instead of a quick sort you know if
you leave as large as it says data then
your stuff is going to go really slow
excessive memory allocations and locking
these are primitives they're expensive
if Victor just talked about you know
this example with Monte Carlo where the
overuse of a synchronization primitive
just hosed performance disk i/o network
Hall IPC these are all really expensive
operations compared to doing an add
these things you want to do as low as
possible now more insidious thing that
happens in software is doing the same
operation more than once let suppose I
write a module that you know quick sorts
and you know properties I read out of a
file and I'll say Victor had written
another function that does the same
quicksort these P will just show up as
calls the quicksort in the profile but
it won't show the fact that two
different pieces of code into different
parts of the program you know did this
call the quicksort and this is a simple
example of what we call complexity and
software I have a little graph of an
execution trace here
of a program the horizontal axis is time
slices or sample slices in this case is
memory allocations the vertical axis is
the call stack depth so we're really
doing is we're taking like slices of
your program as it's running and you
know doing this plan you can see these
interesting patterns there I'll go into
that a little more in a minute so what
do we mean by complexity large-scale
softer has multiple layers and many
modules and the bigger the system the
more of these things you get and because
we're good programmers we hide the
implementation details of the from our
clients so function called or method
called foo could do something I just set
a bit in a class somewhere or it could
cause a transaction to a database you
know update it from roads and even
result in like the launch of a rocket
through some IO devices in one cakes
microseconds or millisec the other one
can take minutes or hours so
innocuous-looking calls can result in
you know crazy complex unexpected
execution paths so going back to this
example which is actually the finder get
info dialog we zoom in and the patterns
of repetition show up in deeper levels
on this fine level you see repeated
structure and you're zooming out you
know shows up again this is like two
layers that are both doing iteration and
repetition and they're layered now
imagine is multiplying with five layers
imagine adding a WT all the sun
libraries all of your libraries your
your huge application this thing can be
insane so how do we deal with this well
in analyzing performance you can break
the impact of an operation into two
pieces the cost of the operation times
the number of places it's used and
traditional profilers for a long time it
made it easy to analyze costs now I can
tell what leaf function i'm in the hard
part is understanding the patterns of
usage did Victor and I unintentionally
both quicksort the same array when we
could have just done it once and one of
us access your cached copy and when you
introduce of over modular zation you
people tend to over abstract design too
much
you go kind of crazy with design you get
these really crazy your multi-level call
stacks multi-level things that just get
really killed performance so analyzing
use shark provides have two classes of
features to help you analyze usage the
first one is called call stack data
mining and in this case what you do is
you want to filter unwanted information
you know how many of you have profiled
something and not seen a line of your
code in the top profile rather you've
seen all these Java libraries in system
libraries have any of you had that
problem yeah I bet you had that was the
first thing that happen to me when I did
a profile now the other side of this is
graphical analysis in this case you can
visualize the dynamic behavior of
repairing like those plots that I showed
you those are not just cute graphs to
make a point those are written data from
real program that was able to use the
fine performance problems and you do
this through a technique called sauce
for fingerprinting and sound for
fingerprinting you recognize that if the
pattern on the picture looks the same
over and over again it means you're
going through the same code path and
you're going through the same code bad
you're either just doing the same thing
over and over again or you're iterating
over some array or other structure data
and in that case you can still look at
the opportunity to hoist information in
other words in quicksort you have to do
a compare function but suppose your
compare operator then has to go through
a whole bunch of different classes and
call stacks except to actually get down
to the words doing the real a is less
than B well that's not a good you should
d capsulate that stuff and you remove
the amount of overhead to do that
compared so the sound for favoring can
also identify those kind of cases shark
supports both approaches so whichever
one works for your application it's
there so data mining concepts to
eliminate what you don't want to see you
know one of the this doesn't work that
well in job right now but you can
eliminate functions without source the
thing is almost every you know I'm
symbol in java indicate and its sourcing
from it so we're going to work on making
that a little better
a really powerful one is exclude package
shark calls these things libraries
because it was originally a C C++
objective-c tool but you can basically
choose the library like so you don't
want the awt in your trade you're going
to exclude it and it'll charge the cost
of any of the things that samples that
have found in those libraries to the
things that call it and exclude
assembler work similarly accept it works
as an individual symbol to help you see
what you do want to see you can focus
symbol and focus symbol you choose a
particular culture you want to look at
you know by looking at the rooted thing
and you can focus in on that and it'll
get rid of Maine it'll get rid of
everything else around so and focus
package is the same thing for you know
the pic know all the functions within a
piglet package so I'm just going to show
you this graphically because it was a
lot to cover so in excluding library and
we have an example of a main program
that calls it an it function a do
example and a clean up and do example in
this case calls the function bar four
times and let's say uses Java util and
say it until use the hash table so in
this case when you profile it you're
just going to see all these samples as
indicated in yellow in java.lang you
java.util and not in bar so we don't
know that we've been using bar to do
this but by excluding it turns barn
effectively into a leaf function and now
you can see well i'm making for calls
the bar also computing the same thing I
don't need to do that if they are I can
hoist that was another operation is very
similar to exclude library called
flattening a library and that is makes
the library go away instead of make it
all go away all completely it replaces
the library with all of the entry points
into it so you can observe your usage of
the library in that situation and
finally focusing we're going to focus on
do example it makes main an it and
cleanup going and you're just left with
this subtree so these are various ways
you can kind of trim the tree and see
what's going on
canal's do a demo so please switch to
the demo machine okay thank you so we
have a sort of modified version of the
Java 2d application here and to use you
know Java for shark you had an ex run
parameter you're using the jb MPI
interface it'll migrate to jb MTI in the
future and you add a dash X run shark
argument so with that in mind let's run
run this so you get a message in the
console Java for sharks you know is
enabled and here is our you know
familiar example we added a new pane
here called bouncing strings and this is
kind of a cooked example and that it has
some performance problems introduced
that we want to go fine so let's go over
and launch shark and shark has all sorts
of traces but we're going to choose
pronounced java time trays and when it
does so you can pick the job a plane in
this case we ran it from a your command
line type shell so you just need Java it
would see your application name if you
made a double clickable so let's just
start sampling oh by the way you notice
that it just paused it does it every so
often and it's because we're garbage
collecting you see it just did it again
so that's kind of odd let's just start
sampling so there's like go for a few
seconds you know given the sampling rate
is probably good example for about 10
seconds and let's stop sampling and we
now have you note you know typical
profile view and we have a list of the
various symbols and the percentage you
know samples that occurred in them on
the right here you'll see a back trace
of the calls can you see do string
drawstring and it goes down to bouncing
strings paint you click on another
symbol you see it's back trace and one
thing to kind of help keep track of
things a little better there's a neat
feature called color by library when you
click on that look what happens that
color is all the strings and this will
help us identify everything
WT is colored in you know this red color
brown for this and so on and yet one
little problem here in a java runtime
and jb mpiana isn't perfect about
reporting all the symbols so we have a
method with an unknown library we're
going to use the exclude library to get
rid of that and tribute those two things
that are more meaningful and when this
happens you'll see that these
percentages win it you know paint
strings went up your native font wrapper
went up and so on if we look at native
plant wrapper we can exclude the library
again and now it pushes initialized font
up let's give interesting initialize
font instead of a drawing you're drawing
or painting strings why are we
initializing a fun this is taking up
almost as much time as it's taking to
draw the string so let's take a look at
the heavy and tree view and we've been
looking kind of from the bottom up we've
been looking at the leaves of the
execution train now we can look from top
down here's our event dispatch thread
run and it works its way down and here's
bouncing strings paint so we see that
this bouncing strings pain is an
important place to look at and here is
one of the really cool features that we
worked hard to get in for you guys and
double click on bouncing strings pain
and get source and in source here it's
annotated by the relative printing of
densities of the column about ten
percent nine percent of the time is
spent in Phil wrecked and eighty-nine
percent of spend in this paint strange
function he notes that these things are
underlined well that means you can
double click on it and navigate to you
know the associated function and play
old web browser you have a backwards in
a forward arrow and you see here you
know there's three areas of interest and
we found oh that looks like a problem
we're calling set font new font lucita
so we're constructing a font every time
we're painting actually inside of a for
loop that's pretty bad so let's go fix
that and yeah we kind of rig this a
little bit but you know I've made errors
like this in programs yeah I'm sure you
know other people might have to so
this is worth doing so I happen to have
the corrected code here just to save
this time so I'm going to change this
and also let's quit the app and we're
going to run again give it a second to
load and we go back to bouncing strings
and look at that we just about doubled
its speed so now just by doing you know
some analysis we were able to speed up a
program you're pretty significantly now
let's do one more trays you now that is
since you're feeling lucky we made some
priors let's move and make some more I'm
going to do a memory trace because this
is a different kind of technique that
you might be familiar with so we're
going to do start and the memory trace
slows it down a little bit because we're
sampling every memory allocation and
let's stop that and now we see Wow
sixty-nine percent of our allocations
occur in component bound and we change
this to value we can see that just knows
few seconds we allocated half a megabyte
of memory now wonder we were garbage
collection so much we are allocating all
these bounds objects so let's look at
what's up with that so if you look in
the back trace here you'll see that
bouncing strings ball Baltic is doing
most of it and we go in here and you see
that we have this bound beagles get down
it's being called in every tick I've
already got code in here to turn that
off and you can cash it which would be
the obvious thing just compute at one
since the window size doesn't change so
let's go ahead and make that change we
made a convenient little boolean here
called cash bound and by the way this is
not a cooked up example this was
something that we found in the program
just you know this is it was given to me
so we're going to do that we're going to
run it again
go to bouncing strings and look at that
we're now at about 180 3 190 I've seen
this thing go over 200 so just doing
simple memory optimizations because
memory allocation is so expensive and on
and Java can get you huge rent wins you
know what I was working on an
application server a few years ago where
we reduce the number of allocations x 10
x and we got a 3x throughput improvement
in that server so doing just memory
analysis and memory reduction is a
really amazing technique so thank you
very much I'm gonna give this back to
Victor Thank You Christy I need the so I
just wanted to conclude with one
recommendation for optimizing for hot
spots the main thing you need to know is
exactly what your hot methods are you
can find out bottlenecks in your own
code and you can identify them using
sharks or Java the other thing that you
need to be aware of is even if there's
no more nothing more to be done you also
need to make sure that the hot method is
as amenable to hot spots optimization
opportunities as possible I want to
highlight the fact again that plane
lining is probably one of the biggest
optimizations that were able to do and
let and therefore it should be in your
best interest to us to make your hot
methods in line of all so here's a
reminder of the of the key things that
keep a method from being in lined it
being too large it call it being
synchronized or it having exception
handlers and the last tip I want to tell
you is that we do have a java lab here
at wwc all week and if you want to see
your application running on on a mac and
i'm done so before do go down there or
if there's any performance model next
thing you want to identify to our
engineering definitely we're able to do
that there as well so that concludes
everything want to point out a few URLs
you can get um
java reference documentation from Apple
at the ADC website you can also get java
15 documentation at the Sun website and
finally that is the URL again for where
you can download Tiger that has shark
sorry where you can download a shark
that has the job of support that runs
both on Panther and tiger finally if you
have any more questions that people to
contact our Alan Samuel these are Java
technologies evangelist Bob Fraser he's
our product manager and finally
franchisee well who is our the manager
of all things Java and apple