WWDC2000 Session 182

Transcript

Kind: captions
Language: en
good afternoon everyone welcome to
session 182 which is Java getting the
best performance
and here to begin our presentation this
afternoon please welcome Jim Leske good
afternoon we're going to talk a little
bit about performance yesterday and we
talked a little bit about some of the
new technology that was being introduced
on Mac OS 10 in particular hotspot
hotspot and some of Java too we're going
to focus on different aspects of these
two technologies but we're trying to
zero in on performance and we're going
to talk about how we've made
improvements to performance in the Java
VM and we're going to give you some
hints or ideas about how you can modify
your code to gain game performance we've
broken the stock up into three parts
because we wanted to get three different
the categories of things to talk about
the first part will be done by Yvonne
posta of the vm team who's going to talk
about the vm the new memory management
thread synchronization then I'll come
back for part two and talk about code
execution and how we boosted performance
up in the performance of code and
finally John berkey will come up and
discuss how we can get some performance
out of the aw aw tea and swing and the
new Java classes so I guess forget to
Yvonne to come up now
[Applause]
well thanks Jim for the introduction so
let's look at first what are the factors
of Java performance first of all it's
the design of your application that's
the most important part of that second
is the speed of the byte code execution
inside the vm of the speed at which we
execute the byte codes of your program
next and that's what I will focus about
is speed of vm operation speed class
loading garbage collection threading
synchronization and its associated
library and last is the speed of the
hardware you're running on and the
underlying operating system that's not
much we can do about it but we are
pressing on the colonel team in that
area so the two main attractions of the
Java the two main attractions of the
Java language are automatic memory
management and language level support
for threading and synchronization
unfortunately these two points are also
associated with them having the most
impact on performance most negative
impact on the performance and we are
here to clean up some of those
misconceptions and show you what you can
do to give the vm hin in that area so
first I will deal about memory
management we have in the hotspot vm we
have direct object references to two
objects we don't we don't go to handles
as the classic vm goes the stuffs we
have see speed up see speed access to
fields of instances and we also have
basically see speed access static fields
in your classes also what we have is a
Loper object overhead we have two words
per object or associated for use by the
vm
when you compare that to classic to the
classic vm which has three words four
objects this doesn't seem significant
but this is actually where we get some
of the performance or memory memory
advantages because Java tends to
allocate a lot of a lot of small objects
and studies have shown that for the mtrt
benchmark in the spec JVM sweet each
additional word in the object header
cost you twenty percent more memory for
java see this additional cost is twelve
percent so the lower the per object
overhead is the last memory you use and
it can be a considerable game so I will
the hotspot vm uses a generational
copying garbage collector and on the
next slide I will deal and dive into bit
more detail so it is accurate the
garbage collector is accurate which
means we know at all kinds in the
executing Java program we know at all
times where we have life references to
object we are not conservative when we
walk through the stack so we know what
what objects are what words on the stack
our actual integers even though they
look like object references and we can
collect those objects in contrast to the
conservative collectors this can really
make a big difference in memory usage in
the sense that you're not keeping live
object that you're that are apparent
objects but you actually leading to
leads in that area it is generational
which means the majority of objects that
you allocate in Java when you when you
actually study it is
they're very short-lived so what we do
we allocate all the objects in this new
object heat or what we call the nursery
and once we exhausted this nursery we
copy the surviving object out of there
into an old space and can start from
scratch in that nursery which means we
have a fast allocation in this nursery
we don't have to deal with with the
garbage here we have in there we don't
do anything with the garbage we just
copy out the object that survived let's
survive from this new generation to the
old generation you could you could say
that garbage collection is actually the
wrong term it's more like a
search-and-rescue operation where your
rescue the few survivors out of the new
heat into an out of the nursery into the
old old space it is also a search where
we do to the accurate nature we know
exactly who's surviving and so it the
search is pretty fast i say i mentioned
copying we actually move those objects
out of out of the nursery in the
separate memory area studies have shown
that five to ten percent of allocated
objects survive from the new allocation
space from the nursery into the old
generation so if we say this nurseries
half a megabyte big we copy 25 to 50
kilobytes words of objects which is not
very much also since you have this
copying infrastructure already in place
for this rather big object space we
compact it regularly every time be
around the old space collector we can
compact
with the heat and keep the memory usage
to a minimum also to the old space
collector also since it has to deal with
much bigger memory allocation is
incremental so it's it work it works on
a chunk at a time every time it's
involved it reduces you to perceivable
posits so you don't have this big stop
in the middle where you're collecting
the whole heap but you have these many
many small causes which makes your UI
applications or server applications
respond much faster so the benefits to
you as a programmer are very fast
allocation since we always allocated a
bit nursery and reallocate in a stock
like fashion so we always do is
increment the pointer and this is our
new object all we have to do after we
increment at the pointer is check if we
exhausted the nursery space and then we
have to trigger this new space
collection this this allocation code is
actually in line in the compiled code so
all we have to do is to allocate a new
new objects when you have the new
operation this is equivalent to 11
instructions when you compare that to
the a/c style my log function call you
have to go through cross library
function glue and then you have to do
the c.c prologue to allocate after those
11 instructions you're not even
allocating objects in the c language yet
versus in java you have your object and
are ready to go as i mentioned before we
have an accurate collector so we do
aggressive reclamation of of this
nursery as well as of the old generation
so for example you don't leave of the
trends that are not accessible anymore
and due to these two factors you have
essentially free free temporary objects
temporary objects are by definition
short-lived which we don't have to deal
with you have no overhead for
short-lived objects because all we have
to do is but we copy the few survivors
out of there so I claim when you have
essentially free of essentially free
temporary object especially since javis
multi-threaded if you have an allocation
cash you would have to lock that
allocation cash flows you would have to
take the object out of the allocation
cash and then unlock but by that time
you already allocated the obstacles
nursery so what do you need to do well
as I mentioned these do not build
allocation caches but what you also have
to do is you have to tell the vm that
you're done with an object so so what do
you have to do is if you have an object
here that you don't need anymore or you
have even universe if you have an object
hierarchy especially if you have it in a
static field you have to set that value
that restaurants to know so we know this
object hierarchy or this object is not
reachable reachable anymore so we don't
copy it out into the old generation or
even worse if it's in the old generation
we keep compacting it and so on we keep
that memory alive so what do you have to
make sure that we need all out your
object that you're not using anymore and
while i mentioned allocation caches what
you also have to do is if you have
native code via the stub library stub a
slave libraries based on on the old
native call conventions from java 1.0
jdk 102
or John 11 or if you have data in your
project if you have J direct to code you
have to basically convert those project
to use a direct three there is a talk
tomorrow about Michaels and jawline
death and that talks about how to
convert project page rectory or you have
to convert your native stub libraries to
use J&I native calls and if you're using
the objective-c to Java bridge for
wrapping objective-c frameworks to Java
then you have to make sure that you
recompile your wrapper project with the
new Jane I base bridge that comes with
DP for on the CD before i go into
synchronization i wanted to mention that
java threads are one-to-one map 2 to P
threads and therefore our also
one-to-one map the kernel threads they
are fully preemptive which means we
inside the vm or you we don't have we do
not have to deal with scheduling the
colonel does that for us we are
multiprocessing ready as we've seen in
the hardware keynote this morning if if
we have something that the criminal can
use use to schedule more threads onto
second cpu we in the vm will make use of
that and therefore your applications
will run faster naturally as well we
integrated in that native and java
stacks for four invocation stacks into
into one memory area so you have better
locality of reference this is vm
internal this is not something you will
notice but this is one of the reasons
why you have to go to the J'naii for
example as well as the accurate garbage
collector makes it necessary to go to j
and I
so my last slide is on synchronization
first I wanted to explain into when we
talk about synchronization what is a
contended case the combat contended
cases when you're executing in one
thread a synchronized block below
synchronized method or if you do execute
the synchronized statement on a
particular object you're within inside
that call inside that block and a
different thread comes in and tries to
synchronize synchronize blog on the same
object so at that time it is called that
you have contention on that object and
the on contended case therefore is if
you have one thread go in synchronous
synchronize in a particular object
execute the whole block and exit the
synchronized block without any other
threat trying to synchronize in the same
object studies have shown that the
contented case is very rare in those in
those instances we use pthread
primitives and Colonel primitives to
make sure that we do right we do the
right thing or blocking the thread so it
doesn't use any CPU cycles from then on
but what is much more important is that
we have very fast and conversation and
young contended case what you have
basically is the constant time overhead
you have a couple of instructions to
make sure that you set up that
synchronized blocks we do not allocate
memory on the heap or anything it's all
stack allocated for for that memory are
basically for that for that synchronized
block we allocated on in inside the
invocation of that stack and most
important of all we don't use any OS
resources so we don't use any colonel
resources you don't make any pthread
calls and don't make any Colonel calls
which which is important since we want
this constant time overhead so I will
handle podium back to Jim to talk about
code execution I want to try to focus on
on on basically what we've done to
improve performance in the echo
generation but I want to give you some
background so we're going to go in a
little bit of history first and then
talk about optimization and then at the
end oh I have a pin slides which are
what I'm calling a code generator hints
which are things that you can do in your
code which the code generator can look
at and say I hope this means i can do
this optimization and hopefully you can
pick up a few of those things and put it
in your own code so i would say at this
liner i guess hopefully most of you to
say at this point the jab is matured
fairly well the first versions of the
java vm the chemo were simple
interpreters that took bytecodes take
the opcode off the bike code go through
this humongous T switch statement figure
out what's in strings that needed to do
to implement the instruction and go
through this process one opcode at a
time and that worked out you know
initially got you interested in Java you
know because it was actually a real
thing but it turned out to be very slow
performer and so once the initial
interest the Java sort of passed we had
to find different ways of speeding up
the performance so the next generation
of interpreters usually ended up being a
hand coded assembler program so that it
was implemented in the native language
of the platform and it would go through
the bike codes and interpret the bike
codes a little bit faster a little bit
more optimal way that would been
generated by the original interpreters
and it's that way we got you know maybe
two or three times improvement in
performance the third generation which
hotspot is I would classify hotspot is
being for generation is that we found
that the native code that we created
with the assembler was hard to manage
that the assembler code is always hard
to management that's why we program in
higher-level languages to begin with so
a lot of these newer interpreters are
actually using templates which
makes it easy for us to just go in
insert the instructions that we want for
particular secrets of codes and then the
the particular vm that you're working on
in this case hotspot gathers all those
templates together and produces a an
interpreter engine that allows us to to
interpret the code so in hotspot we gain
another two or three times improvement
in speed just by using this
implementation now interpretation is
impaired you so far but it's you know
it's really not the the end game and
what we need to be able to do at that
point is to get into some kind of native
code generation so that we get down to
the point where we're actually running
at the same sort of speed that you would
see in C++ so we get into code
generators and we had the first jits
that came out used to compile an
compiler interface there's a plug into
the classic vm and what they did was
intercepts the execution of methods and
go through and convert the bike codes
into native machine code and as a lot of
the earlier jits the chemos base call a
really did was take to be the bike code
sequence and convert them it one to one
into sequences of native code that
worked again you got another boost
performance probably get a vote of five
times boost in performance but the
problem is it wasn't really utilizing
the true performance of the machine
wasn't scheduling the instructions it
wasn't looking at the sequences of
instructions and whether you could get
any kind of optimization so then we got
a round of static code generators where
people would take the back end off of
the c compiler and put a job of
front-end and and produce static
executed coat and that sort of works for
some types of applications but but the
problem with that is that the java is a
very rich and dynamic environment to
work in and Static applications don't
really fit in to what you know what the
spirit of job is about you need that
dynamic dynamic environment to run in
so then we get into the high-performance
gypsum and you're familiar with save a
semantic jet that we've been using again
mr j there is a couple of other
high-performance gifts like the the situ
or the server a compiler that's on
hotspot and what these these ships would
do would basically optimize the heck out
of the byte codes and try to reduce it
down to something that would be quite
close to what you would expect from a C
and C++ compiler ok now the problem with
these high-performance Jets well does
the positive part is that we're getting
the really good performance you know
really really good performance but the
problem with these was that there was a
lot of competition between the JIT
companies and they were trying to
squeeze out as much performance as they
possibly could try to get the best
caffeine mark that could working with
the semantic Jets on the semantic gypsy
we were getting infinite scores and
caffeine marks because we bit were
optimizing methods right down to the
point where they were just simple return
return stages so but the prob what the
problem is that's the farthest side the
negative side is that it was taking more
and more time to compile these things
and more and more memory and that's
where the cost was so we have to find
some kind of balance between getting the
optimization optimization done and
keeping the compile time and memory
requirements down to a minimum and
that's where we are with hot spots of
client version
so the type the traditional traditional
type of optimizations you would expect
would be you know expression reduction
CFE loops unfolding or loop optimization
dataflow analysis these are all the
standard sort of things he would see in
the in the dragon book i guess the
standard compiler book and typically
what that would happen would be that you
would go through a round of these
optimizations and then they would reduce
the application down to a little bit
smaller and then you'd have to go back
and repeat them again because then it
would bring it down again so we have a
her stick tape algorithm to actually
reduce capitalization and this is really
where a lot of these high-end optimizers
got into trouble was because they loop
and loop and loop and it could loop
several hundred times before they give
up and say well this is the best I can
do with this method and then actually
executed
now one of the great things about the
runtime just-in-time compiler is that
you can do optimizations that you
wouldn't be able to do an aesthetic
aesthetic situation one of the most
important ones is be able to determine
whether a virtual method is mine and
more thick or not what this means what
might a morphism is about is that we
have in Java the ability to create
subclasses of a particular message and
then to be able to billet e to be able
to override those methods so in order to
do to call a particular method with a
particular object as a dispatch goes on
it says which method is associated with
this object so it's basically virtual
objects pervert virtual dispatch but
most of the time and it turns out to the
most application a little bit over
eighty percent of the classes that you
have in your environment are not
overridden they're basically their leaf
classes so there's really no need to go
through this dispatch mechanism you can
make a direct call to that method and
not worry about having to you know
hitting the wrong hitting the wrong
method so the just-in-time compilers try
to exploit this and one of the things
that you can do besides just you know
simplifying the actual call to the
method is in-line the code for the
method that you're dispatching to and if
you take a look at some of the examples
of your own code you probably find lots
of places where your calling is this
method that's rather simple you know
maybe a few lines of code a good example
those would be get accessors finally
gets getters and setters for your class
and isn't it a shame i have to go off
and call this method because this code
could fit in rates Rajon lineman and be
very inexpensive but but this is what
the JIT does for you it goes off and
says hope that's a really simple method
it's I can deter I determined that it's
monomorphic there's nobody over writing
this so I'm just going to in line is
code in and the code becomes very
instant
now there's always the possibility that
that method may get overloaded as some
other or not overloaded but overridden
it some other time so what will happen
in the just-in-time compiler environment
is it it may create several flavors of
the same method so you may have several
different versions of that method for
one where it may be a call is overloaded
or overwritten and another case where it
isn't over so you so what happens is
that it's the method that needs to be
executed will be chosen that run time
and will go off and do that code or
tutor or which one whichever one suits
the situation and also should make a
point and this is another great
advantage of of a different types of
fathers is that we can do process or
specific optimizations at runtime so the
great beauty of Java is that I can take
this bytecode and poured it to any
machine and have it run on that machine
well even on the same architects general
architecture like the PowerPC I can get
different kind of optimization on the g3
than I was on a g4 because of scheduling
and so and so forth so the types of
things that you could do would be to say
on the part we see I can use masks and
shift operations or if I'm running on
the g4 I can use the velocity engine
instructions right so I can do that on
the fly and then I can do instruction
scheduling and once I've done that on a
particular implement our particular
machine that I'm running on I can cash
the code that are generated so the next
time I go and execute it I'm going to
use that cash code they don't have to
recompile it and that code is already
been tailored for the machine that's
running on
ok so what gets compiled now that
there's probably all kinds of myths
about their boats all these magics her
sticks that we used to figure out what
method gets executed and there is some
some truth to rumors but generally the
things that do get compiled into native
language our native machine code our
primarily methods that have loops and
methods that have interpreted and number
of times so those are the two primary
triggers the trigger whether something
gets converted to negative is coconut a
method that has a loop may get executed
once in the interpreter but then each of
such a time that it may get converted
the native code and get run run as
native code and the triggers for a
number of times I said n number of times
because different Jets trigger at
different levels like hotspot triggers
at around 1500 executions before it
actually goes and convert the natives
but that's bec that fluctuates depending
on different kinds of criteria what
doesn't get compiled are typically
things that say if you have a method
that's currently running and it's
looping and calling other things and it
seems to be looping for a long time if
it didn't get meet the original loop
criterion didn't you compiled it may sit
there and a trip continuous as an
interpreter this is something that would
require on stack replacement and the
current version of hotspot we don't have
that in place yet but you should be able
to replace eventually will be able to
replace a next something that's
currently being interpreted with
something that's been compiled but that
doesn't prevent that method from being
compiled what happens is that if that
method is being called by any other
point in your application then it will
use the compiled version of it because
it's already been triggered to be
compliant class initializers typically
don't get off compiled because the fact
that they're only
just one that usually do the
initialization of their statics and
create whatever things that they need
they don't need to go beyond that so
they typically don't get compiled and
finally things are written in Java
assembler that are very convoluted and
there go to structures or whatnot where
it's really hard to do the analysis of
that code we can't generate generate
native code for them where we may try
but it's typically not worth the trouble
and the things I'm the only time I've
ever run in for that really is with the
JDK there's a lot of the Jacob K test
that try to you know see what you can do
to to a trip up on the jet and so you
just don't think we have to worry with
them okay so now i have 10 hints or
things that you think that you can do
that you can provide the code generator
provide to the code generator that will
actually help for your performance
there's various degrees of performance
improvement here some made me a little
bit more dramatic than others but you
don't need to use them all and just
don't have but you don't necessarily
have to go out and feel it you have to
use them all they're just ideas that you
can keep in the back of your mind when
you're trying to tune your application
at the end of end of your your cycle the
first is probably the most important
thing is the right small and concise
method try to avoid methods that have
two thousand lines of code in them
because what happens is that when the
just-in-time compiler kicks in it's
going to compile the whole thing and
maybe you're only going to use a couple
of lines of this because you've got this
big case statements and as you know
maybe two lines and if it gets executed
most of the time and the other ones may
be compiled or sorry runs very rarely so
what you should try to do is to try to
to keep them small so that they compile
quickly and then if you've got some code
that's not going to be used very often
move that code into separate routine
so that you know if it's necessary that
we compiled in but otherwise news'matt
don't worry about you message being too
small because method inlining will take
care of that the jit we'll figure out a
nice load balance to get a nice size
routine and what what in lines nicely
and so on so it's don't worry about the
size of a single sentence for its that's
not crucial or whether the method is too
simple and then finally you should not
always know that are always remember
that the accessor methods are almost
always inline even in the classic
classic interpreter the actress and math
methods were often in line so you know
it's good to actually use asked accessor
methods instead of fun that's accessing
the fields directly the next hint is to
trust the supply classes what what we
try to do is look at look for hot spots
in your codes like things that take a
lot long time and try to gain
performance and one of the things we do
in their analysis is identify methods
that get executed a lot and want to try
to tune those so that they execute very
quickly and often we two min write
directly into December so so methods in
the class string and string buffer
bestvector which are used a lot well we
actually have intrinsic or built-in
methods to deal with a lot of those
situations you get better performance so
if you think you can write it better
than Java sorry better than Sun and did
well just remember that well maybe we're
going to do a little bit better for you
so in the background so that's something
to keep in mind array copy is something
we gain performance on if you're running
on a g4 hopefully we'll be able to get
this or not currently in place but if
you're a ray copy running on a g4 use
the velocity inch in the help
the coffee in sine and cosine is Han Han
on Intel you would call the hardware
directly to do those we call the library
directly so you don't actually go
through the group so it's a bit of a
performance and season we don't have
64-bit architecture so there is a cost
in using long so if you if you don't
really need long you're just doing it
because you think you might need the
decision later then we'll maybe rethink
is a little bit and go back to using
straight industries long multiply takes
five instructions long design has to
call the subroutine a shift operation
may take several instructions so it's
not as simple as the same long and you
know if things are going to work flow
there's lots of techniques to get around
some of the problems you might have long
i did a class library that's it handles
the situation when you're trying to use
long's that deal with the unsigned
integer problem when you wanna do lunch
on compares there's a way of actually
doing that without having to resort too
long floats versus what of vs devils
floats obviously are smaller take up
less memory and most circumstances
floats and doubles and have equivalent
and execution but there are some
circumstances like divide where divided
of a double is actually twice as long or
almost places on is a float so if you
don't really need the precision stick
with float and the other reason why I'm
recommending using flowed is that as we
progress to the velocity engine the
velocity engine doesn't support doubles
it only supports float so if you're if
you're thinking about declaring an array
of doubles see if you can use a very
floats instead because then that's most
likely what will be able
used to or what will apply the velocity
engine there's no commitment to that
project sister just keep that money try
to avoid the use of generic types it
costs actually to use these generic
types especially when you're doing
assignment between it's the generic
types with specific types because the vm
has to do a type check to make sure that
it's valid to do that and it does that
say check at runtime and it may have to
search up the class hierarchy in order
to determine whether that's a member or
not okay so that's just something that
you should keep in mind especially when
you're doing assignment from a generic
type of ray to a specific array because
that means like even if you're doing an
array coffee it has to validate
everything that's being moved from that
array / okay and has to go through a
class trip so try to use subclassing and
method overloading as much as you can
because that will actually be better in
the long run than actually and then
using one the sabres writing one routine
that has a generic type and then doing
an instance of check inside of it it's
better to do the overloading momentum
copy the valley local values and then
some of the optimizers in the image its
will actually do this optimization for
you but it's it's better for most Kate
or better for the interpreter it's
better for the lower end jets and so on
and so forth to have you move it up and
work with that coffee and then stuff it
back in if you speed to okay on this
particular examples example on the on
the left-hand side we have the
increments of the index or sorry
extracting value from the table
increments the value check the CP values
exceeded 100 and then reset to zero if
it has every time you have that index
it's going to have to do an array bounds
check
again the higher-level optimizers will
take care of that and move that out but
that's that you can't rely on that okay
so it's probably a good idea to move it
into a separate into a separate entity
okay the other thing is a semantic this
issue there as a semantic issue there
where if there's another thread that
goes and changes that field or changes
that array entry you don't know which
company you're going to get so if you
extract the coffee work with that coffee
and stuff it back in again you know
exactly which value your business in the
situation where you have multiple
threads that are accessing something you
should use volatile if you're not using
synchronization volatiles a little bit
cheaper than synchronization because of
what it says is you need to reload that
value every time you access it and it
volatiles not there then what will
happen is it'll highly optimizing jit
will say oh well I've got this value I
don't have to reload it but then
meanwhile another thread changes the
value and you're sitting there in your
loop waiting for it to change and it
won't change okay so use the word
volatile when you've got local values
that
final we've had a lot of discussion
internally about these two final but my
it's one of my favorite words as far as
just in time compiling a concern because
it gives me a lot of hints about what
the class can be or what kinds of
optimizations that can do in the class
but it's something that you don't need
to over use okay write your application
and if you feel that the class is not
going to ever be overwritten for
specifically for any you know any reason
for instance your application class or
class it was not going to be overridden
declare it as being final and what this
wind you is the fact that it says all of
the methods in this class can now be
monomorphic still never be overwritten I
can make direct calls so it improves
performance at all it also says that if
I do in the instance of on that class
all I have to do is compare the sieve is
equal to the class survive the class
string which is declared as final then
it's the check to see if that is a
string is a simple compared to see if
the classes Aretha I don't have to
search at the class hierarchy in order
to find out what's going on
the other use of final of course the
other use of finals courses on statics
and this says this value is constant
it's not going to change so once the
just-in-time compiler knows of this
constant it will just grab it and say
okay I've got this I can apply it to
optimizations in the code and in this
code sequence what I can gain here is
that I know that the allocation of that
character rate is a fixed size so all I
have to do is increment that pointer by
six shots the the allocation pointed by
a fixed size full month don't like my my
array declared in the loop I know that
it's a fixed size loop or the loop is
going to go iterate a fixed number of
times 5032 so I can actually get rid of
the loop and maybe do a blanket
initialization of that array into spaces
so if you have a choice between
declaring class hierarchy virtual class
hierarchy or using interfaces you'll get
better performance from virtual calls
than you will sue interface calls that's
because a virtual call is requires a
simple index into an array to get the
address as a message that you want to
call where an interface requires that
natural search of the class to make sure
we find the implementer of the class and
then it doesn't indexing inventory so
there's a little bit of overhead now in
Hawks bars are very clever where they
actually cashed the last instruction or
last method that you called from a
stickler call point so it's a little bit
better in hot spots but it still has to
go through a verification to make sure
that well really is an instance of that
class that's a that is being passed
through
limit the use of J and I and J direct
and initially what I wanted to do this
I wanted to convey to you that if
you feel you can do it get better
performance out of see you should
probably really rethink it a little bit
and think that Java is the way to
actually write your code now try to
avoid going off and doing things as you
can because the optimization levels that
you're going to get in Java will be free
close to see it's not better depending
on which kind of level of optimization
that you're doing when you're using J&I
there's a translation layer that has to
take place you could translate into
these visitors system and then coming
back again you have to actually do a
lookup for the methods in order to find
out which method is called back into
into the vm so use java as much as fun
and so in conclusion i just want to
repeat what Devon said earlier but
you're the best thing first of all is
that make sure that your application has
a good design okay and watch that you
have a good design then go back and look
at places where your you need to improve
performance we didn't talk about
performance tools here yet because we're
not really finished with them yet but
watch bata is actually releases its
release with a pipe and x prof ok which
will give you a profile of the methods
that have been executing and what
percentage of time that you spend in
there as we go on we're going to have
better tools as h cross tools that will
give you much more detailed reports on
and performance so get a good design
your application then go back and then
start tweaking it and maybe apply a few
of these hints so that you can get for
different
I'm compiler to produce better code for
you ok
[Applause]
so hi I'm John berkey and I my Nativity
King and I'm going to talk about a
little bit different side of performance
and that's how we can work together to
group performance specifically from the
framework level lot of a just heard is
about how do I good methods and good
class yourself but from the ADA bikies
perspective we're more concerned with
just highlighting a few things that will
help you use our frameworks actually
traveled up from the time limitations of
funds are friendly so anyway there are
five major areas in the cover are here
to read them but basically you'll see a
lot of usage pattern kind of stuff so
for image creation the main thing is
there's a new column job of you called
get compatible image and for all the 11
usage specifically swing the kinds of
things that are in the toolkit class a
lot of math you take care of this for
you and what this will do is depending
on the decisions we make based on device
steps and stuff we'll make sure that
it's the optimum image most users of
image don't need doing together than
this and if you do need to dive into bit
style manipulation of the image
and number one check make sure you
really want to do that and then go into
the imaging classes and that'd be real
careful about the ones you check one of
the cases here is that there are some
image types on windows for example that
aren't as a common on Mac and so they
may not perform into expected and again
so get some credible image of stuff from
the first choice the next thing is a
thing called rendering hints and if you
have a graphics object you can both get
the list of default hands and also set
your own and the basic idea is that with
Java too there's an ability to do a lot
of really nice graphics you can do
anti-alias text anti-aliased primitives
you can do image splitting with
different kinds of convolutions and do
really high quality work but the fact is
for a lot of what we do today including
most of our normal GUI framework
operations we don't need quite the
quality so that's why for example a lot
of these will be defaulted to lower
quality
fishin for like one swing music and
that's also because a lot of times in
GUI framework building anti-aliasing can
get in the way and cause buzzing other
cameras that you experience so the key
here is is that we haven't actually
fine-tuned all that stuff yet my
imitation but what I recommend is that
first you get familiar with these and
extreme at with them understand the
different ones i will explain them in
sec here and then as we moved excuse me
towards shipping you try these again
with our final candidates because then
you will start to experience differences
this is really important for us because
we can make a serious changes in our
implementation Dustin when we're in fast
mode so rendering is a key if actually
deleted off the front of these in key
underscore and there but they're in this
undoubtedly PS thunderclap and so
rendering is a key that you can pass
into this little hash map and you can
say basically quality or a fast
anti-aliasing you can turn on and off
for both primitives and text as well as
two of them in fractional metrics to
specify sub-pixel positions for context
one thing I'm out here too is there's a
new way to do text in Java to which is
glyph vectors and it is the highest
performance way 2d text don't assume
that you can make these do those kind of
things yourself and if you really are
doing tech stuff you want to see a high
quality and speed look at the way the
swing examples be the stuff before you
go and later on because they're using to
effectors and that is the fastest way to
the text so fractional metric comes up
because there's an additional cost with
doing sub-pixel positioning of your
letters on your text rendering so
bittering there's a couple different
choices there again it's quality vs
interpolation they're filing it by
cubics and that's for your image
bleeding basically we're image world was
a little better when scaled different
sizes if you use a higher quality ones
but it's slower so keep that in mind elf
interpolation st. same quality speed and
color rendering same thing for speed
so bitmap image manipulation this is
freaky for our platform we're optimizing
first to make swing apps and normal
usage at fans what that means is we
actually are not going to use the same
data buffer types internally that are
used on the windows implementation and
that's because then we can do hardware
accelerated blitz between our off
screens and ice cream and so in order
for us to do that we have a different
implementation class under there so
don't assume that you can type and save
cash down to specific data buffer types
they won't be down there so at least
look doing this themselves you can
create the other types and they will
work that they will be slower so just be
careful again this next one is kind of
obvious but there's a whole bunch of AP
is that are in the new imaging stuff
that are useful for some high quality
advanced imaging stuff and they were
developed this sentiment develops that
in mind but they're not as useful for on
typical case so I'm calling it low call
frequency methods the point is that I'm
read about a raster and bufferedimage
there's two basic ways to do things in
that this is the ones that I think are
good for typical case and it's basically
you can pass in a whole wreck to
whatever size you want of pixels push
between your ear on your bed max you're
after and this is better because you can
control the frequency of cooperation for
the number of pixels you want to do so
for example if you want to drink ram
you've got it you can basically have a
full copy that gets pushed across and
make one call to copy or if you want to
go all the way their way I would
recommend at the lowest per scanline you
can say make a medical person and that's
much better than these for speed which
existing API but you want to be where
and that's where you basically make a
method call for pixel and it's shooting
yourself in the foot in most cases so
double buffering this is kind of
interesting so I Mac os10 we double
buffer all carbon windows right now in
most cases so what that means is
interrupts already double buffered and
we'll take care of flushing that up of
efficiently if you want to have the
effective double buffering use the swing
stuff because on Windows that stuff will
be double buffering and on our platform
you'll feel just you double buffers
whereas if you create your own Java
image and drawn to that which is really
easily doing Java is mostly now I'm sure
and then with that image you'll actually
be triple buffer done that with 10 which
is kinda graceful so again we take care
of the swing so what I do in the case
where I want to implement this is I just
again you the swing stuff and new jpanel
etc and then it will all just take two
so that's another issue from echoes 10
its performance related basically the
hard work of doing live window resizing
is yours f developers will make of
course all the primitive stuff fast but
when you do live winter sizing suddenly
component that pain action is to be fast
right can be called every time you move
a little bottom of the window so if your
codes can't handle that you'll either
have built chunky performance when you
move in that window your intellect funny
you know like you have a low frame rate
or something and just the mouse to
become unresponsive or that you can do
is do some kind of threaded rendering so
you know the case total cases are like
JPEG voting where you show what you got
don't wait or if you've got just a real
complicated image maybe we just pass
over and do some kind of simple versions
at and cue up a thread that does the
rest of the work and then at that point
is that your con securely event maybe
you would use double another buffer in
fact we triple buffered but draw to an
image and then do it later that gets a
little more complicated because then if
the size changes maybe you want a sample
and drama gym scale can talk to people
afterwards if there's specific examples
the main thing I want to point out is
that with live resizing your penis can
we call it a lot
so if thats all for my size Jen
I guess I really don't have much to say
as far as the summers concern but the
environment that you'll be working in is
a little bit different than mr j and
there are going to be some different
things that you may have spent a lot of
time trying to get the forms done it in
my day they're going to be a little bit
different when you get to macro as Ken
so hopefully we've sort of covered a
broad net area that you get to keep in
mind in your ass and you're working so I
guess we're going to get Alan to come up
and coordinate a Q&A session we took
time