WWDC2004 Session 428

Transcript

Kind: captions Language: en hello welcome to maximizing java virtual machine performance this session will be given by three of us Victor myself Victor Hernandez and Roger Hoover from the Java Virtual Machine team and also Christy warned who's a responsible person for responsiveness in tigard so we'll be talking about is the hot spot virtual machine in Mac OS 10 that is used to actually execute your Java applications this hot squad java virtual machine comes from sun and we take that source and we tailored for mac OS 10 and optimize it specifically for powerpc currently we are supporting j 2 SC 1 for 2 and has announced yesterday we will be supporting j2se 5.0 everywhere throughout this talk will be referring to as Java 150 which is what we all know it as but the official name is j2se 5.0 virtual machine on Mac OS 10 you get a variety of features you get a client just-in-time compiler a variety of garbage collection algorithms an implementation of class data sharing which is innervated by Apple starting in Java 13 you get native g5 support also a jdk whose classes are optimized specifically for mac OS 10 and finally you get the debugger and profiler interfaces JVM di and p I which you can which development tools can use to analyze your application Oh what's going on this thing is a little confused art sorry I wasn't looking at the slides and now with 15 there are a whole bunch of new features specifically you will be getting new language features in the Java language which will make your development time a lot more basically it'll simplify a lot of development there's also a new client compiler feature called a safe point polling startup time should also be a bit faster and there will be explicit concurrency exposed in the Java API s also we're really excited about this one the class data sharing implementation from Apple has been adopted by son and will be available on all their platforms in Java 15 they've not only taken our source but they've also optimizes themselves and so what you will be seeing will be improvements to our initial sharing implementation finally there will be a new tools interface which will be replacing the debugger profiler interface that has been deprecated so Java 15 is available for you today it is equivalent to the beta 2 version that is presently being previewed by fun it installs on your tiger preview CDs are you a DVD that you got at the conference this week and you can go ahead and download it from connectable com so today's talk will be divided into three parts first Roger Hoover will be discussing what is new in Java 15 the second part I will be discussing how the hotspot virtual machine optimizes your application and then finally Christy Warren will be introducing a very exciting new Mac os10 application for profiling your java applications so here's Roger hello on I'm going to give you a brief overview of the noon pretty exciting things that are in I went out five I was over at javaone a little bit earlier and they've got hold talks for each of my slides so this is going to be very high-level very quick but I'm doing this to get you interested in new things in one dot five if you haven't seen them before and also to point you to where to find more information about those pieces that you're interested in so I biggest changes pretty much in the history of Java there are lots of language changes and i'll go into those in the coming slides there are also a bunch of library and rent time changes which are also pretty interesting note the blue bubble with the jsr numbers in it this is the java community process has these java specification request numbers that core bond with the specifications for this new stuff and if there's something you want more information and remember the number in the blue bubble and you'll be able to look it up with a URL that I'll have at the end okay why change language well there are some great things in here that I think are going to give a lot of improvements in productivity most of these changes are handled by Java Sea but there are a few things that touch the vm there's a flag source 1 dot 5 for Java see that turns these things on it was not the default in earlier betas but with beta to the stuff that we have that we're giving you it is the default there's a new keyword called a pneum if you use the new ms it ended up identifier in the past in your old code you're going to have to say data source 1 dot 4 to turn off these things in order to compile your old code ok let's look at some of these great features the first one is auto boxing and auto unboxing well what's boxing I consider a primitive type like int or boolean to be a nun box type and capital integer and capital boolean to be a box type namely the primitive object is inside an object box so with one dot for if you use the box types and the unbox types together you had to do lots of conversions in this example here I show you know you are all the time creating the new containers for things with one dot five the compiler does this for you at type checks it codes much more readable simple this is excellent for anybody dealing with these things generics people talked about doing generic types for a while in Java it's finally here before you had the dilemma in Java of either writing a very general type using object and then having to do all these casts in and out and worrying about doing the type checking or having runtime cast exceptions when you did it wrong or you can make something very specific and not be able to reuse them now you can write the code with parameters for the types that are embedded in the the data type and get the reusability and get the safe type checking from the compiler which then generates the caf this is for object types only you can't use primitive types I as as the parameters here's an example of pair that takes an arbitrary left and right type capital L capital R a constructor for that and public access or functions for doing for looking inside and then at the bottom two lines the next to last line has a place where we've created we're creating a new pair of capital integer string and note that the 17 in that line there is going to be Auto boxed for you so these features interact and also a teaser at the last line there we can now do see style printing and I'll get to that in a second but there's where we're calling the access or functions to pull the values out those of you who know C++ here's equivalent code that does exactly the same thing as C++ the major difference is that C++ compilers typically instantiate the template for every I instance and thus you can use primitive types and you can't do that a Java so here we're using int and care star okay another feature static import why static import well is to two reasons why you'd want this one reason is that it eliminates a binary compatibility issue with importing an entire interface you're just pulling out the static methods and fields of another class and so it's simpler for the compiler but probably the main reason that you want to use this is it simplifies the naming because you can actually import those names into the current namespace so in one dot for if you were going to use this my math class that has this constant PI and this method times you have to be saying my math dot this and my math thought that all the time but in one dot five if you do an import static you can use PI and times without qualification so simpler code compiler does all the work to make it right for-loop has been enhanced instead of by having to specify the induction variable in the loop if you're using a raise or a new java.lang iterable type you can simply say for type variable colon well here's an example of a string concatenation in the old method here's what you can right now you need simply name the variable that represents each iteration of the array that piece of data you're interested in and you can use it in the loop again simpler code compiler does the work for you I showed you this F print at the printf example before this is enabled by variable arity methods you can do this I here's an example you can on say type name dot dot in a method definition and the compiler automatically converts that into an array so you simply use it as an array this function here just I concatenates or no you choose as a maximum of a bunch of strings that you give it but you could pass any number of strings to it enumerations and this is similar to the new type in c you specify a bunch of constants that get instantiated by the compiler I and you can use it as such by saying aye in this case my color yellow picks out it an individual one but this also interacts with static import and if you import this stuff you can actually talk about yellow and red but there's a lot more to enumerations than simply a list of constant you can actually have methods inside enumerations here's an example of where I've taken the color and defined another enumeration fruit which has a method that does a switch on the type of fruit and returns the color now note that we were able to say red and yellow because we did a static import the third one Oran it's actually both a fruit and a color and if I had just said orange there instead of my color orange the compiler would have complained that was ambiguous so we can go on and and in another file import both color and fruit and do computations on them here I'm doing apple dot my color that will return the color of an apple and because the compiler keeps these things is unique instances you can do equal equal on enumeration types and it gives you equality ok another big thing is metadata I this is currently not hooked in with the metadata stuff you heard about yesterday's spotlight although we'd love to do that at some point in the future but this is metadata inside the Java program that allows you to add additional information into your program and there are three parts that in order to make this work there are declaration there are annotations which well declarations safe what you're going to keep track of annotations say are where you use that and you instantiate that in inside the program and then runtime access you can write programs that actually look at this I metadata via reflection and the good thing about this is that it eliminates the encoding of data into the flag classes like jax-rpc stuff did just to indicate that these are special functions you can do this more cleanly with metadata it doesn't have the compiler implications of being dependent upon other classes and this will be used for programming documentation tools I suspect we'll see a lot of neat stuff that uses metadata so how does it work a metadata declaration is similar to an interface declaration except you say at interface instead of interface it has a bunch of members value it's going to be special and I'll talk about that in a minute and you can give default values so here's an example we've got a bunch of bugs in our code and so we define metadata to called fix me that has a value that's the problem that needs to be fixed and a reward what the programmer gets for fixing this and the default is going to be a cookie which comes from the enumeration reward then there I have a second one just called the bug since we want to be using this one we're debugging that has no members and that's special I'll talk about that in the next slide okay so once we have the declaration then we need an annotation in order to place this and we can place the annotations before any declaration in our Java program there's a special way with a file that we can put a package annotation because javed really doesn't have a declaration of a package that's explicit in code and we can also put these in front of a pneum constants so what do they look like well here's several in a piece of code I've got this class called perpetual motion and you'll note that the thing is preceded by a fixed me annotation that says there's no such thing and you get a holiday if you fix it well I'm inside we have a method psalm that also has an fixed me annotation and this time I just have a string why does some subtract well if you don't specify a member named it assumes value that's what's special about value and I didn't have to say what the reward was because there's a default on that it'll use cookie is the default reward and finally notice the debug annotation in the last line and since debug has no members I don't have to do the open close paren so I can just simply again cleanly put those where I need it okay so how do I use these things liking right tools that actually look at the source to use these because the it just gets compiled out but I can also look at them at runtime via reflection and there's this another special annotation called at retention that is used to tell the compiler to retain that and put it in a class while so reflection can find it so in my previous definition or declaration of I fix me if I said at retention blah blah blah it would remember this stuff for metaphor runtime access and in this example I'm using reflection to get at the method of the corresponds with some and I do a dot get annotations that gets me back an array of these annotations and then i use the enhanced for loop just to print them out on the screen so this is how you'd right tools to use the metadata information great things for people who write concurrent programs with multiple threads on there are some new classes I I'm going to kind of look at these inside out the java.util concurrent atomic does single access atomic on individual variables that's exposed in this API and on top of that there's locks that are built on those and then and that those are completely independent of Java internal synchronization and then there's java.util current it has a bunch of classes that are pretty useful that's built on all of this and this was dug done book to jsr by Doug lien company some of you know of here's a brief overview of some of the things you can do the in threads there's an executor interface that gives you fairly convenient thread pools without doing lots of work there are lots of different kinds of queues there's nano time for nano second clock time within a given JVM I which performance people are going to love lots of synchronization primitives that more match the literature than what's in Java so you can implement published algorithms easier and concurrent access to various kinds of collections this is great stuff also in terms of multi-threaded programming a new Java memory model it says what what to expect with multiple threaded threaded program accessing shared storage there's there's a specification in the thread original thread specification for Java it was widely ignored because it says you can't do things that were widely done by optimizing compilers as well as processes like the PowerPC reorder instructions what this does is it presents a realistic model of what can happen VP for other things it also guarantees that when you do build a new object that the final fields are set beside the constructor so you don't see an intermediate State they're practically speaking what does this mean well we're going to make the apple 15 JVM obey the Java memory model so you'll be able to count on it and in particular you really have to use synchronization anytime you're mucking with shared storage either the job at synchronous a synchronized statement or Java util concurrent also if you're doing a multi-threaded thing where one thread sets up a bunch of data and then a bunch of other threads take off and start working on it when everything's done that variable that says that things are done has to be volatile where the right of that volatile by one thread releases all of the things that were done before it and the read of that volatile and the other threads acquires all of that stuff this is the standard way you should use thread communication via shared variables they've got to be volatile otherwise don't be surprised if things happen out of order in particular things that you've test and debugged on a multiprocessor g4 are most likely going to fail at some point when you run them on a g5 if you haven't followed the rules there's a new tool interface that replaces a jvm p I and di which are now deprecated and going away presumably in the next version of Java this has a slew of functionality that lets you implement these tools plus hopefully a whole lot more basically agents plug into the JVM user C++ programs or at least a piece of your program they get that saying which callbacks they want to get and then are notified and there's also a whole bunch of functions that they can query the vm so it'll be exciting to see what comes of that in the coming months and years on there's a new monitoring and management interface basically designed so you can do things like load balancing in server environments you can look at the memory usage in classes and thread information inside the JVM you can look at number of processors cpu utilization in the OS things like that and finally there's lots more things that i don't have time to talk about here's a list of some of some more things that you may be interested in and note the last two line for urls the next to last line fill in the jsr number and you'll find out where to get the specs for that particular jsr and son site the last line also has some great information Thank You Roger so I will be going over some implementation details of the hotspot virtual machine that should give you a better idea of how your application can be better optimized when running specifically on Mac OS 10 so what do you get with the hotspot Java Virtual Machine you get a client Justin kind of compiler a variety of garbage collection algorithms automatic g5 optimization and the sharing of class data between JVM instances when you have multiple Java applications running on the same machine you also get the tune JRE implementation that we have done for Mac OS 10 so let's talk a little about the client compiler the client compiler dynamically compiles your applications hot methods after you've called a particular method a certain amount of times we go ahead and stop executing in the interpreter and run run it continue execution in a jit compiled version of that method we have personally optimized the client compiler from Sun for powerpc that means that we've come up with the optimal code sequences for each java bytecode and we've also figured out how to make best use of the full power pc register set the clan compiler has a bunch of object-oriented optimizations that it does on your methods in particular for example there's instance of an check cashed instance of an check cashed require knowledge of the full class hierarchy so because their implementation require knowing who is a subclass of who internal the hot spot we keep track of all that information making the implementation of instances in check cast as optimal as possible another object oriented optimization is in location cash for virtual methods it turns out that most times that you call a virtual method even though there might be multiple implementations of that virtual method loaded you're probably still ending up the same target so we cash the most recent target and try that one first if it works or does if it doesn't we have a way of rolling that back but that's a pretty good thing the other thing that we're also taking advantage of is the ability to inline your java message this has actually been one of the biggest areas of performance improvements that we've been able to do so i'm going to go into in more detail so what exactly is in lining it's pretty straightforward in the example i have up on the on the slides you got average a call some there's extra overhead of actually calling the function some so ideally you would want to have a situation where average the body of some is just in line right into the body of average you don't want to have to do that in your own code because it it the code is not as expensive so we do it for you dynamically in what situations can we do it well there's very there's a few opportunities that are very straightforward to be able to do this one our field accessor methods there's no reason for you to access directly field via their names you can use methods to do that and also constructors in your java classes we're also able to in line a bunch of intrinsic intrinsic methods are methods that we don't even need to look at the implementation of the byte codes we know what the behavior of that method is and we go ahead and executed in the optimal PowerPC code sequence for those we know how to do a ray copy so this basically applies to jdk classes and we keep on adding methods as they become either possible to be it to come up with an optimal code sequence or they're used more heavily in the jdk and there's a examples of a bunch of in light of intrinsic methods well the this doesn't include a huge set of methods that are used in your applications and that's virtual methods as I said before you there are is possible that the target of a of an invoker chiral byte code is actually not always the exact same method depends on what those virtual methods have actually been loaded but it turns out that we actually are able to inline those if we know that there has only been one implementation of that virtual method that has been loaded in which case that method can be considered monomorphic it turns out that in most most usage patterns this is actually the case so we're actually hitting a large percentage of invoke virtuals with this optimization there are however limitations to this since we're able to in line so many methods the limiting factor no longer becomes finding opportunities to in line but actually the size of the compiled method so if you're hot method happens to be really large it might not be able to inline methods that it calls or might not be able to be in line by its callers you also need to be aware of the fact that we are unable to inline methods that are synchronized and also methods that are is exception handlers there's a limitation in the client compiler and I want to leave you with the tip that in previous versions of Java it was necessary or it was useful to use the final keyword to basically to make a virtual method in line a ball and now that's not needed at all and you should basically be using final only words object-oriented purpose okay safe point polling there's a new feature in the client compiler in hotspot 15 what is a safe point a safe point is the state that a javathread needs to reach for exact garbage collection specifically the location of all java objects needs to be known at that location currently compiled message are reaches a point in Java 142 as follows basically you have a Java thread executing through compiled code and the virtual machine has to start suspend that thread make a copy of the code and an insert trap at the pre-designated location and then continue executing at the previous location in the copy until you actually hit one of the traps at which point the virtual machine can take over and it's known as I the save point the the fact that we're suspending the threads makes us possibly a dangerous thing and it also requires a lot of extra overhead and pine compiler to keep track of the locations to to insert trap set this is now greatly simplified in 15 what we're doing in 15 is basically you have your compiled method that it's currently executing through all the instructions and every once in a while there is an access to a safe point page in memory that basically is the no op it's not reading it or writing anything that's actually useful but at some point the virtual machine decides to memory protect that page and so the next time you come around to that access it actually hits a trap and the virtual machine is able to take over this is this will result in much more optimal basically the overhead taken for your equity for your compiled methods to be able to get to in a state for garbage collection that will be greatly improved in Java 15 because of this okay let's talk a little about garbage collection in hot spot right now we're currently supporting three different garbage lection algorithms they're each designed to meet various the needs of different kinds of applications there is no longer this notion of one garbage collector to meet all needs so the original garbage collector the sphere of garbage collector that's that you're familiar with from hot spot since Java 12 or you know before then is still there and it still is the default collector but since Java 14 there have been two garbage productions algorithms introduced first there's a concurrent mark and sweep algorithm which has been designed to have a higher throughput for larger Java heat and there's also the parallel scavenge algorithm which is designed to have a shorter pause time what I recommend to you is to run your application with all three and each and also change the Java heat parameters to see where you can fine-tune your application there is not one garbage collection algorithm that will apply to everybody there are a few small changes to garbage collection happening in Java 15 the main thing is that the parallel scavenge collector will now be the default in all mac OS x server installations this is similar two sons approached at garbage collection configurations for Java 15 they're also making it so that all server installations will automatically give the parallel GC and we're applying that as well and the way they detect that is any dual mat any Duwamish CPU machine with greater than 2 gigabytes memory gets classified as a server machine therefore it should get the peril of garbage collector this will definitely be very good for compatibility between various installations of Java 15 we don't necessarily want all of a sudden performance characteristics to be different just because you didn't happen to get parallel scavenge on our platform the other set of differences in Java 15 is the fact that there were making it more convenient to you to configure the heap parameters for performance purposes in each garbage collection algorithm specifically in parallel scavenge you can now designate a percentage of time that you hope that your application is spent during GC under the covers that gets translated to particular heap sizes of the permanent generation the new the new size basically in the past you had to kind of know how these garbage collection algorithms were implemented and now that's the goal is to obstruct that away for you if you've been using use adaptive size policy in Java 14 thats also has a few convenience that also has a few convenience flag specifically you can specify how long you want a pause to take and also what person what ratio of the time of your free will full java application is spent doing garbage collection finally I want to talk about how we've optimized Java for the g5 these optimizations that we've done on the g5 require absolutely no code changes on your part and also no recompilation basically what we've done is we've taken full advantage of the double word registers available on a g5 and also all of the double word instructions this has been done both to the hotspot interpreter and the compiler and there should be big gains overall but especially those people who are doing arithmetic of Long's doubles and slow to will see a substantial improvement there's specific reasons why I mean we can do cast in line from you know flow to integer we're also bid extractions from those values are much faster and also the square root instruction is actually available on the g5 and we actually call directly into that instead of writing source for that as you can imagine that's much faster also synchronization has been improved by taking advantage of the extra lightweight synchronization instruction available on the g5 and so we are taking full advantage of all of the G all of the powerpc instructions for synchronization how have we actually been able to measure that performance gain well we've been tracking signmark two point oh so I'm are two point O is a very good example of a few scientific algorithms it does fast Fourier transform it does Monica Arlo approximation and you might be familiar with the composite score numbers from the Java Sea of the Union yesterday but there are a few points that I want to point out that are different from that but as you can see these are our scores on a 1.2 5 gigahertz g4 the 98 number is definitely pretty low and our goal with the g5 has definitely get the number up we expected at least twice as fast this was this the second bar is basically a pre 10 2.5 gigahertz g5 just twice as fast as the scores are twice as high as the second as the first numbers well we actually get on the g5 are something that's substantially larger and we're pretty excited about that this makes the composite score is now very competitive with scores reported on other pentium and other platforms and we're very excited about that but one thing i do want to point out is for example why Monte Carlo is still low it turns out the Monte Carlo is doing unnecessary synchronization by attaching synchronized to the front of a method you remove that and score increases dramatically it does increase dramatically on the g4 as well the reason I want to point that out is a decline compiler is unable to detect unnecessary synchronization and it therefore is evolved into the into the hands of the java developer to do analysis on their application to see if this could be the case in your application and that's it and i'd like to invite christie to introduce sharks for java hi everyone hope you having a good afternoon today I'm here to talk about shark for java and high level performance analysis how many of you have heard of shark or use shark wow I didn't expect quite that miss you you know about this program but now we're going to show it for java so i'm going to go through this part pretty quickly then so shark is this is the ultimate profiler you can get on Mac os10 it's a really neat program in the past has been great for analyzing our C and C++ Objective C programs it does both what's called costs and youths analysis I'll go into that a little more later it can profile a running process a thread or even the entire system and for Java you know it's limited to just a single process but it can do time samples like other profilers you but just some things that we used to be able to do in the old sampler programs such as allocation tracing and even exact method tracing so it can act like cheap ralph and record every invocation of a method so there's two other methods are really nice addition to the usual time profiling you see on other profilers it also does non Java profiling as I mentioned for time memory function even low-level hardware events if your application is a real time type thing you can use that in as usual to study jni calls and you can download this beta from developer.apple.com and please get that because the version of shark on your tiger CD does not have the Java support we worked really hard in the last few weeks to deliver this for you guys for wwts I hope you enjoy it but you have to get it off the website the good news is it runs on both Panther and tiger yeah so you're going to take this back to your development system as it is use it and just rock with it it's awesome so some key features of shark is it provides a profile view that gives you a simultaneous heavy and tree perspective and i'll show you they'll become more clear when i show you the demo and we've also introducing this chart for sophisticated data mining and filtering we also provide a chart view though to visualize the execution of your program and especially for enterprise applications is a really neat feature you can be remote profiling over a network we run a command line tool and your ex serves sitting in a cage somewhere you know you can talk to it to shark free of rendezvous and you know control it so you have minimal impact a new survey you can analyze Tomcat you jsps you know whatever so that's a really neat thing and you learn more about you know the detailed features of shark at got chart is friday at three thirty p.m. so i'm going to talk about as a few general principles here that motivate the data mining what makes software slow well probably best known as bad algorithms you're using a bubble sort instead of a quick sort you know if you leave as large as it says data then your stuff is going to go really slow excessive memory allocations and locking these are primitives they're expensive if Victor just talked about you know this example with Monte Carlo where the overuse of a synchronization primitive just hosed performance disk i/o network Hall IPC these are all really expensive operations compared to doing an add these things you want to do as low as possible now more insidious thing that happens in software is doing the same operation more than once let suppose I write a module that you know quick sorts and you know properties I read out of a file and I'll say Victor had written another function that does the same quicksort these P will just show up as calls the quicksort in the profile but it won't show the fact that two different pieces of code into different parts of the program you know did this call the quicksort and this is a simple example of what we call complexity and software I have a little graph of an execution trace here of a program the horizontal axis is time slices or sample slices in this case is memory allocations the vertical axis is the call stack depth so we're really doing is we're taking like slices of your program as it's running and you know doing this plan you can see these interesting patterns there I'll go into that a little more in a minute so what do we mean by complexity large-scale softer has multiple layers and many modules and the bigger the system the more of these things you get and because we're good programmers we hide the implementation details of the from our clients so function called or method called foo could do something I just set a bit in a class somewhere or it could cause a transaction to a database you know update it from roads and even result in like the launch of a rocket through some IO devices in one cakes microseconds or millisec the other one can take minutes or hours so innocuous-looking calls can result in you know crazy complex unexpected execution paths so going back to this example which is actually the finder get info dialog we zoom in and the patterns of repetition show up in deeper levels on this fine level you see repeated structure and you're zooming out you know shows up again this is like two layers that are both doing iteration and repetition and they're layered now imagine is multiplying with five layers imagine adding a WT all the sun libraries all of your libraries your your huge application this thing can be insane so how do we deal with this well in analyzing performance you can break the impact of an operation into two pieces the cost of the operation times the number of places it's used and traditional profilers for a long time it made it easy to analyze costs now I can tell what leaf function i'm in the hard part is understanding the patterns of usage did Victor and I unintentionally both quicksort the same array when we could have just done it once and one of us access your cached copy and when you introduce of over modular zation you people tend to over abstract design too much you go kind of crazy with design you get these really crazy your multi-level call stacks multi-level things that just get really killed performance so analyzing use shark provides have two classes of features to help you analyze usage the first one is called call stack data mining and in this case what you do is you want to filter unwanted information you know how many of you have profiled something and not seen a line of your code in the top profile rather you've seen all these Java libraries in system libraries have any of you had that problem yeah I bet you had that was the first thing that happen to me when I did a profile now the other side of this is graphical analysis in this case you can visualize the dynamic behavior of repairing like those plots that I showed you those are not just cute graphs to make a point those are written data from real program that was able to use the fine performance problems and you do this through a technique called sauce for fingerprinting and sound for fingerprinting you recognize that if the pattern on the picture looks the same over and over again it means you're going through the same code path and you're going through the same code bad you're either just doing the same thing over and over again or you're iterating over some array or other structure data and in that case you can still look at the opportunity to hoist information in other words in quicksort you have to do a compare function but suppose your compare operator then has to go through a whole bunch of different classes and call stacks except to actually get down to the words doing the real a is less than B well that's not a good you should d capsulate that stuff and you remove the amount of overhead to do that compared so the sound for favoring can also identify those kind of cases shark supports both approaches so whichever one works for your application it's there so data mining concepts to eliminate what you don't want to see you know one of the this doesn't work that well in job right now but you can eliminate functions without source the thing is almost every you know I'm symbol in java indicate and its sourcing from it so we're going to work on making that a little better a really powerful one is exclude package shark calls these things libraries because it was originally a C C++ objective-c tool but you can basically choose the library like so you don't want the awt in your trade you're going to exclude it and it'll charge the cost of any of the things that samples that have found in those libraries to the things that call it and exclude assembler work similarly accept it works as an individual symbol to help you see what you do want to see you can focus symbol and focus symbol you choose a particular culture you want to look at you know by looking at the rooted thing and you can focus in on that and it'll get rid of Maine it'll get rid of everything else around so and focus package is the same thing for you know the pic know all the functions within a piglet package so I'm just going to show you this graphically because it was a lot to cover so in excluding library and we have an example of a main program that calls it an it function a do example and a clean up and do example in this case calls the function bar four times and let's say uses Java util and say it until use the hash table so in this case when you profile it you're just going to see all these samples as indicated in yellow in java.lang you java.util and not in bar so we don't know that we've been using bar to do this but by excluding it turns barn effectively into a leaf function and now you can see well i'm making for calls the bar also computing the same thing I don't need to do that if they are I can hoist that was another operation is very similar to exclude library called flattening a library and that is makes the library go away instead of make it all go away all completely it replaces the library with all of the entry points into it so you can observe your usage of the library in that situation and finally focusing we're going to focus on do example it makes main an it and cleanup going and you're just left with this subtree so these are various ways you can kind of trim the tree and see what's going on canal's do a demo so please switch to the demo machine okay thank you so we have a sort of modified version of the Java 2d application here and to use you know Java for shark you had an ex run parameter you're using the jb MPI interface it'll migrate to jb MTI in the future and you add a dash X run shark argument so with that in mind let's run run this so you get a message in the console Java for sharks you know is enabled and here is our you know familiar example we added a new pane here called bouncing strings and this is kind of a cooked example and that it has some performance problems introduced that we want to go fine so let's go over and launch shark and shark has all sorts of traces but we're going to choose pronounced java time trays and when it does so you can pick the job a plane in this case we ran it from a your command line type shell so you just need Java it would see your application name if you made a double clickable so let's just start sampling oh by the way you notice that it just paused it does it every so often and it's because we're garbage collecting you see it just did it again so that's kind of odd let's just start sampling so there's like go for a few seconds you know given the sampling rate is probably good example for about 10 seconds and let's stop sampling and we now have you note you know typical profile view and we have a list of the various symbols and the percentage you know samples that occurred in them on the right here you'll see a back trace of the calls can you see do string drawstring and it goes down to bouncing strings paint you click on another symbol you see it's back trace and one thing to kind of help keep track of things a little better there's a neat feature called color by library when you click on that look what happens that color is all the strings and this will help us identify everything WT is colored in you know this red color brown for this and so on and yet one little problem here in a java runtime and jb mpiana isn't perfect about reporting all the symbols so we have a method with an unknown library we're going to use the exclude library to get rid of that and tribute those two things that are more meaningful and when this happens you'll see that these percentages win it you know paint strings went up your native font wrapper went up and so on if we look at native plant wrapper we can exclude the library again and now it pushes initialized font up let's give interesting initialize font instead of a drawing you're drawing or painting strings why are we initializing a fun this is taking up almost as much time as it's taking to draw the string so let's take a look at the heavy and tree view and we've been looking kind of from the bottom up we've been looking at the leaves of the execution train now we can look from top down here's our event dispatch thread run and it works its way down and here's bouncing strings paint so we see that this bouncing strings pain is an important place to look at and here is one of the really cool features that we worked hard to get in for you guys and double click on bouncing strings pain and get source and in source here it's annotated by the relative printing of densities of the column about ten percent nine percent of the time is spent in Phil wrecked and eighty-nine percent of spend in this paint strange function he notes that these things are underlined well that means you can double click on it and navigate to you know the associated function and play old web browser you have a backwards in a forward arrow and you see here you know there's three areas of interest and we found oh that looks like a problem we're calling set font new font lucita so we're constructing a font every time we're painting actually inside of a for loop that's pretty bad so let's go fix that and yeah we kind of rig this a little bit but you know I've made errors like this in programs yeah I'm sure you know other people might have to so this is worth doing so I happen to have the corrected code here just to save this time so I'm going to change this and also let's quit the app and we're going to run again give it a second to load and we go back to bouncing strings and look at that we just about doubled its speed so now just by doing you know some analysis we were able to speed up a program you're pretty significantly now let's do one more trays you now that is since you're feeling lucky we made some priors let's move and make some more I'm going to do a memory trace because this is a different kind of technique that you might be familiar with so we're going to do start and the memory trace slows it down a little bit because we're sampling every memory allocation and let's stop that and now we see Wow sixty-nine percent of our allocations occur in component bound and we change this to value we can see that just knows few seconds we allocated half a megabyte of memory now wonder we were garbage collection so much we are allocating all these bounds objects so let's look at what's up with that so if you look in the back trace here you'll see that bouncing strings ball Baltic is doing most of it and we go in here and you see that we have this bound beagles get down it's being called in every tick I've already got code in here to turn that off and you can cash it which would be the obvious thing just compute at one since the window size doesn't change so let's go ahead and make that change we made a convenient little boolean here called cash bound and by the way this is not a cooked up example this was something that we found in the program just you know this is it was given to me so we're going to do that we're going to run it again go to bouncing strings and look at that we're now at about 180 3 190 I've seen this thing go over 200 so just doing simple memory optimizations because memory allocation is so expensive and on and Java can get you huge rent wins you know what I was working on an application server a few years ago where we reduce the number of allocations x 10 x and we got a 3x throughput improvement in that server so doing just memory analysis and memory reduction is a really amazing technique so thank you very much I'm gonna give this back to Victor Thank You Christy I need the so I just wanted to conclude with one recommendation for optimizing for hot spots the main thing you need to know is exactly what your hot methods are you can find out bottlenecks in your own code and you can identify them using sharks or Java the other thing that you need to be aware of is even if there's no more nothing more to be done you also need to make sure that the hot method is as amenable to hot spots optimization opportunities as possible I want to highlight the fact again that plane lining is probably one of the biggest optimizations that were able to do and let and therefore it should be in your best interest to us to make your hot methods in line of all so here's a reminder of the of the key things that keep a method from being in lined it being too large it call it being synchronized or it having exception handlers and the last tip I want to tell you is that we do have a java lab here at wwc all week and if you want to see your application running on on a mac and i'm done so before do go down there or if there's any performance model next thing you want to identify to our engineering definitely we're able to do that there as well so that concludes everything want to point out a few URLs you can get um java reference documentation from Apple at the ADC website you can also get java 15 documentation at the Sun website and finally that is the URL again for where you can download Tiger that has shark sorry where you can download a shark that has the job of support that runs both on Panther and tiger finally if you have any more questions that people to contact our Alan Samuel these are Java technologies evangelist Bob Fraser he's our product manager and finally franchisee well who is our the manager of all things Java and apple