WWDC2001 Session 501

Transcript

Kind: captions Language: en good morning I was thinking about just sitting the audience and starting this talk because to some extent the Java Virtual Machine is an invisible thing right yeah like sits there and does your stuff when you're programming in swing or when you're running your application and things the virtual machine is like just it's just there it's just the utility it's just the thing that makes your java happen and hopefully if my group does its job right that's you know hardly even have to know about it right well that's that's our goal we want you to just enjoy the benefits of it but in fact it's an incredible piece of technology I don't know whether you have have heard of the game or have played the game The Incredible Machine or or have kids who have played that game but the incredible machine is this great little great little app on on the Mac that that lets you build and and and put together really fun little stuff and I kind of think of the virtual machine the java virtual machine as an incredible machine so anyway in this talk we're gonna talk about what we're gonna I'm gonna get up onstage as if I were still sitting in the in the audience saying all this and tell you about it so first of all if you just came from Steve Narrows talk you heard Larry Abraham's get up there and talk about how hot spot is this next generation virtual machine so it's true it is a fabulous piece of technology so we're gonna talk about what hot spots about and why it's so good and why some of the things we like about it beyond that we're going to talk about what Apple does beyond now what Apple does to add value to the virtual machine as we get it for the Sun we will talk a little bit about what is in the Java developer preview that we are going to be releasing I guess later this week or any day now or by the end of the week whenever we get it out on the website and finally we'll have some time for some Q&A so first of all java virtual machine basics what does a java virtual machine do so if I say JVM by now I hope you know that I'm it's the acronym for the java virtual machine the virtual machine is responsible for managing the threads of execution your code is written in Java and it executes so various threads so the virtual machine makes those threads happen it's pretty easy for us because we have mock and and threads underneath us but we keep track of what's going on there and we the Machine execute your byte codes it is an operating system in a lot of respects and having managed operating systems before let me assure you it is an operating system in many respects the Machine collects your garbage your garbage is your unused objects these eat up memory and as fast and one of the virtues of the Java programming model is you don't have to worry about getting rid of your memory when you're not using it it helps us if you know out your references when you're not using it anymore helps you also but unless the Machine take care of that stuff the machine helps shift control back and forth between your byte codes and native code underneath byte codes are great so our native application ativ libraries that do great things like speech recognition speech synthesis quicktime you know GUI drawing all that kinds of stuff so there's so much you can get done in byte codes and some stuff you have to get done in native code and so the machine also handles the transition to and fro library Java home is where we put you know the standard properties and things like that and that's something that you get to extend cocoa Java if you wander around with Java browser or something is the is located in a couple other places it's under system library Java so if you poke around and find strange things in strange places this is meant to help tell you where where some of our stuff belongs the the next slide is about what we do or what we think of is your place so in our system Java home is library job home and the things in there that you extend with your code with your applications are the bin area sometimes you've got a little helper helper utilities and stuff Java home bin is a great place to drop those if you have jar files that need to be part of you know the standard extensions that's what the extensions directory is all about and put stuff in there and you don't have to fool around with class pads anymore that is great clasp has we're not such a such a great idea obviously the other stuff their fonts images you put your properties their security data you know it's the general kind of dumping ground for all stuff all sort of support data for Java programs cocoa and Mac OS 10 has different bundling mechanisms that let you package some of that stuff before but if you're coming to us from another platform don't want to rethink that part you know Java homes where you already put your stuff this is the part of Java home that we want you to extend and of course in applications when you write stuff you put your your resulting product up there so that's it with the basics I want to talk a little bit about hotspot so what is hotspot we take hotspot from Sun and we put and import it to necklace 10 so it's not hard compared to what it was a Mac was nine plus nine we had to do things like invent a threading model we had to - things like well anyway I'd only go into it the so the adopted Darwin phase is is pretty easy because there's natural threads underneath there's a natural filesystem model so i/o just happens and stuff like that so it's it's a porting job the main thing where we add value at this stage is we write the interpreter so this you might think well the interpreter is just a C file right and you just kind of compile it well that's not quite the way it is so I'll talk about that a little bit later but we we get the interpreter up and running and so now you can run hotspot in a in an interpreted mode and then of course and we're make a run fast we have to write a compiler we have to write a runtime a JIT C compiler that compiles your byte codes dynamically into machine code you know roll folds in in your program and makes it run so that's the basics for what we do with hotspot right and then the point comes where we want to make it better and I'm going to talk a lot about how we make it better but let's let's just keep that let's just step back to hot spot hot spot this next-generation technology what is that really all about well first of all it's 700 files I have four people in my group who work on hotspot as well as other things so 700 files that's 200 thousand lines of code it's actually 714 files 220,000 lines of C++ code so I don't know whether any of you just started out programming six years ago and I've only known Java but C++ code can be intricately complex if it's not done well luckily for us hotspot is done very well but it is still an enormous undertaking just getting that part up so let me talk about that interpreter for a minute the interpreter is actually combined assembled if you were out of code templates every time you launch so there is no interpreter dot C file there are templates for the interpreter now why would you like you know Jam together an interpreter every time you launch the VM well let me tell you there's there's two reasons one is we don't quite make use of this right now but if you're on a g4 we might be able to have a faster little loop that would make that could take advantage of a g4 processor so if we had just one version of it would have to be tuned for either a g3 or a g4 another one though that which we do make use of is if you're debugging every time you execute a bytecode you'll often have to ask you have to say am I supposed to stop here is this a breakpoint am I supposed to do something else and so there's at least an if debugging check that you have to do on every bytecode that you interpret well not if you assemble the interpreter on the fly so we check and we look and we say are we running this in a debug mode and if not we assembled the interpreter without that little if check so the instructions just go flat-out the interpreter instructions just go flat out as fast as they can and it's really important to have a very fast interpreter because compilation just-in-time compilation takes cycles away from your program and you should only do it when you have to when it's going to be to your programs advantage so having a very fast interpreter is very important and so that's why that part of the technology is actually pretty sophisticated I could tell you more about how when you're about ready to do garbage collection you actually swap out the whole interpreter with a little jump table such that you jump into and start synchronizing with the garbage collector thread but I mean it has a bunch of very sophisticated technology it's really cool but if you're not going to be done with if you're not going to be spending your life in the interpreter what you want to do is have a fast compiler you want a compiler that compiles fast because it's again taking cycles away from the running time of your program and you need it to build good code when it does spend that time so our compiler it's fast compiler and compiles into pretty good code we are anxiously awaiting the the next generation technology from some the 1.4 train because it actually has a better technology for generating that our code so we're we're looking at that because as as Steve knob said the pendulum is swinging back to the compiler part so we have a pretty we generate pretty good code and we're looking to make that code a little bit better for you finally hotspot has a patented I believe implementation of synchronize now synchronize is an interesting notion from the java language of viewpoint when you go to access a vector object for example it's methods are synchronized right and what that means is that they're safe if several threads are trying to access do that do operations on that on that object at the same time well the reality is that most of the time when you use a vector only one threads operating on it at a time in fact maybe only one thread will ever operate on that so the idea with his synchronized operation is that to get a hold of and acquire the lock for that object it's a very very fast hand to dissembled you know compare-and-swap operation and basically it says that the data structure for that is built on the stack of the caller and basically it's very very cheap there's no extra data allocated for that only if a second second thread comes in and says I need to operate on this object also do we build a very a heavier weight operation a heavier weight locking operation and actually go to the operating system to say block that thread we don't want them to spin we want to put them to sleep until this other guy is done so a very fast synchronized implementation is is it's a trademark I know it's a patent it's it's it's one of the really cool things about about hot spot a fabulous thing about hot spot is that garbage collector I mean when I grew up I never thought I'd be strolling the virtues of garbage collectors but garbage collection is actually a fabulous technology that lets you program a lot easier and so I'm going to talk a bit about garbage collection right now well not quite yet sorry what are the benefits I want to talk a bit about what is all this fabulous technology do for you in combination so one of the people who works for me put together a little tiny benchmark we call it the allocation micro benchmark it's from one to sixteen threads or something like that it goes I want to tell you about it it's several threads running in this allocate and objects freedom allocate on objects free them how many threads can you get gone at this and any any measures the peak rate of allocation so the fun thing about it is he wrote it about four times he wrote the code in Java he wrote the code in C he wrote the code in C++ and he wrote the code in Objective C which is what cocoa is found is based on so then of course you run it on multiple processor machine to make sure you get two threads actually trying to do the same thing at the hardware level at the same time so I have to warn you micro benchmarks should be taken with a grain of salt a lot of water and don't don't think about them too much or don't trust them to predict your performance because they often focus on one very a typical usage pattern I mean you might use it a little bit but you don't use it a lot and so anything you see from a micro benchmark about a particular little usage pattern it's really hard impossible really to extrapolate from that to your program to any kind of a extrapolated win for your program so to emphasize the point your mileage will vary obviously it'll be much less so let's talk about the results so with Sica this allocation benchmark gives us about 200 objects per millisecond Objective C is a little bit less I think well I'm not sure why but there's a little bit of message overhead in there for that C++ actually is a little bit better than C you peek out around 200 and you know 225 objects per millisecond with Java in that interpreted mode allocates faster than compiled C++ code not bad when the compiler has run and is running those threads the allocations are eight times faster that is pretty phenomenal this is two threads trying to go after objects and they get up eight times faster when you're writing a Java for point of reference in mrj on Mac OS 9 compiled as fast as it could go what's still faster than C or C++ but is you know it's just a little bit faster than hotspot interpreted so let me talk about garbage collection again garbage collection is 41 years old first paper on garbage collection was John McCarthy 1960 where he talks about mark-and-sweep mark-and-sweep is the idea that you got your objects laid out in memory and you go and you mark every one that's still alive and then you get to reclaim this the stuff between between the objects so that's pretty cool it was used obviously on a lisp system three years later Marvin Minsky of other fame came along and provided an interesting paper on a copying and hence compacting collector where not only do you mark all the objects that are alive by descending through their routes but you copy them into a new space and so that compacts your memory and so you don't have fragmentation issues that that that plagued C programmers all the time because your objects can be packed into a smallest memory they need to survive and so this hugely extends the love running lifetime of a program it was a long time before the next major advance and garbage collection came along and that was in 1984 Dave Unger put out a paper about generational collecting and since then well and over the course of these 41 years there have been over a thousand papers written on garbage collection it's a great topic Java is the first system where it really comes in the mainstream for folks though I got my data for there's this great book called garbage collection and if if any of this talk interests you or intrigues you a little bit I highly recommend you go and buy this book and reviews all the algorithms and a very very great way let's talk about generational collecting what's the idea of generational collecting most objects die young right you use an object just a little bit it's dead so the idea is you split memory into generations such that you can minimize the number of CPU cycles allocating a new object and you can minimize the number of CPU cycles to remember and keep track of the old ones so the idea is that old objects actually often don't change that much in terms of what objects they hang on to so if you can never worry about an object if it never changes then you don't have to spend cycles even remembering that it's still alive so in order to make this really happen the compiler and the interpreter implement what's known as a write barrier such that if an object in one generation gets stored into an object in another generation we keep track of that to say hey you better go look at these objects over here because they might you know we might have had an intergenerational reference here so we can keep track of which objects are alive so that's sort of the basic background technology so what I want to talk about is how is that employed in hot spot hot spot has for generations running at once first generation the Eden is where you allocate objects and basically it's as simple as you got a pointer to the top of memory you add the size to it and you're done you have an allocation the only complication here is that the assignment for the memory the men plus equal size is an atomic compare because you got multiple threads that maybe having to go after that and you might have missed so there's actually a little loop to make sure that you stored what you wanted and so you have to loop back up and see whether or not you have to re add from the top of stack it's very very fast the so-called new generation is where objects that have that survived that survived the first run I mean that's the only thing you could do with objects in the new generation is you allocate them you never worry about them again because if they ever stay alive the only way they stay alive is if they got stored in an older object other than that they're dead so you just assume that everything that's in the in the Eden space is dead because you've actually track it you actually keep track of what objects are stay alive and the other generations so in the new space the new space is a to space copying collector you know Marvin Minsky kind of technology from 1963 where objects that are in this space just get copied over to another one and compacted and if they survive this kind of back and forth space a while then we say that they're no longer a child or an adult we push them into its kanoon as the tenured generation and they stay there from adulthood till till death actually they can die at any stage but there are adult objects ago their hot spot actually has two different algorithms for using to maintain the tenure generation the one we ship with is a pretty classical mark-and-sweep algorithm there's another one called the Train collector which you can get to with - X Inc GC I believe it is we haven't done much experimenting or much qualification on that we intend to though because the virtue of the train generation is it spends more cycles keeping track of your objects but you have less pause times when it goes to do and flying some dead memory so pause time is actually kind of important for we based apps isn't it so we're going to work on the train collector and see if we can get it into shape to ship with you're invited to go play with it yourself maybe it works just fine for you des there's another generation however which is used for support objects those 200,000 lines of C++ code are possible because those objects are for the most part garbage collected their garbage collected with the same collector that is used for the rest of your Java objects so hotspot eats its own dog foods it collects its own ah it implements its own collector and it uses it for its own purposes so the permanent generation is where the support objects for the program are used and other implementations those usually just come out of the mallet keep but in hotspots case they come out of the the so called permanent generation and that uses a market sweep algorithm objects they're rarely die so we it's rare that we actually worry about those too much let me shift gears a bit and talk about the things we do the things we do to make it better to make Java better first of all from the VM perspective one of the things we do is provide better integration better language integration lets you see more api's to use to write programs we like to provide better performance performance is critical to how you know how your program looks how it behaves and we really believe in better performance there's general observations about performance there's all different ways to think about performance in general you want to do more with less memory one of the things that hotspot does is in other implementations they had an extra word per object just to keep track of that lock just to keep track of you know whether or not a lock was around for an object and they had another data structure of the handle to keep track of where it really ought to be such as these stored handles and everything and so you know all that pays off so hotspot runs about 10% small and smaller memory simply because it doesn't use handles and it doesn't have extra data space for that that rarely used monitor on every object for the client Steve narrow off talk about how much effort is being spent on the server well for the client we think that scalability means running more apps for the same amount of memory when he got up there and said you know it takes I can't remember what the graph said 60 some megabytes to run to job applications well we sell systems with 64 to 128 megabytes memory and we would love for you guys to write apps and ship them and have them run well on our you know our out-of-the-box configuration so for us that means we have to make sure we use that memory to the best in the least and the most efficient way we can another attribute of performance that we work on is launch time nobody wants to buy an app and sit there and wait 20 seconds for it to launch I mean you know they put up with it but it's not one of the things that they that they're happy about and so if we can make launch times faster we're gonna do it for you and of course faster running time you know you especially from the VM perspective the fewer cycles we spend thinking about what you're supposed to be doing means more CPU cycles for you to actually do it so better language integration J&I is the standard there used to be others but j'ni is now the standard if you've programmed the j'ni it can be a little cumbersome right because you can't really get to the tune to an array you know in a rate you got a copy the the array contents over muck with them and copy them back and that's you know it's just cumbersome you get these J object references and stuff like that but the value of that is that it allows that precise collection to go on within hotspot since you never see a pointer to a real object we can move it around we don't have to examine all of memory and try to figure out whether or not that bit pattern really represents a pointer to one of our objects or is just happens to be some you know the your current net value of your portfolio sitting in money dance so that's the benefit you get for J and I so what we do we do two things to extend the ability to program to j'ni we provide J direct we use that internally for the super swing and AWT somewhat we also Qt Java QT j uses that and that lets you sit in your java code and and and get to the c routines we've talked about that in the past I'm not going to go into too much more detail I have a code slide a little bit later but but in using it what happens is you just kind of write your little wrapper class for the for the C functions you're going to be using and then you you have one piece of code that you do you say you know you asked J direct to build you a library and so it generates it writes the j and i stub codes links them in and then you just start using your code so when your static initializer you just say you know build me a load me in and j direct does the rest that's pretty sophisticated we also have a job of bridge which implements the technology that lets cocoa java happen there's a standalone tool a bridgit tool that is used to it starts with a mapping file and says this close this Java class maps to the objective-c class underneath it and the benefit of that is that for the most part those Objective C frameworks can now be subclassed in Java because the cocoa frameworks are used with setters and getters so whenever they do a setter it comes across the Java side and and and and does things and when it implements methods we transform the method names and actually dispatch on the Java side and vice versa so when you do super in an job it actually gets translated by the bridge and gets dispatched into Objective C below so an example I don't know how well you all can see this no I'm not too bad this is the J direct 3 example I just pulled this pretty much straight off the web at at developer.apple.com slash java and as you see the first line of code public static linkage needs to be done typically in a static initializer some reason why didn't that show up here hmm anyway the the new linker part is the part that you should do in your static initializer and it tells it to go fabricate something for prom for the for the class Prime it's a reference to itself right and so J direct goes and finds through reflection you know what native methods what static native methods are in there what their names are and what the types of their parameters are then it goes and looks up in the runtime and says hey is there a in this case is there a compute prime function around if you haven't loaded the library it'll actually look for that magic string and actually load that library for you in case your stuffs out there and so from that that's it from that point on you can now do prime compute prime send it a short and it will return a long long and you're in business you're writing in Java and you're using this this C based library underneath you the counterpart for cocoa cocoa is a pretty rich framework Steven are off says he's been working like obvi with Steve Jobs for 15 years my tenure isn't quite that long it's only about 11 but I had something to do with some of the cocoa api's in a role previously the one I have now and I wanted to pull up just a little bit of something I did a long time ago I can get to it from Java there's a date formatter there the date formatter takes a string and turns it into a date and more than that it can take a date and turn it into a formatted string so this is an example that does does that so the key element here is the let's see where to go the next Tuesday at dinner that's a pretty simple little English string it was a weekend's worth of hacking and it's kind of fun and but that actually turns into a real date so you can actually get to Coco from your Java and make use of it without having to wade through Objective C without having to wade through J and I and I invite you to take a look the cocoa examples that are shipped under developer examples Java app kit there's actually two or three programs completely written in Java there's a game called blasts app there's the sketch program which is a simple Mac pen or Mac draw kind of a kind of a game or a program and there's a text editor in there so go play with with cocoa it's kind of fun let me talk now about better performance I said we tried to innovate in two areas one was better language integration the next one is better performance better performance we all want to write question is of course how I mean it's not like you just walk up to your program unless you have optimize it and say how do I make it faster and it's obvious it's optimized it actually is in our case we had to scratch our heads a little bit right we said hmm what are the basic principles of performance well if you remember if you think if you've ever done performance work before you should know that memory is evil right if you are wasting memory you are going to spend more time taking away from a system that might not have it you might have to bring it in from disk you might have to I mean just memory is evil if you can use less memory to get your job done your system is going to run faster cycles are moving or the rate of increase of CPU cycles to memory bandwidth is just continuing to the the disparity just keeps going keeps getting larger and larger and to ameliorate that we keep putting more and more caches onto the chip because memory has to be really close to the CPU so just just think memory memory is evil remember that the next thing is that of course you should steal good ideas I mean why invent totally new stuff if there's already some good ideas out there already so if we think about memory and we think about good ideas where do we come to when you talk about see technology a long time ago they put shared libraries into this shared libraries are a mechanism for C libraries for for programs to share instructions right so you know we keep thinking the the C libraries what do they share well they share their instructions right that's a dominant cost that there's a little bit of utility in that with a dynamic sure that I were you can swap implementations out without having somebody relink so there's a little bit of code portability in there but but sharing the machine code the actual assembly instructions is the dominant savings or shared libraries another large savings though is the data that goes along with it and so obviously how do we what about using or building some kind of sharing for jar files so if we look at our initial at an initial memory configuration for your running app this is the memory layout for something that's just getting started I put in some realistic numbers real numbers for your java application the point here is that the Eden space is actually pretty large to start out with the new space is where the little back-and-forth compaines fairly small the tenure generation remember that's the one where your objects live in adulthood and then there's that permanent generation that that's sort of you don't know about it but it actually costs you kind of place right when you get running that whole space gets door.the the hot spot keeps those the ratios of Eden to new and the total of that to tenured and they keep the ratios the same put it in a 25 or in a in this case a 35 megabyte application the tenured space is where most of your stuff lives but doggone it that permanent generation the place where we keep things like your byte codes and stuff takes up a fair amount of space now wait a minute byte codes wait what about all the byte codes for things like swing things like Java line string I mean does your program have a different version of the byte codes for java.lang string of course not it's the same byte codes well why does your program and memory have a different copy of it no good reason whatsoever so when we took a look at what we it's what we could share we figured out that it's that space for the byte codes it's that space for the metadata for your program comes out of the standard shipping system libraries so what we did was so imagine that red space that red space gets split up into three sections there's a section that is completely shareable completely read-only part there's a section that is mostly shared it can be touched on but it's mostly shareable and then there's still the your class is the byte codes for your classes that that aren't really shareable to anybody so this is a review slide what we did was we added a new generation we call it the share generation it has no CPU cost to maintain because it's there to start out with it doesn't die because these objects are immortal so that's pretty cool if we don't even have to build these objects and we don't have to even maintain them that offers us a CPU savings as well so in addition to reducing memory we get to reduce the CPU cycles to get to that to get to this initial configuration and to maintain it during the running time of your program so we talked a bit about the shared generation it's based on the observation some objects never change and never die so those are the objects we want to see we want to share those are the objects we maintain on your behalf for the byte codes for the for the strings and stuff and your protein in your in your jar files or in the system jar files at least what we do is we process those standard jar files once we take a we have a list of the of the an ordered list of the classes that typically get used in a swing application we load them into the VM I'm using a special option which I'm not going to tell you about oh yeah a key point here is that we don't execute any byte codes typically when you load classes into hotspot you course you know run static initializers well the static initializers can do things like look at your command line arguments they can you know go look at look at you know go look at disk memory they can do arbitrary code right and so that would change the you know change the state of the program so the idea here is that we want to just preserve the jar file we just want to have an in-memory version of the jar file the useful the running the useful part of the jar file is the part we want to save and share and so we don't execute any byte codes and then we use that fabulous garbage collector technology there's a little part of it that just says iterate every object in this generation and do something to it yeah something like a closure only it's written in C++ anyway we reapply that garbage collection technology to pack all the objects that ever got created and pack them into these two spaces the shared read-only space in the shared rewrite space and then of course we write that space to disk and the next time you start a pot spot you just map that into memory do a little bit of fix up and you're running right piece of cake simple this is called pickling Swizzle I know nuts whistling it's not pickling map and go I can't remember hole three there's oh I don't know there's a term for that map and go maybe that's the right term the shared generation benefits I think I hit on some of those already there are virtually no CPU cycles used for the shared generation that's what the asterisk is about the read-only part that's true the readwrite part we do spend some cycles and actually a few more than we need to but it's almost totally free we rarely read the standard jars the classes that jar UI gr we don't even read them to get you started that saves cycles to process them it saves you memory to you know read map the index to the jar file and to you know read part to map it in and to wander through it and copy the stuff out to make our versions of it and stuff like that and obviously the disk i/o to get those things off the disk so if you never have to read them in that they're not sitting in your disk cache so that helps with the rest of your systems performance as well so one of the benefits from that is that a hot start you know the the the the second start of any Java program is always faster because we save all those cycles to begin with the a secondary benefit of this technology is that we can be smarter about how we lay those runtime data objects out in memory so for example there's linkage strings that that keep your class that roughly know whenever you reference another object there's a little linkage string that goes in and says you know Java laying the dough or or your your your reference to your classes that's actually laid down in the metadata and those strings are rarely used but if your byte codes are sandwiched between those rarely used strings what we do by pulling the strings out and putting them in their own space that's hardly ever used and keeping your byte codes hotter then we never even pull those pages in off of disk that reference the data that you never use and so you're working set actually gets smaller because we've done packing to put the hot data into the memory pages that you actually pull in off a disk so those disk i/os pack more punch because they bring in more useable data due to this packing benefit this sharing benefit was the one I started out with it's the last one I want to talk about Steve showed you how all together the combined benefits were 20 megabytes for two applications you know the benefits for three and four and five are you know are the same so sharing saves we've measured three to six megabytes alone the other processing adds up to some of that other data other reductions in the working set add up to some of those other benefits so if you're writing a swing app and most of you are you're going to be saving and getting that for free using our shared generation technology there are just a few caveats we don't yet know how to share your application jars well your application jars are actually aren't all that often shared but getting that runtime launch time benefit would be pretty cool so we're gonna at least try to figure out how to map and go your stuff so that your stuff launches faster the first start of a job application is actually a little slower and that is because we actually have to do some processing for all those swing classes we have to do some processing all up front that we typically meter out as you load them on demand and we're working on ways to to not have to do that the interpreter those byte codes that you execute have to be slightly slower but since in hotspot you spend 90 percent of your time and compiled code slowing down the 10% you spend and interpreted by one or two percent isn't a big deal but I just want to be truthful a caveat here is that what we share are the classes on your boot class path now for some programs that alter you boot class path hotspot takes a look at that and says uh-uh we don't know what they're doing so in in Jade builders case for example what they've done is they've provided their own implementation of certain AWT classes so that they can use them in their great designer the designer is a great tool so if you're using J builder for designing swing applications here's a tip you can get sharing for J builder by configuring jbuilder v which just got pre pre now since in your bags by adding a line called add skip path dot slash la to VTR that la WT dot jar is their jar file for giving you giving them a better AWT and that's a configure that that line goes in a file I called it JSA you can call it anything you want the magic is dot config in the open tools area of jbuilder v so directions for for our sharing work first of all we want to we know how to and we can improve the hot start launch time even further we can and know how to eliminate for the most part we know how to eliminate that first start penalty we know how to extend or we want to extend this fast start launching behavior to all the files or at least the ones we're told to all the jar files that exist in that extensions directory we were really pleased with that second order benefit of packing data so what we would like to do is rather than gather all the data for a class enjambment we want to make the observation that some methods are never used in a class so the byte codes for some methods shouldn't be on some of those pages that get brought in so we want to start packing based on the methods that are used and not just based on the classes that are used this may well double the benefit of our sharing already by reducing your working set by even more the we of course have to finish the GC work on the readwrite shared generation and we of course could figure out how to share more runtime data structures that that live in that permanent generation so the there's a few things we're not trying to do right now the non directions it's important when you're setting out to build something to know what your goals are and if you can to identify the goals that you're not going to try to worry about so the biggest one for us is we're not going to share the machine code that gets compiled for your byte codes I mean that is the first thing that other shared libraries that the traditional C libraries share but in hotspots cases you got to remember what hotspots about hotspot is about compiling the methods that you're actually using not only combining them but in lining methods that they use so you get one long pilot that is really hot because it has everything it needs to get its job done so though that highly tight code is really good for you so when we've measured how much code do we compile it never has exceeded two megabytes when you're running with hotspot applications like jbuilder and stuff we never end up compiling more than about two megabytes of code that code is not worth sharing that code is the stuff that's hot that is for your runtime because every time you run an app of course you get different hot spots right you shift into this area and it needs to do that and then you shift into another app it's all based on we're on your on your on your on the work program so the idea with hotspot is it's gonna optimize what your program is doing right now and so if we tried to share that we wouldn't do as good of a job so we're not gonna share the compiled machine code that we've we built on your behalf so the other reason is it's kind of hard right because if you do try to share that then it has to have relocation data in it and so rather than folding in a branch to a direct address we have to fold in an indirect ER it just gets messy it's not very good there's only one place where sharing compiled bite codes might make a difference and that might be for say the static initializers or the code that you actually run to get up and running if the interpreter if that's a dominant cost to getting a program up and running it might be better to have a pre compile but not so good but compiled compiled not as because hotspot would do normally but compiled better than the interpreter might be better to have compiled code start out but that that's sort of precompiled stuff and I wouldn't even characterize it in the same way so we might look at that another thing we just decided at the outset was we are not going to try to share have any kind of shared buffer shared read/write buffer of loaded class information a shared read/write buffer of compiled code information shared read/write buffer of anything because you know what happens when you have a shared read/write buffer of something some other app can make you crash we do not want that to happen so that's just not a design point we're going to provide the status of the shared generation this code that I'm talking about is in Mac OS 10 we shipped it on March 24th you're getting it already if you're using Java the Merlin we've asked we talked to Sun we talked to Sun about a year ago and said you guys really ought to do something about sharing because that's what scalability means for the client and so they said well the way you do this as you file a little oh I can't remember the tread not jsr you file a little you put a feature request into Merlin and through the open community process and stuff so we sponsored one of those guys and it's a heater request in Merlin which is their code name for jdk 1.4 and more than that we we talked with these with these folks the vm teams know each other and we talked with them and said you guys thought about doing this and what about that and stuff like that but anyway so we've worked with them testing are designed out with them and as we develop this thing and we've provided this code we've provided it back to sun so that they can use it for their implementation of this little feature request so current status well you have talked to Larry about the current status of that so we we've had very positive interactions with Sun on this work this comes about in two ways with web objects for example with web object stress testing they ran into some bugs and we kind of chased them down we go mm-hmm this is bug in what we called portable code so we call up our friends across the street and say did you know about this I go mmm no we didn't so we're actually feeding bug fixes through across you know through the through the indirect channels and making you know hotspot ash ship by Sun better for everybody and of course they've given us some feedback on approaches to take when we run into some problems and stuff so the feedback goes both ways I'd like to spend the lot not the last section but the last section for QA the last section of this talk I'd like to talk about what's in Developer Preview one which you're going to be getting either today tomorrow the next day before Friday the the JVM in Java dp1 it's basically got about two fixes since we shipped it in Mac OS 10 I just talked about him actually the web object stress testing gave us showed us two things once they started kicking off and so we've upped our mean time to failure to at least days I'm not sure we don't know a failure right now but it was running in terms of hours and about you know after about 24 hour now 48 hours of continuous hammering a bug would show up and that's the bug I alluded to that we figured out with was sun's help the the other thing though that was not quite right in Mac OS 10 GM was that debugging was really slow painfully slow and profiling didn't work so that's kind of bad so in DP 1 what we've done is we fix both problems we fixed profiling and we've picked fixed the speed of debugging and the way we did that was we took hotspot 2.0 from the 131 technology train and packaged it as an extra VM sitting somewhere in that little implementation space I told you about so there's actually two hot spots in DP 1 the one that's configured for normal use and the one that is secretly utilized whenever you do debugging or profiling so now why would we do that I mean where'd we get that VM from well obviously we're working on 131 right so we wanted to get 131 out to you in some ways especially for debugging and profiling because we think that's really important the benefits from hotspot 2.0 is again it's the client compiler technology from Sun they also have a server compiler the debugging as I said now works fast profiling works perhaps it didn't work at all 1:31 is actually a hotspot 2.0 is actually you know a next generation of the next generation stuff and they have a register allocator technology in there that we can use right away and we do use that so we get better register allocation when we're compiling and it's it for shadows or for shadows the compiler that they're working on 41.4 which does even better code gen so we're prepping ourselves for getting on board with the one for work but we didn't stop there I mean this is Apple right I want you guys to come to expect more from us than just what you can read on the web page is at Sun so what we've done since then or since we ship GM or Mac OS 10 GM was we put some smarts in to recognize when you're on a g4 now what could you do differently on a g4 well a g4 comes with this thing called a velocity engine now what's a velocity engine right you're supposed to do graphics with that right well it's a special processing unit for doing highly fast pipeline graphics operations to do graphic or pipeline graphics operations in a high speed way you've got to read memory like mad off the bus well if you can read memory like mad off the bus you can use it for simple things like copying memory can't you so we put a copy memory implementation in there that went on g4s uses the altivec and it is dramatically faster than just this any kind of C loop or assembly loop you can write and PowerPC yourself so we have that and it's in it's in our in our in our 131 version of hotspot that's about 200 we put in an optimized instance of I mean this is just an example of lots of little we do for you that you guys will never hear about all you're ever gonna see it is it improves your run time but when you do instance of you typically think well how would you do it well if it's not this class I got a look at the parent class go look at the parent class go well we put a table in there such that it's always it's a constant speed operation instance of works so it's fast we put in even better register allocation than what came from 131 131 still doesn't deal with floating-point registers very well so now we have a better floating point register allocation method for when you're doing those graphics operations sharing is not in this little package for profiling and debugging use only VM we know actually how to make startup times even faster but since that's part of sharing it's also not in that thing yet and 131 also has a technology known as per thread allocation pools so remember that that that very fastest Edin technology that I talked about where you you bump the pointer and yeah but that but that reassigned to the memory was that compare-and-swap instruction well with a per thread allocation pool you don't need the compare and swap even so it really is just about three instructions to allocate memory instead of a stall the processor and check with the other CPUs maybe next to you and make sure they're not using this memory kind of instruction so it actually is gonna be really really fast so I put this up here because I mean we're starting this this kind of beta train thing with with DP one I want you guys to play with it so if you want to use it for casual use use it via the command line like this you say Java dash hs1 underbar three underbar one HS one three one will run this this new version of hotspot on a program you throw at it if you really like it try using it all the time there's a link sim like I'll let you go explore but there's a symlink under java vm framework that points to the version of hotspot that actually gets used it's the Lib JVM die Lib symlink slam it to point to that thing you'll find an HS underbar 131 die live somewhere if you slam the JVM done sim link to point to it you'll get a hotspot 131 all the time tell us about it so when it was when is this thing going to be available I wish I knew no it's it's going to be coming it's real soon now so to get to it you sign up at developer.apple.com you you go to the connect dot after you sign up as a developer you're all the developers you're here right okay you go to connect that Apple comm and you download it when you download it what does it do it preserves your existing 1.3 implementation 1.3 is a sub-directory under Java frameworks so it pushes out aside in case you don't like what you got yet preserves what we find under Java home that we think you've augmented specifically the stuff that's in Lib including your extensions you know any other third part you know even QuickTime is in there right stuff we ship gets packaged up in extension so we preserve everything that's in extensions because we actually put some other stuff in there and we preserve everything we find in Java Lib home bin so that's the main motivation for that first set of slides that tell you the stuff that we consider our implementation and the stuff we think you should extend because we do need to upgrade you want us to upgrade and we we gotta we've got to agree on some rules as the stuff we need we can upgrade and the stuff that we shouldn't upgrade so there is a mailing list Java dev that you can get to go to as I said that page there I pulled the j'ni example or yeah the J direct example off developer.apple.com slash java there's a section on there that talks about the java dev mailing list and sign up members of our the extended Java team read that respond to it we found it very useful and we appreciate your comments from that so a quick road map the first one wrapping MacOS api's and beans if you went to Steve narrows one you saw Steve Llewellyn Steve one is I'm proud to say works for me and does I've empowered him to go do great stuff making more Java happen at Apple and so he came up with some great api's the stuff you saw there our API is there beans you can use them inside jbuilder to add that kind of technology to your apps so find out all about it by going to the section 502 that's today at five o'clock java development tools steve now i've talked about that's tomorrow at 10:30 java performance performance is critical to us so we have a whole session on how how you can add performance to your programs how you can discover it things to avoid things to do part of the java development tools to talk is the optimizer demonstration and jbuilder debugging and PBS or a project builder debugging and if if that's not enough if you really I put the J builder reference up here as well because J builder is just an awesome tool for building pure job applications ah that's about it ah how about that there is the feedback forum as well on Friday at 10:30 that should've been on the first one so please come tell us what you like what you don't like and give us suggestions as to what you'd like to see even better Allen Samuel is the contact he was the guy that introduced Steve Nara find him as Blucher one at Apple calm you