WWDC2000 Session 410
Transcript
Kind: captions
Language: en
good morning everyone please take a seat
we're about ready to get started if you
are in the overflow room we have plenty
of rooms still here in the main hall
probably the only time that's true today
so come and take advantage of it so glad
to see you all here this morning how
many of you were made it to the
community boss last night you have a
good time
get me some people learn a few things
excellent so as you can see we got tired
of having bill preaching at us from the
aisles we decide to just bring them up
on stage and make it a lot easier on all
of us
so we're pleased to have Alex code and
Bill Baumgartner here from code fab
these people between them probably have
a good fraction of the WebLogic
experience on the planet and so we're
really pleased to have them here about
how to tune the last drop of performance
out of your applications Alex and Bill
good morning
thank you all for getting up at the
crack of dawn and getting down here I
know next time we'll make sure that this
session is later in the day so everyone
from sleep in in the morning that's the
first trick to optimizing is make sure
you get a good night's sleep okay so you
know basically you know this is about
the good stuff yeah you've got an
opportunity to do killer high
performance applications is when
somebody wants to do the big high
transaction rate East doors it's going
to make a bazillion dollars and you know
this thing has actually got to work it's
got to work well it's got to work well
with a lot of people doing a lot of
transactions through this thing and you
know and in truth you know you have to
do a little bit of thinking and planning
before you can get something that works
exactly the way you want it to be
ultra-high performance and also reliable
also scalable and so forth
and we're going to try and give you some
good points on how to make my web
objects app that can take all that
punishment and make it look easy and and
we want to help you guys build an
application like that and make it look
easy
ok so this is what you're going to learn
how to do it leaner meaner faster
nastier web objects applications it will
kick everyone's butt
most importantly you know optimize them
before you build them and then
you know what to do when the application
turns out to be quite as fast as you
wanted it to be and how to get that last
ounce of performance out of it okay so
what can go wrong with an application I
mean what will stop an application from
running you know perfectly with billions
of transactions and we know that blowing
up well now can it can work too hard you
know the the application server is
cranking away at 100% and there's no
spare CPU cycles and so the response
times just start to slow down because
you know you're just competing for
actual cycles on the server similarly
you can run out of memory and the
machine is working too hard getting
stuff in and out of swap space you can
be bound by the network you know
literally just you know have trouble
getting enough packets in and out of the
machine and you know bound by you know
the actual code you've written you know
just you're doing too much per
request/response loop and it just takes
too long to get those responses out and
of course you know common one is that
the database is working too hard and you
know takes too long for your database
calls to come back because it's doing
too much the first three can be fixed by
spending more money now CPU bound will
just buy bigger computers you know
memory bounds take those things full of
RAM Network bound by a bigger pipe I
mean in general when you're building a
big site like this it's important to say
you know you have to have enough
computer power to do this kind of thing
it's not going to run on you know some
windings box in the corner you're going
to talk big you know Solaris machines
with lots of CPUs in them similarly
you're going to talk about gigabytes of
RAM the basic idea is to make sure that
you know you want to get it close as
possible and not swapping at all
similarly you want to get you want to
buy CPUs until you've got a lot of idle
time left on your CPUs because you know
you get that Slashdot effect going on
you get mentioned in the popular press
and suddenly you know you get ten times
the people hitting your site as you used
to and you know when it's running it's
sort of normal rate you want to have you
know the CPU usage down where you've got
some extra space but you know the last
two problems require optimization and
that's what we're here to talk about um
you know I start from here sure okay so
one of the things is that with
optimizations you don't want to optimize
something before it works however there
are some things you can do before you
start coding that will lead to an
application that is both relatively
optimal or at least passable as well as
something that can be optimized one of
the philosophies we work by in is to
make it work then make it work right and
then make it fast and it's very
important to make it fast last the big
mistake people make is trying to
optimize something before it exists in
that what we're going to talk about is
design what you can do in the design and
then the initial coding process to
really lead to an application or our
server solution that can be scaled and
managed and extended in that within that
understand usage patterns you want to
optimize the most used areas first you
really want to focus your design around
the areas that you think the users are
going to use if you're talking about a
large site and you have the budget to do
it do some use testing do some focus
testing where you take some to your
potential user audience and run them
through the design and get their
feedback there's companies out there
that do this the other thing is make
your entry page fast one of the things
you'll see on the web especially when
you get nailed by the slash dot a
factory get touted in AOL or something
like that is that you'll get hundreds
and hundreds of users that will hit your
entry page and go no deeper kind of
depressing but it's the reality to that
ends one of the things that Dave Newman
in mentioned in his security thing
yesterday which was very interesting if
you're talking about a site or will not
even the site where there's logins but
in general if you're talking about a
site where you don't need to create a
user session when the user hits the
entry page you can avoid a tremendous
amount of overhead by not creating an
individual user session when the user
hits that entry page and that can be a
great boon to performance plan your
business logic around response
generation one of the things we
commonly find ourselves doing is we're
building back-end applications like
entry tools etc and while those are more
complex what you really want to do is
design your business logic about how the
front-end the high traffic piece is
going to be used you want to avoid
repeating expensive calculations use
caching you know just avoid the
expensive ones altogether use less
precision don't provide as much
information or provide user interfaces
where if the user wants the expensive
information they have to drill down a
little bit they have to ask for it
retain and reuse data know when it is
out of date that's a huge issue there's
a caching session gof caching session I
recommend everyone go to that and manage
your cache data carefully this is a huge
issue as well think carefully about how
often you really need to refresh that
data and how you're going to go about
doing that one common mistake is
invalidating your cache simultaneously
across all your applications so every
single app hits the database at the same
time bad idea is it you also want to
minimize your memory footprint by doing
that you can run more instances which
gives you more opportunity for scaling
by spreading your traffic out across
instances share data across your
sessions what that means is you you when
the application starts up you pre cache
information you want to clean up
thoroughly and you want to clear
transient instance variables and no
longer need and now what that means is
that if you're doing Java programming
just because you have a garbage
collector don't be a lazy programmer
okay when you're done with something set
the pointers to null I'm not pointers
set the variables to know that will not
only will that make your code cleaner it
also means that you're going to avoid
issues like using a variable that you
really didn't mean to use anymore it
also means that when the garbage
collector does run it has less of less
of an object graph to traverse use
stateless components stateless component
is a component that literally has no
state within it it's cached by the
application not by
session you shared sessions if
appropriate if you're talking about a
new site maybe you don't need user
sessions at all and set the session
timeout value to something appropriate
you don't want these things sitting on
your server forever you want to plan
your data access your queries caching
cache updating and understand your data
at latency you really want to try for 0
requests or zero data requests per
response now obviously you can't do that
I mean you're never going to achieve
zero else your apps and we're going to
display anything but you really want to
minimize those because if you can
minimize the trips to the database you
can really increase app response time it
also leads to better scaling if you have
fewer applications talking to the
database simultaneously as you hit that
huge flood of traffic if you're doing
your caching and you're sharing that
data across sessions you can increase
your scalability because as your as your
traffic Peaks you don't have sudden
bursts of activity against your database
using memory searches where possible
obviously anytime you can avoid traffic
across the network in your server
environment you're going to get a huge
boost in efficiency you want to manage
your faulting and manage your caching
again this is just about making sure
your data is up-to-date at the same time
making sure you're not expending huge
amounts of processor time network
bandwidth etc updating these caches and
use the shared editing context for
reference data and there's a shared
editing context functionality that was
new and four or five this allows you to
relatively easily share data across
sessions I mean if you've got to go to
the database and read the the upcoming
calendar events every session doesn't
need a copy of that and you can use time
outside your request for response loop
for housekeeping or you can use time you
manage the time with which you're doing
the housekeeping in the request response
loop very carefully for example you can
load reference data at the application
startup instead of forcing the first
person to hit the site to refresh your
caches or fill your caches use the
application will finish launching
notification to pre fill those caches
you can use timers or perform after to
delay to do database access or to do
cache and validation cache updating etc
got to be a little careful with that
because of the way what objects works
and the way threading works you can end
up with a thread issue but there's
discussions of that on the omni group
mailing list which everyone should be
subscribed to I'm going to try to keep
this high-level serialize and lock
request handling that's very important
and this is when you get into really
advanced web objects programming and you
start doing things like multi-threaded
cache updates cache and validation or
the timers is what you want to do is you
want to make sure and this is kind of a
warning
we have scars you want to make sure that
you're locking your request handling
when you're in situations where your
caches are being updated because you
could find yourself in multi-threaded
situations that can lead to some serious
data destruction and these are again
things that have been discussed on the
omni web list and well not really going
to them
you want to partition your functionality
into multiple applications one of the
temptations with web objects is and with
the ease with which it is to add
functionality of applications is to make
a monolithic application that just does
everything and part of it there is that
is very convenient if you have a single
session for the user and that that
session contains everything in the world
it's very efficient well yes it is but
at the same time it also greatly limits
your scalability if you talk about
spreading the user session across
multiple applications such as the user
say comes into your site and browses
that's a different application and say
drilling down into a product or doing
searches what this means is that you
have a much greater opportunities for
scalability if you need to if the search
tool proves to be a bottleneck you can
just run more of them versus having to
run more of your monolithic application
move more expensive operations from live
site to data entry that's just about
building administration tools think
about when you're building larger sites
building a front-end application that
the users
it's all optimized and oriented to
performance building a back-end
application which is what the
administrators see which is optimized to
flexibility and power and manipulating
the business and by doing that you can
move your expensive operations into the
backend if you're doing something like
now storing images and you know off on
the server and you can keep track of
information like how big the images are
and things like that you gotta do other
sorts of processing it's necessary to
prepare the user interface for the
person visiting your site to see you
know move those calculations to like the
data entry time when I upload the image
let's calculate what the height and
width of the image is and sort that
information in the database rather than
grabbing that you know at runtime and
you know there's a variety of things
like that if you can you know compose
you know compose images by you know
compositing and put them store them when
you enter the data rather than when the
user comes to view the data you can save
time now if you construct cached HTML
pages when you're doing the data entry
and you know thus have sort of
information predigested ready for the
site to work because that a small number
of people using the administrative
application relatively infrequently and
we can move functionality from sort of
presentation time to data entry time you
know that will significantly increase
the speed of your presentation time tool
which is actually the thing that your
speed is all about it really cares if
the admin tool is fast or not I mean yes
they'll complain a little bit if it
takes too long to save something but now
the real throughput that you're looking
to optimize you know is the thing that
the customers see and one of the the
real benefits of the web objects
environment is the ease with which you
can create modules and you can assemble
these modules and put them all together
and that's really the last three points
here and one of the things you can do is
create a different view of your data for
the front end application versus the
back end if the front end application is
primarily read-only which generally they
are I mean if you're in a store it's not
like the customer can edit the price so
if that's the case then the front end
application doesn't need to have the
business logic or the expense of
supporting that so you can create neo
model or say - yo models one that has a
very simple view of the data that's
optimized for speed and a second yield
model that's used by the entry tool
location or the the administration
application suite that is optimized to
the functionality and the power required
by the business managers and all of this
can be leveraged through frameworks
while generally what we're finding is
that our applications end up being
extremely thin
they'll be almost no code in the
applications themselves all they do is
load a bunch of frameworks everything is
in the frameworks and by doing that you
can reuse those frameworks across as
many applications as you need to realize
the site you have minimize use of frames
and user interface it's just an
optimization it clearly frames or
necessary frames are necessary but
frames can cause a lot of issues they
can cause a lot of extra traffic against
your site as well when you're doing
dynamic applications where you have to
update content across multiple frames
you'll find situations where you end up
having to reload the whole page which
means you get one hit to load the frame
set one hit to load each frame and that
can be very very expensive it also means
that when those hits for the frames come
in the browser it can be lead to a lot
of bugs because the browser you don't
know what order the browser is going to
load the frames in and it's going to
load both of them simultaneously
whichever one gets there first is going
to be the first one to load and if the
user hits the stop button then okay one
frame loader the other one didn't how do
you know you don't so that can just be a
lot of confusion there use direct
options wherever you can direct actions
are wonderful not only do they allow for
bookmarkable sections within your
application they can also be very very
fast because they don't go through the
full request response handling they
don't have to do form processing or
things like that
you can certainly use for values those
request Act work with the report the
direct action request handler but you
don't have to and beware of mixing job
and objective-c yes certainly the the
environment does support fully mixing
these things in just about any way you
want to there are a couple of little
subtle limitations you can run into
however there's some serious performance
issues with going across that bridge
between Java and objective-c and it
should be avoided it's also very
difficult to debug okay so I'm going to
turn it back over to Alex here ok ok so
it's better some good pointers just sort
of upfront you know think about when
you're structuring your application
you're taking apart your problem and
figuring out how am I going to go about
building a solution you know we organize
things well we do some good planning
about database access and some good
planning about caching of our data in
order to you know minimize our round
trips to the database and we've thought
through our framework design and
everything else and the apps are done
and it's up and running
but ok you know it's a bit of a pig you
know let's assume maybe the opposite
situation is you've inherited a pig that
somebody else's built and now it's time
to figure out a way to make that pig fly
so this is you know now all the planning
up front is all well and good but as we
all know that you know no good plan
survives contact with you know the enemy
or reality or your customers and what
have you there's just a limit to how
much you can get right planning upfront
because you get half way through this
thing and the design change is
remarkable your client calls you up and
says look our business model has changed
and you know so it doesn't always end up
working out by timing out the app
written that it's exactly the way you
thought when you started out so ok so
it's a pig it's a little too slow you
know it's that's more memory you know
yeah damn that's a big application
instance size isn't it 50 megabytes
nobody's even started a session yet you
know you go off into this request and
it's like on my machine that's 2 seconds
before I even started gloating in my
browser now you're testing this out with
a hundred users and the CPUs on your
multiprocessor Sun or pegs and it's like
wow all right now what do you do okay
first of all don't pee fill it I you
know this seems like this seems like a
trivial advice here but this is actually
good we've we've we've had some
situations where we've had clients who
got stuck with you know they had whoa
caching enabled turned off you know for
debugging purposes and you know they
managed to get into production with the
big sites on caching was turned off so
every time somebody loaded any web
objects component
loaded it from disk in fact that
particular client I'm thinking of I had
actually coded into the code for the
application to turn this off explicitly
in eight places in eighth place
and so yeah everybody kept saying ah I
found it you know they take this line
out it would still be still sucked all
right but you know this sort of thing
doesn't really show up when you're doing
the desktop development the pages load
fast and if the apps right there anyway
but you know once everybody starts
hitting this this uses up a lot of see
you know resources on the server reading
all stuff off disk make sure whoa deep
buggin is off all right so you've added
you know going to monitor you add the
application in and it's running in
monitor and what have you well monitor
doesn't automatically turn off whoa
debugging you have to actually go in
there and edit the command-line
arguments and say you know we don't need
all those debugging messages spewing to
the log file during while the
application is running in production so
turn that off and of course you know the
corollary is when you're actually doing
logging you know use debug with format
as opposed to say log with format which
doesn't get turned off when you turn
whoa debugging off and in general you
can achieve some speed pickup by having
your applications methods you know your
action methods that return the same page
return you know this context page as
opposed to returning nil that basically
does is it short-circuit the action
processing within web objects so that
whenever basically what happens is
whenever web objects invokes the invoke
action for request method that's the
thing that says okay which button did
the user click on oh it was this button
okay I got to do something well when you
got to do something if you return nil
web objects doesn't know that you
actually did anything so it's going to
keep searching for whatever user
interface element was clicked on by
returning something in this case this
context page which just simply reloads
the page already on which does exactly
the same thing as returning nil but what
it does do is it gives web objects a
signal oh you can stop searching for
whatever the user clicked on or whatever
the user did and depending on how
complex your page and how many nested
components and what have
this can save significant amounts of CPU
processing okay so you got to start
cleaning this stuff up where do we start
well we've got to start with the most
frequently used bits this is the the
classic thing about all optimisation is
you know you could have this one page
that totally sucks but nobody goes there
so don't bother worrying about that one
for now start off with the stuff that
they do usually you know you got a store
they're doing a certain amount of
browsing you know they got the whole
checkout process and so forth so on and
so forth
handle that you know if you're dealing
with the loss to your password do the
whole section of the site trying to
optimize that you're optimizing the
wrong thing people don't spend their
time there and they don't complain that
it takes too long cause little form to
send me an email to give me my lost
password know so while your user
actively know what they're actually
doing use the most statistics store
logging this is a great thing it's only
gotten better than four or five gives
you a tremendous amount of information
on which pages the users are using and
how long they're taking and what the
average response time is you look right
down this look this one's got an average
response time of ten seconds like okay
sir sometimes it gets they buy with half
a second but we've got thirty five
seconds here and there that gives you a
good good indication of where to go to
start cleaning stuff up capture your
direct action activity the direct action
information is not by default has most
of the low statistics store logging
deals with component actions it'll also
keep track of your direct actions and
what's happening but you can code your
direct actions in such a way that you're
always going to some direct action with
the same action method and then it's got
some other arguments to tell it what to
do a statistical show that okay the
default direct action you know have
these 50,000 hits on it but doesn't
really tell you that much if you go to
the same method and then you branch
based on other conditions so putting
some logging stuff so you can tell which
direct actions are doing you know doing
the most work and then tune the most
visited areas first alright this is this
is generally where your butt gets a bit
the most I mean by and large web objects
applications are big database
applications and the thing that most you
know after you've cleared up the fact
that you were running this onto smaller
computer or you know you cheap tout when
it came to putting RAM in the machine or
what have you
generally what it comes down to the fact
is that your bottleneck on talking
database so you need to start out by
making sure that the ad doesn't do
amazingly stupid things with the
database so you know a common thing is
you know going there on that search page
and you know if nobody fills in anything
on any of the fields and hit Search
don't go off in search and return all
records you know you know say hello you
got to put in at least one qualifier you
know something like that that's a
tremendous help because you know the big
search and the big result return results
you know is going to take time to do the
search move the data across the wire and
Stan she ate all those objects
you just going to look at the first page
and then say oh well that's way too much
information and type something in anyway
so you know a little bit of sort of
smart modifying the users behavior can
go a long way I use fetch limits this is
you know this simplifies a bunch of
things
I mean net-net you're mostly doing a
bunch of the same work as doing it you
know a large query to the database you
know has to process the the database
request in the first place but you can
choke off returning back tens of
thousands of records by putting in a
fetch limit I mean nobody wants to look
at you know more than a hundred items on
a return anyway except in very rare
circumstances and if you put a fetch
limit on there and take bring back the
first hundred or the first twenty
records and then make sure that the user
wants to see more you know you can limit
the amount of data moved across the wire
the amount of objects you instantiated
the size of your cache and so on and so
forth it's often useful to cache search
result this is kind of a interesting
thing as an interesting design pattern
here if you go to the search page and
you search for all you know blue
t-shirts for men that are medium or
larger what have you we get one you go
go down and look at that t-shirt and say
I didn't really want that when you go
back to the search page it turns out to
be very nice if you go back to the
search page and your results of your
previous search are there now then they
can go down to the second one and the
list and go down there this may involve
you having to write code to keep that
search around and keep the search
results around on a per session basis as
opposed to when you go back to the
search page you clear everything out and
I got to do a search again because one
of the most expensive operations
generally on your site is doing these
sort of big database searches that's one
a the the user is expecting to have a
long response time because you're
connecting to the database and returning
a bunch of stuff if you can just sort of
minimize that that's gonna be no last
half-dozen apps I've done that's turned
out to be the page that had the worst
performance with the big unlimited
database search page and so just by
having that thing come back be there
automatically when they come back just
limp you know drastically reduces the
number of searches an individual user
will do and you know last but not least
on this particular subject you know if
you have a small enough set of objects
and you're doing a store but you have a
hundred products or 200 products you
know maybe it makes sense to have you a
read-only set of product data in memory
that you initialize when the app starts
up and you do the searches against this
cache using in-memory searches and don't
bother going to the database you know if
you've got a you know CD store with
400,000 records in it you know maybe it
doesn't make sense to bring all of that
into memory and do searches but you know
if they're doing searches on relatively
small things definitely using a memory
searches they're fast and all sorts of
precious resources Network
you know bandwidth going over to the
database server the database servers
resources etc you know suddenly you're
disappeared from the equation you know
you know when you got to do some
fetching let's optimize this a bit you
know it seems obvious but it's
definitely good things you've got
pop-ups you've got reference data you've
got stuff that is constant across
everybody stuff you know
fetch it at the application level in a
shared editing context and keep it there
you know it's really easy to start
coding up using you know the default
editing context for a session and start
doing stuff there and you can end up
with you know copies of data in every
session in every editing context you
just don't need to do that you know use
the sessions editing context only and I
mean only for data that the sessions
user will actually edit you know if he's
not actually changing the values in it
you can share the data with everybody
else yeah you can have a list of you
know you have a session specific list of
things I'm interested in or what have
you but it's not doesn't mean you need a
session specific copy of the actual data
you know and that's just a general good
rule of thumb you know is the user going
to edit this piece of data in here
session No then we don't need it in the
session editing context okay you know
you get to the stage where you want to
avoid having doing fetches in order to
draw the pages of the users looking at
now
good idea is to cache and share data
that you know that's used to draw the
pages of the users are looking at and
you know try and keep that cache data
up-to-date have you know you end up with
a situation you know like we've done
some financial sites where people are
putting in bids and offers and doing
trades and such so that session a you
know app instance a is going to put some
data into the database everybody else
needs to see you know you need to find a
good way to make sure that everybody
information is up-to-date in a timely
fashion you know there's some really
neat stuff for you can do inter
application messaging so that the the
individual applications don't have to
fetch from the database every time there
was some good work that Dave Newman
posted originally on doing a snapshot
updating and we've done a bunch of stuff
to modify that stuff but that just avoid
you having to go to the database to get
the cash you know to to update your data
you can also use the time between
request response loops we mentioned
during it up in the design session you
can just when nobody is actually
requesting something go in and do a
section update the cache data it's a
little less efficient than you know
notifying the various app instances that
the data has changed but you know met
when it comes time to handle a request
for a response you know you know you've
got up-to-date data in your application
you don't have to go to the database and
of course if you've got to get some sort
of non object based data out of the
database you know go ahead and use the
raw road stuff it's quite fast and you
know it doesn't doesn't involve
instantiated objects I mean do not you
know don't try and get around the whole
object mechanism using this stuff but if
you want to know if there are you know
if there have been any changes to the
database with you know it's in this
particular time frame or you want it you
know you can use wall rows for certain
specialized stuff and it's quite fast
right the thing that really bites you in
the butt is you have this picture
what the application is doing and you
know you think it's being very efficient
because you optimize the design before
you wrote the whole thing and it's still
flow and the database is still cranking
away so you know obviously you're doing
fetching where you didn't expect to do
fetching so you adapt your debug enabled
is your friends now turn this on you'll
see all sequel that's being generated
you go to this page and you think
there's no sequel you know queries
involved in this page and you go hit
this page like Cory Cory Cory Cory Cory
Cory Cory Cory query query process is
like where's this all coming from and
you know it's very easy to discover that
you know in your WOD file you're
referencing object that relationship
that relationship that value and you
know you're smart caching or you
preloaded the data up front you didn't
use any prefetching or anything else
like that so all stuff on the other ends
of these relationships hasn't been fixed
yet and so you go to video visit the
page and you've got some binding here
and that forces several fetches in order
to get the data to answer the binding
now especially bad is when you're just
saying you know you're like testing to
see whether or not we should show this
component or not you know you know does
this object have you know one of these
things on the end of this relationship
and so you fold in the relationship only
to find out no it doesn't have anything
in your app and display anything anyway
be very careful about you know what you
bind to and how you answer some of these
questions
this actually raises a interesting point
about what logics in general I know
virtually nothing about databases I'm
lost and when I hit a relational
database you give me a raw sequel window
I don't know what to do but what he'll
modeler even I can set up a really
complex database generated use it and do
very useful things of course I can't
make it go fast I mean the power of
these tools can be intoxicating it can
lead you to some trouble add these
things here being you know using the
adapter debug enable looking at the
database plans things like that is
critical because it's very likely you're
going to have someone on the project
like me you can make this thing work at
the object level and it's going to
realize an application it's just going
to be a pig in production at the
database level so yeah there you are
cleaning up after the pig but now one of
the good things you can
do for to avoid excess faulting is when
we're talking before about having
separate data models for data entry and
data display now you can have instances
of feel like the product you're going to
display on the screen or whatever the
article that you're display on this
article page that you've tuned for the
runtime application and you do things
like flatten relationships in and so you
know testing to see if you have a
picture and if the article has a picture
then we need to now put in the picture
component or what have you if it's been
flattened in you know we can check the
value without causing faulting if you
have this thing is you know separate
relation you know you know article that
image you know you know in order to say
if article that image is you know not
equal to no you know you have to fire a
fault so you can you can optimize you
know you're affecting behavior by tuning
the EEO model and flattening
relationships in for present for
presentation you know one of the most
common mistakes that we've had to clean
up I'm sure none of you would do this
you're all very good is you know you
build this you're building the
components up one at a time if you know
this component I'm going to need this
pop-up list of all the states in the in
the country and so like in your init
method you write a little thing in there
there's a statute of all the state you
know you know states all objects so that
you can populate the pop-up because you
know you're just writing this one
component at this moment or not thinking
about it and you know six other
developers on the project for six other
pages that have a list of all the states
also write the same thing and so every
time these components are initialized
they go off and do the database effects
this stuff in and components come out of
the cache and they are recreated and the
infection and components are cached in
different sessions and each one it's an
it aspecting in this list of 50 states
you only need one copy like the states
change all that often you know think you
know you go through and you clean all
the stuff up and you know move the stuff
off to application in the shared shared
editing context everybody has got to pop
up or browsers or things like that you
know valid you know regions that we
shipped to so on and so forth can get
this common reference data out of one
place and not try and do this stuff on
each thing if need be you know fetch you
know all the objects you need and then
you can use filtering you know to
produce the stuff that you need for each
individual page the other common
mistakes that involves excess fetching
is okay you've got this you know you've
got this shared thing in the shared
editing contact
and you know you accidentally cause
stuff to be fetched into the sessions
editing context by sort of not managing
you know which which objects are in
which editing context now if you must
you know if you you've got to the point
where the user is going to edit some
object and you use local instance of
objects you know to get local copies of
the object without doing faulting I mean
basically without going to the database
to get this now basically it's creating
a new instance of the snapshot data in a
particular editing context and doesn't
require a round-trip to the database so
you carefully manage when you move
things across the boundary between the
shared editing context and the session
is editing context and you know follow
this stuff all around have a policy you
know these objects are all here will
only have this object when we do this or
whatever and you know stick to it metals
that'll soup things up a bunch okay
optimize your yo models again this is a
there's a there's a tendency to go
batshit on your eel model or come up
with like the perfect you know
normalized abstracted Yeoh model with
everything is an object and so on and so
forth I mean you know we had one client
who had you know you know we had a table
for gender objects with it was a row for
male and a row for female so that they
could have all the people who'd signed
up for their site you know have a
reference to either the male object of
the female object now it's like oh
please now use flags
you know simplify some of this stuff
down it may not make everything an
object but you know they were faulting
these things in all over the place and
like no um okay the other cool thing is
Els and inheritance it's cool you can do
just amazing things with this and I know
I've been given the you know zo F
inheritance abuse award a few times you
know think seriously about you know how
much of the inheritance stuff you
absolutely need to have in your model
you know if worst comes to worst you can
do a complex hierarchy for editorial
tool and simplify it for your
application but there are a lot of cases
in which having a complex inheritance
hierarchy especially when you're doing
deep fetches which is now I've got 15
of users and I want to select all users
who haven't been here since last week
and I've got to do a fetch again each
one of the 15 tables even when you're
doing something like single table
inheritance that's going to do you know
fetch against table a where flag equals
one and fetch against table a where flag
Eagle so each round trip is expensive
when you're trying to minimize it's not
the data that's pulled across but the
actual number of round trips to the data
server and so the complexity of your
inheritance hierarchy especially when
you're using deep fetches can cause a
lot of round trips to the data server
now this again brings up another point
where a person like me can get you in a
lot of trouble because I think in
objects you know I look at a bunch of
users and I think a big you know an
inheritance hierarchy yeah that makes
total sense but EOF provides a brilliant
object-oriented interface to a
relational database and a relational
database doesn't do inheritance well
object-oriented databases do but there
are other issues there
so keep the object models simple not
because the object model being simple is
great but because it's going to make the
database that much faster and you can
overload these tables too I mean you can
have complex objects that you use for
editing and then slap over on top of the
same table a simplified object and with
flattened attributes and what-have-you
but you use for presentation you know
maybe we know we're doing on this page
some simple piece of information
processing and you know we can take a
you know and create a new user entity
that you know spans the important shared
part of all the other user entities and
we'll just do a query against that and
doesn't give us the whole complex
hierarchy but it gives us enough
information to answer the questions that
we need to do and it's only it doesn't
have an inheritance hierarchy at all
now there's tricks you can play like
that that will simplify simplify things
you know again you know think about what
you're going to use these things for use
batch faulting we're appropriate you
know you can basically you're using what
this does is you sit there you set batch
faulting in your email model say when
you're going to fetch this object once
you fetch the next ten because we might
need them basically what you're doing is
you're pre-populating the cache that's
stored by us
the snapshot dictionary of your objects
but then you need to make sure that
you're using that appropriately if
you've got two one relationships in the
same editing context in the object on
the other side of the two one
relationship is already in your cache
it'll go find that with that flat
without faulting the database but you
know it's not going to know if you've
got a too many relationship it's going
to have to go to the database anyway I
can't tell with this guide you know all
the children in there for the parent
because even though it you know you know
and I know that you know all three
children are already been brought into
the the snapshot dictionary you know it
doesn't really have anything that can
tell to make sure that the you know the
list is complete so I have to go to the
database even if the result of this is
that it can satisfy there's many
relationship out of the cache you know
use prefetching this is you know this
says transmogrified from the earlier
days to the current days from you know
hints the actual directives and you can
just say when you're going to populate
this object you know populate these
things on the relationship this is
useful for when you're building up your
cache to make sure that later when
people start using the objects and
following the relationships to things
the objects on the other end of the
relationship were already there and just
you know beware of excess complexity in
your model in general you end up with
extra extra pointers to various objects
in there that can then cause further
fetching up activities or in certain
cases access back pointers can prevent
prefetching from working the way it's
supposed to so once you've set up all
its prefetching you've got to actually
watch the stuff with adapter debugging
able to make sure that the right objects
are being fetched when you expect it to
all right so UF there's all this great
stuff for it'll build your tables it'll
build your database so on and so forth
you know really nice stuff the one thing
it doesn't do for you out of the box it
doesn't create indexes all right so
you've gone off and you create all these
objects that have unique primary keys
doing all this fetching based on unique
primary keys create indexes on those
things also look at your queries and see
what you're doing people are doing these
sort of queries where they've got seals
they can type in values and do a search
what are they searching on create
indexes on those values you can speed up
your database activity tremendously by
you know properly indexing things if
you're not quite sure how to have the
database is using stuff
this is a great thing and everybody's
not like a database geek doesn't know
about this but this is a database
propellerhead saying you know for sure
you know it's I Basin or it's I base
that show plan an Oracle it's explained
plan you turn this on and run your query
and it says well you know I was going to
check in this table in that table and
then I was going to gather this
information here and then I've been a
process it and do that stuff there and
it tells you exactly how it's going to
go back giving you back the three rows
of data you would actually get from your
complex query and one of the useful
things this will tell you is you know
and then since you asked the question in
just this way I decided not to use your
index and to do a table scan instead you
would get the results out and you know
you know by sort of doing explained
plans fiddling with your indexes and
what-have-you
you're going to actually make sure the
database is doing what you want it to do
not what it thinks it has to be able to
do now you're playing time yeah we go
over was a two-hour session right okay
no problem now we're getting close to
the end
other good tuning thing is no the
database is running exactly the way it's
supposed to it puts most of the
information that you're going to access
on a regular basis thing to memory cache
and you can check that the database
statistics to find out is it doing that
or just going to the disk every time for
your your data and you can tune that and
also just more silliness and you may
have to get somebody's a database with
to come in and do some tuning in the
operating system
you got a multiprocessor machine
oftentimes you have to actually tell the
database to use all the processors
similarly databases often have a bunch
of parameters about how much memory they
use you know how much data they put in
there and how much store procedures they
put into memory tune that appropriately
you can have a big piece of iron is
basically sitting there idle because the
database is trying to run a little tiny
slice of memory on one processor doesn't
do anyone any good speaking from
experience
databases run really really slow when
they're tuned for safe 512 Meg's of RAM
but you only have 256
and it turns out not to work quite as
well as you'd like and last but not
least actually look at the generated
sequel you know it'll suggest additional
indexes you know you shouldn't ever need
to do hand optimized sequel and put that
into the EO model it's definitely a last
resort but once in a blue moon
basically on the way you've constructed
your object model and what-have-you gof
mains construct sequel that is less than
completely optimal and there may
occasionally be special purposes where
you need stored procedures you know it's
basically compiled sequel runs on the
server faster than you know now on the
fly sequel and it can be useful in
certain in certain circumstances all
right now that we've gotten out of a
scary database part I'll get this back
to Bill ok once you get the database
going fast because that is generally
where most of the bottlenecks are you
need to start looking at optimizing your
application itself and optimizing your
components one of the first things is
there's great temptation to componentize
everything make everything a reusable
component that's actually a significant
performance hit do it carefully simplify
your component nesting you know don't
make every image in the navbar an
individual web component make the navbar
itself a component things like that to
find your own compiled subclass of world
component and put your common
functionality there what this does is
just simplifies your overall component
hierarchy well you did what we always do
is we always have a subclass of whoa
component and every single component B
it Java Objective C web scripts doesn't
matter inherits from that specific
subclass by doing that not only do we
gain the benefits of sharing all this
functionality across the component
hierarchy we can also push some
debugging information into there some
little debugging triggers do some
logging things it's a great place when
you start to get into debugging and
performance optimization to be able to
put breakpoints and print information
etc you also consider caching pages or
using new stateless components on any
page where you're not displaying
information specific to the user or even
if you are to a limited degree there's
no reason to not use the stateless
component State
components are great that means they're
cached at the application not in the
session the other thing is caching your
pages if again if you're talking about a
page where you're say selecting a region
for some store application or something
well regions in the United States aren't
going to change that often so cache that
information and finally make static
content static and this is one thing
that a lot of people miss there's a
great temptation to serve everything
from web objects or everything from the
dynamic content generator if you use
static content you get an order of
magnitude performance improvement static
content comes straight off the disk of
straight out the web server there's no
state associated it's blazingly fast
because it's exactly what the web was
designed to do I mean in effect all
these middleware things were doing all
this web object stuff is doing something
to the web that it was never designed to
do and there's a big performance penalty
for making something do what it was not
designed to do refactor your software
once you get to thing built
once it's working right and you found
where your bottlenecks are start to you
know compile anything that does serious
calculations look at optimizing your
calculation engines look at generalizing
that and moving it out of sort of the
application layer and into the backend
layer and really treat it like this area
a serious calculation engine that you
want to maximize the performance up and
then use it from the upper layers
simplify your application and session
objects this is more this isn't really
about optimization as much as
facilitating optimization what you want
to do is if you have say something that
does region management going back to the
store thing where you've got multiple
regions at the application level and
there's some complex a product selection
or product availability on regions as an
example we did a record store record
stores there's certain records you can't
sell in certain countries in the world
so we have a region manager we push that
region manager into an object of its own
that you can access through the
application by doing that it moves that
functionality out of the application
level and it means that is we're
optimizing that
optimizing other things we're modifying
something that's relatively isolated and
finally don't forget about the web
server is the web server optimized for
the environment a classic example of
this is uh okay so you're running
against Apache you've got that well
object adapter in there you tuned your
application out to the ants degree oops
you're only running five Apache servers
I did that once tune it make sure it has
the appropriate configuration for the
amount of load you expect to have use a
mixture of your static and dynamic
content wherever you can use static
content again that's just going to boost
performance direct actions allow you to
integrate the static content with a
dynamic content if you enable OK in web
objects one of the great things about
HTTP since it's totally stateless is
that when you have a user having a user
experience with your site you have to
pass around a user ID a session
identifier well normally by default that
session identifier goes back and forth
in every single Earl
so every time a hyperlink is generated
in the dynamic content that hyperlink
has to have this big long nasty number
that identifies that user such that then
when the user clicks on that web obvious
can figure out what session to associate
that hit with well if you move that to
the cookie now cookies are have their
own problems but pretty much they're
supported everywhere now by doing that
pretty much all the URLs in your content
no longer have to have user specific
information in them this allows you to
integrate your dynamic content in your
static content so for example again
using a record store example we may have
static pages that describe albums static
pages that describe artists well those
don't change very often leave them on
disk is static let the web objects
application navigate over to them have
direct actions in those pages that bring
the users back into the web objects
application the more hits you can get
against the static stuff the better off
you are
and with quickest QA okay we'll be
actually do we have 10 minutes after to
do QA 15 great thank you
optimized for fast browser display this
is another little war story here we had
a client and the content generation the
content delivery was really really slow
and this was back in the days when
tables didn't really quite work right
and you couldn't really specify image
sizes quite right and to do layout you
had a spacer dot gift everywhere I mean
if anyone's been on the web for more
than two years you probably remember
this well the path to the spacer that
gift was like webobjects slash some app
dot Walla slash web server resources
slash images slash spacer gif and all we
did is we put spacer gif as s gif in the
root level of a web server and reduce
the amount of HTML generated by about
forty percent across the entire site
that was smaller pages display faster
less HTML you generate faster it goes
out the less dynamic content you're
generating the faster it goes out you
want to batch your displays along sets
of data I mean not only does the user
not want to see three thousand products
all at once this makes things go faster
show them ten at a time shown fifteen at
a time generate short URLs this again
gets back to the space or dot gif thing
instead of slash images use slash I this
also do better with images and just
everything or surrounding the static
resources that are associated with every
web site split installs and web objects
are very very convenient they're very
useful we never do them and it's not
because they don't work or anything like
that we never do them because we put all
of our static resources as close to the
top level of the web server as we can we
leave it there and it reduces the amount
of HTML we generate
you want to also improve the structure
of your HTML now this isn't as much as
an Optima this is optimization is in
optimizing towards a working application
not a fast one using HTML code checker
such as web lint which everyone should
do on the web objects mailing list an
omni group WWAMI group comm your ever
planning on doing any development with
web objects or you're even interested
immediately sign up for that list and
the reason why I mention that as well is
because we're going to be throwing a
bunch of code out there next week when
we get a chance to go back and one of
them is this thing called web lint what
it does is it looks through your web
objects HTML or your generated HTML
check the structure of it make sure
everything lines up simplify your table
structures it's tempting to nest tables
deeper and deeper and deeper especially
when you have an object hierarchy or
component hierarchy you want to reuse
all those components every component
needs to guarantee that it displays
correctly so it has its own little table
that's really slow it's really really
slow in Netscape it's just very slow in
Internet Explorer and watch for nesting
problems especially things like nested
forms if you open a tag don't close the
tag until you've closed every other tag
inside of it and always make sure you
close the tag in the HTML standard that
seemed to have come out of the early
browser implementations closed closing
in table cell closing a table row even
closing a table closing forms is pretty
much optional that doesn't work when
you're talking about dynamic content
generation and it's going to break
things and also one of the when it gets
back to actual form it's one of the
risks there especially if you have forms
that are mis structured is that you can
get incomplete data back to your
application or you can get broken data
and you can get a performance hit as
your application goes in you know oops
exception and has to go and deal with
maintenance stuff associated with a
air conditioner an exceptional state the
the classic one is the overlap problem I
can't tell you how many times we work
with HTML producers or we have been
doing HTML ourselves and just a simple
HTML overlap where you open a form you
open the table then you close the form
of close the table there's a problem
there and this can cause some serious
problems the forms don't work the
processing must be broken the code must
be broken it's like no it's actually in
the HTML and one of the things to keep
in mind is that that especially when
you're developing you focus on a single
component I'm doing this component well
sometimes problems can span across
multiple components and what we like to
do is we check we use Web lint on the
components themselves as well as on the
entire generated content and for more
information this here as well please
sign up to the omni web web omni group
mailing list and there will be a lot
more information coming out after WWDC
there's always discussions on that
mailing list follows up from the
sessions at center I'm sure Dave Newman
who's making a bunch of code available
will post information there as well well
we hope this stuff was it was a good oh
this is a good start at optimizing your
application so we got some time for a
bunch of QA now the usual who didn't who
to contact and let's a little
question-and-answer first of all a big
hand for our presenters
you