WWDC2000 Session 194

Transcript

Kind: captions
Language: en
good afternoon ladies and gentlemen and
welcome to the session tuning for
velocity engine and MP I'm Glenn Fisher
I run the performance marketing group in
worldwide product marketing at Apple
Computer and as such have been involved
with lots of the engine and MP
technologies for the last several years
delighted to have all of you here today
we have a great crew of presenters from
metalworks and would like to thank them
for the work they've done in preparing
this session and also Chris Cox from
Adobe who has been instrumental in
helping us with some of the demos that
you'll see today so why are we here two
key technologies for Apple going forward
or velocity engine and MP that the
processor technologies that we've been
developing make those technologies
available to our customers or well in
the future and we need to make sure that
you're on board supporting those
technologies to get the best performance
out of your application and to take
advantage of the performance that's
there in the machine for our customers
at the same time we recognize that it
isn't always easy to take advantage of
these technologies so we're looking for
ways to make it as easy as possible that
have been working closely with Motorola
and Metro works over the last few years
to make it as easy as possible for you
to take advantage of these powerful
technologies for information how many of
you have actually written velocity
engine code or alt of that code how many
of you played around with it a few more
okay and how many of you are here to
find out how to get started great so
that's wonderful to have new recruits to
the to the crowd so what we're covering
today how to debug velocity engine
applications using Metro works code wire
and we'll give you some specific
examples of tuning code for velocity
engine using Metro works code wire we'll
also give you a brief introduction to
code water support for MP coding and
debugging and to do that I'd like to
bring up Bob Bob Campbell who's elite
compiler engineer at Metro Works Bob
[Applause]
hello I have to admit I don't actually
write a lot all two-bit code and that's
one of the reasons why we're very
grateful to Chris Cox for providing us
one of his examples I look at a lot of
altivec code but my tendency is to
critique other people's in and not
actually write it along that line
changing from scalar code to vector code
requires that you you put some thought
into what you're doing and standard
things that applied to scaler code still
apply to velocity engine so you should
profile your code you should make sure
that if if you're going to make
something run fast or make something
that you're actually spending time doing
run faster so find your hotspots and
look at them and think about ways to
rewrite them another issue for altivec
is that alignment is important the
engine will run much faster if you have
16-byte aligned data then it will run if
you're if you've got things misaligned
then you're going to have to waste
cycles realigning them another thing is
because you're going to work on things
in big chunks of four eight or sixteen
elements you can't really have an if
statement that does something different
on one element that you're going to do
on the other element so you need to take
advantage of instructions like V select
that allow you to basically do
conditional assignments within straight
line code with no ifs and probably a
third point is that one of the most
powerful features of altivec is the data
streaming instructions so if you're
writing code that's going ripping
through memory then you ought to tell
the CPU hey I'm going to be going
ripping through memory and I'm going to
be reading you know every so often tell
it how much you're going to be reading
and what your stride is that way it will
read the data into the cache before you
even get there as the example that we
have that we're going to show you today
you can almost plan on restructuring
your code algorithms that work well
scale early may be significantly
different than the algorithms that you
can
to work on in vectors example that we
have from Chris Cox is a rotate example
and the rotate example was interesting
because you change the way you move
through your data in order to enable you
to speed up and we'll actually talk
about that when we run the example
another point about altivec is that you
know you see this great instruction and
it'll it'll do some across to let you
add four values together and produce a
result well you can't just go stick that
in every time you need to do that one
instruction you need to be doing more
than just using one vector instruction
because there's setup overhead and you
won't get back your setup overhead if
you're only running one instructions so
it's not a good idea to write a bunch of
little tiny and functions that call one
single altivec function and kind of in
that point one of the things that we've
discovered is that there's some there's
some bookkeeping regarding saving
altivec registers so one of the comments
has been if you're going to do a bunch
of altivec code there's a pragna you
stick in at the beginning that says I'm
going to be using all two bit code
generate all your code with all the
function calls and then turn it off at
the end that saves some of the
intermediate bookkeeping regarding
saving the registers because it's sort
of like sets it up at the beginning and
cleans it up at the end and last is you
need to look for parallelism in your
algorithms places where you can do four
things at a time or eight things at a
time instead of looking for sort of
iterating through one at a time at that
point I want to bring up Richard at well
and we're going to run through a demo of
some rotate code from Adobe oops and I'm
in charge of the monitors so
so here basically Richard is a has
launched the program in the debugger or
week the beginning here yeah we're to be
and you want to run to the first
breakpoint the part that's interesting
is the original rotate algorithm just
went through rows at a time writing out
columns at a time so it's a 90-degree
rotate so if you're going along like
this and you're writing down like this
can you make the font bigger richer the
change to do for the altivec algorithm
is instead of to think about working in
rows and columns is to take the input
and break it into tiles where the tiles
exactly fit basically 16 bytes by 16
bytes squared so we're going to do is
we're going to load a 16 byte by 16 byte
square into altivec registers rotate it
in the registers and then write it back
out where it goes so instead of going
along word at a time we're going to grab
a chunk we're going to rotate it and
we're going to write it back out so
actually Richard is here at the at the
function which is going to do this and
I'm not really sure how I'm defconn want
to go into the instructions but it
essentially does want to step through it
oh we were going to show off Richard
wants to show us some debugger features
easier what do you works on so we've got
a nice register window you can look at
all the altivec registers you can step
through code you can watch the registers
as they change as it loads them so
basically at this point he's going to do
one more step
and it's going to like basically load it
all loaded this whole tile into memory
and so then you're looking at Chris
going you should explain that stuff but
it's actually pretty neat algorithm
because instead of doing what I would
have done which is individually move the
elements within the vectors around to
the right spots he used a trick with the
merge instruction to sort of merge
partway and then second merge which puts
everything exactly
where it wants to go and if you really
want to know are we going to make this
code available weekly okay so we won't
we won't promise that yet but it's it's
a pretty nice thinking about the problem
differently and looking at what all that
can do and instead of saying you know
move my elements individually I move I
move half my elements halfway where I
want them and then the second set of
merge instructions moves them the other
half scroll along
so basically that set of eight
instructions causes everything to rotate
and I think Richard wanted to show a few
debugger features look can hear me now
okay so so one of the problems that we
have was how to represent the vector
registers because they're so large so
what we decided to do is make use of the
struct paradigm that we already have in
the variable view and you can take a
look at all of the scalar elements that
live within the vector elements and you
can modify these things individually in
order to help you with your debugging so
because we have the ability to do that
we also wonder what we can do for
breakpoints and we have a conditional
breakpoint feature in the IDE but
because of the way Motorola specified
the struct as being anonymous you
couldn't access the elements on the side
we had to invent some syntax for you to
get inside those things so if you look
down the breakpoints window here we have
a conditional breakpoint and we're going
to set a breakpoint on line 32 and it's
going to be on the source 1 variable if
you take a look at the syntax its SRC 1
dot followed by what looks like an array
notation this is just something that
we've borrowed because we're using this
array like notation to indicate the
scalar elements inside each vector so
what we can do is we can change the PC
and move it up here
if I go run it's going to hit the
breakpoint I change it back again and
change the condition should bypass the
breakpoint as it did any other deflector
features the vector register windows so
huge so we made it scrollable and your
small on real estate on your power book
or whatever now we don't have do for
power book yeah but when we do you'll be
able to you know grow the window and
expect register values and then shrink
it back down and it remembers the
position you had it scroll to before so
you can kind of you know manage the
debugging a bit better and not let the
debugger get in the way of your
programming okay so could we quit the
debugger and run the non debug version
sure I want to show because basically
Chris was kind enough to set us up with
with an example that runs the scalar
code several different ways and then
runs the altivec code so we could just
run that and of course it's going to
build it the scalar code was what I was
talking about earlier where it goes
through memory bike by row writing out
columns then an additional one that was
tried to do it away was writing out by
rows but reading by columns sort of
inverting the way you do it to kind of
look at the differences in the way it
walks through the way it walks through
memory so then you know and I can't
hardly read that no it's not your fault
it's our terminal window the first the
first attempt up there took what four
four point three seconds the the second
attempt took five point three five point
two and the third one was five point one
those were all scalar attempts of
different ways of doing it and the one
thing I like about this is the fact that
Chris very religiously has a base case
he writes a new case
he keeps the base case and he always
keeps comparing the times to make sure
that we're actually getting better the
fourth case is the vector case which is
the one that we step through in the
debugger and that one took what does it
say there
two point seven two point seven seconds
so not quite half but the third one
which is actually the interesting one
was where he combines all the features
he added in the data stream touch
instruction where he sort of tells the
CPU which memory he's going to read
ahead and that one runs at one point one
point six which is what did we get to we
got better than half right yeah so it
kind of shows you that that there's two
parts to the velocity engine one part is
the vector unit but to get full
performance out of it especially for
data streaming operations you really
need to take advantage of the cash ins
and starting to tell it what you're
going to read ahead of time I'm sure
Richard you want to open open the
altivec
No yeah you're right
go here
there so it's basically the datastream
touch instruction it takes the pointer
it takes a cash pattern you need to read
through the manual a little bit and
what's the one the one on the end the
last parameter oh that's right this is
stream one I should know this so anyway
that's what I was going to talk about I
was actually going to save stuff for QA
and I want to bring up let's go push the
right button push the right thing bring
up Ken he's going to talk about MP
debugging well we've been doing a lot of
work with the ID and the debugger trying
to carbonize it and get the tools ready
for the pro 6 beta CD that we've got
here but we would also kept Richard
really busy working on other Apple
technologies to all the elta vet
debugging support you saw and then also
support for the newer flavors of the MPD
buggy or MP api's and we had some MP
support in the debugger before but
Richards really revved it and and made
it work with the newer api's there's a
new Metron of extension and a newer and
newer plug-in and some new user
interface he's done too so
see we've already you've already seen
some of this in the demo but just to
recap there's the vector register window
for alpha BEC
you saw how vector registers and
variables can be shown as strux
for easy viewing of those and then the
work we've done to support that and the
expression parser and also for
conditional breakpoints for NP we
already had sort of a paradigm as having
separate thread windows in the debugger
you can also view all the threads in one
window with the pop-up menu if you want
to look at them that way and so we just
use the same paradigm for EMP tests we
also list the empty tasks along with the
cooperative threads inside the processes
window so you can look at specific
processes and see exactly which which
MPTs and which cooperative threads they
have and then we've also tweaked the
registers window interface a bit so that
you can look at registers for separate
MP tasks and in separate register
windows and so now I'll go back to
Richard for a demo of some of the things
I just talked about there we go okay so
if you were in the session this morning
on Mac OS 9 and multitasking George
Warren was up on stage and he showed you
this little demo called closed for you
and it's a regular Mac OS program that's
drawing into this window but all the
drawings are being done by an MP task so
if you try to do this without MP tasks
you probably couldn't keep the interface
alive by dragging down the menus and and
so forth so you know what we want to be
able to do is debug something like this
so previously we had we had two versions
of metronome we had an empty Metron up
and we had a regular metronome and we
split those out a few releases ago
because some memory issues Apple has
fixed all those things and since they've
basically jettisoned the old MP 1.4 API
we've decided to jettison support for it
too so we combined the extensions
together so there's only one Metron up
extension for debugging so you don't to
swap this thing out
you empty stuff and it makes it a lot
easier so so I'm going to debug this
thing so the program starts up and first
thing I want to show you is the
Preferences in the processes window so
we can see close you've empty under
control the debugger here we've got the
main thread which is a cooperative
thread and it's suspended and I'm just
going to step through into the code to
the point where we're just about to
create the empty tasks to do the work so
can anyone read the text ok I guess I'll
make it a bit bigger
here we go so what we're going to do is
we're going to step over this line and
it's going to create an MP task and MP
tasks are created in the runnable state
and in the main blitt routine that's
going to draw into that window I've got
a breakpoint set so what should happen
is I should stepped over this thing
we're going to get another thread window
and we're going to hit the breakpoint so
here we go so let me just resize the
window here because we had little bug so
under the control the code where your
debugger we have the co-operative task
that was drawing the interface and we
also control the NP task so you know
these things are running completely
independently of each other so so what I
can do is I can step in the main task
then go back to the empty task and I can
step independently the two so let's get
the main task running again let's stop
the way next event so I'm going to take
the breakpoint off hit run so we've got
the program running back here again
except what's happening is we've got the
breakpoint still in the empty task so
we're not getting any drawing happening
but the cooperative task is free to run
and that's what's drawing the window
letting us drag it around so I'm going
to go back to the window here move this
around a bit so you see in the corner so
the blitter is running in a loop so I'm
going to set the code where your
debugger stopping every time it hits the
breakpoint and you notice that the way
this thing works is it updates the
display underneath the mouse so as I
move the mouse around hitting resume
we're going to get the display updating
so let's go back to the process is a
window so we can see that we've got a
few tasks so we've got the co-operative
tasks we've got our MP tasks which reads
is stopped and the tasks above it is
called the death watch task it's
something the OS crates and it tears
down all the MP tasks when the
application exits so let's go back the
breakpoints window here and let's
temporarily disable the breakpoint get
the MP task running again and that's
updating
let's set the break point again next
time we've lit stop so basically we've
got empty 2.0 support for debugging
footwork
[Applause]
hey kill that's ready no dates that's it
okay thanks Richard mostly this stuff is
a little bit newer and it's going to be
finished up next week and then put up
for download yeah so if you if you get
the beta 6 tool CD that we have here at
at WWDC and then just check check our
website in about a week and you can get
the same stuff that Richards been
demoing here well we all have them
outside the door after this session is
over I'm sorry
we have the same architecture for
showing multiple threads whether it's
Java or empty but it's up to Metro note
to actually figure out you know what the
threads are doing okay that's it for our
MP demo so now here's Godfrey howdy I
want to thank our Metro works friends
for coming and showing us this stuff we
left a lot of time for Q&A today so that
we could feel their quest feels your
questions and just just run it kind of
informally from that point so without
further ado we have three mics set up
people should queue up at the mics and a
little road map of some more sessions
this afternoon we have apples
performance tools from echo s10 and
tomorrow we'll have a debugging session
in the main hall at 9 a.m. so we have
people queuing up put all of our
presenters please step up to the stage
you