WWDC2004 Session 217
Transcript
Kind: captions
Language: en
good afternoon welcome to the
penultimate session for wwc and the
ultimate quicktime session today we're
going to be talking about
next-generation video format in
QuickTime my name is Tim churna I'm the
manager of the quicktime video
foundation team and the QuickTime team
is going to be talking to you about
h.264 AVC the new video codecs that were
shipping in QuickTime the next version
of quicktime and in tiger and the
technologies that are required in quick
time to support h.264 and as well as
changes that were required to support
ipb video frame coding we're going to
talk a lot about what that actually is
so you're going to see a bunch of
abbreviations in my slides and it'll be
a little bit strange but let me talk
about the quicktime video technology is
before tiger basically we have this
software stash and at the top of the
software stack is the movie toolbox the
movie toolboxes use for creation and
editing and navigating movies it's
awfully used for playing back movies
stepping through movies and so it's
typically what the highest level
applications use now the movie toolbox
is using the video media handler to
sequence video frames from files or from
a network device to the image
compression manager an image compression
manager is the service within QuickTime
that deals with compression and
decompression and so we've creates these
compressor components and decompress
their components underneath the ICM
which are created by Apple and third
parties and that serves as the codec
model and there's a base coat at a base
decompressor which helps to implement
decoders and we recommend you actually
use that and you'll see today that it's
actually essential for the new video new
compressor for you compress this format
so the four tiger there was some
limitations in the movie toolbar and the
video media handler I see em this wasn't
really a big issue because most of the
Codex were either I frame or keyframe
codec such as DV or motion jpeg or a
different frame or I peak coded videos
such as MPEG four simple profile Sinha
packs the Sorenson codecs etc what we
didn't support is the more
complex frame ordering ipd frame
ordering diffuse within h.264 and mpeg-2
mpeg-4 now we do support MPEG one and
two but that's via the MPEG media
handler not the video media handler it's
a different code task and we're not
changing that code path and tiger so
today we're going to cover some
fundamentals about the new h.264 video
codec details about what I PB is and how
a difference from I and IP by the end
you'll really know that the user level
impacts for the changes that we're doing
in QuickTime 66 changes to the movie
toolbox for navigation editing and of
course the changes to the ICM to support
these new kinds of video codecs with
that I'd like to bring up Thomas poon to
talk about h.264 hi I'm Thomas and I
walk in the video code at tim in
quicktime Oh actually back to the
surface so today I'm going to briefly
talk about x264 I'm sure you guys have
all heard about it so the whole week I'm
just going to recap so h.264 what is it
it's a joint effort from the 2 1 the 2
biggest organizations regarding
standards when it's the ISO and data one
is itu they bought us video standards
such as mpeg-1 mpeg-2 and also it's to
61 to 63 now because of John effort at
different state they seem to give this
codec a different name so you may also
heard em say for part 10 jbt x264 and
also ABC now it's standardized in last
year so it's a very recent addition to
your video standard and it has all the
new technologies and it works very well
at various bitrate all the way down to
3G all the way up to HP and because of
that it has recently been chosen for one
of the video codecs for HD DVD and also
3gpp standards as well and because
quicktime always stand behind standards
and it will for sure become a new videos
Kodak's in quick
and that i'm just going to show some
demos and go to your demo machine trees
now subside it has been recently chosen
for HD DVD and let's see how it looks at
HD resolution yeah I have a clip here
ascender it's encoded of HD us version
1280 x above 550 and this is actually
only at six megabit we can really do
that with mpeg-2 at this kind of quality
before so let's see I'm going to play
the whole club
you
[Music]
by the age of 25 yet conquered the known
world
and change at the course of mankind
forever
[Music]
come on young man
and I promise your concert dad
[Music]
so that's how it looks at six megabit
and if you actually play with our
encoding video you notice like at the
beginning the sandstorm it's actually
really hard to code and the code that
does it really well and other damage and
I'm going to show is give your idea how
it compared with existing standard one
that I chose as mp4 so I'm going to open
this to file first can you put into it
please okay now when we try to compare
with a different standard we usually use
three guidelines just fix a bit ray have
quality how does the quality look how
does a bit rate so one could be a lot
bigger try to bake the other one is how
much information you actually packing
the stream so that goes with frame rage
with resolution or frame size so here I
have two clips both of them encoder a
megabit the 264 h.264 one is about four
times as big as the mpeg-1 mp4 one that
we do that we ship earlier so I'm going
to play this I'm just going to play a
short about 30 seconds going to play or
movie
so the qualities about the same except
based arrested
[Music]
identify detective Wow richest man in
the world Kanaka coffee sure why not I
don't think anyone saw us coming so
whatever I can do to help sugar I'm
sorry for coffee sugar ah oh oh you
thought I was calling you sure that
you're not that rich okay don't have
anything to pick up yeah gonna stop it
now so we can't pay the Hooksett later
if anyone let's just continue so can we
get back to the sides okay so what makes
this codec a lot better than say mp4 and
Petry that we already have so here's a
big table that I'm sure some of you may
have seen it from another section what I
really want to point out here is the
three things the first one is there's
really not one single technology that
gives you all the games is a
combinations of offenses in different
view different technologies most of the
technology that we use in h.264 are
based on technologies that we already
have like 10 years ago that we have a
lot more improvement and we know how to
use the technologies a lot better second
thing is wanna bring o is as we
mentioned earlier that for example will
impact to it tooks a couple years to try
to get the best out of the mpeg-2 and
h.264 is the various new standards it's
just standardized last year so you
should expect the quality of x 264
stream getting better and better once
you know how to use the tools even more
efficient efficiently and the last thing
is a lot of technologies here i'm not
going to go over all of them this is
boring but most of the technologies what
it really means is this self contained
within a codec so your codex receptor
implemented for example a different
transform different way of packing the
bits in the streams but there's one
particular technologies which
is the IPP ipd frames which is the third
column down we're x264 you have a lot
more options you can you have it's very
flexible you can do almost anything you
want at the simplest forum is about the
same as the periods impact I BP which
you're going to explain more but if you
really need to take advantage of it then
the higher layer will require some
changes and thus quicktime will have to
mix on structural change as well and
that's going to bring it to ends which
is going to talk about what changes in
quicktime needed to support h.264 all
right so how are we going to deliver
h.264 in quick time we've added four new
h.264 specific components a compressor
decompressor and a packet eyes ur and
REE assembler and made a whole lot of
changes inside the infrastructure to
support these components with Tiger
applications will be able to playback
h.264 content they can also play back
h.264 strings and if they use a high
level movie toolbox api's they'll be
able to do this without any changes in
their apps in addition to quicktime
movie files will be able to store these
h.264 streams in mp4 files and 3gp files
and the streaming realm will fully
support h.264 in that you can playback
h.264 streams you can take h.264 content
hints them put them on the quicktime
streaming server and stream them to
clients and you can broadcast using
quicktime broadcaster these h.264
streams are in the standard format as
defined by the IETF
on the authoring front applications will
be able to edit h.264 movies and if they
call the high level movie toolbox api's
they'll be able to cut copy paste with
no changes they'll be able to produce
h.264 content and store them in
QuickTime files also mp4 files 3gpp
files all right so if you want to
compress h.264 content you can do it
using the movie exporter components and
if you call these components we've
modified them so that they can generate
h.264 be frame content now if you call
stood compression and the ICM api's
yourself instead of the movie exporter
api's then h.264 will show up as a new
item in the codec list however b-frames
won't be enabled by default and in order
to get the frame content you'll have to
opt-in to be framed by calling new api's
that will describe in this session if
you call the sequence grabber api's then
and you want h.264 befriend content
you'll have to call stood compression
and compress the frames yourself so
what's in the seat with the speed you'll
be able to playback h.264 streams and
edit them we've been working really hard
on HTS explorer but we haven't fully
integrated into all our exporters yet
but we really wanted to get you
something something in your hands so
we've included a preview h.264 exporter
it appears as a new menu item in the
exporter list and it does support multi
passing code and one thing to note is
that it produces an interim format right
now at the format is guaranteed to
change before GM so don't produce any
content that you want to stick around
for a long time and be able to play it
back with the feed anyway the ATIS we
talked about today are in the seeds so
please try them out and a couple things
h.264 there's a lot going on there so it
requires a g4 g5 and also the seed
doesn't contain a compressor packet I
zaroor reassembly yet
so say you want to take advantage of
h.264 in your application what do you
have to do and what do you have to
change well what you have to change
depends on what level of api's you're
calling if you're calling into the high
level quicktime api's then chances are
you don't have to do anything in order
to gain support for h.264 in your app
however if you call some of the lower
level API such as the media level you
might or might not have to change your
application depending on what specific
API is your calling and if you access
the sample level api's yourself if you
access the samples yourself you'll have
to change your app all right so as I
said if you call the high level api's
you'll be able to gain access to h.264
with no changes to your app and some
examples of the high-level api's are the
various views that quicktime provides
such as the new cutie movie view part of
the cutie kit the new H I movie view and
the older carbon movie control if you
use those views you're all set you don't
have to do anything in order to use
h.264 if you call the movie and track
level api's you'll still be able to
playback step through the movies edit
them navigate through the movies without
any changes in your app so let's have a
look at that
ok
here I have a h.264 movie that I've
compressed and if I drop it on the
currently shipping version of adobe
golive I can go and it opens as expected
and it plays the video
10 to page 394 okay and just for fun I
can bring up this timeline editor in go
live and if I click around in there I
can click around in the movie and step
around in the movie networks
I can also take this movie into
quicktime player select the portion of
the movie go over here copy that small
portion of the movie and this is the
currently shipping version of word I can
create a new document and if i go and
paste then that small section of the
movie is pasted into the document and
word will play it back for something
moving out there it was add event so use
the high level API if at all possible
because when you do that with each new
release of QuickTime you'll gain a lot
of new functionality and usually you
won't have to make any changes to your
app and they'll just magically work can
we go back to slides okay so if you
can't just call the high level api's and
you have to call some of the lower level
api's well in order to use this be
framed content you might have to change
your application so if you're calling
the media level ap is and you call API
is that don't reference time durations
or sample flags then those api's haven't
changed you don't have to do anything
however if you do make calls at that
level that reference time duration or
sample flags you it won't work with the
new be trained content and you'll have
to change your application if you
obviously can still use those api's it
will still work with content that
doesn't contain be frames but once you
start trying to use it would be frames
those api's will return errors and we've
added some new era so you know that
that's the cause of the problem instead
of using those older api's we've added
some new api's for do you use if you use
sample references then we've added a
whole new set of QT sample table api's
which I'll describe later
and for everything else we've added
similar-looking api's which i also
described later okay and one last thing
before Sam comes up I want to stress
that these new api's work for content
that contains be frames but they also
work for all the other content too so
please switch to them whenever you can
and here's Sam to talk about be frame
thank you hi I'm Sam let's talk for a
moment about video compression
technology lossy video codecs provide
you with a trade-off between quality and
bitrate if you want more quality you
need to use more bits if you can't use
so many bits you might have to accept a
lower quality and we're constantly
trying to improve this quality curve and
move it towards a higher quality at a
lower bitrate and we do this by adding
more tricks the Thomas said many of
these tricks are self contained within
the Codex but some of them require
awareness outside the codec in other
parts of the system of the modules and
that's what we're going to talk about so
suppose you had some video that you
wanted to compress here's a clip of some
guy parking your car it's prosaic but
this is educational so we could encode
each of these frames independently if we
did this this is called a spatial
compression because we're only
compressing in the spatial domain if
every frame is called is it
self-contained we call it keyframes we
call it syncs samples we call them I
frame I stands for intra and random
access is fast which is good but the
data rate isn't so good because if we're
compressing everything independently
we're not taking advantage of the
similarities between frames I've got a
in four and five of the previous six on
the screen here and you can see that the
tree and the building of practically the
same and the car has moved it a little
but it's it's mostly the same so we can
improve compression performance
substantially by using one frame at the
basis for describing another frame and
the jargon for this in correct
terminology is temporal prediction the
way it works is you start off by saying
these are the areas of the new frame
that are similar to areas of the old
frame for example in the example that
I've got here we're describing frame 5
in terms of frame for so first in the
yellow parts of the screen we're saying
these pixels are more or less the same
as the pixels in the same location in
frame for and then the green part that's
what we're saying these pixels are like
the frame if you just move over so many
pixels to the right but these are only
first approximations there's still a fix
up that has to be added because the
wheel is turning and the reflection
doesn't move with the carrot sort of
feeds to stay in place and so you can
see that there's an additional image
that must be added as well this is
called the residue the first part is
called motion compensation and the
fix-up is called the residue you'll
notice that there's a strip of that car
that is in frame 5 that wasn't there in
frame for that this part might need to
be coated from scratch encoded from
scratch so this is what we get if we
encode the last five frames out of those
six as a motion compensation piece and
then a residue we call these different
frames or P frames p stands for
predicted well we get better compression
because motion compensation can be
described extremely compactly relative
to describing something from scratch and
as a result the bit rate that we get is
a whole lot better
there's something else that's worth in
paying attention to here which is that
each of these frames the encoded frame
can only be interpreted with reference
for the previous one which means each
frame in a way depends on the previous
one if you want to decode and display
the last frame in this sequence and you
haven't decoded the previous frames will
you better go and do that right away so
random access into a sequence like this
could be somewhat expensive so when we
have I frames and P frames or keyframes
and different frames this is what it's
this is what it's like we call it IP
four ice creams and P frames it gives
you much better compression than I
friends only but random access can be
somewhat slower for example if the key
frame rate is is 20 frames you might
have to decode 20 frames before you can
display the one that you want to see
another thing to pay attention to is
that gradually appearing images are
constructed incrementally like a car in
this clip the image of the car that you
see in frame 6 was constructed out of
strips in five different frames this
might not be give most efficient way of
doing things so let's introduce an
alternative what if we encode the first
frame in that sequence as an iframe
self-contained and then go all the way
to the end and encode frame six as a P
frame based on that iframe well if we
done that first then we can encode all
the frames in between using motion
compensation part from the previous
frame at the yellow piece and part from
the later frame which is the blue piece
and you can see that these frames are
almost entirely motion compensation very
little residue to encode here's what it
looks like if we encode are six frames
which with all of the four frames in the
middle encoded as be frames which stands
for bi-directional prediction based on
the frames at the end
again these four frames in the middle
are almost entirely motion compensation
and another thing to notice about them
is that random access can be a bit
faster for any of the frames in the
middle starting from scratch if you
needed to display those you only need to
decode three frames along the beginning
the one at the end and the one in the
middle so these are be frames they refer
to information in a future frame as well
as to have information from a previous
frame and the good news about be frames
is that they let us enhance the
compression quality improves the lower
the bitrate even further but there's
more there's two benefits you get better
compression especially when objects
appear gradually the reason that we've
described and also random access is
faster as i illustrated accessing any of
those frames the worst case for random
access is having to decode three frames
another example to think about is if
you're playing in fast-forward you could
skip the friends you didn't need to
display if they will be friends well you
wouldn't have to decode them at all the
jargon for this is temporal scalability
but there's something strictly about be
frames the decoder that's displaying
these can only use motion compensation
from friends it's already decoded if one
of those frames is going to be displayed
later then that means the order in which
frames are decoded and the order on
which frames are displayed is different
so the frames have to be reordered
somewhere and this reordering is why
your application might need to
understand be frames so some of you have
been working with ipb codecs for some
time and this is no news to you but I
want to speak to you guys for a moment
because there's an important point so I
want to drive home with some other ipb
codecs you can implement playback using
a small finite state machine
in which is driven with different
transitions for iframes p frames and b
frames and this works for mpeg-2 because
only one frame can be held at time
there's only one future frame that would
ever need to have been decoded but not
displayed and this is not true for h.264
your stand of rage 2 64 allows up to 16
future friends to be held in fact h.264
allows the encoder of an enormous new
amount of flexibility and how it chooses
to find material from motion of motion
compensation pnb friends can depend on
up to 16 frames not all iframes reset
the decoder completely we have a new tag
for those the name and age 26 boys IDR
frames which stands for instantaneous
decoder reset you care some be frames
can be used to provide material for
motion compensation so not all be
friends can can be skipped and some
iframes and P frames can be skipped
because they don't count for motion
compensation throws on the left you can
see that the pattern for mpeg-2 is
fairly regular and in fact you can
entirely derive the dependency graph of
the frames just knowing the frame
letters and that's how the finite state
machine works everything can be worked
out from the frame letters but with
h.264 the encoders free to do things in
a much wilder way and just knowing those
those letters those friend letters
doesn't let you derive the graph in fact
as you can see it's I don't know that
you really want to try and store that
graph unless you were the decoder itself
so the new rules if you want to work in
in h.264 it's no longer sufficient to
use the frame type letters to derive
frame dependency information and the
dependency graph instead you should pay
attention to four things
first is a frame of synchronization
sample not all I frames are synced
samples and this is because an iframe
may not if you decode an iframe that may
not prime the decoder to with all of the
motion compensation material that it'll
need in P&B frames that follow it so
instead you only want to you want to pay
attention to whether a frame is a think
sample which is equivalent in the new
world to an eye dr frame now the pool is
a frame droppable now our some be frames
are not droppable and some inp frames
are and that's the information that if
you're outside of the codec that you
really want to know you want to know
whether you need to decode that frame in
order to get a random access number
three what autism of the frames we
decoded in and sometimes it's also
sensible to include information about
what time the frame should be encoded at
a decoded ad before what time should
each frame be displayed out and this is
how we know how the friends are
reordered so to summarize dependencies
between frames are getting weirder but
it's all in those the cause of improving
the quality versus bitrate trade-off
number two ITB means someone needs to
know about frame reordering and if you
work with the compressed media it could
be you and three some of the convenient
rules things like the one frame delay
and the ability to build this little
finite state machine although they're
okay for mpeg-2 they don't hold 3264
back to land
[Applause]
so what changes do we have to make in
the movie tool boxing is in order to
support be frame well first we had to
change the file format for those of you
who cares and parse the files yourself
we've added four new tables in the
QuickTime files when there's be frames
one other thing to note is that samples
are stored in the files in decode order
now they've actually always been stored
in the files in decode order but decode
order and display where order we're
always the same before so you couldn't
tell the difference we've added a bunch
of new AP is to distinguish between
decode time and display time because of
Sam explained with be fine content
they're not necessarily the same anymore
and some of those api's take something
called a display offset and a display
offset is simply just the difference
between the decode time and the display
time and just note that sometimes
display offset is a negative number okay
so for example where where you used to
call sample num2 media time if you're
processing be frame content that call is
going to return an error so instead of
calling that you should call either
sample num2 media display time or sample
num2 media decode time and which one you
call depends on which time it is that
you want we've added a whole bunch of
new sample flags and increase the size
from 16 bits to 32 bits most of them are
optional but the main one that you need
to know about is media sample droppable
which usually but not always indicate
the bee frame if you need to know
whether a movie or track contains be
framed content don't hardcode in you
know if track is encoded with h.264
because not all h.264 movies necessarily
contain d frames and we might add new
codex in the future that use B frames so
instead call media contains display
offsets
okay so if you're using sample
references we've added a whole new set
of api's the Cutie sample table ap is
for you to work with these cutie sample
tables represent media sample references
in a movie the reference counted still
use retain and release similar to other
Apple ap is and you can use these api's
for all media types of audio video texts
etc not just video and be framed in
order to get sample references out of a
movie called copy media mutable sample
table to get the sample table from that
you can get the number of samples in
there and then you can index through the
samples and get information about about
each sample such as data offset sample
flags whole bunch of things and these
samples are in decode order same as
stored in the files in order to add
sample references to a movie called QT
sample table create mutable to create an
empty sample table and then add your
sample references to the sample table
similar to the old add media sample
reference call and when you're done call
add sample table to media to actually
add the sample references to the movie
we've also included a whole bunch of
more advanced sample table API so if
these aren't quite what you need you
could have a look in the headers and the
documentation to see that if what we
provided helps and here's Sam to talk
about changes in the ICM
I'm still them so and explained that if
you call high-level api's you might not
need to worry about be friends because
they might not make a difference for you
but if you like the kind of application
that deals with compressed frame data
yourself or if you write a codec then
there's not hiding this information from
you and you wouldn't even want to anyway
so let's talk about how the API is that
this layer might need to make have there
been changed to support befriend codecs
there are three things that are missing
from the current AP is in order to
support be frames frame reordering new
frame information like the droppable
flag and multiple buffers in flight at
once
the image compression manager provides
api's votes for compression and
decompression it provides high level
client AP is and also defines the
interface to decompress their own encode
and compressor components underneath
let's go through each of these in turn
em tiger we're introducing a new multi
buffer API for compression we're also
extending the existing G world based
decompression API to support b-frames
and we're introducing a new multi buffer
decompression session API the these new
multi buffer api's are based on core
video pixel buffers underneath we're
introducing a new multi buffer API for
compressor components and we have
extended the decompressor component API
to support b-frames so if you write code
that works at any of these levels then
you'll want to look at this stuff ping
let's let's start here with the new
compression API the existing G world
based API is one frame in one frame out
what this means is that the compressor
has to give you back the compressed data
for frame one before it'll get the image
data for frame to this makes it really
difficult to reorder frame also the
current compression API its can almost
completely unaware of time so the new
compression session API is based on core
video pixel buffers instead of G world
if you're using a new style compressor
component Abby frame aware compressor
then multiple buffers may be in flight
at once this allows the compressor to
reorder the frames and also in code B
frames it also allows the compressor to
have a look ahead window for better rate
control time stands can flow all the way
through the compression chain and the
new API supports multipass encoding so
where as in the previous API with you
the dual based compression API you draw
each frame into the same buffer and then
each time you'd pass that buffer off the
ICM with the new API you take a fresh
core video pixel buffer each time put
your your source frame in it and then
pass that over to the compression
session and it will retain those until
it's done with them and then it will
release them so you can release the
buffer as soon as you passed it to the
compression session
so this uses standard retain and release
semantics and you could just general
allocate these each time if you wanted
but mapping and unwrapping these large
pieces of virtual memories that you use
for pixel by pixel buffers it can
involve some memory management overhead
and that can be somewhat expensive so we
have a pixel buffer pool of products or
video that does efficient recycling so
this is how reordering happens you push
source code video pixel buffers in
display order and the session will call
a callback function that you provide
with the frames that have been encoded
in decode order the session will also
call you when it's releasing those pixel
buffers so you can perform your own
frame buffer recycling if you want
now in some cases you might not want the
compressor to hang on to too many frames
at a time perhaps your networks
application like a video conferencing
application and there's a maximum
latency before you have to send those
frames out over the network well in
those cases you can set a maximum number
of frames that the compressor is allowed
to hang on to it once and you can also
make an explicit request that forces the
compressor to finish encoding the frames
that it's currently hanging on to the
new compression special API has a bunch
of other features that make it a big
jump forward it's easier than before to
add encoded frames to a movie you can
use the fixed or flexible dot pattern if
you know what a got pattern is it's not
politics don't worry you can set a CPU
time budget you can set data rate limits
and as I said before it supports
multipass encoding in fact the movie
explored oh that's in the plague of
speed supports multipass encoding as
well in the final version of Tiger the
compression session API will be
compatible with existing compressors but
no b-frames will be generated however in
the tiger see that you've got it's not
yet compatible with existing compressors
and also we don't have an h.264
compressor so it would be a good idea to
try and get on our seed program if you
want to try and exercise this API so
what's next let's talk about what's
underneath the compression section API
which is a new compressor component
interface new style compressor
components still use the full character
code imco but it supports three new
component calls for be frames and if you
want to opt in for multipath support
there's three more api's to implement as
well it's also talk for a moment about
decompression
ping there we go so there's two flavors
of decompression API that we have in in
the jeweled based mode we have
synchronous api's and these are all one
frame in one frame out we also have a
second mode for decompression which is
called scheduled decompression and the
scheduled decompression you can cue
multiple frames each of which with a
frame time and when that time arrives
the frame is triggered and we decode it
and display it with B frames as we've
we've gone through the decoder and the
display order can be different in fact
you may need to decode several frames
before you come to the first frame to
display so immediate one frame in one
frame out api's are the good match once
we look at that example of the little
clip of that car parking in the decode
case the first frame happens to be the
first frame both to display and decode
so we decode it and then we display it
but the second frame and decode order
doesn't need to be displayed until x 60
but it does need to be decoded before
the next frame in decode order which is
the frame of frame at time 20 and it
needs to be decoded before the frame up
of that which is the x 30 in the 10 x 40
and 125 x 50 after that it's okay for
that frame to be displayed at x 60 so
the new model for doing decompression is
that you always Q frames in decode order
and then you provide the display
timestamps so that we know how the
frames should be reordered as before
friends can be scheduled against the
time base in which case the frames will
automatically be output
wanting to that time base when that
trigger time happens but we have a new
mode called non-scheduled display times
in which case there's no time base and
you have to make an explicit call to the
ICM to say I would like this frame back
you can also optionally supply decode
time stamps which are a hint saying this
is when it would be a good time to
decode that frame so many of you will
have loops of in your code where you
decode some frames by calling the ICM
and generally the pattern has been you
go to all the frames in order whatever
there was only one order before you read
the frame into a buffer and then you
call the ACM to decode the frame and
decompress it immediately and then you
use that output frame somehow well I've
been saying immediate mode one frame in
one frame out is very awkward for be
frames at least if you want to get the
friends out and dig in display order
which is the automatic sensor to the to
the user so we need to enhance this a
bit so here we have an outer loop and an
inner loop the absolute cues frames in
decode order and the inner loop
retrieves frames in display order so
friends go in and decoder and you pull
them out in display order and there may
not be a one-to-one correspondence so
that's why we have the the alpha loop
and the inner loop the inner loop isn't
going to be run many times I'll show you
in a second one other thing because
you're queuing multiple frames you need
to load them into multiple buffers these
aren't call video pixel buffers these
are just data buffers and the ICM will
call you back to say when it's time to
release those because the codec no
longer needs them you can do this both
with the existing G old API and we've
also introduced this new multi buffer
core video pixel buffer based API called
decompression sessions now the
decompression session API does not
support any drawing operations it
doesn't do clipping it doesn't do matrix
transformations it doesn't do transfer
modes
only that other Gulf instead it just
gives you the buffers in the format you
want there's a flavor of this API that
support spending of buffers directly to
a visual context and there's one that
just gives you the buffets back so let's
do a demo wake up I am right-handed ok
so I have a little clip here I like the
harry potter trailer and i put out a
little bit of it in the middle it's not
very long but it plays in display order
see everything's moving forward the bus
is moving on down the road movie players
a quicktime player has not been revised
to understand be frames so i can said if
you're using the high level api's you
don't need to change what's more I'm
going to step through this by pressing
the arrow key and you notice that the
frame is moving forward so what's
happening here is we are telling the
movie to move forward to new to move to
new movie times and rendering the frame
for each of those times if you do that
in your application if you step through
with something like set movie time and
movies task appropriate then you don't
need to worry you'll get the frames out
and display order everyone will be happy
great job but this movie does have the
frames the frames are being reordered I
have a copy of dumpster here it's great
when we get to demo dumpster many of you
will know what dumpster is salvi one
dumpster is a tool that shows you the
internal structure of movie files we
actually of the movie header not the
place where the the media compressed
data is but the movie header itself and
we've modified dumpster this version of
dumpster to show you the new be frame
tables that Anne was mentioning you
probably can't see the detail so I'll
just pop them open so you can
take my word that they're there there's
a is a piece of information here that
tells you the size of individual frames
these are numbers varying between 40 and
80 kilobytes it also stores the timing
information each of these frames has a
duration of 1,000 and this is from film
so the the speed is more or less 24
frames per second and so the time based
here is something around 24,000 and each
frame has the same duration 1,000 we
also now store this is a new table for
tiger you can also store the display
offsets and you probably can't see but
the first 10 the next one's a thousand
then minus a thousand and a thousand and
minus a thousand what does this mean
well you think about the durations which
are now interpreters as decode durations
the frames are at x 0 a good 0 1000 2000
3000 and we add the display x will be
added to display offsets to those to
find the display time and we'll c 0 2000
1000 these first are these the second
two frames are reordered their exchanged
in pairs will same for the next one so
here's the difference between the decode
and read the display order the bigger
I'm an adopter the decoder is one two
three four five and so forth the display
order is 1325 476 and so on okay oh how
about keyframe there's also information
here that shows you the sink samples or
keyframe there's one of them it's the
first frame so this is a new version of
dumpster that shows you the new tables
this version of dumpster I believe is in
the disk image for this session if you
want to use it with movies that you've
encoded yourself so
what's the season code this is a little
command-line tool which we've included
in the disk image for this session it's
a command line tool that steps through a
movie and does that new kind of loop
that I was describing it can cause the
new decompression section API to decode
each frame into a core video pixel
buffer and it takes command line
arguments I've made it scaled afraid and
down it's so big there's not a little
space for debugger and it also takes a
command line path to the movie file so
so let's try it out I'm pausing in
between the frames so that you can see
them so they don't race path wasn't
there something odd down did you see
that I'll play it again it's Harry
Potter but didn't come from he was allow
me up from this movie in quicktime
player what's going on well this little
tool is going down to the media layer of
the movie and it's accessing the sample
table directly but when you do that
you're circumventing the the higher
layers and that the track layered
there's a thing called the Edit list and
when I cut crop that little piece out
and copied and pasted it out of the
balanga movie I had edited out the frame
of Harry Potter but it still inside that
sequence of frames at the media layer so
we're going around the Edit list if we
wanted to make the movie sets of the
movie in the same way that it appears to
someone who's using it then it would
look we'd have a bit more code that
would have to walk through the ad list
or an easy way would be to do that what
I did with the arrow keys stepping
through the movie so briefly let's have
a look at this in in the debugger
that's with this code little and what
you have what so we're doing some things
that aren't very special about be free
movies and I'll skip over those we are
opening the movie we are getting the
video track they're getting the image
description and then we're making a
window that's the scale size of that
image description there them there's a
window in the background there nothing
big yeah and then we're making a
decompression session i want to show you
this yep local variables love them so
we're creating a pixel buffer attributes
dictionary this is how we're describing
the kinds of pixel buffers we'd like to
get back from the decompression session
and we give it the width and height that
we'd like it to give us buffers in
because we would like to be to be a
specific width in a specific height and
we ask for one specific pixel format we
could also ask say here's a list of
pixel format piglets have a best out of
these we also pass a call back when
we're creating the decompression session
this callback is called with the encoded
pixel buffers there's no G world here
the callback is is called the fresh
buffer for each time and when you
release them they can be go back into
the pool that the ICM is using to
recycle pixel buckles that tracking
callback is also called when we want to
recycle when the data buffers can be
recycled ok so now we've got a
decompression session but nothing's been
decoded into it now we've got to have a
look at the media and pull out those
frames we get to the number of samples
in the media and we allocate some
storage for each of them now here's a
first interesting question what's the
first frame that we want to display well
we're starting at time 0 in display time
but that might not be the same as the
first frame so we need to go and do a
mapping and this is calling the new API
and Tiger previously we'd call media
time to sample them to get this
information but now we need to specify
which kind of time you want to talk
about so it's media display time to
sample them
and i think i have debug two expressions
aha my little expressions would have
some of these variables are
uninitialized so you can't see this but
I'm telling you that number in red is
one so that's the next sample number we
want to display that's also the first
some sample that we're going to be code
that's just like the as we saw in
dumpster ok so we're approaching the
outer loop here the outer loop q strains
in decode order that's just ascending
numbers in sample time so if we use plus
plus to get to the next one so we
translate that sample number to a decode
time to that we can get the size of that
sample we allocate some storage we load
it into that storage along the way we
have found out what the decode time and
display offset are we add those to find
out the display time for that frame and
now we have enough information to queue
that frame with the decompression
session and we hand it to the itm and
nothing happens yet nothing happens
because we're using this new mode where
you pass display times and then
non-scheduled there's no time base
they're not going to come out until we
ask for them back so the next question
is well have we include the frame that
we need to get out and so we just
compare these two numbers we have queued
sample one and we want to get back
sample one so yet let's go into the
inner loop where we retrieve samples in
display order and we do that by saying
here's the non scheduled display time I
wouldn't like to get the frame back for
and there's a frame alright next we want
to know what's the next time that we
want to display so we call get media
next interesting display time you might
be familiar with get media next
interesting time that we have to say
which time we want and then we translate
that time to a sample number what we
call from time 0 to time 1,000 but
that's frame 3 so let's go around this
loop again I've got a break point here
so the next thing that we decode is that
we share your 50 code is fine too
and now we've cured frame too but we
want fine three so we have to go around
again we'll skip over the inner loop and
come back to the outer loop will link
you frame 3 and then we'll ask back to
frame 3 which is in the queue now and
here we go we'll move the the bus will
move a bit of the road woman I guess
when the bus and the next kind of two
thousand and that's frame too so
remember this this order we're going
through the frames and display order
thanks to the get media next interesting
display time so forth we're going
through these trains in order 1325 476
so as we go through we're going to get
back frame to now so okay big deal we
carry on in this pattern because the
frames a shot of exchanged in pairs will
Q two frames and then retrieve two
frames and then we'll q two frames and
retrieve two frames and it's going to
carry on like this I'll clear that break
point and just continue and in fact
here's us going through the rest of the
frames and his Harry Potter great note
okay so one more thing it's not like a
Steve one more thing I'm sorry we've
enhanced the decompression component
interface to support b-frames this is
all based on the base decompress so we
introduce the basically compressor six
years ago in QuickTime three it's been
helping you since right to video codecs
the base codec helps by implementing the
cue that hold those scheduled frames and
now it'll also help you with the frame
reordering kabhi frames is the new rules
for be frame aware decompresses if you
want to write one yourself you opt-in in
your initialize function by setting a
flag you classify frames in your begin
band function which means that you say
whether the frame is a keyframe a
different frame or a drop a whole
different frame it's always been a good
idea to do this but for be framed
mandatory third we've split the work in
the draw band function this used to be
work we're both decode and display
happened and then we separated those and
decode happens in the decode band
function and display happens in the draw
band function and only one of those will
get cold if we just need to decode the
frame in order to prime it in your
inside your decoder final mode and this
is not just for D phone colleagues that
in fact any codec that wants to work in
the pen player now don't cache the
pixmap base address in begin band it
might change between begin band and draw
band and then you draw in the wrong
place so these are the new API is we've
provided an image compression manager
for multi buffer support and 4b frame
support so there's a video track blue
the exporter in the feed I encourage you
to try it I encourage you to try out the
new AP is that exercise that because the
new api's will work with the be framed
stuff and you'll see how that works also
exercise your application and examine
whether your application needs revising
order the cope with
befriend content and it gives you one
small warning about one bug because it's
likely to bite you and I'm a kind guy
the bug is this the video track movie
exporter only creates video tracks it
doesn't copy the soundtrack so it's
likely that you're going to want to go
and extract the soundtrack and paste it
into the new movie on with the add
command so they're next to each other if
you do that save the movie but don't
state it's self-contained and save as
self-contained is now the default in the
new quicktime player carefully switch it
back to say the reference or what used
to be called save normally the reason
for this is that when you say to movie
self-contained we do a thing called
interleaved flattening we interleave a
half second chunks of audio and video
and the code that does this it's got a
bug it doesn't do the right thing for
h.264 and your the movies that won't
play properly they're not going to cause
any harm they just won't play right so
avoid saving those at movies
self-contained so more information both
on the CD the DVD and on the net at
connected apple.com you can download a
bunch of documentation for this there's
a what's new in in QuickTime 6.6
document in so it's a big 60 megabyte
Tiger documentation I'll download it
here while Apple's think the bandwidth
also download the disk image for this
session you can go to connect oracle com
login and click download software and
see what's new and it'll be there under
the developers conference stuff the api
reference won't be updated until they
Tigers final and one more thing it's
another dull one more thing just around
the back here in the hands-on labs the
QuickTime for graphics media include
time lab we have a special extended
hands-on lab time so that you can talk
to me and other folks about ipb and
about visual context and that's starting
more like right after this session
and that will go on till six-thirty and
they'll be tearing down everything else
but they're going to let us stay in the
room so that we can help you so come
along also seeding you've got the pilot
feed it's possible that will do for the
cook time seeds if you would like to be
involved in such seeds please send an
email with your name company product
technology interest to quicktime seeded
apple com I have some reminder cards I
have to reminder cards that i can give
you if you come and see me after this or
you can contact your friendly evangelist