WWDC2013 Session 612

Transcript

X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
[ Silence ]
>> Hello and welcome
to the advanced editing
with AV foundation session.
My name's Scott Johnston
on the AV Foundation Team.
[ Applause ]
So today we have two
main topics for you.
Introducing custom video
compositing, and secondly,
a really useful debugging
technique
for debugging compositions.
So let's start with
custom video compositing.
First of all, we're going
to take a few moments just
to review the existing AV
Foundation Architecture.
AV Foundation has been
available for a few years now.
Ever since iOS 4 and Lion.
It's used in a wide variety
of video editing applications
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
in the app store
including those from Apple.
It provides core and
key fundamental tools
to create cuts only
temporal compositions,
video mixing compositions,
and of course audio mixing.
In Mountain Lion and iOS
6, it's quite possible
to create a wide variety
of visual effects.
Dissolves, scaling,
transformation effects,
layering, picture in picture
effects, wipes, and slides.
And now, new for
Mavericks and iOS 7,
we'd like to give you
guys the opportunity
to place your code inside
the video composition
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
render pipeline.
So giving you direct access
to the pixels with the CPU
and the GPU to colorize,
warp, texture map.
Achieve a much wider
variety of complex effects.
And we'll see more of the
sea otters later [laughter].
So custom video compositor.
But what is a video compositor?
Well, it's basically a
chunk of video mixing code.
Which takes zero or more
source frames blending
or modifying the pixels to
deliver a single output frame.
We'll find the existing
built-in compositor inside the
composition architecture.
So looking inside
compositions as a whole,
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
we first see the AVComposition,
which is our temporal
cuts-only edit.
We place time ranges of
source assets in tracks.
Here we have two
tracks, A and B,
and in this example we're going
to be dissolving from track A
to B and back to A again.
In order to achieve
the dissolve,
we have the AVVideoComposition,
which stores the video mixing
parameters and time ranges.
So let's take a closer look at
the video composition itself.
Now for these time ranges
in the video composition,
they're instruction objects,
which store the mixing
parameters.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
But some instructions
are simpler than others.
For example, one source track.
Another can take more than one.
And looking at the first
simple segment there,
that first instruction.
It's effectively just passing
through frames from track A
through the compositor
rendering the first segment.
Now the more interesting
segment is
where the initial
dissolve happens,
where we take two frames,
and using the parameters
from the instruction to tell
the compositor to blend them.
And with them, we render
the rest of the edit.
Focusing even more
on the compositor,
we have multiple source
frames and a single frame out.
Along with the source frames,
we have the instruction object
with the mixing parameters.
In this case, a dissolve
which we encode
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
as a property in
the instruction.
In this case, in ramp
from one down to zero.
At this point, we're
at the compositor,
so let's introduce the new
custom compositing API.
The built-in compositor is
green, you can now replace that
and do your own stuff
in Mavericks and iOS 7.
In addition, you can implement
your own instruction to gather
up and carry your own
set of mixing parameters.
It's bundled up with the source
frames in the request object.
So you'll be implementing the
new protocol AVVideoCompositing,
which receives the new object,
AVAsynchronousVideoComposition
request.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
And, in addition, you'll
be implementing the
AVVideoCompositionInstruction.
Now looking at the
methods involved here,
you'll be receiving your request
when you implement
startVideo CompositionRequest.
And then deliver the
frame once you've rendered
with finishWith
ComposedVideoFrame.
Now in the event of an error,
you can call finishWithError
or acknowledge a
cancelled request.
Okay, so what about the
format of the source frames?
How are the pixels arranged?
Well you've probably
looked in our header files
and seen a wide variety
of formats.
You'll probably only see a
small subset here, but generally
when we're decoding H.264 video,
you'll receive YUV 8-bit 420.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
But that could vary and
your code may not be able
to render into YUV.
So you can express the exact
format that you require
by implementing source
pixel buffer attributes.
Now here you can provide
an array of formats
that you would like to receive.
In this case, our example
requires 32-bit BGRA,
and by doing so the
system will, on the fly,
convert the source
frames into that format.
Similarly for the
output pixel format,
you implement
requiredPixelBuffer
AttributesForRenderContext.
And we're requiring
32-bit BGRA here.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
Now in order to get hold of a
new empty frame to render into,
we go back to the
request object,
ask it for the render context,
which contains information
about the aspect ratio and the
size that we're rendering to,
and also the required
pixel format.
So we ask it for a
new pixel buffer.
Now this comes from
a managed pool.
So the frame is allocated, we
can then render in to their,
in this case our dissolve and
send the output on its way.
Now let's jump into Xcode
and we'll have a demo.
Okay, so, in Xcode, here we are
with the Hello World equivalent
of a custom compositor.
So, here we're implementing
our AVVideoCompositing protocol
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
and those three methods that
we called out in the slides.
So again, we're requiring
BGRA and it's simply a case
of putting in that value.
Now remember it's an array, so
we can, if we do support YUV
in some flavor, we
can drop it in there.
Similarly for the output,
in this case it's identical.
Now the action happens
inside the rendering function
startVideo CompositionRequest
receiving our request object.
Let's open this guy.
Okay, so, the first
thing we do here is -
it's slightly hard coded
just for simplicity -
but we fish out the
source frames
from the passed in
request object.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
Now you'll get these as
Core Video pixel buffers,
and we'll try our hardest
to make sure they're
IOSurface backed,
so they're going
to be on the GPU.
The next thing we do
is we allocate a fresh,
new pixel buffer to render into
so, as we saw in the slides,
we ask the request object
for the render context
and allocate a new pixel buffer.
Then we do the rendering and
deliver the output frame.
So before we look at what's
inside that little render block,
let's run up and
see what this does.
Okay here we are
with a deer again.
Okay so this app, a simple
app, and it's really simple
because we've managed to use
the new AVPlayerView from AVKit,
which gives you a nice, simple
player view with a scrubber
and various other goodies.
So let's play and
we'll see what happens.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
Okay so we've got a
really simple wipe,
with that white line
coming down.
Let's play it again
if you missed it.
Okay so there's nothing
particularly exciting,
but the main thing is this will
illustrate how we can get hold
of those pixels, in
this case which is CPU,
so the pixel buffers you can
lock to the base address.
At which point, you can then
ask for the usual things
like the bytes-per-row
and the base address,
and at which point we can then
start splatting the pixels.
Now, you'll see this rather ugly
- and don't do this at home -
memcpy and memset stuff.
Just purely for simplicity
here, but I'll close that.
Now, up here, when we
play that transition,
obviously we're doing some
sort of animation and on
to calculate the vertical
position of the white band.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
We have to basically interpolate
across our time range to work
out exactly at what
point we are,
we use a thing called
tweening here,
we calculate this tween factor.
Now we're going to come back to
this in the slides in a moment,
but basically there's some basic
arithmetic going on in here
and it's all to do with
the time range associated
with the current video
composition instruction.
We'll come back to
that in a sec.
But this isn't a
particularly exciting example
of a compositor, so
let's have a look
at something using
the GPU, using OpenGL.
So let's run this guy again.
And this time we're going
to change the composition,
and I've got some pre-prepared
ones fortunately, so,
let's switch to this guy.
Okay, so, let's play
it this time.
Coming in quite fast, and
there's our otters again.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
I'll play that again.
It goes through quite quickly,
but we can do something
about that, we can change the
[pause] video mixing parameters
that we've placed
in our composition,
and I can reveal some controls
wired directly to properties
in our instruction, in our
custom instruction here.
In this case, stiffening
and damping.
Now these are going to influence
the way we're calculating
the physics.
In this case we're mapping,
texture mapping the
video onto a mash.
And then applying some
physics to the vertices.
So let's see if we can slow
this down a little bit, so,
[pause] let's play
that through again.
Okay, it's a little
bit different.
You can see the, as it comes in,
it's a strange, hypnotic effect.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
But this we can play
with it even more
by playing with the damping.
Woops. Okay.
Of course that was all planned.
Okay so, we've set up
the stiffness and damping
to be a little bit
wilder this time.
Yeah. [Pause] Okay, so
anyway, that's an example
of the OpenGL doing
something useful.
Let's switch back to the slides.
Okay we mentioned
tweening there,
which is an interpolation.
So let's have a quick
overview of that
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
in case you're not familiar
with how that works.
Now we'll focus in on
the second segment,
which does the dissolve
in this example.
Okay our instruction
contains our opacity ramp
from 100 percent
down to 0 percent.
Of course the interesting
properties in there,
as well as the opacity
ramp is the time range
and our first example
is arbitrarily starting
at 5 seconds with a duration
of ten-thirtieths of a second.
Now in the first frame we've
only elapsed a few seconds in.
At which point, the passed
capacity should be 100 percent.
Now we calculate
this tween factor
by dividing the elapsed
time by the duration.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
In which case we calculate
zero, which leads right
to the beginning of
the opacity ramp.
Now things get a
bit more interesting
when we move forward one frame.
We've elapsed one-thirtieth
of a second in.
The tween factor is now
calculated to be 0.1,
which means we're a tenth of the
way through the opacity ramp.
By multiplying the opacity
by the tween factor,
we can then arrive at the
desired opacity for this kind
of point in time
with 90 percent.
We do the same calculation
for each frame all the
way through to the end.
Arriving with a tween
factor of 1,
which means we'll get the
exact destination opacity
of 0 percent.
Okay, so there's some
performance considerations.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
[Pause] But before we get to the
performance, there's a caveat.
When your app on
iOS is backgrounded,
OpenGL won't be available.
So you'll have to
have a CPU equivalent.
Of course you can use the
CPU in foreground too.
So, the three booleans inside
the new composition instruction.
It's important to look at these
because there are
performance wins to be had.
First up, passthroughTrackID.
So some segments are, sorry,
some instructions are
simpler than others.
Some just take one source and
don't modify the frame at all,
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
they simply pass through
frames from track A.
In this example, this gets
you to the first segment.
Simply passing through
frames on track A.
[Pause] If we're not modifying
the frame, then we might
as well bypass the
compositor completely.
Now this is a power win.
Next up is
requiredSourceTrackIDs.
So let's say that first
instruction is actually
modifying those single
source frames coming through.
This say it's doing some
colorization effect.
In which case we do
require frames from track A
and we do want the
compositor to be called.
So we set required
source track IDs
to be track A, that
single track.
Now what we don't want
to do is just leave it
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
to be the default value of
nil, because that's equivalent,
that says, deliver me
frames from all the tracks,
and it's going to
be wasted effort.
So containsTweening.
This is about our
animation again.
Now consider this composition.
We have a moving
picture-in-picture effect.
But all the source frames
coming at us are the same.
Perhaps we've time extended
the source so we got a run
of identical frames coming
through the compositor,
but we're still doing animations
so every output frame
is going to be unique.
Because of that we set,
containsTweening to be yes.
[Pause] But if we leave
containsTweening as yes,
and then we stop doing
animation for some reason,
so we have a static
picture-in-picture effect.
The picture-in-picture is at
the top right for every frame.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
The same source frames
every time.
We're just re-rendering
identical output.
So by being careful to
set contains to be no
when you have no animation.
Then after the initial frame,
the system is at liberty
to optimize away those renders
and reuse the identical output.
And then we look at the
pixel buffer formats.
There's another performance
win to be had here.
I think, as I said maybe
earlier, if we can work
in a wider variety of formats,
in particular YUV 420
then do so, because then
that will avoid, generally
avoid a conversion hit
on this source frames coming in.
With output it's less critical
since the display, for example,
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
can accept multiple
formats, like BGRA and YUV.
There will be no conversion.
Okay, so we have some
sample code, AVCustomEdit.
In there you'll find
some OpenGL examples.
Okay, the second topic is
debugging compositions.
Debugging compositions,
well, it's quite possible
to construct elaborate
and complex compositions
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
with entirely legal values while
producing unexpected results.
Sometimes that's
just a black frame.
I like to think of this a bit
like building an OpenGL scene,
but inadvertently pointing
the camera view vector
in the wrong direction.
In all these cases, again,
there's a black frame.
How do you debug that?
What's the best way?
Well we found internally
something really useful.
Let's just consider
the types of problems
that you may encounter[Pause].
First of all, and probably
one of the worst ones,
is leaving gaps between
the segments
in the video composition.
It will almost certainly
result in black frames
or just hanging onto
the last frame.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
And then just getting that
alignment in the tracks wrong.
Maybe it's a rounding
error when working
with CMTime or something.
Or misalignment between the
video composition time ranges
and the AVComposition
track segments.
Or a mistake in one of the
ramps, too long for example.
Or an error in the
transformation matrix
and spinning off your layers
into the ether beyond the
boundaries of the output frame.
[Pause] As I've said we've
found something really useful
internally and that is to
visualize the composition.
Rather than looking at the
code, you look at the picture.
So let's switch back
to the demo machine.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
I think I see that, yeah, okay.
Alright so if I run,
little lagging here,
hopefully it doesn't crash.
Okay, so, we're looking
at our simple wipe again.
Now let's see, this is
AVCompositionDebugView.
This is going to be sample code.
I can display it like this.
And what were looking at here,
get it big enough here, okay,
so, what we're seeing
here in green
and dark blue are
AVComposition tracks,
and green is the video tracks,
dark blue is the audio tracks.
And remember this is
the cuts only part.
So our source media starts
here for video track 2
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
and there for video track 1.
And then down here's
the video mixing part,
is the AVVideoComposition
containing our instructions,
in this case this
instruction's made use
of that passthroughTackID
so we can label it as such.
And then into the
custom instruction.
And then at the bottom
is the audio mix.
Now you see the yellow
volume ramp there.
So, this looks like it
could be quite useful.
So let's put it to the test.
Let's give it a composition
that's broken in some way.
So let's switch to our
pre-prepared compositions.
Alright, I wasn't going to
show you the debug viewer yet.
Let's play this guy.
It's going to dissolve
into the deer,
there's something
visually wrong there,
let's just play that
slowly through.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
If we dissolve away from the
boat, the deer comes through
and it jumps straight
through that is not right.
Okay, so we've got
something wrong here.
Rather look at the code,
we'll see if we can
spot it with this guy.
Okay, you can probably spot that
straight away and you can see
that for video instruction,
the dissolve,
we can see the opacity ramp
descending all the way here.
So the video instruction
starts here and ends there,
so that's supposed to be
our dissolve, which is going
to take two source video tracks.
One, that starts here, it
does last for the duration,
but this guy is too short.
[Pause] So.
And I just happen to know
where that problem is.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
And your bugs won't
be this easy.
I tried to obfuscate
that, but [laughing].
[Pause] Okay.
Switch to the, hopefully
now fixed, composition.
And there we see the
first video track.
It looks good.
Now we get the dissolve.
Awesome. But there are worse
ways for the composition
to be broken than that.
In fact, let me just
run that again.
Okay so let's have
a look at the,
the gap in the video
instruction.
Black. Not good.
We'll have a look there.
And here, now, we can spot
this guy straight away.
Down here we have a gap between
the dissolve instructions
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
and the last instruction,
which isn't good.
So, that should be nice and easy
to fix just like the last one.
Now this guy here.
[Laughing]
[ Silence ]
Now the sea otters.
[ Silence ]
And there we go,
job done, fixed.
Okay. Okay so that's
AVCompositionDebugViewer,
which is a sample for
Mavericks and iOS 7.
Now we've made it simple so
that it's particularly easy
for you guys to drop it
into your app and modify it.
Extend it to draw your
own video instructions.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
Now remember, it's just a
noninteractive view, okay.
Now it's going to help you
spot those overlaps and gaps
as we saw, you know,
in the tracks,
the video instructions,
and the audio.
But don't forget the validation
API that we have in there,
the AVVideoComposition
ValidationHandling protocol.
So you want to implement that
and then you'll get callbacks
to see when we encounter
something a bit untoward
in your composition.
So what have you
looked at today?
Well, we've spent a few minutes
just reviewing the composition
architecture, to see where
the built-in compositor fits
in [Pause].
And we've discovered the
new custom compositing API
with the two new protocols
and the request object.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
And we've see how
important it is
to choose the source pixel
buffer formats and the output.
And we looked a little
bit at tweening
and interpolating your
video mixing parameters
across the time range
of your effect.
And we talked a bit
about performance,
so remember those three booleans
inside the instruction protocol,
have a good look at them.
And lastly, we saw
our sea otters again
in debugging compositions.
So that's
AVCompositionDebugViewer,
which is a great visualization
technique and much simpler
than looking at reams of code.
So over the next few months,
I want to see you guys drop
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
in some great OpenGL visuals
and I want to see them appearing
in those video editing apps.
It's going to be, we're
going to see awesome effects,
transitions, and video
generators[Pause].
So if you need some information,
then do e-mail our
evangelist, John Geleynse.
We'll have some related
sessions that you can catch
on video now at WWDC app.
Thank you very much.
[ Applause ]