WWDC2017 Session 511

Transcript

>> Good morning, everyone, and
welcome to Friday of WWDC.
[applause] Thank you.
My name is Erik Turnquist and
today Brad and I are going to
talk about working with HEIF and
HEVC.
So, first off, what is HEVC?
HEVC stands for High Efficiency
Video Coding and it is the
industry standard next
generation video encoding
technology.
It's the successor to H.264.
Now for the more important
question, is why?
Why is Apple going through all
of the effort to deliver a new
codec?
H.264 has been really good to us
for over ten years.
Now we thought about this a lot,
and we really want to enable new
features and unfortunately,
H.264 has reached the limits of
its capabilities.
We want to enable new features
like 4K and larger frame sizes,
high bit depths like 10-bit and
wider color spaces like Rec.
2020. Now, we want to do all of
this while lowering the bit
rate, not raising that.
So, how do we do that?
Well, we do that with HEVC.
So, now how much lower are the
bit rates we're actually seeing?
Well, for generally encoded
content we're seeing up to a 40
percent bit rate reduction for
H.264.
So, this is a really big deal.
And for camera capture, we're
seeing up to a 2 times better
compression compared to H.264
and JPEG.
So, another really big deal
here.
And we're making all of these
changes today.
So, if you've installed the iOS,
iOS 11 seeds, we've enabled HEVC
Movie and HEIF Image Capture by
default.
So, that means, many of you have
already captured HEIF images or
HEVC movies without even knowing
it.
And it just works on our
platforms.
Let's go over what we're going
to talk about today.
I'm going to cover the HEVC
Movie side of things, and Brad's
going to cover the HEIF Image
side of things.
We're going to cover accessing
this content, playing it back
and displaying it, capturing and
creating HEIF and HEVC Movies,
and then export and transcode.
So, first let's cover access.
So, many of you are using
PhotoKit and PhotoKit will
deliver HEVC assets for
playback.
So, if you're using
requestPlayerItem or
requestLivePhoto they will
deliver, or they will give you
automatic playback with adopting
new, any new APIs, so this
should just work.
PhotoKit can also deliver you
HEVC assets.
So, if you're calling it
requestExportSession, it will
transcode to the existing preset
you're already using.
So, if you're using one of the
dimension presets that used to
give you H.264, it will still do
that.
But we'll cover new presets
we've added for HEVC.
If you're calling
requestAVAsset, it will give you
access to the HEVC media file
and this will have an HEVC video
track inside of it.
Now, if you're to backup the
application, you want access to
the raw bits, so you're probably
calling it requestData, so I
want to make note that this will
actually contain the HEVC video
track inside the movie file that
you've receive, so you need to
be able to handle this.
Now that you have this content,
let's call about playback and
display.
HEVC playback is supported in
our modern media frameworks like
AVKit, AVFoundation, and
VideoToolbox.
We support HTTP live streaming,
play-while-download, and local
file back or local file
playback.
And we support MPEG-4 and
QuickTime file formats as the
source, and here there's no API
opt-in required.
Things should just work.
We support Decode on macOS and
iOS and now let's go over where
we have Hardware Decode support.
So, we have 8- and 10-bit
decoders on our A9 chip, so
that's the iPhone 6s and we have
8-bit Hardware Decode on our 6th
generation Intel Cores, that's
Skylake and that's the MacBook
Pro with Touch Bar.
We also have 10-bit Decode on
the 7th Generation Intel Core
processors and that's Kaby Lake
and that's the brand-new MacBook
Pro with Touch Bar.
We also have 8- and 10-bit
Software Decode fallbacks on
macOS and iOS.
So, now let's go over some code
you might have and let's convert
it to HEVC playback.
So, here we're playing "My
Awesome Movie" making a URL,
then a player and playing it.
So, this is the H.264 version.
And now here's the HEVC version.
There's no changes.
So, to play an HEVC movie file,
you don't need to change any of
your code.
We want to have you think about
a couple things.
So, the first is about Decode
capability.
And if you're asking the
question is there a decoder on
the system that can handle this
content, this API is for you.
This is useful for non-realtime
operations, like sharing or
image generation.
And it can be limited by
hardware support.
So, not all of our hardware
decoders support every frame
size.
Now, for the more important
question is about playback
capability.
If you're asking, how do I have
the best playback experience for
my customer, this API is for
you.
And many of you are already
using this API.
So, not all content can be
played back in realtime and we
have differing capabilities on
different devices.
So, if you want to have a one
stop shop for the best user
experience for playback, whether
that's 1x or 2x playback,
rewind, scrubbing, or fast
forward, this is the API for
you.
Now, let's go on to Hardware
Decode availability.
If you want to get the best
battery life during playback,
you want to playback on systems
that have Hardware Decode
support.
This will also get you the best
Decode performance.
So, we have new VideoToolbox API
that you can query, is there a
Hardware Decoder Supported for
this codec?
Here I'm showing you HEVC, but
you can also use it for any
other codec.
Now, for the final question for
playback, which codec do I use
for playback?
Do I choose H.264 or HEVC?
Well, if you're concerned about
delivering the most compatible
content or want to deliver one
asset that just works
everywhere, choose H.264.
Our platforms have supported
this format for over 10 years
and there's broad adoption in
the third-party ecosystem.
However, if you want the
smallest file size and latest
and greatest encoding
technology, like 10-bit choose
HEVC.
You'll have to decide what works
in your application.
And with that, let's move on to
capture.
So, capturing HEVC is supported
with AVFoundation and we support
MPEG-4 and QuickTime file
formats as the destination.
We support HEVC capture on our
A10 chip, so that's iPhone 7,
and now let's go over the A, the
capture graph that many of you
are already familiar with.
This starts with an
AVCaptureSession, this needs to
get data from somewhere.
You create an AVCaptureDevice,
you add it as the input, then
data needs to go somewhere.
In this case you're using movie
file to compress and write the
output file.
These are all connected with an
AVCaptureConnection and this
creates your movie file.
So, let's convert this into
code.
And many of you probably have
this in your app.
First, create an
AVCaptureConnect -- or
AVCaptureSession.
Here we're making a 4k capture
session.
Then you create the
AVCaptureDevice, add it as the
input.
Create your MovieFileOutput and
this does the compression file
writing, add it as the output.
And then startRunning and
startRecording.
And then we're capturing.
So, how do we opt in to HEVC?
Well, with iOS 10 we added an
API to check for the available
video codecs during capture.
And new with iOS 11 is you can
check, does it contain HEVC.
On supported devices, it will
return true and you can go ahead
and use that in your output
settings.
And if it doesn't support it you
can go ahead and fall back to
another codec like H.264.
Now I want to make an important
point here, is that order
matters with the
availableVideoCodecTypes and for
this seed we made HEVC the first
option.
So, that means, if you do
nothing else, you'll be
capturing HEVC content.
We really want to get you used
to handling this content.
Now, let's move on to Live
Photos.
So, we have the same capture
graph here, but we use our
AVCapturePhotoOutput, and that
makes all the Live Photos we
love and enjoy.
So, first let's go over couple
new Live Photo enhancements
we've done in the past year.
We now support video
stabilization, so no more shaky
playback during Live Photos.
We also no longer pause music
playback during Live Photo
capture, and we support much
smoother Live Photos up to 30
frames per second.
So, let's go over capturing HEVC
with Live Photos.
So, we have new API in iOS 11
where you can create
availableLivePhotoVideo
CodecTypes, see if it contains
HEVC, and it will return true on
supported devices.
Then if it does go ahead and use
it, if it does not you can fall
back to another existing codec
like H.264.
I also want to make note that
there's the same considerations
here, is that order matters with
the availableVideoCodecTypes and
for this seed we made HEVC the
first option.
So, again, if you do nothing
else, you will capture HEVC Live
Photos.
You might be sensing a pattern
here.
We really want to get you used
to handling this kind of
content.
Now, let's go over the most
customizable capture graph, and
that's with
AVCaptureVideoDataOutput, and
AVAssetWriter.
So, you use this if you want to
modify the sample buffers in
some way.
So, you might be performing some
cool filtering operation.
With configuring AssetWriter for
HEVC, you have two options.
So, you can either configure
custom output settings where you
explicitly specify HEVC, or the
video data output can actually
recommend those settings for
you.
And we recommend this API.
In iOS 7 we added
recommendedVideoSettings
ForAssetWriter.
Now this always recommends
H.264.
So, if you want to stick with
that, that's fine.
However, in iOS 11 we've added
new API where you can actually
pass in the codec type and we
will give you recommended
settings for that codec type on
supported devices.
And with that let's move onto
the Export and transcode side of
things.
So you can transcode to HEVC
with AVFoundation and
VideoToolbox.
And we support MPEG-4 and
QuickTime file formats as the
destination.
And here API opt-in is required.
We support HEVC Encode on macOS
and iOS and now let's go over
where we support HEVC Hardware
Encode.
So, we have an 8-bit Hardware
Encoder on our A10 Fusion chip,
that's iPhone 7, and we support
8-bit Hardware Encode on our
macOS on our 6th generation
Intel Core processors, that's
the Skylake family, and that's
the MacBook Pro with Touch Bar.
And on macOS we have a special
10-bit non-realtime, high
quality software encoder that
you can use and we'll talk about
that in a little bit.
Now, let's start with the
highest-level export APIs, and
that's transcoding with
AVAssetExportSession.
So, with this, you give us an
asset, then you pick a preset
and we do all the operations for
you including compression and we
produce an output movie.
So, there's no change in
behavior for existing presets.
If you're using one of the
existing dimension-based
presets, and it used to give you
H.264, it will still do that.
We've added new presets here.
And those will convert from
H.264 or any other codec to
HEVC, and these will produce
smaller AVAssets, up to 40
percent in some cases, with the
same quality.
Now, let's move one level down
the stack, to compressing with
AVAssetWriter.
So, AVAssetWriter, you're either
generating the sample buffers
yourself, or getting them from
another one of our APIs like
VideoDataOutput or
AVAssetWriter.
And AVAssetWriter's responsible
for compression and file
writing.
Again, like I discussed
previously, there's two options
for AVAssetWriter.
You can either explicitly set
custom output settings, in this
case we're specifying use HEVC.
You can also specify your bit
rate and dimensions, or you can
use one of our convenience
settings in capture, you can use
the VideoDataOutput and for
general encode you can use the
AVOutputSettingsAssistant.
We've added two new presets here
that on supported devices will
return HEVC output settings.
Now, if you're in the business
of creating your own custom
output settings, it can be a
little tricky.
So, not all encoders support all
output settings.
We've fixed that problem in iOS
11 and macOS High Sierra so you
can now query the encoder for
supported properties to use in
your output settings.
To do that you pass in HEVC
here, and it will return the
encoder ID and a list of
supported properties.
The encoder ID is the unique
identifier for that specific
encoder, and with that the
properties and the encoder ID
can be specified in the output
settings and you can be sure
that it actually works for
compression.
Now, let's move to the lowest
level compression interface and
that's compressing samples with
VTCompressionSession.
So, just like with AssetWriter
you might be generating the
samples yourself or getting them
from another one of our APIs.
VTCompressionsSession compresses
them and produces our compressed
media data.
So, to create a compression
session with an HEVC encoder,
it's very simple.
In this case we're creating one
that's compressing to H.264.
Let's go ahead and convert it to
HEVC.
There we go, and now we're
compressing with HEVC with
VideoToolbox.
So, that was pretty easy.
Now, let's go over a couple of
considerations on macOS.
So, for optimal encoding
performance on macOS you want to
opt-in to hardware.
This will use hardware when
available and when it's not
fallback to software.
So, to do that, set the
EnableHardwareAccelerated
VideoEncoder property to true in
your encoderSpecification and
then pass it into
VTCompressionSessionCreate.
Now, if you're do realtime
encode, you'll want to often
require hardware and never
fallback to software.
So, to do that, you set in your
encoderSpecification
RequireHardwareAccelarated
VideoEncoder to true and then
pass it into your
encoderSpecification.
Again, on systems where hardware
supported, this will succeed,
but hardware on systems where
there's only software encode,
this will fail.
All right, now let's go onto a
couple advanced encoding topics.
And the first is bit depth.
So, if you've ever seen a nice
gradient in a user interface or
a nice sunrise or sunset, you
notice what it looks like in
real life versus what it looks
like in a movie isn't exactly
the same.
So, you might see these color
banning effects in the video
version of your, of your movie.
And that's because with 8-bits
we don't have enough precision
to represent the subtle
differences between colors.
Now, the great thing about
10-bit is we actually do.
So, you get these really
beautiful gradients.
Now, with our macOS software
encoder, we actually support
10-bit encode.
So, first check that the
property is supported, and if it
is go ahead and use our HEVC
Main10 profile for our software
encoder.
And we want to make sure your
entire pipeline is 10-bit.
We don't want you going from
8-bit to 10-bit and then back to
8-bit, because that loses
precision.
So, we've added new CoreVideo
pixel buffer formats to ensure
that you can stay in 10-bit.
One is listed here.
So, now for the first time you
can render in 10-bit, encode in
10-bit, decode in 10-bit, and
for the first time ever on iOS
and macOS our display pipeline
also supports 10-bit, so we get
it across everything.
[ Applause ]
Now, let's go over our second
advanced topic and that's
Hierarchical Encoding.
And so to understand a little
bit about this we need to go
over a little bit of video
encoding 101.
There's three major frame types
that compress video, and the
first is an I Frame.
You can think of I Frames like
an image file and they can be
decoded independently.
Then we have a P Frame, and P
Frames refer to previous frames,
so think of them like a 1-way
diff and they only contain
information that isn't in the
previous frame.
Now we have their cousin, the B
Frame.
B Frames refer to previous and
future frames and they're like a
fancy multidirectional diff.
So, they only contain
information that isn't in either
frame they're referencing.
Now let's pretend we have a
decoder that can only handle 30
frames a second, and let's say
we have content that is 240
frames a second.
Well that means we need to drop
some frames before we can
decode, because it can't keep
up.
So, when can we drop frames?
We can drop frames when another
frame doesn't depend on it.
So, in this case we can drop the
last P Frame, because it refers
to another frame, but no frames
refer to it.
So, let's go ahead and drop it.
We can also drop the B frame
because it refers to other
frames, but no frames refer to
it.
So, let's go ahead and drop it.
Now, let's move to a real-world
case of encoding 240 frames per
second content.
So, this is a typical encoding
scheme used when creating
content compatible with low end
devices.
So, for example, when encoding
240 frames a second content,
we'll have one non-droppable
frame for every seven droppable.
So, this gives us a lot of
flexibility during playback.
On devices that support 120
frames per second decode we can
handle that, on devices that
only support 30, we can also
playback there.
Now, let's throw in our frame
references.
Because these frames are
droppable, they can't refer to
each other and they all refer to
the non-droppable frame.
Now, those of you with
compression experience, are all
already seeing one problem, is
that compression suffers because
we can't refer to the nearby
frames.
So, they're all referring to the
non-droppable frame and a lot
might have changed between the
non-droppable and the droppable
frame.
All right, so that's problem
number one that we're going to
fix.
Now, let's step through and
decode down to 30 frames a
second.
So, first let's say we can't
handle 240 frames a second,
let's go ahead and drop some
frames.
So here we're dropping down to
240 frames a second, and let's
say we still can't keep up.
We need to go down to 60 frames
a second, let's say we have our
decoder that can only handle 30
frames a second, we can't even
handle 60 frames a second.
So, we go ahead and drop this
last frame.
Now, I was really guessing about
what frames to drop.
So, there's no indication at all
about whether I should drop
every other frame, or just the
first half, or just the second
half.
So, let's fix this problem too.
We can fix that with a concept
known as temporal levels, and
this allows us to organize
frames about which ones to drop
first.
So, let's go ahead and re-encode
our content.
And you can already see that
this is way more organized.
So, first we drop temporal level
three, and then two, and then
one, and there's no guessing
involved.
So, this really helps.
Now, let's throw in our frame
references.
And you can already see there's
a big difference here, is that
the reference frames are much
closer together and they're
often referring to frames that
are just before, or just
afterwards.
So, this really improves
compression.
Now, let's go through and let's
say we have our same decoder
that can only handle 30 frames a
second.
We need to drop some frames.
Well, there's no guessing
involved.
We dropped temporal level three.
Now we're down to 120 frames a
second.
Let's drop down to level one.
Now we're down to 60, and now we
have a level that our decoder
can actually handle.
So, this reduces guessing with
frame dropping.
Let's go over what we've
learned.
So, with HEVC hierarchical
encoding, we have improved
temporal scalability.
There's a much more obvious
frame dropping pattern and it
removes frame drop guessing
during playback.
We also have improved motion
compensation, the reference
frames are much closer to each
other, so we can use more parts
of other frames and it also
improves compression.
We're also using file
annotations and for those of you
who like to read specs, check
out MPEG-4, Part 15 section 8.4
and basically we're using sample
groups, so no bitstream is
parsing -- no, sorry.
We're using sample groups so no
bitstream parsing is necessary
to get at this information.
So, that really helps.
All right.
How do we opt-in to this?
So you want to opt-in to this if
you want to create compatible
high frame rate content and
there's two properties you
should set.
You set the base layer and
capture frame rate.
First check that they're
supported on the encoder you're
using, then set the
BaseLayerFrameRate, this is the
temporal level 0 frame rate, in
our previous example this was
the 30 frames a second, and then
set the ExpectedFrameRate, in
our previous example this was
240 frames a second.
The base layer must be decoded,
and we can decode or drop other
levels.
So, now that you're all experts
in hierarchical encoding, let's
move it over to Brad for the
image side of things.
Thank you.
[ Applause ]
>> Thanks, Erik.
I'm Brad Ford from the camera
software team, and I get to talk
to you about the other
four-letter acronym that begins
with HE, Here's the agenda for
the rest of the session.
First, we're going to cover what
is HEIF at high level.
We'll start at the very lowest
level when we talk about reading
and writing files with HEIF.
Then we'll go up to the top of
the stack and talk about how to
use general use cases and common
scenarios with HEIF, and we'll
end with a topic that's most
dear, near and dear to me, which
is capturing HEIF.
So, first off, what is HEIF?
HEIF is the High Efficiency
Image File Format.
The second F is implied and
silent.
You don't need to call it HEIF
[extra F sound].
You'll just embarrass yourself
in front of your compressionist
friends if you do that.
It's a modern container format
for still images, and image
sequences.
It's part of the MPEG H Part 12
specification, and by way of
curiosity it was proposed in
2013 and it was ratified in
summer of 2015, just 1.5 years
later.
If any of you know anything
about standards organizations, a
year and a half is kind of like
two days in real people time.
So, you know it must be an
awesome spec. The technical
detail I'm sure your most
interested in and the reasons
that you came today is how to
pronounce it.
So-- .
[ Laughter and Applause ]
I use the scientific method, I
pulled all the engineers on my
floor and the voting was largely
along party lines.
The German speaker said "hife",
the French said "eff", and the
Russian said "heef".
And "heef" was the runaway
winner though.
That's "heef" as in I can't
belief how big, or how small the
files are.
Now, my Finnish office-mate was
quick to point out that Nokia
researchers were the ones that
came up with the spec, so the
Finnish pronunciation should
win, that would be the 1 percent
"hafe".
Well, as for me and my floor
we're going to call it "heef".
It can use HEVC intra-encoding,
which unsurprisingly compresses
much better than the 20-year-old
JPEG, two times as well as a
matter of fact.
That's an average of two times
smaller, not up to two times
smaller.
We used qualitative analysis on
a large data set of images to
arrive at this number, ensuring
visually equal quality to JPEG.
It supports chopping up an image
and compressing it individual
tiles.
This allows for more efficient
decompression of large images in
sections.
HEIF also has first class
support for auxiliary images,
such as alpha, disparity, or
depth maps.
Here's a gray scale
visualization of the depth map
that's embedded in this HEIF
file.
Having depth information opens
up a world of possibilities for
image editing, such as applying
different effects to the
background and foreground like
this.
Here I've applied the Noir black
and white filter to background,
and the fade filter to the
foreground.
So, notice that the little
girl's tights are still in pink,
while everything behind is in
Noir.
Knowing the gradations of depth,
I can even move the switch-over
point of the filters like this,
keep an eye on her flower.
Now, just her hand and the
flower are in color, while
everything else is black and
white.
You can even control foreground
and background lighting
separately, exposure, such as
this.
Now, she looks like you
Photoshopped her into her very
own photo.
I'm not saying you should do it,
I'm saying you could do it.
That was just a teaser for a
two-part session that we had on
depth, and that's sessions 507
and 508.
I hope you'll make some time to
look at those videos.
When it comes to metadata, HEIF
has a great compatibility story.
It supports industry standard
Exif and xmp as first-class
citizens.
HEIF isn't just for single
images, it also supports image
sequences such as bursts,
exposure brackets, focus stacks.
It also has affordances for
mixed media, such as audio and
video tracks.
Let's do a demo, shall we?
Okay, this is a showcase that
takes place in Apple's very own
Photos app.
All right, I'm going to start
with a pano and this is a nice
looking pano, this one is from
Pothole Dome in Yosemite.
It looks great, it's sort of
what you'd expect from a pano
until you start zooming in.
So, let's do that.
Zoom in a bit.
Looks nice, let's zoom in a
little more.
And then zoom in a little more.
And zoom in a little more.
And keep zooming.
And keep zooming, oh my gosh I
can see what the speed limit is,
and wow.
[ Applause ]
There are cars there, and there
are Porta Potties.
I can even go and take a look at
the peaks in the background.
Notice how it snaps into clarity
as I go.
This is actually a 2.9 gigapixel
pano.
It's 91,000 pixels by about
32,000 pixels.
The RGB TIFF file for this is
well over 2 gigabytes and I
assured it brings any fast Mac
to its knees, whereas the HEIF
file is 160 megabytes, you
literally cannot do this with
JPEG, since JPEG maxes out at
64k by 64k pixels.
HEIF does not max out.
It supports arbitrarily large
files and it keeps the memory in
check by efficiently loading and
unloading tiles.
So, while I have this enormous
data sitting in front of me, I'm
never using more than 70
megabytes of memory at a time in
the Photos app.
So, it's responsive and I can
zoom in and zoom out.
I could do this all day long,
but I should probably go back to
slides.
[ Applause ]
On all iOS 11 and macOS 10.13
supported hardware, we read and
decode three different flavors
of HEIF.
The three different extensions
you see here relate to how the
main image in the file is
encoded.
For HEIC, .HEIC also the UTI of
public.heic that refers to HEIF
files in which the main image is
compressed with HEVC.
The second flavor is AVCI, in
which the main images is
compressed with H.264, and then
the .HEIF extension is reserved
for anything else, could be JPEG
inside, could be any of the
supported codecs.
We only support one form of HEIF
for encode and writing, and
that's the HEIC format, in other
words the ones in which you use
HEVC.
We figure if you've gone far
enough to adopt the new file
container, you might as well
adopt the greatest compression
standard as well.
Support is currently limited to
iOS 11 devices with the A10
Fusion chip.
All right, let's go over to
low-level access to HEIF.
The lowest level interface on
our platform for reading and
writing images is ImageIO.
It encapsulates reading from
either a file or in-memory data
source using an object called
CGImageSource.
It also supports writing to
files or to immutable data using
CGImageDestination.
These objects have been around
for a long time.
You've probably used them.
To open a JPEG image file on
disk, this is how you would do
it using ImageIO.
First you create the URL, then
you call
CGImageSourceCreateWithURL to
create your source.
The last argument is an options
dictionary where you can
optionally pass the UTI of the
input.
It's not needed when you're
opening a file on disk, because
the UTI can be inferred from the
file path extension.
Once you've got a CGImageSource,
you can do several things with
it, such as copy the properties
at any index, that's getting
metadata out of it such as Exif.
You can also create a CGImage
from any of the images in the
file.
For JPEG there's typically only
one image in the file.
CGImage is of course like a
promise, a rendering promise.
The JPEG data can be lazily
decoded when necessary using
CGImage such as when you're
rendering it to a CG bitmap
context.
You can also get a thumbnail
image using a variety of
options.
For instance, the maximum size
that you would like, what to do
if there's none available in the
file, and when you call
CGImageSource
CreateThumbnailAtIndex it does
decode right away.
Now, here's the analogous code
for opening a .HEIC file.
Can anyone spot the differences?
Here, I'll make it easy for you.
That's it.
It's a comment and it's a file
path, that's it.
In other words, CGImageSource
just works.
The one difference you don't see
is how the HEVC is being
decoded.
On recent iOS devices and Macs
the decode is hardware
accelerated, whereas on older
devices it's done in software
and with thus be slower.
A quick word on the tiling
support that we just saw in the
demo, CGImageSource can provide
a dictionary of properties about
the image by calling
CGImageSourceCopy
PropertiesAtIndex and the
properties dictionary is a
synonym for metadata, Exif,
Apple Maker Note, et cetera.
There's also a subdictionary
called the TIFF subdictionary,
in which you'll find the size of
the encoded tiles as the tile
length and tile width.
By default they are encoded as
512 by 512 pixels.
CGImageSource provides you with
CG Images as we saw, and CGImage
has a nifty method called
cropping(to: that takes
advantage of the tiling.
This call creates a new CGImage
containing just a subsection of
another image.
This isn't a new API, but it
works really well with HEIF
where the tiles are encoded
individually.
You don't need to worry about
the underlying encoded tile
size, you can simply ask for the
subregion that you want to
display or render, and know that
under the hood you're getting
all of the tile-y goodness.
It's only decoding the tiles
that are necessary for that
subregion.
Now, let's talk about the
writing side.
Here's how you write a JPEG with
ImageIO.
You, after creating a
CGImageDestination calling
CGImageDestinationCreateWithURL,
where I should point out you do
need to specify what the UTI is.
Here I'm using AVFileType.jpg
which is the same as the UT type
public.jpg.
I'm being careful with the
result, I'm using guard let just
in case destination is nil.
Now, in the, with the current
JPEG, the only reason it would
be nil is if you asked to write
to a file that's outside your
sandbox, but to be defensive you
should really write code in this
manner.
Next, you add your CG image or
images, one at a time with
accompanying metadata if you
would like.
And then when you're done, you
call CGImageDestinationFinalize
which closes the container for
editing and then writes it to
disc.
Now, let's look at the HEIC
writing.
Again, differences are very
small.
Just the file path extension,
the UTI, the comment.
One important difference here
though between JPEG and HEIF is
that creating a
CGImageDestination will fail on
devices with no HEVC hardware
encoder.
And when it fails, destination
is nil.
So, the good defensive code that
I wrote on the previous slide,
is even more important to do
with HEVC where there is now a
new reason that the destination
might be nil.
Please always make sure that you
check this is the one and only
way to know whether writing to
HEIC is supported on your
current platform.
Also worth noting is that
ImageIO has added support for
reading and writing depth maps
as I talked about earlier.
We've done that for both HEIC
and we manipulated JPEG in
strange sorcery ways that we
probably shouldn't talk about,
I'm not going to delve deeply
into that though because it's
covered in the dedicated session
507 and 508 where we talk about
depth.
And I hope you'll go look at
those session because they're
many segues to the auxiliary
image format in HEIF.
All right, it's time to move on
to our next major topic which is
high level access to HEIF.
But before we do that, I feel
that WWDC should be a cultural
experience, culturally
enriching, not just an
educational one.
And that's why I want you to
rest your brains for a moment
with some compression poetry.
All right.
Wait for it.
JPEG is yay big, but HEIF is
brief.
[laughter] Thank you.
[ Applause ]
See it's compression poetry, so
it's small.
Did you like that?
Do you want to hear some more?
Okay, let's do another one.
Here's a compression haiku.
HEVC has twice as many syllables
as JPEG progress.
Thank you.
All right let's move on.
[applause] I'm sure they'll edit
that out later.
Okay, we're going to talk about
HEIF and PhotoKit.
PhotoKit is actually two
frameworks, it's Photos
framework and PhotosUI and it's
very high level, it's even above
UIKit.
The way that you work with HEIF
in PhotoKit when applying
adjustments we're going to cover
just briefly and we're going to
talk about how you apply
adjustments in three different
scenarios, photos, videos, and
Live Photos.
And then we'll talk about common
workflows that you would use
with PHPhotoLibrary.
Let's briefly outline the steps
involved in applying an edit or
an adjustment to an asset using
PhotoLibrary.
You ask the PHPhotoLibrary to
performChanges and in that
change request you start with a
PHAsset that you want to edit,
such as a photo.
And you call request content
editing input on the asset to
get a PHContentEditingInput.
This is the guy that gives you
access to all the media
associated with your asset such
as a UIImage, a URL, an AVAsset,
or a Live Photo.
Next you create a
PHContentEditingOutput by
calling in it with content
editing input.
The editing output tells you
where to place all of your
rendered files on disc by
providing you with a
renderContentURL.
You then perform your edits to
the media that's provided you
from the editing input, and then
you write them to the specified
location.
Finally, the PHPhotoLibrary
validates your changes and
accepts them as a whole or
rejects the change.
So, the rules with respect to
renderedOutputImages are
unchanged, but you may not been
aware that they were in force.
In iOS 10 your output images
must be rendered as JPEG with an
Exif orientation of 1, that is
if there's any rotation to be
done, it is baked into the image
in the outputRendered file.
You may have overlooked this
detail since probably 99 percent
of the content that you are
editing was provided as JPEG and
then you just outputted it to
the same format.
But now you will see a
proliferation of input content
that is HEIC, so you should be
well aware that you must still
render all of your output
content to JPEG with Exif
orientation 1.
Here's the code for it.
Make, first you make a CIImage,
this would be one way of doing
it.
You could make a CIImage from
the content editing inputs file
URL, and then apply your edits.
Here I'm doing both an
application of a filter and
baking in the orientation.
And then when I'm done, I call
ciContext's handy dandy
writeJPEGRepresentation, which
if you've used this boilerplate
code in the past, it still works
correctly because it's
outputting to a JPEG regardless
of what the input was.
Our second applying adjustments
use case relates to videos, and
the rule again same as iOS 10 is
that no matter what the format
of your input movie content, you
must produce a movie compressed
with H.264 as your output.
Yes, even if the source movie is
HEVC, you still need to render
to H.264 for output.
Here's some boilerplate code to
edit video content that looks
like this.
First you get an AVAsset from
the PHContentEditingInput, then
you can create an
AVVideoComposition in which you
are handed each frame one at a
time and you can get them as
CIImages and then request an
object that has a mouthful of a
name, AVAsynchronous
CoreImageFilteringRequest.
You get a CIImage and then you
produce a CIImage, when you're
done rendering it you call
request.finish and then as a
final step, you export your
AVAsset to a file on disc at the
URL told to you by the
PHContentEditingOutput.
Now here's the important part.
The preset to use is
AVAssetExportPreset
HighestQuality or any of the
existing ones as Erik said,
still compressed to H.264.
Don't use the similarly named
new ones which have HEVC in the
name because you're change
request will fail with an error.
Finally, applying adjustments
using Live Photos, the video
content of Live Photos.
What I'm talking about here is
the moving aspect of a picture
when you either swipe between
photos that were Live Photos or
when you Force Touch on a
picture or swipe between
pictures.
This is the simplest use case as
you never get to deal directly
with the input or output files.
You're passed CIImages and you
produce CIImages.
The encoding is done on your
behalf.
There's a lot of good code to
look at here, but I'm not going
to spend a lot of time on it.
You can pause the video later
and take a good long look at it.
The one take home point is that
after you've filtered each
frame, in a Live Photo movie you
can tell the Live Photo content
to save your Live Photo to a
given URL and that's it.
The Live Photos will be saved
out using H.264 on your behalf
just as the stills will be
encoded as JPEG.
Okay, let's move over to the
common workflows with PhotoKit.
When displaying content from
your photo library, you use an
object called the PHImageManager
and this provides you with one
of three things.
You could get a UIImage if it's
an image, a PlayerItem if it's a
video, or a PHLivePhoto if it's
Live Photo content.
Here you don't need to make any
changes because all of these are
high level abstractions in which
you don't care where the sources
came from, all you're doing is
displaying them.
No code changes needed here.
The next is backup.
When using PhotoKit for backup
purposes, you probably want to
access the raw assets such as
the HEIC files and the QuickTime
movies.
And you do that using
PHAssetResourceManager.
It will give them to you in the
native format.
The only thing to be aware of
here is that you might get
different file types coming than
you're used to, so make sure
that you're ready for it.
The third and most complicated
case is sharing.
Here you're sort of leaving
Apple's nice walled garden.
You have to think about your
compatibility requirements.
Are native assets okay?
You might be doing your clients
a favor or you might be doing
them a disservice by giving them
HEIC content depending on
whether they're ready for it.
So, here you must weigh
compatibility versus the
features that HEIC affords.
If you do choose compatibility
over features, you can ensure
format compatibility by
specifying the output format
explicitly.
For images, you can just check
the UTType that you get, and see
that it conforms to say JPEG,
and if it doesn't, explicitly
convert it.
With videos, you can always
force compatibility by
requesting an export session
with a preset that you know will
deliver H.264 such as
PresetHighestQuality.
All right, onto our last topic
of the day, capturing HEIF.
Finally, one that I know what
I'm talking about.
But let's do compression haiku
number two, please would you let
me?
It's fun for me.
Here we go.
HEIF a container, compresses
four times better than HEVC.
Think about that.
Okay, so, why are we wasting our
lives saying HEVC, it's supposed
to be a good codec right?
Why aren't we calling it
"hevick".
All right.
So, Erik mentioned that
AVCapturePhotoOutput added
support for Live Photo movies
encoded with HEVC.
This class was introduced last
year as the successor to
AVCaptureStillImageOutput.
It excels at handling complex
still image capture requests
where you need multiple assets
delivered over time.
It is currently the only way on
our platform to capture Live
Photos, Bayer RAW images, Apple
P3 Wide Color Images, and new in
iOS 11 it is the only interface
on our platform for capturing
HEIF content.
HEIF capture is supported on the
A10 chip devices which are
iPhone 7 Plus, iPhone 7, and the
newly announced iPad Pros.
We'll do a brief refresher on
how to request and receive
images with the photo output.
First, you fill out an object
called an AVCapturePhotoSettings
this is sort of like a request
object where you specify the
features that you want in your
photo capture.
Here it's the orange box.
Here, I've indicated that I want
auto flash, meaning photo output
only use the flash if it's
necessary, only if the light is
low enough to warrant it.
I've also asked for a preview
sized image to accompany the
full-sized image so that I can
have a quick preview to put on
screen.
I don't know exactly what the
final aspect ratio of it will be
so I just ask for a box that's
1440 by 1440.
I then pass this settings object
with a delegate that I provide
to the photo output to start or
kick off a capture request.
Now the arrow on top shows when
the request was made, and now
I'm sort of tracking this
package delivery, PhotoOutput
calls my delegate back with one
method call at a time.
Very soon after I make the
request the PhotoOutput calls
the delegates first callback
which is willBeginCaptureFor
resolved settings and it passes
you this blue box which is a
ResolvedPhotoSettings.
This is sort of like the
courtesy email that you get
saying we've received your
order, here's what we'll be
sending you.
And this ResolvedPhotoSetting
sort of clears up any ambiguity
that you had in your settings
that you provided at the
beginning.
In this case, we can now see
that flash is not auto, it's
true or false.
So, it's become true, we know
that the flash is going to fire.
Also, we now know what the final
preview image resolution is
going to be.
Finally, after we get the
willBeginCaptureFor, we, our
second call back that we receive
is willCapturePhotoFor
ResolvedSettings.
This is delivered coincident
with the shutter sound being
played.
And then shortly thereafter
comes didCapturePhotoFor
ResolvedSettings just after the
image has been fully exposed and
read out.
Then some time typically passes
while the image or images are
processed, applying all the
features that you asked for.
When the photo is ready, you
receive the
didFinishProcessingPhoto sample
buffer call back and the image
or images are delivered to you.
Here I got the main image, and
the preview image.
They're delivered together in
the same call back.
Finally, you always always
always get the
DidFinishCaptureFor
ResolvedSettings callback.
And that is guaranteed to be
delivered last.
It's the PhotoOutput's way of
saying we're done with this
transaction, pleasure doing
business with you, you can clean
up your delegate now.
This programming model has
proved to be very flexible.
We've had a lot of success with
it because we've been able to
add new methods to the delegate
as needed when we add new
features.
For instance, we added support
for RAW images.
There's a call back for that.
We added support for Live
Photos, there's a separate call
back for that, for getting the
movie.
So, it would seem like HEIF
would be an easy addition to
this very flexible programming
paradigm.
Unfortunately, it's not.
The incompatibility lies in the
CoreMedia SampleBuffer which is
and has been the coin of the
realm in AVFoundation for many
many years.
We have used it for still images
since iOS 4.
It's a thin container for media
data such as video samples,
audio samples, text, closed
captions.
HEIF on the other hand is a file
format, not a media format.
It can hold many media types.
Also, CMSampleBuffers can of
course carry HEVC compressed
video, but that HEV compressed,
HEVC compressed video doesn't
look like the HEIF containerized
HEVC.
Remember, HEIF likes to chop
things up into individual tiles
for quick decode.
You can't store that kind of
HEVC compression in a frame in a
QuickTime movie, it would just
confuse the decoder.
So, at this point, you might be
asking yourself, if we have this
fundamental tension between file
container and media container,
how would we be able to use
CMSampleBuffer for so many years
with photo output and still
image output?
Well the answer is JPEG.
We got away with this because of
the happy coincidence that JPEG,
the image codec, and JFIF the
file format are virtually
indistinguishable from one
another.
Both are acceptable as images,
in another container such as a
QuickTime movie.
So, the answer to our quandary
is to come up with a new purpose
built in-memory wrapper for
image results and we call that
the AVCapturePhoto.
It's our drop-in replacement for
CMSampleBuffer.
It is in fact faster than
CMSampleBuffer because we are
able to optimize delivery of it
across the process boundary from
the media server, so you get
even better performance than you
did in iOS 10.
It's 100 percent immutable like
the, unlike the CMSampleBuffer
so that it's easier to share
between code modules.
It's also backed by
containerized data.
I'm going to talk more about
that in a minute.
Let's talk about some of its
attributes.
It has access to critical
information about the photo such
as the time at which it was
captured, whether or not it's a
RAW, Bayer RAW photo, and for
uncompressed or RAW photos, you
get access to the pixel buffer
data.
Also, side band information
travels with the AVCapturePhoto
too, such as the second smaller
preview image that you can ask
for.
You can also now request a third
image that's even smaller to be
in embedded as a thumbnail in
the container.
An ImageIO property style
metadata dictionary is provided
that can contain Exif, or other
metadata that you've come to
expect.
And with the iPhone 7 Plus dual
camera, you can request that a
depth data map be delivered with
the AVCapturePhoto results as
well.
AVCapturePhoto also provides a
number of convenience accessors
such as a reference to the
resolvedSettings object that we
saw in previous slides.
Also, it gives you easy access
to bookkeeping about the photos.
For instance, if you've fired
off a request for a RAW plus
HEIC, you would expect to get
two photos.
So, the photo count accessor
will tell is this photo one or
photo two?
If this photo is part of a
bracketed capture, such as an
auto exposure bracket of three
or four different EV values, it
can tell you which bracket
settings were applied to this
particular result as well as its
sequence number and whether lens
stabilization was engaged.
AVCapturePhoto also supports
conversions to different
formats, so it's friendly and
able to move to other frameworks
that you would use with image
processing.
First and foremost, it supports
conversions to data
representations if you just want
to write to file.
And it can produce a CGImage of
either the full-size preview or
the -- sorry the full-size photo
or the preview photo.
Now the mechanism for opting in
to get an AVCapturePhoto instead
of a CMSampleBuffer is just that
you need to implement one new
delegate method in your
AVCapture PhotoCapture delegate,
and that's this one here.
It's very simple it just has
three parameters.
It gives you the AVCapturePhoto
and optionally an error.
Now, error or not, you always
get an AVCapturePhoto with as
much information about it as
possible, even if there's no
backing pixel data.
The following two really lengthy
delegate methods have been
deprecated to help steer you
towards the new and better.
We used to have separate call
backs for getting the RAW or the
uncompressed or compressed,
didFinishProcessingPhoto which
would give you a CMSampleBuffer
or didFinishProcessingRawPhoto
which would give you a
SampleBuffer.
You needn't, you needn't use
these anymore.
You can just use the new single
which subsumes both of them into
one.
All right, in iOS 10 we
supported the following formats.
For compression, all you could
get was JPEG.
For uncompressed you had your
choice of two flavors of 420 or
BGRA, and of course we supported
Bayer RAW.
Now, in iOS 11, in addition to
adding HEVC support, we're
adding a new dimension to this
as well.
Every image format that you,
that you request is also backed
by a file container format.
In other words, implicitly,
every image that you capture is
being containerized.
For HEVC the implicit container
is HEIC, for JPEG it's JFIF, for
the uncompressed formats it's
TIFF, and for RAW formats as
before it's DNG.
Now, why would file
containerization be a good
thing?
The answer is performance.
Let me explain using a case
study.
So, here's the old way you would
get a JPEG and write it to disc.
PhotoOutput would deliver you a
SampleBuffer with a full-sized
image and a preview image and it
would attach some metadata to it
such as Exif.
If you wanted to mutate that in
any way, you would have to wait
until it delivered the call back
and then you would get the
attachment that had the Exif,
manipulate it, and re-add it to
the SampleBuffer.
Then when it came time for
writing it to disc, you would
call PhotoOutput's
JPEGDataPhotoRepresentation and
pass it the two buffers.
Outcomes a JPEG data, ready to
write to disc.
While in code it looks simple, a
lot is happening under the hood.
Because we conflated preview
image with embedded thumbnail
image, we had to take something
that was sized for the screen
and scale it down, compress it
to JPEG, incorporate all of your
Exif changes, and rewrite the
full-size image.
So, a lot of scaling and
compression done just because
you wanted to include a
thumbnail with your image and
manipulate a little bit of
metadata.
Not efficient at all.
Now in the new way,
AVCapturePhoto lets you specify
up front what you want in the
container.
If it has enough information to
prepare the file container right
the first time, then it's done
before you ever get the first
call back.
The way you do this is you fill
out some extra features in the
AvCapturePhotoSettings.
This time you can specify in
advance the codec that you want,
and optionally the file type.
You specify metadata that you
would like to add such as GPS
location, you can now do this
before you've even issued the
request.
You can also tell it I would
like an embedded thumbnail and I
would like it using these, these
dimensions.
You then submit your request to
the AVCapturePhotoOutput and
eventually it gives your
delegate an AVCapturePhoto as
its result.
This AVCapturePhoto is backed by
something that's already in a
HEIC container.
It's already been compressed in
tiles.
It's already embedded that
thumbnail image that you asked
it to.
It's already put the metadata in
the correct place.
So, the final call that you
would do to write it to disc
photo.fileDataRepresentation is
much simpler than in the
previous example.
All it's doing is a simple byte
copy to NSData of the backing
store.
No additional compression, or
scaling, or anything.
It's all done in advance.
This is much more efficient and
especially when we're dealing
with HEIF, it's necessary to get
all of the performance of that
great tiling format that I
talked about earlier.
Now, let's switch over to a few
performance considerations with
HEVC and HEIF.
The first is what to do about
photos that are taken during
still capture.
When you take a HEIC photo while
capturing a movie, you should be
aware that the same hardware
block that's compressing video,
that is the one that does H.264
or HEVC compression, is also
being asked to do double duty if
you want to encode a HEIC file
where HEVC is the compression
format.
That hardware block may be very
busy indeed if you are capturing
high-bandwidth video such as 4k
30 or 1080p 60.
Video is on a real-time
deadline, so it gets priority
over stills.
This means that it may take
longer to get your still results
back and it also may mean that
there are up to 20 percent
larger than they would be
otherwise because the encoder is
too busy to use all of the
features that it would if it
didn't have to meet that
real-time deadline for 30 or 60
frames a second.
So, our recommendation is if
you're capturing video, and
taking stills at the same time,
you should use JPEG for the
photos to leave the encoder for
HEVC as available as possible
for the video.
Another concern is HEVC and HEIF
bursts.
This is where you mash on the
button and you're trying to get
a constant frame rate maybe 10
frames a second of capture
images.
HEVC encode obviously is doing a
lot more work than JPEG did,
it's delivering a file that's
less half the size of JPEG.
Therefore, HEVC encode does take
longer.
Now we've benchmarked and we're
comfortable that HEVC HEIF can
meet the 10 fps minimum
requirement for bursts, but if
you need to capture at a higher
frame rate than that, our
recommendation is to go back to
JPEG for bursts.
And we've heard a lot about
compression today and I feel I
would remiss if I didn't give
you my thoughts on WWDC.
It is after all a compression
talk.
So, I can't just leave this
dangling there.
World Wide Developer Conference,
nine syllables.
W-W-D-C, eight syllables.
That is like the worst
compression format ever.
It's lossy, it's like .1 to 1
compression ratio, which is even
worse than lossless JPEG.
So, please as a service to me,
for the rest of the conference,
which you please only refer to
conference as WWDC or
Wuh-Duck.
[laughter] All right.
Let's summarize what we learned
today.
HEVC movies are up to 40 percent
smaller for general content than
H.264 and for camera content on
iOS they are 2x smaller.
Also, HEVC playback is supported
everywhere on iOS 11 and High
Sierra, sometimes with software
sometimes with hardware.
And to create HEVC content you
need to opt in to new capture
APIs or new export APIs.
Also, we learned about HEIC
files that they are twice as
small as JPEGs and that decode
is supported everywhere on iOS
11 and macOS where capture is
supported iOS only and where we
have an A10 chip, and you do
that using the new
AVCapturePhoto interface.
For more information, here is
the URL for today's session.
I also wanted to point you to
some sister sessions to this
one, the first one in the list
High Efficiency Image File
Format is one that went straight
to video.
This is where we really delve
deeply into the bits in the HEIF
file.
It's a great, great
presentation.
You should definitely listen to
it.
It's performed by Davide so you
get the nice Italian accent
going at the same time.
Also, the Introducing HEIF and
HEVC which was one Tuesday,
which gave a higher-level
introduction to what we talked
about today.
And finally, the depth sessions
that I've made several
references to, they have several
segues to the auxiliary image
format that we use to store
depth in HEIF.
Thank you and enjoy the rest of
the show.
[ Applause ]