WWDC2017 Session 513

Transcript

>> Hi and welcome to session
513.
In this talk you will learn
about the lower level details of
the new High Efficiency Image
File format or HEIF and the many
advantages that this new file
format standard affords.
My name is Davide Concion and I
manage the Image Compression
Team at Apple.
During the talk, we will briefly
touch upon the current de facto
standard for image compression,
a standard that everybody's
familiar with JPEG.
We will go through the
requirements that Apple
identified as mandatory for a
new image format.
We will explain why we think
HEIF is the answer to those
requirements and we will get to
know some of the flexible tools
that HEIF implements.
We will then present the reasons
why Apple thinks HEVC is the
right codec to be used within
the HEIF file format.
Let's start with JPEG.
JPEG is still the most popular
compression technology for
images only present on the web
and on consumer electronics
devices, such as the DSLR
cameras, point-and-shoot cameras
and cell phones.
Cloud services also use JPEG
because of its universal
support.
JPEG though has several
limitations, among those are the
compression efficiency.
Several new compression
algorithms have been developed
in recent years that can shrink
the file size much more than
JPEG and still maintaining the
same objective and subjective
quality.
Auxiliary images like alpha or
depth are not easily supported.
Also, in recent years new ways
to present and display animated
images have been developed.
Apple Live Photo is one of them.
JPEG unfortunately does not
support animation.
Let's look at history map of
compression standards developed
by JPEG and ITU/MPEG.
JPEG is really starting to show
in its years, especially in
terms of compression efficiency
when compared to recent
advancements.
As you can see in the slide,
JPEG has been finalized as a
standard in 1992, a quarter of a
century ago.
Since then, several new
compression standards have been
developed.
The latest one in the least is
HEVC.
And here is HEIF for comparison
in the timeline, which has been
finalized in 2015.
Apple invested a lot of time to
find a successor for JPEG and
many options were evaluated.
The requirements were extensive.
The new format needs to support
all the features available in
JPEG, but at the same time
provide better performance.
It needs to be friendly to
professional photography tools,
the web and the cloud.
The new format also needs to be
flexible and extensible to cope
with the ever-changing
photography ecosystem.
Here is a list of features Apple
considered paramount.
The compression needs to be
state-of-the-art both on the
[inaudible] front.
It needs to be competitive with
natural images, but also when
compressing text or graphics.
The format needs to be friendly
to hardware accelerated and code
and decode operations on modern
CPUs, GPUs and ESPs.
Performance and power is very
high in the list of
requirements.
It needs to support high depth
and wide color gamut which is
the new frontier for images
captured on consumer devices.
It needs to be able to compress
4:4:4 color sampling and also
describe HDR content, including
HDR metadata, transfer function
and color space definitions.
Auxiliary images for example
alpha or depth need to have a
commonly defined place in image
files.
New editing tools will be able
to utilize auxiliary data for
new presentation and editing
experiences.
In recent years, the new ways to
present and display animated
images have been developed.
Apple iPhoto is one example.
Apple iPhoto includes animated
content together with static
images.
The new common format needs to
store animated information
efficiently, ideally using
Tempra compression techniques
and be able to instruct players
about the presentation intention
for example, a looping sequence.
The new format needs also to
support multiple images in the
same file.
For example, multi-exposure
stacks or stereo images.
This is to aid the development
and implementation of new
computational photography
algorithms.
Multiple representations of the
same image matching the same
file are of great importance.
For example, multi-resolution,
including progressively
increasing level of details or
the ability to represent the
same image encoded with
different codecs.
Tiles are an important tool the
new format must implement.
It allows for scalable operation
on image of any size.
We'll be looking into tiles
later in the talk.
The new format needs support for
rich metadata associated to each
image in the file.
And also support for time meta
for example, a sequence of
images.
There is also desire for a new
format to be able to include
other metadata types for
example, audio or text.
Last but not least, the new
format should be flexible and
extensible enough to provide a
solid foundation for the future.
We believe that HEIF is the
answer to all these
requirements.
What is HEIF?
HEIF stands for High Efficiency
Image File Format.
Version one of the spec became
an ISO standard in June of 2015.
Version two should be released
imminently.
A C reference model for HEIF is
available upon request at this
link.
The reference model is meant to
provide guidance for HEIF
implementation and to understand
the specifications.
As a side note, the open source
project GPAC MP4Box has recently
added basic functionality to
part C files.
The vodaworld long time ago
learned that containers and
codes are different entities and
there are several advantages in
keeping them separated.
But historically in the image
world containers and codecs are
tied together and JPEG is no
exception.
It makes sense to make the
distinction in the image world
to get the flexibility to
[inaudible].
HEIF does exactly that, it
specifies a structural format, a
container for individual images,
as well as image sequences.
It is built on top of the widely
used ISO based media file format
which is based on Apple
QuickTime technologies.
It also uses and enhances
structures defined in the MP4
specification and MPEG-21
specifications.
Sequences for example, bursts or
animations are stored as tracks
or time media MP4 style.
Images coded or derived are
stored as items MPEG-21 style.
Any compression codec can be
included in a HEIF container.
The HEIF specification
explicitly mentions HEVC, H264
and JPEG in terms of file
extensions, [inaudible] types
and decoder configuration.
The basic building block of a
HEIF file like the ISO based
media file format is a data
structure called box.
A box is comprised of a
four-character type, for
instance in the example on the
right the ftyp box or the
metabox or the mdat box.
The size of the box in terms of
bytes and the payload of the
box.
The metabox gives a full
description of what is included
in the file.
The handler type of the metabox
for whomever is familiar with
ISOBMFF specification is of type
PICT indicating to a reader that
this metabox handles images.
Before going into the anatomy of
a HEIF file a note on file
extensions.
The standard defines explicitly
the file extension of a HEIF
file depending on the particular
codec being used to compress
single images or sequences.
The list of extensions can be
found in the table above.
iOS 11 can capture and store
HEIF images using the HEVC
codec.
Therefore, the extension you
will be encountering is .HEIC.
In iOS 11 and macOS 10.13 we
support all three single image
HEIF flavors for decoding and
displaying.
Note also that a HEIF file that
includes sequences will have a
different extension than a HEIF
that contains only single
images.
We will now dive into the HEIF
format and its anatomy.
Let's start with the concept of
item.
Every element in a HEIF file is
an item.
There can be coded items for
instance, HEVC encoded frame or
tiles.
There can be derived items for
instance, an image overlay or an
image grid.
There can be metadata items for
instance, EXIF, XMP or MPEG-7
metadata.
Each item can also come with
several properties associated to
it.
Everything is then connected via
structures that link certain
items to other items or
properties.
Images are items and because
multiple images can be stored in
the same file the HEIF standard
differentiates between them by
assigning certain roles.
Some of the roles specified in
HEIF are listed in the table
above.
The primary recovery image is
the representative image of a
file.
The primary image should be
displayed when no other
information is possible or
decodable by a player.
Only one primary image can be
present in a HEIF file.
Other full-resolution images in
HEIF files are called master
images.
The thumbnail is a small
resolution representation of a
master image.
Multiple thumbnails can be
stored in a HEIF for example,
with different sizes.
It's a very useful feature for
progressive decoding and
displaying very high-resolution
images.
The auxiliary image is an image
that complements a master image.
For example, an alpha plane or a
depth map.
Auxiliary images can assist in
displaying master images, but
are not typically displayed.
A hidden image is an image that
should never be displayed.
It can be present in the file
for example, as an input image
of a derived image.
iOS 11 HEIF implementation uses
extensively hidden images which
are called tiles.
Each tile is used to compose the
final master or canvas image.
Now derived manager is an image
that is rendered by an indicated
operation being performed on
other input images.
For instance, the canvas image
described before is rendered
after stitching together
multiple tiles.
Equivalent images are
alternative images for instance,
encoded with a different codec.
A server could distribute the
same input content to players
that may have different decoding
capabilities.
Once the role has been defined
for each image properties can be
associated to them.
Properties are either
descriptive or transformative.
They can also be essential for
example, the codec
initialization info or
nonessential.
The table above provides a
non-exhaustive list of
descriptive properties for
images inside a HEIF file.
All the usual suspect
information can be found in
there like the image size, the
color information, the type of
auxiliary image which can be
alpha or depth and also the
configuration parameters to
initialize the decoder.
The table above provides a
non-exhaustive list of
transformative properties.
The presence of these properties
instructs a HEIF [inaudible]
that the image needs to go
through extra steps before being
displayed.
For example, the clean aperture
property instructs a HEIF reader
that the crop operation must be
performed before rending the
final image.
All the properties for each
image are grouped together in
the same item property box.
Each image can then be
associated to which property via
the association box.
We will use an example to
describe how the association
works.
The above HEIF container on the
left describes the file with one
main image and one thumbnail.
The main image is comprised of
four tiles.
The item property box or ipco
box contains all the decoder
configuration and the sizes of
the main image, the tiles and
the thumbnails.
Note that the order matters for
this box.
The association box or ipma box
on the right groups properties
nicely based on their position
with the item ID in the file.
As explained before, there is a
total of six items in the file,
one image, four tiles and one
thumbnail.
Items 1 through 4 are the tiles,
these are hidden images with
properties in position one, the
decoder configuration and
position two, the size which is
500 by 500 pixel.
Item five is the main image,
only the size property is
defined since this is this is a
derived image.
The size is 1,000 by 1,000
pixels.
Next, we will briefly talk about
image sequences in HEIF.
When sequences are embedded in a
HEIF file the move box and it's
sub boxes are also present in
the file.
The move boxes fully described
in the ISO MP4 file format
specification from which HEIF
derives.
Each sequence of images or
samples is described via the
trak box where all the timing
information to play back the
track is included.
HEIF specifies a new track
handler for picture called pict.
The key difference is that while
the timing information given for
a video or an audio track is
used to synchronize the playback
the timing information an image
sequence track can represent
either the capture time for
example, a burst or the
suggested display time for
example, to derive a slideshow.
Roles can be used for image
sequences to.
For example, a HEIF file could
embed a track of thumbnails or a
track of auxiliary images
associated with the master
track.
One of the most important HEIF
features is the ability to
control the playback by
signaling in the file the intent
of the creator.
For example, an edit list
enables modifying the playback
order and pace of each sample.
HEIF also allows indicating edit
list repetitions for example,
for looping animations.
The repetition can be indicated
to last for a certain duration
or be infinite.
Given that the ISO tracks can be
used in HEIF files interframe
prediction is also available.
Inter prediction is the ability
to remove coded information by
predicting the content of the
current frame from similar
frames in the past or in the
future.
This gives a tremendous
advantage in terms of
compression.
Inter prediction can also
introduce a delay decode time
because the previous frames must
be decoded first before being
able to decode the current
frame.
HEIF allows inter prediction,
but also includes constraints in
the file to limit frame
interdependencies.
For instance, each predicted
image can be restricted to point
only to unknown predicted image
or inter.
In this case, the time to decode
each frame in a sequence becomes
deterministic.
Last but not least, a HEIF image
can be subdivided into tiles.
Tiles are rectangular regions
within an image.
They are completely independent
items in a HEIF file and they
can be of different or same
size.
If their size is different a
relative location property
describes their position in the
final image.
If their size is the same the
final image is described as a
grid. Several reasons why tiles
make HEIF extremely flexible.
A player can exploit parallelism
and decode time.
For example, each tile can be
separately and independently
decoded.
Tiles can be used to reduce
memory consumption when resizing
an image rather than decoding
the whole much and then apply a
rescale operation each tile can
be independently decoded and
rescaled and then placed in a
smaller buffer for rendering.
Cropping becomes very fast
because a player does not need
to decode the whole image to
extract a certain region.
This property is extremely
useful for zooming operation.
For instance, a gigapixel image
could be decoded and displayed
and zoomed in with ease without
the need to decode the whole
image into a multi gigabit
buffer.
Of note, the tiles can be used
also as an encoding tool.
A smart encoder can make
different decision based on the
content of each tile.
Apple HEIF implementation uses
tiles extensively.
Note though that the HEVC
specification also supports
subdividing a frame into tiles
as a parallelization tool.
Apple does not use tiles in HEVC
parlance, but rather each tile
is a whole HEVC frame, we call
them system tiles.
Next, we will talk about HEVC,
the codec Apple has chosen to
compress HEIF photos.
Two of the major reasons for
selecting HEVC.
First, HEVC is the latest
technology in the compression
standard world.
With HEVC we see an average of
2X compression compared to JPEG
containing the same visual
quality.
Second, HEVC hardware support is
becoming available in most CPUs
and GPUs.
For instance, HEVC hardware
support is available from the
sixth generation Intel core
processors.
This means except means
exceptional performance without
sacrificing battery life.
Several inter coding tools have
been added to the standard that
allow HEVC to outperform JPEG.
In the next few slides we will
mention some.
You will notice that the common
theme here is flexibility.
First, the block size.
JPEG divides each image into a
grid of blocks of 8 by 8 pixels.
These blocks are then described
transformed and quantized.
HEVC has the flexibility of
being able to divide an image in
blocks that are 64 by 64 pixels
down to 4 by 4 pixels.
The transform size is also
flexible within the block.
A new optional discrete
[inaudible] transform has been
added to the standard and three
possible scanning orders are
available to group coded
coefficients.
Next, the block prediction.
JPEG allows the top left corner
coefficient also called the DC
component or the constant
component of an 8 by 8 block to
be predicted from the block on
the left.
HEVC adds the flexibility to
predict every pixel value within
a block.
Up to 35 angular predictions are
available.
Being able to remove redundant
information in a block by
exploiting similar information
available in neighboring blocks
is one of the most efficient
tools inside HEVC.
Entropy coding.
JPEG uses Huffman coding as the
engine for statistical encoding.
The idea is to assign variable
length codes to input
coefficient.
With shorter length codes
assigned to coefficients with
higher frequency.
HEVC on the other hand, employs
an arithmetic coder called CABAC
which stands for Context
Adaptive Binary Arithmetic
Coding.
CABAC is notable for providing
much better compression than
most other entropy encoding
algorithms.
Quantization.
Quantization is a [inaudible] C
compression technique achieved
by compressing a range of values
to a single quantum value.
JPEG utilizes global
quantization matrixes for each 8
by 8 block.
HEVC on top of the quantization
matrix adds the flexibility of
assigning a different
quantization parameter for each
block.
This allows smart encoding
algorithms to compress more
areas of an image while the
human visual system is less
susceptible to detect artifacts.
For instance, high-frequency
content.
Next is the blocking, a tool
that is available only in HEVC.
Blocking artifacts are visible
discontinuities occurring at
block boundaries.
The HEVC deblocking filter is a
filter applied to the pixels
around the block edges to
smoothen the transition and get
more pleasing visual results.
SAO which stands for Sample
Adaptive Offset is an extra
filtering step available in HEVC
that is applied to the output of
the deblocking filter to further
improve the quality.
It's a local filter that can
attenuate bringing artifacts or
changes in sample intensity of
some areas of a picture for a
better visual quality.
Both these techniques allow for
more pleasing images, especially
when the compression is very
high.
We have gone through several
HEIF and HEVC features and
tools, I wanted to take a second
to mention a few characteristics
of HEIF files captured on iOS
11.
First, the extension for HEIF
images captured with iOS 11 will
be .HEIC because of the HEVC
codec.
The HEVC profile utilized to
compress images is the main
still profile.
Also, we use HEVC monochrome
profile for depth data.
Images are encoded using tiles
that are 512 by 512 pixels.
They are positioned in a grid
fashion to cover the whole
image.
The thumbnail is a 320 by 240
image HEVC encoded.
It is four times the size of a
common 160 by 120 JPEG thumbnail
and this is to help showing
better thumbnail quality when
images are displayed on modern
screens with high pixel density.
EXIF metadata is part of the
HEIF file like JPEG for backward
compatibility.
Depth data is stored as an
auxiliary image and the metadata
pertinent to depth is stored as
XMP payload associated with the
depth image.
Last, a note about file
creation.
The HEIF standard does not
mandate any order for the boxes
a reader could find at the top
level of a HEIF file, but we
found that ordering them in a
certain way greatly helps
parsers and decoders.
For example, having the
thumbnail early in the file
would allow parsing and display
huge amount of HEIF images
without the need to parse the
whole file.
For [inaudible] transmission or
web application once the metabox
is received all the information
for the file is available and
therefore readers can start
configuring the decoding and
display pipelines before having
received the whole coded data.
Let's summarize what we have
learned today.
The photography world needs a
better image file format to
replace the rather old JPEG.
We looked at the extensive list
of requirements that Apple
considered paramount when
searching for a JPEG
replacement.
We believe HEIF is the answer
for all the requirements.
Its flexibility allows to handle
with ease and elegance the
advancements available in iOS 11
and its extensibility also
allows HEIF to be a solid
foundation for the future.
We then analyzed the various
features available in the HEIF
standard.
And finally, we looked at the
HEVC tools that make it the best
choice, both in terms of
compression efficiency and
friendliness toward hardware
architecture for performance and
power.
For more information, please
visit the URL for the High
Efficiency Image File format
session 513.
And if you're still at the show
we invite you to visit the two
related sessions about HEIF and
HEVC.
Thank you for watching the talk
and enjoy the rest of WWDC 2017.