WWDC2014 Session 504

Transcript

X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
[ Pause ]
>> Hello, and welcome to
Advanced Media for the Web.
My name is Jer Noble
and I'm an engineer
on the Safari Layout
and Rendering Team.
And today we're going to talk
about what's new with audio
and video in Safari and
how media integrates
with today's modern web.
So this is the
element, it's the heart
and soul of web-based media.
On one hand it's a container for
some truly advanced technology,
but  elements are also
just another brick in the DOM.
They can participate in
layout and rendering.
They can be styled with CSS.
With the  element,
media can now be integrated
into the same responsive,
dynamic designs being
written for the modern web.
Video now helps tell
stories rather
than being the story itself.
A modern example, such as New
York Times' snowfall article,
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
A modern example, such as New
York Times' snowfall article,
show how you can weave video
into a rich storytelling
experience.
And video can add emotion
and energy to a page,
even to something already as
exciting as the new Mac Pro.
And it can still
take center stage.
But things weren't
always this easy.
Let's take a look back
at how the
element became a part
of this exciting modern web.
So in 1999 this was how you
added video to your website.
The QuickTime plug-in,
and at the time the
QuickTime plug-in was amazing.
It could decode high
bit rate video
and it exposed a rich
API to JavaScript.
For a while, plug-ins
were the only way
to add video to your website.
Now fast forward seven years
and in 2006 this was how you
added video to your website:
the QuickTime plug-in, again.
By now the QuickTime plug-in
supported the H.264 codec
and it delivered even
higher quality video,
but it was still a
plug-in, one which users had
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
but it was still a
plug-in, one which users had
to find, download and install.
Well, not you Mac users, of
course, but it wasn't something
which website authors could
depend upon being available
across all their
entire audience.
In 2007, though,
this all changed
when the
tag was introduced.
This was an amazing
breakthrough.
No longer did web
developers have
to depend upon a
proprietary plug-in
to deliver video in their pages.
Video is now integrated
directly into the web layer,
and as an HTML5 specification
the  element provided a
constant or a consistent
experience in API
across all browsers
and platforms.
Browsers could build on,
improve and add video features
without waiting for
plug-in developers,
and this triggered a
virtuous cycle of innovation.
In 2009 the  tag came to
mobile browsers in iPhone OS 3,
when support for the
element was added to Safari.
Previously, the primary way
users of Safari would interact
with a video was by clicking
on a YouTube.com link,
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
with a video was by clicking
on a YouTube.com link,
which would open
up the YouTube app,
but now video was a
first class member
of mobile browsers as well.
And today's iOS devices
are almost as powerful,
if not more powerful than
desktop computers sold in 2009.
We've talked a lot about the
element at past WWDCs.
All the videos are at
developer.apple.com
or on the WWDC app you
have on your phone.
In 2010 we covered the basics
of adding a
element to your web page.
In 2011 we talked about how
to take that  element
and add CSS and JavaScript
to make your own custom
media controllers.
And in 2012 we showed you how
to play back multiple
media elements synchronized
with one another, how to do
advanced low latency effects
with the web audio API and how
to take your JavaScript-based
controls into full screen
with the full screen API.
So what will you
learn at WWDC 2014?
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
So what will you
learn at WWDC 2014?
You'll learn how we've narrowed
the differences between Safari
on iOS and Safari
on OS X and what
that means for your web pages.
You'll learn the best way
to stream adaptive media
on your websites,
how to use less power
when playing back video and how
to coordinate your media's
timeline with elements
in your page with a
timed metadata API.
But before we get started,
let's talk a little
bit about plug-ins.
Now how good is the
element on iOS?
It is so good that whenever
I encounter a page on Safari
on OS X that insists that I
need to use the Flash plug-in
to view its content,
the first thing I try is
to turn the user agent to iPad.
Most of the time it works.
What's that all about?
Well, I know that no one here
would deliberately write a page
that insisted on using a plug-in
when HTML5 video was available.
I'm just going to assume
you've updated your iPad sites
recently, but please update
your desktop site as well.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
recently, but please update
your desktop site as well.
Plug-ins have a time
and a place,
but as web standards
evolve and browsers improve,
those times are getting fewer
and the places further between,
and that's a good thing.
So speaking of browsers
improving,
let's talk about how we've
narrowed the differences
between Safari on iOS and OS X.
We've removed some
of the distinctions
between the platforms by
giving you more control
over media loading with the
preload attribute on iOS
and by allowing the
element
to fully participate
in CSS layering.
But first the preload attribute.
The  element's preload
attribute lets page authors
control when and how their
media's data is loaded.
A preload value of "none"
instructs the browser
to preload no metadata.
A value of "metadata"
asks the browser
to only download
enough media data
to determine the media's width,
height, duration et cetera.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
to determine the media's width,
height, duration et cetera.
And a value of "auto" means
begin loading media data
sufficient to begin playback.
Now in the early days of
iOS, there was a lot of media
on the internet which couldn't
be played by iOS devices.
So in order to be able to
tell users whether the media
in the page was playable or not,
Safari would download enough
media data to check playability.
But in order to keep
users' data costs down,
it would ignore the
preload attribute and behave
as if it was set
to preload="none".
In 2014, unplayable web media
is much less of a problem,
so new in iOS 8, Safari will
honor two preload values:
"metadata", which is the
new default, and "none".
Why is this the right
thing to do?
Most sites will see no change in
behavior, either in the browser
or on their server, but even
loading just metadata can load
up for sites with a lot
of  elements.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
up for sites with a lot
of  elements.
So now the preload value
of "none" will be honored.
Now it's still true that on iOS
loading beyond metadata will
still require user interaction,
and we still believe
that this restriction is in
the user's best interest,
but it does get rid of one
frustrating distinction
between Safari on OS X and iOS.
So why is this important?
For  elements which
don't explicitly specify preload
of "none", they will begin
admitting new events,
specifically the
"loadedmetadata" event.
Now, during development, we
came across a certain site,
which had the following
on their mobile page.
They had a  element with
default controls enabled that,
when it receives
the loaded metadata,
the event would hide
those controls.
And it did so to enforce users
watching their pre-roll ads.
So in iOS 7, when the
element was shown,
nothing would happen.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
nothing would happen.
And when they hit "play",
loading would progress,
the loaded metadata
then would fire
and the controls would hide.
In iOS 8, as soon as the
video was added to the page,
loading would begin, the loaded
metadata event would fire,
and the controls would hide,
leaving the users no way
of actually playing the video.
Now how could they fix this?
What they shouldn't do is
revert to the old behavior
by adding preload="none",
that just leaves
in place the implicit assumption
that loadedmetadata means
the video has begun playing.
Instead they should
listen for the onplay event
and hide the controls when that
occurs, letting the users play.
So that's new in loading.
Let's talk about layering.
In previous versions of iOS, the
element was implemented
as a UI view, which was
placed on top of web content.
New in iOS 8, we have integrated
in the  element directly
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
New in iOS 8, we have integrated
in the  element directly
as a native part of
the rendered tree,
just as it is on OS X Safari.
And, as a result, the
element will now fully respect
CSS layering rules.
However, there is a caveat:
websites which did not
exclusively place their video
topmost with the CSS
z-index property may see some
weird behavior.
They could have their video
appear below other layers
that it didn't appear
below before.
Or other layers appearing
transparently on top
of the video layer could
intercept touch events,
leaving the users no way
of actually playing the
video in that case either.
So please be on the lookout
for these breaking
changes in your websites.
That was platform differences.
Now let's talk about
how the best way
to add adaptive streaming
to your websites.
So today's web devices
run the gamut
from small battery-powered
mobile devices,
to desktop computers,
to big screen TVs.
And that has led to a movement
called responsive web design,
whose goal is to provide an
optimal viewing experience
across a wide variety of devices
by tailoring a page to respond
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
across a wide variety of devices
by tailoring a page to respond
to different characteristics
of the device
on which it's running.
Now most responsive web design
concerns itself with the size
of the viewport in
which the page is shown.
But, for video, other properties
of the device are as important.
So, yes, what screen size
is available on your device,
but also what video resolution
can the device decode?
What codecs and profiles
does the device support?
And how much bandwidth
does the device's internet
connection provide?
At its most basic, a
element points
to a single file
on a web server.
With only a single file,
a page author is left
with the unenviable task
of picking a single version
that will apply to
all of their viewers.
So perhaps a desktop device
with a fat internet connection
should get a high bit rate
stream, while a mobile
device on wireless
or on cellular might need a
small, lower bit rate version.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
or on cellular might need a
small, lower bit rate version.
But that same device,
plugged in and on Wi-Fi,
should get the large
bit rate version, too.
None of this is easy
with a single file
sitting on your server.
Instead, this is a job for
HTTP Live Streaming, or HLS.
So HTTP Live Streaming
is a mechanism
for delivering multiple streams
in a single manifest file.
The master playlist,
or manifest,
describes the characteristics
of each substream and the URL
where that stream
can be accessed.
And the browser picks the
appropriate stream based
on the characteristics
of the current device.
Now Safari on OS X and iOS use
the AV Foundation Framework
to play HLS streams so you get
the same high quality streaming
experience as the native apps.
And AV Foundation will
seamlessly switch streams
when conditions of the
network or the device change.
To show you how easy it is
to create an HLS playlist
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
To show you how easy it is
to create an HLS playlist
with multiple streams of
different characteristics,
my colleague, Brent Fulgham,
will walk you through
the process.
Brent?
>> Thank you, Jer.
My name is Brent Fuljam,
and I'm also an engineer
in the WebKit Layout
and Rendering Team.
And today I wanted to show
you a few examples of HLS
and how it might make
your life better.
Now I'm sure that,
getting up here,
the first thing you thought
was, "This guy is a skater."
Right? I mean, I love it.
I film it and I had some great
video that I wanted to show off
that we filmed in Utah.
It's high-fidelity video,
beautiful cinematography,
if I do say so myself.
Wonderful, wonderful content.
And I wanted to share this
with my friends and family,
who couldn't be there that day.
So what I wanted to do
was put together a website
that would show this content.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
that would show this content.
Let me return to this.
All right, so now I have a
single source element playing
a video.
This is the content
that I showed you
in QuickTime Player
just a second ago.
And let's take a look at what
that would look like
for our viewers.
Great, it looks exactly
the same as what I did
in QuickTime Player,
so I'm done, right?
All my friends can look at this
and tell me how great I am?
Well, no, it turns out that a
number of people were trying
to view this with lower
resolution devices: iPhones
and iPads and things that don't
have the full pixel content
of a giant display
projection system like this.
And it turns out they
didn't even watch it
because it just took
too long to play.
Going back to the original
video, we can kind of see why
that is, it's about 150
megabytes for eight seconds
of video, and that's not
going to make many people want
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
of video, and that's not
going to make many people want
to stick around and
wait for that.
So what do I do?
Well, the first thing
I would want
to do is take this
original video
and create multiple encodings
that are targeted or optimized
for different devices.
So if I have iPhones and
iPads that I want to support,
I want video streams
that are more suitable
or optimized for that.
And so you could do this
using a variety of tools.
We have iMovie; we
have Final Cut Pro.
If you're doing a lot of
these you might want to look
into Compressor, which
is a great application
for doing this.
We all have QuickTime Player
installed on our computers,
and so let me just show
you what we would do here.
In QuickTime Player,
we can export the video
in a variety of formats.
So we have 1080p, 720p, and
we have a set of presets
that are already laid out for
different types of devices.
And so in QuickTime Player
I would have to go to each
of these presets individually
and output a 1080p version
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
of these presets individually
and output a 1080p version
and output an iPhone 3GS
version, and so forth.
Now I'm not going
to make you wait
around while I export
these, since that's boring,
but what I will show
you is the set
of video encodings
that I wound up with.
And since I was running through
this briefly before we did this,
I have stuff here that
you don't need to see yet.
All right, so I have these video
encodings, I've created a bunch
of different versions that
support the different types
of devices that I
want to support.
I've got high resolution
for broadband users.
I've got lower resolutions
for people on cellular.
I've got a broadband
Wi-Fi version.
And so now I'm all set.
Once I've uploaded
these to the web server,
then I'm pretty much
ready to go, except I need
to make some changes
to my webpage.
So if I go back to
my web page example,
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
So if I go back to
my web page example,
instead of just having
this one  element
or this one video
element
that is giving me my
high-quality video
or my ProRes video, I
need to add a version
that supports, say, my iPad Air.
So I have a Retina iPad.
I have a source that is a
slightly different location,
a different file encoding,
so in this case I'm
driving the broadband media,
and I'm using a CSS media
selector that limits the clients
that are going to receive this
video to items that have a 1024
by 768 resolution, like
you would have on an iPad,
and a device pixel ratio of 2.
And so I say, "Okay, this is my
iPad Air and, while I'm at it,
I probably want to have
something for iPhones 5 and 5S
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
I probably want to have
something for iPhones 5 and 5S
and maybe something for
a bunch of older stuff.
And pretty soon we have a pretty
large set of sources for us
to serve from this web server.
Okay, so if we look at this
now, you refresh the page,
what does it look like?
Well, it looks exactly the same.
I'm getting the same
stream that I had before
because I'm still connecting to
this with a high-quality, well,
a loopback network
and I'm showing it
on a giant screen
with lots of pixels.
I'm still getting
what I expected.
So I should be done now, right?
I mean I'm able to
deliver the right content
to all these different
people on different devices.
Time to go home and
put my feet up
and get the congratulatory
e-mails
from everyone, I assume, right?
Well, it turns
out my brother-in-law was
camping the weekend I posted
this and had really spotty
internet connectivity.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
this and had really spotty
internet connectivity.
He was, I think, on the
Edge network or something.
So I asked him what he thought
of it, and he said, "Well,
I didn't even bother watching
it because it took too long
to download and it never
made any progress."
And I realized, well, we've
dealt with the resolution here,
but we haven't talked, at
all, about network bandwidth,
and that plays a role, as well.
Now as a web developer what
would we do in this case?
I could write some kind of
network sniffing algorithm
to try to figure out how
much bandwidth is being used
and the download rates
and this, but it seems
like that'd be really
hard to do properly
and it would be really
easy to get wrong
and would have to be maintained.
But what about this
HLS technology
that Jer just finished
telling us about?
In theory that should
take care of everything.
Well, it seems like
a good solution.
I already have all my encoded
video here, so there's really --
I've already done the hard work
of creating the different
encodings
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
of creating the different
encodings
for the different device
types, so now all I need
to do is generate the HLS
master manifest and information
that HLS will use to
display this content.
Now to do this we need
to use the dynamic duo
of Media File Segmenter and
Variant Playlist Creator.
And these are fantastic
tools that you can download
from our website but, as you
might imagine from these names,
they have a dizzying array
of flags and entry points
that you have to provide, and so
it's very difficult to remember.
We ended up just creating a
shell script to do this for us.
And so let me just put
this up here and, okay,
and then let me give you a
minute to write that down?
And then-oh, well, that's
probably not a great idea.
How about if you come by
our lab later this week,
and we'll be happy to
give you a copy of this?
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
and we'll be happy to
give you a copy of this?
All right, so what does this
look like when we run it?
Well, what I do is-I'll run
this, make an HLS script.
I'll provide us with-I'll
feed it the input
of the various files
that we want to use,
and we process each
of the files.
And what that ends up looking
like is I have this
magic index.m3u8,
which is the master manifest
file, and I have a series
of transport streams
that have been generated
for each of my encodings.
So in this case I have a
broadband high-bandwidth rate
version of this, and I've done
the same-and the script has done
the same thing for all
the different options.
So what I need to do is upload
all of this stuff to my website,
so I'd upload the
different transport streams
in my m3u8 file, and at that
point I'm basically done,
except for one change that
I need to make to my video,
I mean, to my website.
What I need to do is get rid of
all of this stuff, all of it,
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
What I need to do is get rid of
all of this stuff, all of it,
and replace it with one line,
the line that I gave you a
sneak peek of at the beginning.
This is the index.m3u8, this
is what we're calling --
this is our Manifest, our
Master Manifest file for HLS,
and let's just make
this say "best".
All right, and let's see
what that looks like.
I bet you can guess.
It looks exactly the same, but
now we're streaming this content
in an adaptive fashion, where
it will change and adapt
to the type of devices that
are being used and it'll change
and adapt to the
network conditions.
So if I were to start playing
this eight-second video
and leave the room, in theory,
it would-the speed would drop,
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
and leave the room, in theory,
it would-the speed would drop,
it would degrade to a
lower bandwidth version.
And if I were to
return to an area
where I had high bandwidth
it could then pick it back up
and return to this beautiful,
high-resolution imagery.
So I think that-so I hope that
this brief demo and this example
of how simple it is on your
website will show you why we're
so excited about this
technology and why we hope
that you'll try it for
your next projects.
Thank you.
>> Thanks, Brent.
That was great.
So for more information about
how to use HTTP Live Streaming
and how specifically
to encode your videos
for all the wide
variety of iOS devices,
take a look at Tech Note 2224,
which specifies all the settings
you'd need in Compressor
to generate multiple
encodings of your video
for a variety of Apple devices.
And, also, you can download
the Variant Playlist Creator
and Media File Segmenter
Tools as part
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
and Media File Segmenter
Tools as part
of the HTTP Live
Streaming toolset
from developer.apple.com.
And we have a live
streaming developer page,
where you can learn
all about HLS
in webpages and in native apps.
But new in Safari
on OS X is support
for a media streaming technology
called Media Source Extensions,
or MSE.
This is an extension to
the HTML5 specification,
where a  element
source is replaced
by a  object,
which requires the page
to completely control
loading of media data.
Now MSE is primarily
intended for only the largest
of video providers, who have
large and complicated CVNs
and who need to micromanage
every aspect
of their network stack.
We built support
for MSE into Safari,
but for most websites we
don't actually recommend
that you use it.
And let's talk a little
bit about why that is.
With great power comes great
responsibility-except I remember
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
With great power comes great
responsibility-except I remember
it being someone else.
Oh, that's right.
The MSE API will accept
raw data, demux it,
parse it into the samples,
decode those samples
and cube the samples for
display, but that's it.
That's all you get.
For everything else, your
website has to do it manually.
The browser will not
fetch data for you.
It must be fetched explicitly
by your page through XHR.
The browser will not
preload metadata for you.
You have to do that yourself
to make sure playback
buffers don't run dry.
And once you've done
these two steps,
you will have reproduced
basic video playback,
but then again the
element could do that already.
For all of the benefits
of streaming media,
your page must implement
it manually.
So your page must monitor
network conditions to make sure
that your user's device can keep
up with the bit rate
that you are serving.
You also have to monitor whether
your users are dropping frames
if their hardware can't keep
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
if their hardware can't keep
up with the media
that you are serving.
And when conditions change you
have to manually switch streams
by pre-fetching and
then starting
over for a more appropriate
stream.
And you have to do all of this
without detailed information
about the current state of
the device whereas, with HLS,
it can use its detailed view
over the device's current state
to make better adaptation
decisions.
So, for example, HLS knows what
other processes might be running
on the device and knows
whether the device is
on a metered cellular
connection or on Wi-Fi.
HLS is aware of the
current battery conditions
of the device, and it knows
about the current memory
pressure the system is under.
So writing an MSE player
involves re-implementing an
entire streaming media stack in
JavaScript, whereas HLS has all
of this data available
and yet writing a player
for HLS requires a
single line of HTML.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
And what's more MSE is
only available on OS X.
So, to reach iOS users,
you're likely to have to set
up an HLS stream anyway.
For almost every conceivable
situation, HLS is going
to be a better choice
for streaming media.
Okay, what about
cross-browser support?
HLS is supported
across all versions
of Safari on iOS and OS X.
MSE is only supported
on Safari on OS X.
The Android browser and
Android Chrome both support HLS,
but not MSE.
IE 11 supports MSE, but not HLS.
Google Chrome supports
media source extensions,
but apparently its developers
are investigating implementing
HLS on top of MSE as JavaScript.
And Firefox only supports
MSE in its nightly builds,
but they are also looking
at adding support for HLS
through MSE implementation.
So, as you can see, the
web hasn't really settled
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
So, as you can see, the
web hasn't really settled
on a single streaming
media technology yet.
So, if you take nothing
else away,
for your Safari users, use HLS.
Okay, that was streaming,
now let's talk
about power efficiency.
At Apple we care
deeply about power.
We make devices with
simply amazing battery life.
But it's not just
about batteries.
We care about the impact our
devices have on the environment
as a whole, and that's evident
in how much performance
we can squeeze
out of a single watt
of power use.
We've done this through
a combination of hardware
and software engineering, but
the last mile is up to you.
It's easy to do this wrong and
drain your users' batteries.
And a user with a dead
battery is one that's not using
your website.
So today we're going to show
you how to minimize the amount
of power you use when
playing back video in Safari.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
of power you use when
playing back video in Safari.
So, first, we're going to talk
about using fullscreen mode
and we're going to talk
about how sleep cycles
affect battery life.
But, first, fullscreen.
It may sound counterintuitive,
but going into fullscreen mode
can dramatically reduce the
amount of power your
system uses as a whole.
Apps which are hidden behind a
fullscreen browser window can go
into a low-power mode called
App Nap, and you can learn more
about App Nap specifically
at-I believe there's a session
on Thursday, something
about programming,
low-power programming
and, anyway, look it up.
It's Thursday at 10:30.
But, in addition, when
the system determines
that it can composite video
without-well, it doesn't have
to do compositing to
get video on screen,
it can go through a low power
mode, but to explain how
that works we're
first going to have
to talk about pixel formats.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
to talk about pixel formats.
So every web developer
should be familiar with RGB.
The web platform is
written in RGB values,
where every pixel is broken into
a red, green and blue component,
and each component is
given eight bits of depth.
But video is different,
video is decoded
into a pixel format called YUV,
where Y is a luminance plane,
and U and V are two
color planes.
The Y plane actually encodes
the green and brightness values,
and the U and V planes
encode the blue, or the red
and the blue, respectively.
And typically the Y plane
is given twice as much depth
as the U and V planes, which
is why we call it YUV 422
or other formats, like YUV 411.
All of those describe the ratio
of the bit depths between the Y
and the U and V planes.
And we give the Y
plane more depth
because the human visual system
is much better at distinguishing
between values of green and
values of light and dark
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
between values of green and
values of light and dark
than it is between
values of red and blue.
So if you're a mantis shrimp,
then this makes total sense
to you-if you're a mantis
shrimp who longboards, that is.
But so YUV 422 only requires
about 16 bits per pixel
to encode, whereas an RGB with
an alpha channel requires 32.
And since there's typically
less variance in the U
and the V planes, they can
be compressed much easier
than RGB values.
And this is why video
prefers to use YUV over RGB.
The side effect, though, of
all these decisions is that,
since the web platform was
written in RGB and video was
in YUV, we have to convert from
one to the other when we need
to draw on top of the video.
That's how this works.
It's called compositing, where
layers are drawn together,
top to bottom, in order
to present the actual webpage
your viewers are going to see.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
So typically it works like this,
you start with coded video
frames, you decode those frames
into YUV, and then convert them
into RGB, draw your web content
on top of them and
then send them
out to the video
card to be displayed.
Simple, right?
Now if the system determines
that it can display a
video frame without having
to draw anything on top
of it, it can skip all
of these format conversion
steps and go straight
from YUV directly
to the video card.
It dramatically reduces
the amount
of power required
to display video.
It does have a few
prerequisites, though.
For one you must support
the Fullscreen API.
If you have JavaScript custom
controls, you should have
at least one that uses the
requestFullscreen method
to bring your controls and your
video into fullscreen mode.
Black is the new black.
You should only have a black
background visible behind
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
You should only have a black
background visible behind
your video.
And no DOM element should be
visible on top of your video
as well, and this is
tricky because elements
which have an opacity of zero
are still technically visible.
So don't hide your
controls with opacity
or at least don't only
hide them with opacity;
use "display:none" as well.
And everything that's not
currently being displayed
in fullscreen mode won't
ever be visible, so you might
as well hide it as well.
And we'll show you a quick
little snippet of CSS
that will hide all of your
non-fullscreen elements
when your video is
in fullscreen mode.
So, first, the Fullscreen API.
Now we've talked about
this at a previous session,
so for more information about
how the Fullscreen API works,
check out-I think it's
the 2011 video session.
But, really quickly, if
you just call this method
from like a fullscreen button
handler it will toggle back
and forth between
fullscreen mode.
To hide everything that's
not in fullscreen we're going
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
To hide everything that's
not in fullscreen we're going
to give you a little
snippet of CSS to use.
So, first, for a
fullscreen element,
all of its ancestors are
given a pseudo-class called
"full-screen-ancestor".
So this will select every child
of a fullscreen ancestor
that's not an ancestor itself
and is not the fullscreen
element itself, and hide it.
So just add this line
of CSS to your websites.
None of the objects that are not
in fullscreen mode when the rest
of your content is
will be visible or,
if they won't be visible, they
won't be in the render tree
and wasting CPU cycles
in memory.
Okay, so that was compositing.
Now let's talk about how video
playback affects your sleep.
Something you should
be aware of is the way
that media playback affects
your user system's sleep cycles.
When Safari plays a video it
will conditionally block the
display from sleeping
using a sleep assertion,
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
display from sleeping
using a sleep assertion,
and it does this to avoid
the annoying behavior
of your display going to sleep
halfway through an episode
of "Orange Is the New
Black" or whatever.
Safari will only block
this sleep from happening
under certain conditions,
though.
So, the video must have an
audio track and a video track.
It has to be playing and
it must not be looping.
If any of these conditions
are not met,
we won't keep the
system from sleeping.
However, this has kind of
a dramatic failure mode.
So there was a website
we came across.
They were trying to
do something very cool
with the  elements.
They used a full-page
element as the backdrop
of their landing page and,
in order to do a fancy
CSS transition at the end
of the video, they
didn't use looping.
They had two  elements
that they faded between and,
even though their video wasn't
entirely silent, it was silent
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
even though their video wasn't
entirely silent, it was silent
because it had a
silent audio track.
So if you loaded this
page and you walked away
from your computer, you
came back in a few hours,
it would be completely
dead because, to Safari,
this looks like the user
is just watching a playlist
of different videos.
So how could they fix this?
Well, for one they could
strip the audio track,
the silent audio track,
out of their media.
They could also burn the fade
effect into the video itself,
the video media itself,
and use the loop property
to loop the video
over and over again.
Either one of those would
let the display sleep again.
But we have also updated
our requirements in Safari.
In addition to the
element having an audio
and video track, not
looping, and playing,
it must also be visible.
That means it must be
in the foreground tab
in the visible window
and on the current space.
If the  element
is in a background tab
or the window is hidden, it
will let the system sleep again.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
or the window is hidden, it
will let the system sleep again.
So, with these changes, even
if you do the wrong thing,
your page will still keep the
system from sleeping but only
when your page is
actually visible.
Okay, that was power efficiency.
Now let's talk about how
to use timed metadata
to coordinate events
in your page.
So what is timed metadata?
Timed metadata is data
delivered alongside your video
and your audio data where each
piece of data has a start time
and an end time that's
in the media's timeline.
But, Jer, I can hear you
asking, that sounds a lot
like text tracks,
and that's true.
Text tracks are one
kind of timed metadata,
but metadata isn't limited to
text and other text-like things.
You can include arbitrary
binary information;
you can include geolocation
information; you can control,
add images, you can
include text;
you can include anything
you'd like in a metadata track
and have that be available
through the Timed Metadata API.
Now, timed metadata has been
available to native apps
in API form on iOS and OS X for
some time, but new in Safari
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
in API form on iOS and OS X for
some time, but new in Safari
on iOS and OS X, it's easy to
use from JavaScript as well.
It appears in the
element as a text track,
just like the caption tracks we
talked about at previous WWDCs.
These tracks will be
a kind of metadata,
meaning they won't be
displayed by the browser.
Instead they'll be available to
your script running in the page,
and you can use the same
Text Track APIs to watch
for incoming metadata
events as text track cues.
Now TextTrack contains
a list of cues,
each one has a start
time and an end time.
You can add event
handlers to these objects
that will get fired as
the media timeline goes
into the cue and leaves.
Now a WebKitDataCue is a
subclass of a TextTrackCue and,
because this interface
is experimental,
it has a WebKit prefix.
So this is not in
the HTML5 spec yet.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
So this is not in
the HTML5 spec yet.
Be prepared for this
interface to change.
That said, we're pushing
to get these proposed
changes into the spec soon.
So each cue will have a type
property, which allows you
to interpret the value
property correctly.
So what does a type look like?
Well, the metadata cue type
indicates the source the
metadata came from.
So metadata can be found in
QuickTime user data atoms;
they can be found in
QuickTime metadata atoms;
it could have been inserted
by iTunes; it could be found
in the MP4 metadata
box; or, finally,
metadata can be inserted
as ID3 frames directly
into your media stream.
So these values allow you to
interpret the value property,
which looks like this.
Each value has a key and,
between the key and the type,
you can uniquely identify
the meaning of the data value
or the data property, which can
be a string, an array, a number,
an array buffer,
any JavaScript type.
And the value may optionally
have a locale so you can choose
between a variety of available
cues for the current time based
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
between a variety of available
cues for the current time based
on the user's current
system locale.
So what would you use this for?
An extremely simple example
would be displaying the title
of a song in a long-playing
audio stream.
Another example would be to
add entry and exit points
to various places in your
media stream so you can track
where the users, say,
watch through an entire
ad or skipped over it.
But because you can package
any type of binary data you
like in an HLS stream or a
media file, the possibilities
for this API are
functionally endless.
And so Brent has an amazing
demo showing what kind
of awesome things you can
do with timed metadata.
Brent?
>> Thank you.
So let's get back
to longboarding.
I have some more footage here
of a really nice day in Utah,
and we thought it would be
really neat to take advantage
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
and we thought it would be
really neat to take advantage
of some of the metadata that
can be encoded in these videos.
Now our iDevices can already
collect a lot of things,
such as geolocation data
and, with newer devices,
we can now collect motion data.
There are a variety of
applications that you can use
that will encode other types
of data that are related
to other devices and things
that you may work with.
This can all be encoded together
and brought along as
part of the media.
And so I thought it'd
be really interesting
to see what we could
do with that.
Now, in this video, we had
some content that was embedded
in the media stream as ID3 tags,
so they contain a text
entry that's a JSON object
or an encoded JSON object.
And I wanted to get a feel
for what that looked like,
so I put together a page
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
so I put together a page
that showed the same
longboarding video
with the metadata
displayed on the side.
And so you can see some
of the types of content
that would be in here.
Now this content was from a
specific use case, so it's going
to vary depending on where your
media comes from and what kinds
of tools are being used
to put it together.
But, in this case, we have a
speed; we have an ordered list
of skaters; we've
got a notes field.
So I thought it'd be interesting
to see what we could
do with that.
Now let me just briefly
show you what I did
to get this data to display.
Okay, so just like before,
we have a video source
that in this case is
another .m3u8 encoded video.
When the video starts, I've
added an onloadstart handler
so that, when the stream starts
playing, we can do something.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
so that, when the stream starts
playing, we can do something.
And what we do is, we need
to register an event listener
for the add track event
so that we can know
when tracks are being
added to the streams.
So the metadata-the
video will start playing
and then the metadata will
be recognized by the system,
and it'll fire this event.
When the track has been added,
I want to add another event
listener for 'cuechange'.
This is the part where WebKit
will be firing these cue change
events as this metadata is
encountered in the playback.
Very important is this, where
we set the mode to 'hidden'.
Now the tracks in the system
come through in a default state
of disabled, which means
that you will not get events.
We set it to 'hidden'
because we don't really want
to see this content,
and we don't want WebKit
to necessarily do anything
with it, but we do want
to receive events when
these cues are encountered
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
to receive events when
these cues are encountered
so that we can do
something with that.
And so, finally,
the meat of this is
in the 'cuechange' event
handler where, because I know
that this data is JSON, I was
able to just take the data cue,
which is a WebKit
data track object
that like Jer just
told us about,
and retrieve the
data portion of that
and parse it as a JSON object.
Once I did that, obviously, the
first thing I did was to take
that reconstituted JSON object
and immediately re-stringify it.
Why? Because I just wanted to
pre-print it for the screen
and I didn't want to have to
like write anything to do that.
So JSON.stringify will do
that for you and then you end
up with something that looks
a little bit like this.
And that's great, we're getting
metadata that's firing a few
times a second and we have this
information we can do something
with, but it's not really
that compelling an example.
It wasn't that interesting,
so I thought,
"What else could we do this?"
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
"What else could we do this?"
Well, we have a speed.
What if we modified it so
that I could show a HUD
with a speed indicator on it?
That would be kind of
cool, we could kind
of see how fast people were
going, so to do that we need
to make a few changes.
First, I'm going to
go ahead and get rid
of this brief little style
sheet I put in here just to kind
of get things on the
screen, and replace it
with a more full-featured
style sheet.
And I'm going to modify
the 'cuechange' event.
I don't need to re-stringify the
JSON object now that I have it.
All I need to do is grab the
speed out of the JSON payload
and stick it in a
that I just named "speed",
and then I'm going to
go ahead and add a
to hold the HUD for that.
We'll call this "AWEsome-er".
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
We'll call this "AWEsome-er".
All right, so what
would this look like?
So now we have our little
HUD on the top of the screen.
And, if I begin video playback,
now we have a little
speedometer running.
So now we have live overlay
in this video playback.
We've got content that's being
written by us and added by us,
live on the web page, and
this is kind of an example
of what you can do with
this kind of information.
Well, I thought this
was kind of fun,
but we could probably do more.
There was other content
there; there was a notes field
that had information
that called out things
that they might be doing on
screen; and we had information
about the ordered
list of the skaters.
So I thought, "Well, why
don't we make a leaderboard
that showed kind of who is in
what position and maybe call
out any tracks or other
information like that."
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
out any tracks or other
information like that."
So what we need to
do is, in addition,
in my 'cuechange' event handler,
in addition to the
speed indicator,
I want to have a method that
will show the skaters in order
and I want to have a method
that will display the tricks
that are being done on screen.
I'm not going to
go into much detail
about how this works except to
say that, to show the skaters
in order, I basically have
a first, second, third,
fourth CSS class that I
set up in the style sheet.
And I just iterate through that
list of ordered skaters and,
as the skater names come
in, I styled the
that has the name of that
skater in first, second,
third or fourth position.
Likewise, I'm going to have
some code to show the tricks
as they come in, and here
I'm just going to add a
to the page that is styled
with the "trick" style.
With a little bit of CSS
animation, we should be done.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
With a little bit of CSS
animation, we should be done.
Then let me go ahead
and add some 's
to hold all the skaters.
I've given the 's
the names of the skaters
and that makes it easy
for setting this all up.
Now let's take a look
at what that looks like.
So now I've got a HUD, I've
got the skaters in the top,
I've freeze framed a few things
and grabbed some
snapshots to show them off.
Let's go ahead and
get it started.
There we go and so now we're
getting live playback on top
of the video with the skaters.
And let me just point out that
these guys look really good,
but imagine the skill it takes
to be skating backwards filming
this, and I think you'll agree
that a lot of the magic
is happening offscreen.
So here, timed metadata
events are firing,
we're seeing speed changes,
we're seeing tricks being called
out as they're moving along,
and this is all happening
live-I'm not baking this
into the video.
I could have done that, but
then we wouldn't be able
to modify this, change
colors and whatever else.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
to modify this, change
colors and whatever else.
And this is the good part here.
This guy, Alan, he saw
something, he just slid away.
Here we are, he's going
to notice something.
Danger is spotted.
He's going to bail out.
Poor Fred, ah, he hit the
pothole, had to get off.
He's now out of the race.
He's got an X and he falls away.
And our last skater ends,
and we get the positions
of the skaters.
And so that's just a brief
example of what you can do
with these kinds of metadata
events and a little bit of CSS
and JavaScript on top
of the live video.
I hope you guys understand
why we think this is
such an exciting technology,
and I can't wait to see what all
of you will do with
this in the future.
Thank you very much
and, back to you, Jer.
>> Thanks, Brent.
That was some amazing
backwards cinematography there.
So let's sum up what you
guys have heard today.
Video in Safari on iOS and OS
X are now closer in behavior
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
Video in Safari on iOS and OS
X are now closer in behavior
by supporting the
preload attribute on iOS
and allowing the  element
to fully participate
in CSS layering.
You've also seen how to use HLS
to add adaptive streaming
support to your pages,
and you've learned how
to improve your user's
power efficiency
by playing back video
in your pages.
And you've also seen how
to use timed metadata
to coordinate events in your
page with your media's timeline.
So, for more information, please
contact our Evangelism Team
and see the Safari for
Developers documentation
on developer.apple.com.
And don't forget about the
Apple Developer Forums.
Other sessions you
might be interested in:
"Harnessing Metadata in
Audiovisual Media" later today,
"Writing Energy Efficient
Code", parts one and two,
will happen on Wednesday,
and stop by the "Designing
Responsive Web Experiences"
for more information on
responsive web design.
And that's it.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
And that's it.
Have a great WWDC.