WWDC2017 Session 514

Transcript

>> Hello, welcome to our session
on error handling best practices
for HTTP live streaming.
My name is Shravya Kunamalla and
I am an AVFoundation engineer.
Let's get started.
There are a huge number of Apple
developers streaming content
using our very popular HTTP live
streaming.
Over the years, the usage has
evolved into multiple complex
delivery scenarios.
The developers are doing live
event broadcasts, prerecorded
[inaudible], and in each of
these there are possibly
multiple different media
selections, variance at
different bit rates, audio and
subtitles of different
languages.
The content itself might be
protected and there could
possibly be millions of
simultaneous viewers subscribing
to your streams.
Given the enormity the system is
bound to run into errors.
A lot of developers and content
providers have asked us one
question in particular over the
years, what is the right thing
to do when an error happens.
And on very popular demand, we
present to you today the best
practices for handling errors on
both app and the server side.
Most of you listening to this
talk might already know all
about HLS delivery, but let's
quickly go through the overview.
We have a master playlist, this
consists of alternate versions
of the same presentation.
In this example, there is a 6
megabit and 2 megabit video,
English and French audio,
English and French subtitles.
Each of these is called a media
playlist and has its own
[inaudible] playlist.
The media playlist consists of
segments.
In case of life, the segment
list is updated at regular
intervals on playlist re-fetch.
The segments may be dropped off
from the beginning and new
segments are added to the end.
In case segments are protected
media playlist also contains
keys.
We also have session data, this
can be for example titles or
lyrics.
These are the resources that the
server is expected to deliver
and the HLS client needs for
playback.
So, what should the server do
when it's unable to deliver due
to errors?
What are the best practices for
handling both content and
delivery errors?
There are a number of iOS,
macOS, and tvOS clients
expecting resources from server.
The server should aim to deliver
the resources in time and if it
fails to do so, communicate the
right error code to AVPlayer.
This error code should clearly
convey the cause of error.
Was the request a valid request,
was it authorized, has the
server encountered an error?
Is the server incapable of
performing the request, for
example due to an unsupported
feature request?
Next, let's see the recommended
way to signal these various
errors to AVPlayer.
So, here is the list of failures
and the recommended error codes.
These are in compliance with the
standard HTTP error codes
specified in RFC7231.
Segments are protected and the
AVPlayer does not have the
required authentication, send
401.
If the client doesn't have
authorization for the content,
send 403.
For all temporary resource
unavailable cases like
[inaudible], send 404.
For permanent resource
unavailability, send 410.
For all unexpected server
conditions where no other
specific message applies, send
500.
Most of the content providers or
CDNs are cache in proxies which
are getting the content itself
from some encoder somewhere.
To notify of invalid response
from gateway, send 502.
If server is down for
maintenance or overloaded and is
unavailable for any other
reason, send 503.
For gateway timeout, send 504.
Now these error codes aren't
necessarily new they have been
around for a while.
And if we look closer at these
errors there is a class of
errors that are temporary like
resource and server temporary
unavailability.
Starting iOS 11 we now have a
way to explicitly communicate
such temporary failures to
AVPlayer by means of GAP tag.
We mark segments as GAP by the
use of EXT-X-GAP tag.
This can be applied to one or
more segments.
Put this in your playlist to
indicate GAP and enable AVPlayer
to make an informed decision.
On seeing this tag AVPlayer will
know that this is a temporary
failure and may decide to go to
a backup alternate or switch
down.
If nothing viable is available
in the utmost case AVPlayer will
play the available media until
we recover from the error
condition.
So, going back to failures and
error codes.
For which of these errors is the
GAP tag applicable, 404
temporary resource
unavailability and 503 server
unavailability always use GAP
tag.
Keep in mind, this tag is
applicable to both live and
[inaudible] playback, but the
use case is typically the live
scenario.
Next, let's move on to HLS
specific media error cases.
On live playback, the HLS pack
specifies that the playlist
needs to be updated on regular
intervals.
If the server is unable to
update the playlist in time
according to the published
target duration, we recommend to
communicate the stale playlist
to AVPlayer by sending 404.
Now returning stale playlist
itself is fine, but that leaves
the onus of identifying the
stale playlist on the AVPlayer
which it does eventually.
And on identifying that AVPlayer
will try to recover by means of
switching to other available
[inaudible] or retries.
This may be too late in some
cases leading to stalls.
Sending 404 instead will
communicate the stale playlist
to AVPlayer much more quickly.
There is another advantage here,
it would also give immediate
notification of stale playlist
to any new AVPlayer joining the
stream.
For unsupported features for
example, BYTE-RANGE not
supported, send 501.
For all authentication failures,
send 401.
Next, an example going through a
typical live playback scenario.
Let's say we have two video
variants, one of 6 megabit and
one of 2 megabit.
We also have the responding
encoder packagers one providing
6-megabit content and another
providing 2-megabit content to
our server.
And the server is distributing
this content to the HLS client
requesting it.
Let's say the [inaudible]
bandwidth of the app is good
enough to handle the 6-megabit
variant it goes ahead and
fetches the 6 megabit media
playlist.
Gets the response back and moves
on to fetch the first segment,
segment one.
Everything seems to be good
until now.
Then suddenly the 6-megabit
encoder or packager is down with
substantial downtime for
example.
The next time AVPlayer
re-fetches the playlist the
server now has a way to
communicate the failure to it,
GAP tag.
For this re-fetch request, we
recommend that server should now
send 200 okay and the subsequent
segments in the media playlist
should be marked as GAP.
AVPlayer on seeing this GAP tag
switches down to 2-megabit
variant media playlist and moves
on to fetch the next segment,
segment two, from the 2-megabit
variant.
With this we have switched down
smoothly and in time to avoid a
stall.
For backward compatibility for
any segment request marked as
GAP the server should still send
404.
Next, let's move on to failover.
What is a failover?
It is a method of protecting the
system from failure in which a
standby or backup system takes
over when the main system fails.
So, what failover can our server
have?
One viable approach is to have
redundant variants on backup
servers, have variants on
different servers with same bit
rate an include them in the
master playlist.
This will give the AVPlayer the
ability to smoothly switch over
in case of error.
Backup alternates will be tried
first before switching down.
If the server wants to
explicitly trigger a failover it
should send 404 to okay list
request.
To summarize, always notify the
HLS client of error with correct
error code.
Have backup playlists on
different servers to failover in
case of server failures, having
some redundancy is good.
Send 501 for unsupported
features.
And in the case of live, update
the playlist in time as
specified by HLS Spec. Prefer
GAP tag in case of temporary
failures.
And send 404 to indicate stale
playlist.
Next, let's move on to how to
handle AVFoundation errors.
When an error occurs, the user
viewing the actual stream wants
to know two things.
First, that the error happened
and second, what caused the
error to happen.
And not all errors can be
anticipated on the server.
The AVFoundation client or app
should be returned to respond
appropriately to various error
conditions originating from the
AVPlayer.
So, how can we identify the
error?
The error can be identified by
looking at AVPlayer.status and
AVPlayerItem.status.
This will change to
AVPlayerStatsFailed and
AVPlayerItemStatusFailed
respectively on error.
For the exact error that caused
the status to change to fail
look at AVPlayerItem.error.
This describes what caused the
item to be no longer playable.
Listen to AVPLayerItemFailedTo
PlayToEndTimeNotification to get
notified that the item did not
play to end.
The user info dictionary of this
notification contains an error
object that describes the
problem and can be retrieved by
AVPlayerItemFailedTo
PlayToEndTimeErrorKey.
Dig deeper, look at
AVPlayerItem.errorLog.
This gives the snapshot of all
the error events that happened
during the playback session.
So, what do these errors mean?
They can mean one of these four
things, network errors,
timeouts, format errors, and
live playlist update errors.
Network errors are all the 4xx
and 5xx errors that server sends
and TCP/IP, DNS errors.
After requesting a resource
there are timeouts for each
master playlist, media playlist,
medial files, and keys.
And failure to get a response
within this timeout will cause
timeout errors.
Any incorrect format of playlist
key and the session data will
result in format errors.
And in case of live, playlist
needs to be updated according to
published target duration and
the failure to do so will cause
live playlist update errors.
What are the corresponding
AVFoundationDomain error codes?
For network errors and timeouts,
it will be AVErrorContent
IsUnavailable or AVErrorNo
LongerPlayable.
AVErrorContent
IsUnavailable indicates that the
content was never playable.
This could mean authentication
failures or authorization
failures.
AVErrorNo
LongerPlayable indicates that
the content was playable, but
over the course of time one or
more errors happened resulting
in being no longer playable.
AVErrorFailed
ToParse indicates parsing
failures.
AVErrorContent
NotUpdated means the playlist
was not updated in time.
Always look at the user info of
the error to get the underlying
error.
Keep in mind, this can be nested
if more than one error caused
the item to fail.
When a new error log entry is
added to error log
AVPlayerItemNewError
LogEntryNotification is sent.
So, listen to this for immediate
notification of error.
I would like to stress on one
point here, AVPlayer will try
its best to continue playback by
means of retries and switching
to different available variants.
The AVPlayerItem.status will
change to fail only when there
is no viable variant to use to
continue playback and we have
played out whatever buffer we
have.
For all temporary errors,
AVPlayer will attempt switching
and/or retry.
If there is nothing to switch to
AVPlayer will retry for a
reasonable amount of time before
giving up.
After a given amount of time it
will attempt to switch back up
to failed variant if the network
conditions are suitable.
For permanent errors like 410 no
retries will be attempted and
AVPlayer only tries switching to
a different variant.
The permanent and temporary
error codes are in compliance
with the standard HTTP error
codes specified in RFC7231.
All session data errors are not
fatal and not ignored.
Next, let's go to a code
snippet.
To view the error, once you have
done the usual things, create
your asset, create your player
item, create a player with that
item the first thing you should
do is add observer to track the
status of the player.
Then add observer to track the
status of player item.
And here you register to listen
to AVPlayerItemFailed
ToPlayToEndTimeNotification.
Once you have that and the
status of the item changes to
failed look at
AVPlayerItem.error to print out
what the error is.
This is the place where you
should add code to display
relevant messages about the
error to the user.
On getting AVPlayerItemFailedTo
PlayToEndTimeNotification
extract the error as the value
of AVPlayerItemFailedTo
PlayToEndTimeErrorKey and again,
take appropriate action.
For instance, print the error or
display relevant error messages
to the user.
To summarize, always monitor
AVPlayer and
AVPlayerItem.status.
Listen to notifications,
AVPlayerItemFailedTo
PlayToEndTimeNotification tells
you when the item did not play
to end.
If you want to more actively
monitor the errors for example,
for the purpose of sending debug
info to server for analytics
listen to AVPlayerItemNewError
LogEntryNotification to know
when a new error log entry is
added.
In conclusion, when an error
occurs always take appropriate
action, don't ignore it.
Notify the user of the error and
always, always display
meaningful messages or pop-ups
when suitable.
For more information, go to the
WWDC site and use the session
number 514.
Thank you and have a great
conference.