WWDC2003 Session 714
Transcript
Kind: captions
Language: en
this is session number 714 large-scale
webcasting macworld keynote case study
and our goal today is to try and give
you guys an idea of what it takes to be
able to execute very large-scale
internet web casting live and video on
demand and the castle uses the macro
keynote which is arguably one of one of
us not the largest live internet
streaming events of every time we do one
as a matter fact every time we do when
it gets bigger and bigger and bigger
probably the main reason we do we do a
webcast of the event give you two
numbers when we have Stephen a hall he's
got five to six thousand people who are
sitting there getting two hours of Steve
a direct marketing message from probably
one of the best marketers around but
while there are five to six thousand
people there in the room we've got over
a hundred thousand unique viewers who
are watching via the Internet live and
then over the next seven days we have
probably more than 500,000 additional
unique viewers so it's over 600,000
people who get to see two hours of Steve
Jobs delivering a precise and a very
precise marketing message about Apple's
new products that's very powerful thing
as you can imagine so today to speak
about this we have clark smith and ryan
lynch from the apple quicktime
operations group these guys are the two
that are majorly responsible for doing
the keynote preparation and execution in
addition a little bit later on we'll
have a bill while the CTO from Akamai
Technologies here to talk about how long
might works with us before during and
then after the keynote to to pull one of
these things off so let me turn it over
to to Ryan and we can start it when one
thing if when we have questions it's
really important that you step up to the
microphone we have simultaneous
translation happening in the back of the
room and if we don't have a clear clear
version of your voice going over the
microphone they can't translate okay
thanks thanks Dennis so we're going to
talk today about the process behind the
keynote and what we do at Apple to
prepare for the the impending doom of
going live on the Internet so we're
going to start with what we do behind
the scenes with network requirements and
planning for the whole event all the way
up to the event and afterwards and three
months but at a time of before we
actually start we deal with Network and
Clark is going to talk a little bit to
that point in order to to provision a
network for the kind of load that the
keynote would require you have to plan
ahead and decide well what other events
might be happening that day for akamai
Akamai is a content distribution network
that serves many many clients Apple is
an important client obviously but if
there's a large news event or something
that takes place on the same day as our
keynote we need to make sure that
there's going to be enough room for us
to still get our data out on the web so
we try to estimate two months in advance
how much how much data we think we're
going to need and we try to negotiate
with them to see how much they can give
us so we go into traffic expectations
based on the the location at the event
is taking place if it's in San Francisco
Japan is going to be more involved
because it's still within their day
range if it's the European event Japan
is not going to be a player and probably
neither is California but if it's in in
New York there's a good chance that
California will come on later in the
stream so based on the amount of data
available to us the bandwidth available
to us and the the expectation we have
the amount of promotion we expects going
to happen for that event and so on we
start determining you want to hit that
hit it oh sorry let that bit right I hit
it you look at it we start determining
our bit rates we normally do for bit
rates and we do a 28 k for audio only
users which has to run well below 28 k
because we have to be able to build a
buffer up and then 56k users are a very
strong you know subscriber to our event
so we have to make sure that we actually
hit somewhere around 37 caves is our
target somewhere around 37 we even like
to get below that sometimes and that
gives a 56k users enough of I mean most
56k users aren't getting much over 42
anyway so we have to make sure that they
have some buffer
and then 100k users are generally from
from Europe and a non-dual ISDN and the
benefit of 100k also is it gives you a
pretty darn good stream and if we do
have to to roll down we'll talk about
how we roll down later but if we have to
roll down to 100 k still get a pretty
good pretty good stream and let's see so
we the next step is we provision the the
Akamai entry points and ports and that's
just a process of basically requesting
from the edge servers that we're going
to begin streaming to specific IP
addresses and specific ports and you're
hitting the button so cool and then we
have a back-up plan and the back-up plan
is always to have a an additional
connection that is also as close to
flawless as possible so that if somehow
something happens to that initial
connection or something happens to that
to that entry point Akamai entry point
we have an immediate solution to rolling
over to another live stream that will
look as close as possible to identical
to them to the the Akamai entry point
that we've serve as a primary so we the
first thing we do is we start to gather
our hardware and software because we're
working in a colocation facility which
is basically all rack mounted if the
Xers are actually wonderful for this
solution we have a group of five or six
x serves right next to within feet of
the Aqua my entry point we have an
mpeg-2 recorder which is a digital
district quarter it gives you a random
access playback so immediately when the
event is over and we're given approval
we begin a replay of the event which is
is pretty much undetectable from the
original and that continues to loop
until the and ever for a significant
period of time until we can get the data
posted for video on demand we use
processing amplifiers you could probably
go to the next graphic here and we'll
show we use processing amplifiers and
analog because there isn't too much
control in DV so it's best for us to
maintain analog television in the analog
realm satellite receivers
is composite ntsc so if we choose that
we want to make some some increases in
saturation or increases in brightness
and so on on the NTSC signal before it
goes into the DV converters we really
benefit from that we keep the mpeg-2
recorder on the satellite side of the
processing amplifier because we don't
want to make alterations to the NTSC
signal and record it and then run it
through the processing amplifier and
have a different result this way the
mpeg-2 looks identical to the satellite
receiver and if we made adjustments to
the processing amplifier it stretches
and makes those adjustments exactly the
way the satellite receiver was being
served then we run that output from the
processing amplifier to a bunch of
analog distribution amplifiers and what
those distribution amplifiers do is
provide a composite ntsc signal that is
absolutely identical to each one of the
two the DV converters you notice we're
using 5 D D converters certainly could
you could use a firewire network and
feed each one of those excerpts from
that firewire network but we like
complete parallel in dependency between
all those machines we want to make
absolutely certain that if one machine
goes down all of the other machines are
completely unaffected or somehow the
firewire network went down it certainly
would take everything down and Steve
would not be happy so and when Steve's
not happy alright I think you're up all
right so we're going to talk after we
get that all set up we want to test and
make sure that the hardware actually
functions because on occasion you can
have a little you know somebody can plug
the firewire plug in backwards and fry
the whole thing it's kind of fun and so
we test all those make sure we have
enough disk space available for the
actual event if you're going to record a
disc we did a two and a quarter hour
program on Monday and that took roughly
250 megabytes to record that to disk at
250 k so we did it it was pretty sizable
and if you have a full event and you
don't actually change delete anything on
your disk and run into trouble and then
we take those settings that we figured
out okay we need to target about 37 k
456 and all that and set those up and
make sure and test to make sure that
their optimal for the bit rates you want
to target and that they don't swing
too much because if you swing then you
might max out your bandwidth and
somebody on the network will get a
really crappy experience and you don't
want that to happen let me take these
settings and we copy them to every
single machine so they have an identical
set up on every box just in case we need
to pull one out or or something like
that and we like we had on the the
previous graphic there is a spare
machine sitting standing by just in case
we have to do something drastic then we
take the team so we have all everything
set up we've got our settings we're
ready to go we need to get people
together about three months ahead of
time and start talking about what's
happening so that we're ready to go for
the day we have Akamai engineering who's
provisioning the network and taking care
of everything to make sure that we're
ready and we have our 16 gigabytes or
whatever we're going to use for the
throw the event and then we have
quicktime quicktime engineering team so
sometimes we'll run into a little funny
bug or something and we wonder what's
what's going on with this we have them
kind of do some packets racing and make
sure that everything's functioning
correctly for us and then we have our
the PR team is actually really important
they're the ones who give us the final
go to say we can actually use VOD and be
ready because if we show something we
don't have legal clearance for we don't
want anybody coming back at us and
saying hey no no it wouldn't be too fun
to get into legal over that and then we
go down to the television contractors we
have them actually on site recording the
video so that we have a tape we can go
back and and do an echo after the fact
so then we have our the preparation step
we we have everything ready to go we
need to export SDP files from the
quicktime player or broadcaster sorry
and these SDP files session description
protocol basically tell the QuickTime
Player where to go to get the stream how
do i get the stream what the and
everything about it and since we're
using Akamai we have something called a
RLS Akamai's resource locators they're
essentially URLs that point to that that
content the stp files and have a little
metadata along with it so it helps
Akamai figure out how to get the stream
and how to deliver besties
and we have of course reference movies
now reference movies are kind of
interesting they allow you to have one
link and point to as many other movies
as you want so you can basically what we
do is we filter based upon bitrate so if
you claim that you have a 300k
connection and you can get that we say
good you can we deliver the 300 k to you
so that's all based upon your
configurations if your machine so we do
is we have a web page so we have a web
page that has all the links or has one
link for all one rest movie for all of
the links and that movie points to low
and behold for different bit rates and
based upon your settings we deliver that
to you so we have the ARL but says here
you go and enhance you to stream and
you've got that you notice at the bottom
that we have a spliced rest movie and
what that means is we have a graphic
that we actually paste on top of the
stream so that there's a visual
experience for the 28k people because
you got to provide something got to make
it somewhat aesthetically pleasing and
then this comes in to the fact that we
have the velvet rope system which is
first point on the Akamai network and
what that does is you know we have hard
stop at some bandwidth limit so for a
typical keynote we have maybe 16
gigabytes and that's all we have and if
we go beyond that like I said before
we're going to have some really terrible
experiences for everybody across the
network so when we enter pick you learn
even if we're ramping too quickly we
have too much bandwidth being consumed
too quickly we want to make sure that we
can cut that off when we slip down back
down to a 100k stream as the top bitrate
available or 56k depending upon how many
people we have on the network and of
course those web pages point to rest
movies and again use the point of
individual sdp files to grab the stream
because you speak into the mic please
i yes essentially that's what Akamai
with the first point system does is it
swaps out which web page is delivered to
to which region okay I was just curious
whether it's the reference movie you
swap or the web page and bullwhip oh you
change both the web page and the red
well the web page points to a different
reference movie because those first four
web pages were already made yes so you
just change the Europe or you change it
in the back end that's all we're doing
is swapping out the web page essentially
and each one has a different link on it
to a different rest movie now if you are
like Apple and you have a whole bunch of
zealots on campus and they all want to
watch this stream at the same time you
don't want to consume a ton of bandwidth
on your local network so if you have you
know a 600 k stream and you have a
hundred people wanting to watch that's
60 megabits that they're going to be
pulling down across your your pipe so
you're not going to be able to provide
anybody outside of your company anything
and that's going to be really terrible
experience so what we do at Apple is we
have an internal web page again using
the Akamai first point system we say
anybody that's coming from our network
gets routed to a specific web page for
internal traffic and that has a link for
a multicast and in short what a
multicast does is it takes one stream
and sends it out across the network so
if you have a hundred people watching it
you have 600 k of bandwidth being
consumed if you have one person watching
it you have 600 k so it's a great way to
conserve bandwidth on a local network
and now I'm going to turn it back to
Clark to talk about testing so we test
in a lot of different levels throughout
the process but there are specific
things that can go wrong if you're
creating all these little files you're
moving things around so many times
something can go wrong an example would
be annotations sometimes the annotations
change somehow and those are actually
generated in the in the ftp file and
there is so at the very beginning of the
chain if that ftp file has the wrong end
or a wrong copyright or something that
can translate throughout the entire
throughout the entire group of reference
movies that we just outlined we were
very attentive to data rates and swings
I spend hours staring at command J you
know the get info or the debt properties
of function in quicktime pro because I
can watch how full the buffer is whether
the buffer is seeing any kind of saw two
cing from some kind of network
interference I can see whether or not
there's any packet loss but I can
actually see where the packet loss is
taking place sometimes you can't tell
why there's something wrong with a
stream but you can actually visually see
why that you know why you're dropping
audio at certain times or why they're
sometimes the video will come across
absolutely perfectly but because there's
a sawtooth in the buffer it'll drop
actually just a moment of audio and
you're kind of wondering what where
that's coming about a lot of it's very
visual the importance of swing is if if
your provision for a certain number of
gigabits and you have a you know 10 or
15 or 80 thousand different viewers
watching and you have a slight swing
from a camera zooming into a Steve or a
pan on the stage we're almost every
pixel is changing it can be a very
dramatic swing in Occam eyes data rate
so the higher the data rate that you're
playing with the higher the percentage
so it's going to be obviously much more
reflected in a 300k versus of 56k but
it's extremely important that you pay
attention to where that median point is
and how far you're allowing it the
reference movies all have to point to
the right things you saw just a moment
ago in ryan diagram how each reference
movie points to a different data rate
one of the best ways to test that is
actually go into your quicktime
preferences and just keep changing your
data rates down in the the the user
defined connection speed portion of your
of your your preferences and just make
sure that the right data rate comes up
for that one so we go through this over
and over again because sometimes we make
reference movies more than once and so I
especially as you're getting closer and
you're hurrying more and more you really
have to be very attentive to what
mistakes you might make the splice
graphic is another potential problem in
that if you when you first create a
splice graphic using one of our
tool sometimes it's not layered exactly
properly you have to go into the
properties and actually set the layering
for them for that image if the image is
layered incorrectly you'll actually get
a little Q and the splice will actually
be kept behind and being that that's all
tied into all the other reference movies
it's just critical that you test it and
last of all is packet loss different
places will get different results even
though we have such a wonderful content
distribution a provider there are
certain places that will get different
results and so what we try to do is we
try to call people we try to have some
contact with people in New York and
other places and see how they're doing
all they have to do is open the
properties that I was just talking about
and tell me you know what's the
percentage of packet loss that you're
seeing are you having any kind of drop
dropping issues are you able to get the
stream whenever you want it those are
all the kinds of tests that we make
certain that we were very attentive to
the day of the event we established a
conference call with many of the members
of the team that Ryan outline but
predominantly Akamai and they'll be
standing by and paying attention to
their network before we actually push
anything live I'm also on the phone with
our with our encoding partner at the at
the satellite site and I sit there and I
watch all of his dreams and then at
about a half an hour before the event we
restart all the broad camera restart all
the broadcaster's just to make sure
there aren't any memory leaks or
anything I mean when you're dealing with
such a high-profile event even though it
probably doesn't make any difference
you'd do a lot of little things that you
just used to want to take any risks so
we restart the broadcaster's and then
shortly before the event we roll the
mpeg-2 and a backup data KMSP machine
we're glad we did that this last event
because there was a disruption in the
NTSC signal into the mpeg-2 machine and
had that disruption that disruption was
long enough to actually stop the disk
recorder and that was catastrophic
because we had no no backup to go to so
what we ended up doing is using the beta
KMSP through the exact same you know
proc amp that we were talking about
before and at the same time we recorded
to the mpeg-2 machine and no one was the
wiser it was just absolutely seamless
but had we not hadn't just yet another
layer of backup we
of would have been two in position to
make ourselves look okay yes master are
you using they are using the old
Sorenson or is it the new mpeg-4 from
apple or well the broadcaster product is
broadcaster one point 01 quicktime
brought your using a few free on our
site right but you're using your using
yours now in the end we're using swords
no absolutely yes we're using ours and
we're using mpeg-4 as you mentioned in a
écrit for audio okay see it's just a
wonderful audio codec and for streaming
mpeg-4 shift is I'm it's just a delight
to work with really and we've come from
a chain of codecs beforehand so
believing we have the scar tissue so
then the actual term pushing the event
live takes place we actually when we
start to see pans of the audience that
means that that our television content
provider from from the location is
likely to stay on programming from that
point on so when we start seeing that
with the lower third I call our web team
and I tell our web team to go ahead and
push the page live and then the
excitement really begins because there
are hundreds of people thousands of
people out there just waiting to get on
they want to be the first ones on
because they think they're going to get
the highest data rate if they're the
first ones and to a degree their correct
for most events so they jump on and you
just see the ramping and you know you we
watch we have our own you know
monitoring tool but so does Akamai back
at their NOC and you can just see the
ramping of people getting on and it's
very exciting really because you have to
know when to pull off you know what what
the highest point is and you have to
make sure that everybody is going to be
able to get on it's stressing their
network to death it's a wonderful test
but at the same time it's frightening so
there's a point when we say okay we're
going to roll off you know some region
that might be particularly stressed like
maybe Australia which seems to be first
but there are certain areas that we
sometimes have to roll off and then
we'll come back to them we'll come back
to them after things moderate a little
bit we'll come back to the 300 k so if
you ever have an experience where you
come on and you get 100k
and for some cut for some reason leave
and come back you may get a 300k if you
have the burnt band with us to provide
it to you so we watch the consumption
and then the last thing is actually
probably the most fun for me as we
report the numbers and there's usually
someone usually Dennis underneath the
stage or in the back of stage and just
before Steve goes on someone will ask
you know what are our numbers how many
people you know and last time I think if
any of you watched the Vatican was on
and that was just so trilling to have
Steve go out and say yeah and even the
Vatican's watching you know you guys are
our friends so those kinds of little
reports are just just they just add a
little something they had a little
realism to the keynote and and we all
benefit from that and so I think now
we're going to bring on the great and
good bill while of akamai no post event
I'm sorry I got a post-event sorry hold
on sorry folks and I'm sorry so then we
do the rebroadcast I spoke about that
earlier we hit that MPEG tube machine
and we start playing back as soon as
we're told that we're good to do so from
Apple PR and at the same time a tape is
being rushed to the to the encoding
partner who will capture that entire
tape and do a pre process on it and
probably within 12 hours have content
for us to start posting which I'll do
later when they start when that data
starts to arrive and we slowly start
replacing the links that were originally
being posted to on that webpage the same
reference movies so the VOD is is is
posted the pages are changed so we're no
longer doing the velvet rope and then we
the last thing is usually days later we
do an analysis in the reporting to
determine exactly how well we did and
and how many people came and and whether
people had a good experience we're very
keen and interested in that so bill now
Europe sorry about that thanks
so I want to tell you a little bit about
what happened once the bits get handed
off to us and I'm going to start by
telling you a little bit about who we
are and what we do in terms of
webcasting both large and small scale
Apple I think these guys said put on
some of the largest live events on the
Internet in terms of the amount of
traffic number of viewers and so on but
we do events from small to large we do
on demand as well as live so play a bit
about that I'll talk about the
partnership between Apple and Akamai in
terms of webcasting and other things and
then a little bit more detail on on the
keynote itself so today what is Akamai
what do we let let you do what do we do
for our customers fundamentally what we
do is allow our customers to extend
their ebusiness infrastructure web and
streaming out to the edge of the
Internet close to the users this gives
better performance better reliability
better scalability in many cases better
security and to gain greater control
over that infrastructure and over the
delivery of the applications the content
and so on across the Internet today if
you're if you're delivering content over
the web or via streaming you can control
what you do in your data center you can
control your first mile but at that
point your control ends and from there
to the end user there's a collection of
networks that are going to take your
bits and you can sport them you have no
control over that and if someone out
there screws up if you you net backbone
goes down if the slammer worm hits and
there's chaos across the internet
there's nothing you can do about them
okay we give a lot of control all the
way out to the edge very very close to
the end user we are the leading delivery
service for streaming web content and
web applications and the real value is
improving in the end the user experience
of the people who are watching the
streams or accessing your application to
your content over the web and often at
lower cost to the provider we've moved
from what we did in the early days which
you could think of as simply static
content delivery we've moved to really
now doing distributed computing
so when I talk about applications I'm
really talking about the ability for one
of our customers people we used to call
content providers but it's not just
content people are doing business on the
net and they're reaching out to their
users with an application we get a
configurator for a car or an e-commerce
site or any other kind of interactive
application on a site we're providing
the ability for pieces of those
applications to run on the edge close to
the end user which allows you to give
sub second response time to users around
the world wherever they are regardless
of what's going on in the internet we
have almost a thousand recurring
customers today we've got about 145
million in annual revenue we've survived
the the.com boom and bust and spent the
last two years building a very solid
customer base in terms of large
enterprise customers as well as smaller
companies and I think we're well poised
for growth over the next few years we
also have a lot of intellectual
properties surrounding the way we
deliver content and applications over
the net ok so what a customers get well
in contrast to doing things centrally or
from a small number of locations what we
allow our customers to do is to
guarantee that they're always on that
when their users want to access their
webcasts their video on demand their
website that they're going to get one
hundred percent availability that people
will be able to get to it we've got a
presence across the globe we have over
15,000 servers in over 1,100 different
networks so we don't run a network
ourselves we don't lay fibre we don't
buy 5 or rent capacity on on pipes
rather we put machines in different
networks and then rely on those networks
to provide the actual transmission
capacity we're in 68 countries and our
network folks like to say that we're on
six continents we're working on number
seven but so far the demand seems small
you know sufficiently small that were
just we can't justify it all so i'm not
sure our servers are rated to work it
you know minus 30 Celsius
we allow our customers to scale whatever
it is they're doing to enormous loads
Ryan and cart talked about the fact that
we've got to worry about for an event
like the key to the Steve Jobs keynotes
at macworld how much capacity is there
how much bandwidth is available how much
are they going to need the reason for
that is that to one of the that one of
or if not the largest event that happens
on the internet today so we on a regular
basis serve 25 to 30 gigabits of traffic
every day on the website our peak we've
served 23 billion hits in one day that
was in the last couple months we
typically served between 15 and 20
billion hits okay reminds me the Old
MacDonald signs about how many billions
they had certain windows this is
everyday so most people if they're going
to do an event and they want to use us
we don't even really need to know that
an event is happening because most
people aren't Steve Jobs and so when
they do a webcast maybe they serve a few
hundred mega bits or maybe it's a
gigabit but when Steve's going to do 16
gigabytes or in fact if they just
allowed everyone who wanted to to get
the 300 k stream he might do 30 or 50
gigabytes well we don't have infinite
capacity but we have enormous capacity
so with somebody like Steve we've got to
worry about what are the limits and make
sure we don't exceed them most people
they do things and they don't even
inform us and we handle it fine the
other major thing that we give to our
customers is a great deal of information
about what is happening both on the
Internet in general and to their content
their streams their web applications
across the internet so information for
example about the number of streams that
are being delivered the number of
streams of different bit rates the total
amount of bandwidth furthermore broken
down by geographical area so you can see
how many people are watching in
Australia how much bandwidth are we
pushing in Australia and so on
just to say a little bit more about
about our platform as I said we've got
over 15,000 servers we're in over 1,100
different networks and this is really a
range of networks across the board from
hosting and access providers to you know
companies that provide data center space
Colo space and connectivity to to
companies that are providing sites and
streams and so on access providers that
are providing dialogue for broadband or
other access to end users as well as
tier 1 backbones and of course more and
more broadband for access of one form or
another so we are in the same network as
have machines in the same network as on
the order of seventy eighty percent of
the end users on the internet which
means that if a user wants to get a
stream there's one of our machines very
close by that can serve that strain to
that user okay same thing for web
content or for web applications and
that's been one of the basic premises of
our company from the beginning is that
it's vital to be near the end user in
terms of performance and reliability and
in terms of the scalability of the
system that is as there are more users
as they're more eyeballs will have more
machines near those users and the system
as a whole will scale with the user base
okay so when they talk a little bit
about how live streaming works the stuff
on the left Ryan and Clark talked about
the the actual video signal needs to be
captured and it needs to be sent to an
encoder which is going to produce then a
digital stream in a certain bit rate
that stream is then sent so first you
get it from the camera through satellite
and other other mechanisms through the
encoder from there it goes to what we
call an entry point now we've got a
stream of packets entering our network
the akamai cloud represented here with
the four circles the entry points
themselves actually our fault tolerance
with our mechanisms as they discuss to
allow that path to failover to a
different one should that entry point
for example go down or in fact it might
still be up but the past
it and the encoder might be congested
might have been fine when you started
but at a certain point the quality they
are degrade so you want to make sure
you're talking to an entry point where
you can send packets with with very
minimal loss from there we send that
stream to a set of what we call
reflectors we have many of these
scattered around the internet typically
in major backbones with with very good
connectivity so the stream is basically
being replicated this is all for now I'm
talking for live streaming okay so this
is for a keynote you've got one one
packet stream going from the encounter
to the encoder to the entry point and
then from there that same stream is
being replicated essentially a separate
unicast to each of those reflectors we
can't use network level multicast
because these aren't on the same network
and you can't do multicast really across
the Internet in that way maybe this
isn't an overlay multicast if you will
the idea with the reflectors is then
from there if the user wants a stream
he'll contact an edge server and let's
say in this case it's the middle one
he'll say I want to get the steve jobs
keynote and the ARL that was mentioned
tells us when when the user hands as
that tells us how to actually get that
what the appropriate port is and so on
to make sure we get the right stream
from the right entry point they'll say I
want the steve jobs keynote that edge
server then subscribe to that stream to
from one or more reflectors and packets
will start to flow now you might have a
situation where a packet start to flow
but then some of them get lost because
the internet while it's amazingly
reliable for such a large and
essentially decentralized system in
terms of how its managed packet loss
happens all the time congestion happens
it appears and then disappears how long
it lasts depends on the length of the
flows that go through a congested link
so in this case we're showing those
packets all grayed out because in fact
those four packets got lost the edge
server if there's congestion when it's
pulling from a single reflektor will
then start to pull for more and it will
pull from enough to guarantee that it
gets
plete copy of the stream so in this case
it was seen congestion on the the
initial stream that it got from one
reflector it starts its subscribed to
the same stream from another reflector
it manages to get some of the packets
but still not all so it will subscribe
to yet another so we will pull multiple
copies to an edge server as needed to
guarantee that that edge server gets a
complete copy of the stream in the early
days of the system we just sent multiple
copies to every edge server doing sort
of blind forward error correction but as
you can imagine this is expensive and
we've built a system now that is much
more adaptive and responds to the
conditions of the network between the
reflectors and the edge servers to do
that adaptively and only call as many
copies as they're needed the other thing
I should say about about this process is
that one of the key things that's not
really shown here is so and then the end
user gets the stream from that edge
server and gets a very high quality
stream one of the key things that isn't
shown here is the mapping process that
decides which edge server and end user
will talk to you to pull the string and
that is one of the key pieces of
technology that our system is built on
both on the website and the streaming
side to monitor the entire internet on a
real-time basis and then make mapping
decisions every 10 seconds that
determine among other things for a given
end user when he wants some piece of
content be it a stream or web content
which edge server he should talk to ok
and we choose an edge server that is
lightly loaded that's likely to have the
content that's up that's always a good
thing and where the path between the end
user and that server is uncongested is
it can can deliver high quality if the
goal is to deliver that content whatever
sort it is to that user quickly and
reliably a couple other things I want to
mention here about what we do to ensure
quality and and good performance and so
on
we do something that we call pre
bursting so you could imagine that when
a user connects to an edge server and
says I want the steve jobs keynote well
a subscription goes from there to one of
the reflector nodes or perhaps more than
one and if those reflector nodes are not
currently getting the stream from the
entry point they'll subscribe from there
and then the stream will start to flow
but in normal a normal situation you
might simply start to send the stream at
the speed at the data rate of that
stream okay so if it's 300k you start
sending packets taste at approximately
300 K or what if the actual bitrate of
the stream is which means there's going
to be significant latency until the user
has built up a buffer in the player
before it actually starts playing what
we do across the network which fits well
with the instant on feature of the
current quit quicktime system is we do
pre bursting from the entry point
through the set reflectors all the way
to the edge machine so when an edge
machine or a set reflector subscribes to
a stream we will send the data at eight
times the actual bitrate to build up a
buffer very quickly close to the user
and then the player itself does that
from the the edge server to what will
pull as fast as it can to fill up the
buffer initially and then from there
we'll pull at more or less than normal
bitrate the other thing that we do to
try to ensure high quality is you might
think that the best thing to do in terms
of mapping a user is simply to map a
user to the nearest edge server that
that has good connectivity between it
and the user in fact because of the
characteristics of the stream servers
that run on the edge servers and and the
dynamics of building the system like
this and running many different streams
many different kinds of content over it
you can get much better quality by being
much more judicious about what streams
you serve from where so we use something
we call block maps which rather than
essentially spreading the
the load first for a given stream in
some sense over the whole network we
will restrict the regions that it is
mapped to and when I say region I mean
data center we will restrict it somewhat
so that for example we would rather
serve the same stream out of a small
number of servers and some other stream
out of other servers then just mix them
all over the place okay we can get
better quality that way we then monitor
the whole system on a regular basis and
continuously certainly for an event like
this but in general we monitor the whole
system to watch a number of different
metrics on the performance and quality
of the stream for example what's the
actual bit rate in terms of packets that
are delivered on time because you can
deliver a packet but if it's too late
and the player throws it away because
it's it's arrived too late then it's not
useful the number of packets the
percentage of packets that are delivered
on time to the player in the end that's
an important metric how much thinning
takes place between the server and the
end user is another important metric
because the user might have a 300k
stream but there might be enough
congestion on that path that it starts
thinning and actually delivering a much
much lower bit rate stream how long does
it take to connect and get started is
another metric you want people when they
connect to not sit there for 20 or 30
seconds looking at you know something
and waiting for the stream to start you
want it to start in a few seconds three
or four or five seconds at most it
possible and then how much re buffering
and and and other things like that take
place during the plane of the stream and
all of the different technologies that
we put in place in the network are
derived from in really in part measuring
all of those metrics and under trying to
understand when there are issues when
there are problems what's causing those
and then developing mechanisms like pre
bursting like block maps that will allow
us to get much much better quality and
continue to deliver a high
the experience to the end user okay so
let me talk about Apple and Akamai Apple
has been a one of our major customers
since the early days of the company with
the company actually started in 1998 our
first paying customer was early 1999 and
Apple has been a major customer in fact
it was an investor in the early days as
well Akamai is Apple's platform for the
QuickTime streaming network we've done
since 1999 over a thousand live and
on-demand events and we've done nine
live steve jobs keynote addresses and
those have been every time the biggest
event to date on the internet steve is
the rock star of the internet I think
there's there's not much question about
that we you know there are movie
trailers Lord of the Rings and many
others Tomb Raider and lots of others
and you can go look and see what if you
haven't see what's on the quicktime
streaming network and that those streams
are coming over us we also do a number
of other things for Apple we are
delivering the the itunes music store
both the actual music downloads so when
you download a song that's coming from
our network but also a lot of the data
and control information that is used to
make decisions about playlists and and
other things are coming through our
through our network movie trailer
download software updates also come
through us we're providing web analytics
services so reporting on what's
happening on the site and what's
happening in different parts of the
world to allow the marketing folks and
other people to make decisions about
about what to do and then geolocation
services so for example for itunes there
are I think contractual obligations in
terms of where in the world you're
allowed to but you have to be if you
want to actually download those songs
and we help ensure that those
contractual obligations are met so let
me talk a little bit about
the keynote itself what happens both
before the event and day of and I'll say
a little bit about afterwards as well we
ourselves for an event of this scale
this is not just a normal event as I
said most people most of our customers
don't do events on anything approaching
this scale and they run events all the
time and don't even let us know in
advance that it's happening large events
a gigabit several gigabits or in the
case of Steve's you know 16 gigabytes we
need to know about because we don't have
infinite capacity so well in advance of
the event we internally develop a
project plan who are all the people
going that are going to be involved from
engineering to various support groups to
the network groups to make sure that we
have adequate capacity in all the
different regions of the world that it's
needed do capacity planning understand
what capacity we have on the network for
serving quick time and where is it and
furthermore what else is going to be
happening on the network some of that is
hard to predict two months in advance
you can't always say when we might be at
war or when some major event might
happen so we obviously have a certain
amount of headroom in terms of capacity
in our network and are prepared to deal
with with fairly significant bursts but
16 gigabytes receive and another who
knows what for for a war and so on can
certainly add some stress and so we want
to do as much planning for that as we
can we then talked about and talked with
the Apple folks about what is needed in
terms of velvet rope and the idea of a
velvet rope is basically to limit the
total amount of of and whitsett is use
think of as a velvet rope around some
fancy event and you know you got to be
inside the rope to you know take part
now there are many ways to think about
velvet rope you could imagine simply
saying I'll let people come in grab
whatever bit rate stream they can get
and then when i hit my women i turn them
off or you could say well I'll just you
know I won't provide a very high bitrate
stream
because that way I can let as many
people in as possible but if you don't
know how many people were going to come
and you provision for a certain amount
of capacity what you'd like to do is let
as many people in as you can this is
certainly what Apple wants to do but as
many people win as you can and give as
many of them as possible the highest
bitrate they can get so the idea is that
we start out with all of the bit rates
being available to everyone and we watch
the ramp and then there's a decision
point when we hit certain thresholds to
decide to clamp down and you know not
provide access to the higher bit rate
depending on how much of the the
available capacity is left and that
decision the interesting thing is that
that decision is made not just globally
you don't want to say oh you know we've
got 16 gig globally and G we're using 12
gigs so nobody should have access to the
300 k stream but rather it's done on a
regional basis and the reason for that
is that we want to serve people to give
them good quality we want to serve them
from reasonably close by so if we just
made a decision globally then it might
be that in fact the servers and the the
network links in Australia from our
servers are maxed out that we have
capacity in the u.s. so we could
conceivably serve more people in
Australia with a high bandwidth stream
or any stream from the US because there
is capacity but they're likely to get
pretty crappy service so we want to be
able to serve them from not too far away
and to do that we need to make a
decision when someone asks for the web
page to say I need to get the the link
for the the rest movie we want to decide
what link to show them well that what we
show them will depend on where they are
if they're in a region that has has
gotten close enough to the limit for
that region say Australia or say North
America or Europe then we will show them
the link that the page and the link that
only allows lower bit rates but other
parts of the world might still have
access to all the bit rates and that's
done with our first point system which
was originally designed to do
essentially global load balancing for
mirrored sites but provides a number of
capabilities including the ability to
look at the IP address and make
decisions based on where that addresses
and make different decisions for
different regions so that's how first
point fits in and it does the the global
load balancing in general for managing
how the load is distributed across our
servers and for ensuring that for a
given region of the world we only
provide access at a given point in time
to the bit rates that that makes sense
given how much bandwidth we're currently
pushing there so we need to determine
the need for velvet rope and what the
thresholds are going to be we need to
provision first point to do the load
balancing and the load management and
then for an event like this that is so
high-profile and so so critical and
where failure I mean as I said it's
Steve's unhappy you can imagine what
happens you don't want to fail okay it's
just not acceptable you don't want to
fail even for you know 30 seconds having
things drop out for 30 seconds would be
a disaster okay and not to say that we
want to fail for other customers
obviously we don't but as I said for an
event like this Apple cares enough about
what happens with the event the profile
sufficiently high that they are willing
to invest in a level of testing and a
level of attention that's paid to it to
just make sure that any contingency that
comes up will be covered so there's a
lot of testing that goes on before the
event end-to-end to make sure we can
capture a signal send it to the entry
point failover the entry points send it
through the network to the edge servers
and then get it with high quality to
users around the world the day of the
event we provide first of all automated
network management our entire network
those 15,000 servers the entry points
the reflectors the edge servers that
actually run the quicktime server
is remarkably self managing our knock on
a normal basis has on the order of four
people sitting in it watching the whole
network and that's because we have a
very extensive automated system for
monitoring lots of different aspects of
what's happening on every machine and
what's happening on the network pairs
between machines and between them and
end users and we have automated failover
at a number of different levels of the
system so that when a machine fails or a
region fails a data center goes offline
or connectivity is disrupted there's
automatic failover and remapping so that
very few users will see any impact at
all from from that kind of event but
there are times when something goes
wrong that needs attention and so we do
have people who are actually actively
watching so the RR knock monitors on a
daily basis constantly the whole network
and for an event like this of this
magnitude in this importance then they
are also specifically watching what's
going on with this event and in addition
we set up a situation room for an event
like this where the team that has been
assembled to put together the event and
run it is in that room watching the
network watching what's going on and
making sure that if there are any
anomalies that crop up that they get
fixed very very quickly often before any
end-user note notices an impact so what
do you get from all this well in January
of this year you know there's a lot of
expertise that we have for delivering
these kinds of events we provided one
hundred percent availability we provided
over 12 gigabits per second peak
delivery to almost 80,000 concurrent
users and I think the total number of
users during the event was on the order
of 100,000 and at the same time
maintaining a high level service high
quality to all our other customers as
well and then in addition we provide
real-time and after the facts historical
reporting on the
what's going on with the abandoned on
the traffic that's being served