WWDC2003 Session 714

Transcript

Kind: captions Language: en this is session number 714 large-scale webcasting macworld keynote case study and our goal today is to try and give you guys an idea of what it takes to be able to execute very large-scale internet web casting live and video on demand and the castle uses the macro keynote which is arguably one of one of us not the largest live internet streaming events of every time we do one as a matter fact every time we do when it gets bigger and bigger and bigger probably the main reason we do we do a webcast of the event give you two numbers when we have Stephen a hall he's got five to six thousand people who are sitting there getting two hours of Steve a direct marketing message from probably one of the best marketers around but while there are five to six thousand people there in the room we've got over a hundred thousand unique viewers who are watching via the Internet live and then over the next seven days we have probably more than 500,000 additional unique viewers so it's over 600,000 people who get to see two hours of Steve Jobs delivering a precise and a very precise marketing message about Apple's new products that's very powerful thing as you can imagine so today to speak about this we have clark smith and ryan lynch from the apple quicktime operations group these guys are the two that are majorly responsible for doing the keynote preparation and execution in addition a little bit later on we'll have a bill while the CTO from Akamai Technologies here to talk about how long might works with us before during and then after the keynote to to pull one of these things off so let me turn it over to to Ryan and we can start it when one thing if when we have questions it's really important that you step up to the microphone we have simultaneous translation happening in the back of the room and if we don't have a clear clear version of your voice going over the microphone they can't translate okay thanks thanks Dennis so we're going to talk today about the process behind the keynote and what we do at Apple to prepare for the the impending doom of going live on the Internet so we're going to start with what we do behind the scenes with network requirements and planning for the whole event all the way up to the event and afterwards and three months but at a time of before we actually start we deal with Network and Clark is going to talk a little bit to that point in order to to provision a network for the kind of load that the keynote would require you have to plan ahead and decide well what other events might be happening that day for akamai Akamai is a content distribution network that serves many many clients Apple is an important client obviously but if there's a large news event or something that takes place on the same day as our keynote we need to make sure that there's going to be enough room for us to still get our data out on the web so we try to estimate two months in advance how much how much data we think we're going to need and we try to negotiate with them to see how much they can give us so we go into traffic expectations based on the the location at the event is taking place if it's in San Francisco Japan is going to be more involved because it's still within their day range if it's the European event Japan is not going to be a player and probably neither is California but if it's in in New York there's a good chance that California will come on later in the stream so based on the amount of data available to us the bandwidth available to us and the the expectation we have the amount of promotion we expects going to happen for that event and so on we start determining you want to hit that hit it oh sorry let that bit right I hit it you look at it we start determining our bit rates we normally do for bit rates and we do a 28 k for audio only users which has to run well below 28 k because we have to be able to build a buffer up and then 56k users are a very strong you know subscriber to our event so we have to make sure that we actually hit somewhere around 37 caves is our target somewhere around 37 we even like to get below that sometimes and that gives a 56k users enough of I mean most 56k users aren't getting much over 42 anyway so we have to make sure that they have some buffer and then 100k users are generally from from Europe and a non-dual ISDN and the benefit of 100k also is it gives you a pretty darn good stream and if we do have to to roll down we'll talk about how we roll down later but if we have to roll down to 100 k still get a pretty good pretty good stream and let's see so we the next step is we provision the the Akamai entry points and ports and that's just a process of basically requesting from the edge servers that we're going to begin streaming to specific IP addresses and specific ports and you're hitting the button so cool and then we have a back-up plan and the back-up plan is always to have a an additional connection that is also as close to flawless as possible so that if somehow something happens to that initial connection or something happens to that to that entry point Akamai entry point we have an immediate solution to rolling over to another live stream that will look as close as possible to identical to them to the the Akamai entry point that we've serve as a primary so we the first thing we do is we start to gather our hardware and software because we're working in a colocation facility which is basically all rack mounted if the Xers are actually wonderful for this solution we have a group of five or six x serves right next to within feet of the Aqua my entry point we have an mpeg-2 recorder which is a digital district quarter it gives you a random access playback so immediately when the event is over and we're given approval we begin a replay of the event which is is pretty much undetectable from the original and that continues to loop until the and ever for a significant period of time until we can get the data posted for video on demand we use processing amplifiers you could probably go to the next graphic here and we'll show we use processing amplifiers and analog because there isn't too much control in DV so it's best for us to maintain analog television in the analog realm satellite receivers is composite ntsc so if we choose that we want to make some some increases in saturation or increases in brightness and so on on the NTSC signal before it goes into the DV converters we really benefit from that we keep the mpeg-2 recorder on the satellite side of the processing amplifier because we don't want to make alterations to the NTSC signal and record it and then run it through the processing amplifier and have a different result this way the mpeg-2 looks identical to the satellite receiver and if we made adjustments to the processing amplifier it stretches and makes those adjustments exactly the way the satellite receiver was being served then we run that output from the processing amplifier to a bunch of analog distribution amplifiers and what those distribution amplifiers do is provide a composite ntsc signal that is absolutely identical to each one of the two the DV converters you notice we're using 5 D D converters certainly could you could use a firewire network and feed each one of those excerpts from that firewire network but we like complete parallel in dependency between all those machines we want to make absolutely certain that if one machine goes down all of the other machines are completely unaffected or somehow the firewire network went down it certainly would take everything down and Steve would not be happy so and when Steve's not happy alright I think you're up all right so we're going to talk after we get that all set up we want to test and make sure that the hardware actually functions because on occasion you can have a little you know somebody can plug the firewire plug in backwards and fry the whole thing it's kind of fun and so we test all those make sure we have enough disk space available for the actual event if you're going to record a disc we did a two and a quarter hour program on Monday and that took roughly 250 megabytes to record that to disk at 250 k so we did it it was pretty sizable and if you have a full event and you don't actually change delete anything on your disk and run into trouble and then we take those settings that we figured out okay we need to target about 37 k 456 and all that and set those up and make sure and test to make sure that their optimal for the bit rates you want to target and that they don't swing too much because if you swing then you might max out your bandwidth and somebody on the network will get a really crappy experience and you don't want that to happen let me take these settings and we copy them to every single machine so they have an identical set up on every box just in case we need to pull one out or or something like that and we like we had on the the previous graphic there is a spare machine sitting standing by just in case we have to do something drastic then we take the team so we have all everything set up we've got our settings we're ready to go we need to get people together about three months ahead of time and start talking about what's happening so that we're ready to go for the day we have Akamai engineering who's provisioning the network and taking care of everything to make sure that we're ready and we have our 16 gigabytes or whatever we're going to use for the throw the event and then we have quicktime quicktime engineering team so sometimes we'll run into a little funny bug or something and we wonder what's what's going on with this we have them kind of do some packets racing and make sure that everything's functioning correctly for us and then we have our the PR team is actually really important they're the ones who give us the final go to say we can actually use VOD and be ready because if we show something we don't have legal clearance for we don't want anybody coming back at us and saying hey no no it wouldn't be too fun to get into legal over that and then we go down to the television contractors we have them actually on site recording the video so that we have a tape we can go back and and do an echo after the fact so then we have our the preparation step we we have everything ready to go we need to export SDP files from the quicktime player or broadcaster sorry and these SDP files session description protocol basically tell the QuickTime Player where to go to get the stream how do i get the stream what the and everything about it and since we're using Akamai we have something called a RLS Akamai's resource locators they're essentially URLs that point to that that content the stp files and have a little metadata along with it so it helps Akamai figure out how to get the stream and how to deliver besties and we have of course reference movies now reference movies are kind of interesting they allow you to have one link and point to as many other movies as you want so you can basically what we do is we filter based upon bitrate so if you claim that you have a 300k connection and you can get that we say good you can we deliver the 300 k to you so that's all based upon your configurations if your machine so we do is we have a web page so we have a web page that has all the links or has one link for all one rest movie for all of the links and that movie points to low and behold for different bit rates and based upon your settings we deliver that to you so we have the ARL but says here you go and enhance you to stream and you've got that you notice at the bottom that we have a spliced rest movie and what that means is we have a graphic that we actually paste on top of the stream so that there's a visual experience for the 28k people because you got to provide something got to make it somewhat aesthetically pleasing and then this comes in to the fact that we have the velvet rope system which is first point on the Akamai network and what that does is you know we have hard stop at some bandwidth limit so for a typical keynote we have maybe 16 gigabytes and that's all we have and if we go beyond that like I said before we're going to have some really terrible experiences for everybody across the network so when we enter pick you learn even if we're ramping too quickly we have too much bandwidth being consumed too quickly we want to make sure that we can cut that off when we slip down back down to a 100k stream as the top bitrate available or 56k depending upon how many people we have on the network and of course those web pages point to rest movies and again use the point of individual sdp files to grab the stream because you speak into the mic please i yes essentially that's what Akamai with the first point system does is it swaps out which web page is delivered to to which region okay I was just curious whether it's the reference movie you swap or the web page and bullwhip oh you change both the web page and the red well the web page points to a different reference movie because those first four web pages were already made yes so you just change the Europe or you change it in the back end that's all we're doing is swapping out the web page essentially and each one has a different link on it to a different rest movie now if you are like Apple and you have a whole bunch of zealots on campus and they all want to watch this stream at the same time you don't want to consume a ton of bandwidth on your local network so if you have you know a 600 k stream and you have a hundred people wanting to watch that's 60 megabits that they're going to be pulling down across your your pipe so you're not going to be able to provide anybody outside of your company anything and that's going to be really terrible experience so what we do at Apple is we have an internal web page again using the Akamai first point system we say anybody that's coming from our network gets routed to a specific web page for internal traffic and that has a link for a multicast and in short what a multicast does is it takes one stream and sends it out across the network so if you have a hundred people watching it you have 600 k of bandwidth being consumed if you have one person watching it you have 600 k so it's a great way to conserve bandwidth on a local network and now I'm going to turn it back to Clark to talk about testing so we test in a lot of different levels throughout the process but there are specific things that can go wrong if you're creating all these little files you're moving things around so many times something can go wrong an example would be annotations sometimes the annotations change somehow and those are actually generated in the in the ftp file and there is so at the very beginning of the chain if that ftp file has the wrong end or a wrong copyright or something that can translate throughout the entire throughout the entire group of reference movies that we just outlined we were very attentive to data rates and swings I spend hours staring at command J you know the get info or the debt properties of function in quicktime pro because I can watch how full the buffer is whether the buffer is seeing any kind of saw two cing from some kind of network interference I can see whether or not there's any packet loss but I can actually see where the packet loss is taking place sometimes you can't tell why there's something wrong with a stream but you can actually visually see why that you know why you're dropping audio at certain times or why they're sometimes the video will come across absolutely perfectly but because there's a sawtooth in the buffer it'll drop actually just a moment of audio and you're kind of wondering what where that's coming about a lot of it's very visual the importance of swing is if if your provision for a certain number of gigabits and you have a you know 10 or 15 or 80 thousand different viewers watching and you have a slight swing from a camera zooming into a Steve or a pan on the stage we're almost every pixel is changing it can be a very dramatic swing in Occam eyes data rate so the higher the data rate that you're playing with the higher the percentage so it's going to be obviously much more reflected in a 300k versus of 56k but it's extremely important that you pay attention to where that median point is and how far you're allowing it the reference movies all have to point to the right things you saw just a moment ago in ryan diagram how each reference movie points to a different data rate one of the best ways to test that is actually go into your quicktime preferences and just keep changing your data rates down in the the the user defined connection speed portion of your of your your preferences and just make sure that the right data rate comes up for that one so we go through this over and over again because sometimes we make reference movies more than once and so I especially as you're getting closer and you're hurrying more and more you really have to be very attentive to what mistakes you might make the splice graphic is another potential problem in that if you when you first create a splice graphic using one of our tool sometimes it's not layered exactly properly you have to go into the properties and actually set the layering for them for that image if the image is layered incorrectly you'll actually get a little Q and the splice will actually be kept behind and being that that's all tied into all the other reference movies it's just critical that you test it and last of all is packet loss different places will get different results even though we have such a wonderful content distribution a provider there are certain places that will get different results and so what we try to do is we try to call people we try to have some contact with people in New York and other places and see how they're doing all they have to do is open the properties that I was just talking about and tell me you know what's the percentage of packet loss that you're seeing are you having any kind of drop dropping issues are you able to get the stream whenever you want it those are all the kinds of tests that we make certain that we were very attentive to the day of the event we established a conference call with many of the members of the team that Ryan outline but predominantly Akamai and they'll be standing by and paying attention to their network before we actually push anything live I'm also on the phone with our with our encoding partner at the at the satellite site and I sit there and I watch all of his dreams and then at about a half an hour before the event we restart all the broad camera restart all the broadcaster's just to make sure there aren't any memory leaks or anything I mean when you're dealing with such a high-profile event even though it probably doesn't make any difference you'd do a lot of little things that you just used to want to take any risks so we restart the broadcaster's and then shortly before the event we roll the mpeg-2 and a backup data KMSP machine we're glad we did that this last event because there was a disruption in the NTSC signal into the mpeg-2 machine and had that disruption that disruption was long enough to actually stop the disk recorder and that was catastrophic because we had no no backup to go to so what we ended up doing is using the beta KMSP through the exact same you know proc amp that we were talking about before and at the same time we recorded to the mpeg-2 machine and no one was the wiser it was just absolutely seamless but had we not hadn't just yet another layer of backup we of would have been two in position to make ourselves look okay yes master are you using they are using the old Sorenson or is it the new mpeg-4 from apple or well the broadcaster product is broadcaster one point 01 quicktime brought your using a few free on our site right but you're using your using yours now in the end we're using swords no absolutely yes we're using ours and we're using mpeg-4 as you mentioned in a écrit for audio okay see it's just a wonderful audio codec and for streaming mpeg-4 shift is I'm it's just a delight to work with really and we've come from a chain of codecs beforehand so believing we have the scar tissue so then the actual term pushing the event live takes place we actually when we start to see pans of the audience that means that that our television content provider from from the location is likely to stay on programming from that point on so when we start seeing that with the lower third I call our web team and I tell our web team to go ahead and push the page live and then the excitement really begins because there are hundreds of people thousands of people out there just waiting to get on they want to be the first ones on because they think they're going to get the highest data rate if they're the first ones and to a degree their correct for most events so they jump on and you just see the ramping and you know you we watch we have our own you know monitoring tool but so does Akamai back at their NOC and you can just see the ramping of people getting on and it's very exciting really because you have to know when to pull off you know what what the highest point is and you have to make sure that everybody is going to be able to get on it's stressing their network to death it's a wonderful test but at the same time it's frightening so there's a point when we say okay we're going to roll off you know some region that might be particularly stressed like maybe Australia which seems to be first but there are certain areas that we sometimes have to roll off and then we'll come back to them we'll come back to them after things moderate a little bit we'll come back to the 300 k so if you ever have an experience where you come on and you get 100k and for some cut for some reason leave and come back you may get a 300k if you have the burnt band with us to provide it to you so we watch the consumption and then the last thing is actually probably the most fun for me as we report the numbers and there's usually someone usually Dennis underneath the stage or in the back of stage and just before Steve goes on someone will ask you know what are our numbers how many people you know and last time I think if any of you watched the Vatican was on and that was just so trilling to have Steve go out and say yeah and even the Vatican's watching you know you guys are our friends so those kinds of little reports are just just they just add a little something they had a little realism to the keynote and and we all benefit from that and so I think now we're going to bring on the great and good bill while of akamai no post event I'm sorry I got a post-event sorry hold on sorry folks and I'm sorry so then we do the rebroadcast I spoke about that earlier we hit that MPEG tube machine and we start playing back as soon as we're told that we're good to do so from Apple PR and at the same time a tape is being rushed to the to the encoding partner who will capture that entire tape and do a pre process on it and probably within 12 hours have content for us to start posting which I'll do later when they start when that data starts to arrive and we slowly start replacing the links that were originally being posted to on that webpage the same reference movies so the VOD is is is posted the pages are changed so we're no longer doing the velvet rope and then we the last thing is usually days later we do an analysis in the reporting to determine exactly how well we did and and how many people came and and whether people had a good experience we're very keen and interested in that so bill now Europe sorry about that thanks so I want to tell you a little bit about what happened once the bits get handed off to us and I'm going to start by telling you a little bit about who we are and what we do in terms of webcasting both large and small scale Apple I think these guys said put on some of the largest live events on the Internet in terms of the amount of traffic number of viewers and so on but we do events from small to large we do on demand as well as live so play a bit about that I'll talk about the partnership between Apple and Akamai in terms of webcasting and other things and then a little bit more detail on on the keynote itself so today what is Akamai what do we let let you do what do we do for our customers fundamentally what we do is allow our customers to extend their ebusiness infrastructure web and streaming out to the edge of the Internet close to the users this gives better performance better reliability better scalability in many cases better security and to gain greater control over that infrastructure and over the delivery of the applications the content and so on across the Internet today if you're if you're delivering content over the web or via streaming you can control what you do in your data center you can control your first mile but at that point your control ends and from there to the end user there's a collection of networks that are going to take your bits and you can sport them you have no control over that and if someone out there screws up if you you net backbone goes down if the slammer worm hits and there's chaos across the internet there's nothing you can do about them okay we give a lot of control all the way out to the edge very very close to the end user we are the leading delivery service for streaming web content and web applications and the real value is improving in the end the user experience of the people who are watching the streams or accessing your application to your content over the web and often at lower cost to the provider we've moved from what we did in the early days which you could think of as simply static content delivery we've moved to really now doing distributed computing so when I talk about applications I'm really talking about the ability for one of our customers people we used to call content providers but it's not just content people are doing business on the net and they're reaching out to their users with an application we get a configurator for a car or an e-commerce site or any other kind of interactive application on a site we're providing the ability for pieces of those applications to run on the edge close to the end user which allows you to give sub second response time to users around the world wherever they are regardless of what's going on in the internet we have almost a thousand recurring customers today we've got about 145 million in annual revenue we've survived the the.com boom and bust and spent the last two years building a very solid customer base in terms of large enterprise customers as well as smaller companies and I think we're well poised for growth over the next few years we also have a lot of intellectual properties surrounding the way we deliver content and applications over the net ok so what a customers get well in contrast to doing things centrally or from a small number of locations what we allow our customers to do is to guarantee that they're always on that when their users want to access their webcasts their video on demand their website that they're going to get one hundred percent availability that people will be able to get to it we've got a presence across the globe we have over 15,000 servers in over 1,100 different networks so we don't run a network ourselves we don't lay fibre we don't buy 5 or rent capacity on on pipes rather we put machines in different networks and then rely on those networks to provide the actual transmission capacity we're in 68 countries and our network folks like to say that we're on six continents we're working on number seven but so far the demand seems small you know sufficiently small that were just we can't justify it all so i'm not sure our servers are rated to work it you know minus 30 Celsius we allow our customers to scale whatever it is they're doing to enormous loads Ryan and cart talked about the fact that we've got to worry about for an event like the key to the Steve Jobs keynotes at macworld how much capacity is there how much bandwidth is available how much are they going to need the reason for that is that to one of the that one of or if not the largest event that happens on the internet today so we on a regular basis serve 25 to 30 gigabits of traffic every day on the website our peak we've served 23 billion hits in one day that was in the last couple months we typically served between 15 and 20 billion hits okay reminds me the Old MacDonald signs about how many billions they had certain windows this is everyday so most people if they're going to do an event and they want to use us we don't even really need to know that an event is happening because most people aren't Steve Jobs and so when they do a webcast maybe they serve a few hundred mega bits or maybe it's a gigabit but when Steve's going to do 16 gigabytes or in fact if they just allowed everyone who wanted to to get the 300 k stream he might do 30 or 50 gigabytes well we don't have infinite capacity but we have enormous capacity so with somebody like Steve we've got to worry about what are the limits and make sure we don't exceed them most people they do things and they don't even inform us and we handle it fine the other major thing that we give to our customers is a great deal of information about what is happening both on the Internet in general and to their content their streams their web applications across the internet so information for example about the number of streams that are being delivered the number of streams of different bit rates the total amount of bandwidth furthermore broken down by geographical area so you can see how many people are watching in Australia how much bandwidth are we pushing in Australia and so on just to say a little bit more about about our platform as I said we've got over 15,000 servers we're in over 1,100 different networks and this is really a range of networks across the board from hosting and access providers to you know companies that provide data center space Colo space and connectivity to to companies that are providing sites and streams and so on access providers that are providing dialogue for broadband or other access to end users as well as tier 1 backbones and of course more and more broadband for access of one form or another so we are in the same network as have machines in the same network as on the order of seventy eighty percent of the end users on the internet which means that if a user wants to get a stream there's one of our machines very close by that can serve that strain to that user okay same thing for web content or for web applications and that's been one of the basic premises of our company from the beginning is that it's vital to be near the end user in terms of performance and reliability and in terms of the scalability of the system that is as there are more users as they're more eyeballs will have more machines near those users and the system as a whole will scale with the user base okay so when they talk a little bit about how live streaming works the stuff on the left Ryan and Clark talked about the the actual video signal needs to be captured and it needs to be sent to an encoder which is going to produce then a digital stream in a certain bit rate that stream is then sent so first you get it from the camera through satellite and other other mechanisms through the encoder from there it goes to what we call an entry point now we've got a stream of packets entering our network the akamai cloud represented here with the four circles the entry points themselves actually our fault tolerance with our mechanisms as they discuss to allow that path to failover to a different one should that entry point for example go down or in fact it might still be up but the past it and the encoder might be congested might have been fine when you started but at a certain point the quality they are degrade so you want to make sure you're talking to an entry point where you can send packets with with very minimal loss from there we send that stream to a set of what we call reflectors we have many of these scattered around the internet typically in major backbones with with very good connectivity so the stream is basically being replicated this is all for now I'm talking for live streaming okay so this is for a keynote you've got one one packet stream going from the encounter to the encoder to the entry point and then from there that same stream is being replicated essentially a separate unicast to each of those reflectors we can't use network level multicast because these aren't on the same network and you can't do multicast really across the Internet in that way maybe this isn't an overlay multicast if you will the idea with the reflectors is then from there if the user wants a stream he'll contact an edge server and let's say in this case it's the middle one he'll say I want to get the steve jobs keynote and the ARL that was mentioned tells us when when the user hands as that tells us how to actually get that what the appropriate port is and so on to make sure we get the right stream from the right entry point they'll say I want the steve jobs keynote that edge server then subscribe to that stream to from one or more reflectors and packets will start to flow now you might have a situation where a packet start to flow but then some of them get lost because the internet while it's amazingly reliable for such a large and essentially decentralized system in terms of how its managed packet loss happens all the time congestion happens it appears and then disappears how long it lasts depends on the length of the flows that go through a congested link so in this case we're showing those packets all grayed out because in fact those four packets got lost the edge server if there's congestion when it's pulling from a single reflektor will then start to pull for more and it will pull from enough to guarantee that it gets plete copy of the stream so in this case it was seen congestion on the the initial stream that it got from one reflector it starts its subscribed to the same stream from another reflector it manages to get some of the packets but still not all so it will subscribe to yet another so we will pull multiple copies to an edge server as needed to guarantee that that edge server gets a complete copy of the stream in the early days of the system we just sent multiple copies to every edge server doing sort of blind forward error correction but as you can imagine this is expensive and we've built a system now that is much more adaptive and responds to the conditions of the network between the reflectors and the edge servers to do that adaptively and only call as many copies as they're needed the other thing I should say about about this process is that one of the key things that's not really shown here is so and then the end user gets the stream from that edge server and gets a very high quality stream one of the key things that isn't shown here is the mapping process that decides which edge server and end user will talk to you to pull the string and that is one of the key pieces of technology that our system is built on both on the website and the streaming side to monitor the entire internet on a real-time basis and then make mapping decisions every 10 seconds that determine among other things for a given end user when he wants some piece of content be it a stream or web content which edge server he should talk to ok and we choose an edge server that is lightly loaded that's likely to have the content that's up that's always a good thing and where the path between the end user and that server is uncongested is it can can deliver high quality if the goal is to deliver that content whatever sort it is to that user quickly and reliably a couple other things I want to mention here about what we do to ensure quality and and good performance and so on we do something that we call pre bursting so you could imagine that when a user connects to an edge server and says I want the steve jobs keynote well a subscription goes from there to one of the reflector nodes or perhaps more than one and if those reflector nodes are not currently getting the stream from the entry point they'll subscribe from there and then the stream will start to flow but in normal a normal situation you might simply start to send the stream at the speed at the data rate of that stream okay so if it's 300k you start sending packets taste at approximately 300 K or what if the actual bitrate of the stream is which means there's going to be significant latency until the user has built up a buffer in the player before it actually starts playing what we do across the network which fits well with the instant on feature of the current quit quicktime system is we do pre bursting from the entry point through the set reflectors all the way to the edge machine so when an edge machine or a set reflector subscribes to a stream we will send the data at eight times the actual bitrate to build up a buffer very quickly close to the user and then the player itself does that from the the edge server to what will pull as fast as it can to fill up the buffer initially and then from there we'll pull at more or less than normal bitrate the other thing that we do to try to ensure high quality is you might think that the best thing to do in terms of mapping a user is simply to map a user to the nearest edge server that that has good connectivity between it and the user in fact because of the characteristics of the stream servers that run on the edge servers and and the dynamics of building the system like this and running many different streams many different kinds of content over it you can get much better quality by being much more judicious about what streams you serve from where so we use something we call block maps which rather than essentially spreading the the load first for a given stream in some sense over the whole network we will restrict the regions that it is mapped to and when I say region I mean data center we will restrict it somewhat so that for example we would rather serve the same stream out of a small number of servers and some other stream out of other servers then just mix them all over the place okay we can get better quality that way we then monitor the whole system on a regular basis and continuously certainly for an event like this but in general we monitor the whole system to watch a number of different metrics on the performance and quality of the stream for example what's the actual bit rate in terms of packets that are delivered on time because you can deliver a packet but if it's too late and the player throws it away because it's it's arrived too late then it's not useful the number of packets the percentage of packets that are delivered on time to the player in the end that's an important metric how much thinning takes place between the server and the end user is another important metric because the user might have a 300k stream but there might be enough congestion on that path that it starts thinning and actually delivering a much much lower bit rate stream how long does it take to connect and get started is another metric you want people when they connect to not sit there for 20 or 30 seconds looking at you know something and waiting for the stream to start you want it to start in a few seconds three or four or five seconds at most it possible and then how much re buffering and and and other things like that take place during the plane of the stream and all of the different technologies that we put in place in the network are derived from in really in part measuring all of those metrics and under trying to understand when there are issues when there are problems what's causing those and then developing mechanisms like pre bursting like block maps that will allow us to get much much better quality and continue to deliver a high the experience to the end user okay so let me talk about Apple and Akamai Apple has been a one of our major customers since the early days of the company with the company actually started in 1998 our first paying customer was early 1999 and Apple has been a major customer in fact it was an investor in the early days as well Akamai is Apple's platform for the QuickTime streaming network we've done since 1999 over a thousand live and on-demand events and we've done nine live steve jobs keynote addresses and those have been every time the biggest event to date on the internet steve is the rock star of the internet I think there's there's not much question about that we you know there are movie trailers Lord of the Rings and many others Tomb Raider and lots of others and you can go look and see what if you haven't see what's on the quicktime streaming network and that those streams are coming over us we also do a number of other things for Apple we are delivering the the itunes music store both the actual music downloads so when you download a song that's coming from our network but also a lot of the data and control information that is used to make decisions about playlists and and other things are coming through our through our network movie trailer download software updates also come through us we're providing web analytics services so reporting on what's happening on the site and what's happening in different parts of the world to allow the marketing folks and other people to make decisions about about what to do and then geolocation services so for example for itunes there are I think contractual obligations in terms of where in the world you're allowed to but you have to be if you want to actually download those songs and we help ensure that those contractual obligations are met so let me talk a little bit about the keynote itself what happens both before the event and day of and I'll say a little bit about afterwards as well we ourselves for an event of this scale this is not just a normal event as I said most people most of our customers don't do events on anything approaching this scale and they run events all the time and don't even let us know in advance that it's happening large events a gigabit several gigabits or in the case of Steve's you know 16 gigabytes we need to know about because we don't have infinite capacity so well in advance of the event we internally develop a project plan who are all the people going that are going to be involved from engineering to various support groups to the network groups to make sure that we have adequate capacity in all the different regions of the world that it's needed do capacity planning understand what capacity we have on the network for serving quick time and where is it and furthermore what else is going to be happening on the network some of that is hard to predict two months in advance you can't always say when we might be at war or when some major event might happen so we obviously have a certain amount of headroom in terms of capacity in our network and are prepared to deal with with fairly significant bursts but 16 gigabytes receive and another who knows what for for a war and so on can certainly add some stress and so we want to do as much planning for that as we can we then talked about and talked with the Apple folks about what is needed in terms of velvet rope and the idea of a velvet rope is basically to limit the total amount of of and whitsett is use think of as a velvet rope around some fancy event and you know you got to be inside the rope to you know take part now there are many ways to think about velvet rope you could imagine simply saying I'll let people come in grab whatever bit rate stream they can get and then when i hit my women i turn them off or you could say well I'll just you know I won't provide a very high bitrate stream because that way I can let as many people in as possible but if you don't know how many people were going to come and you provision for a certain amount of capacity what you'd like to do is let as many people in as you can this is certainly what Apple wants to do but as many people win as you can and give as many of them as possible the highest bitrate they can get so the idea is that we start out with all of the bit rates being available to everyone and we watch the ramp and then there's a decision point when we hit certain thresholds to decide to clamp down and you know not provide access to the higher bit rate depending on how much of the the available capacity is left and that decision the interesting thing is that that decision is made not just globally you don't want to say oh you know we've got 16 gig globally and G we're using 12 gigs so nobody should have access to the 300 k stream but rather it's done on a regional basis and the reason for that is that we want to serve people to give them good quality we want to serve them from reasonably close by so if we just made a decision globally then it might be that in fact the servers and the the network links in Australia from our servers are maxed out that we have capacity in the u.s. so we could conceivably serve more people in Australia with a high bandwidth stream or any stream from the US because there is capacity but they're likely to get pretty crappy service so we want to be able to serve them from not too far away and to do that we need to make a decision when someone asks for the web page to say I need to get the the link for the the rest movie we want to decide what link to show them well that what we show them will depend on where they are if they're in a region that has has gotten close enough to the limit for that region say Australia or say North America or Europe then we will show them the link that the page and the link that only allows lower bit rates but other parts of the world might still have access to all the bit rates and that's done with our first point system which was originally designed to do essentially global load balancing for mirrored sites but provides a number of capabilities including the ability to look at the IP address and make decisions based on where that addresses and make different decisions for different regions so that's how first point fits in and it does the the global load balancing in general for managing how the load is distributed across our servers and for ensuring that for a given region of the world we only provide access at a given point in time to the bit rates that that makes sense given how much bandwidth we're currently pushing there so we need to determine the need for velvet rope and what the thresholds are going to be we need to provision first point to do the load balancing and the load management and then for an event like this that is so high-profile and so so critical and where failure I mean as I said it's Steve's unhappy you can imagine what happens you don't want to fail okay it's just not acceptable you don't want to fail even for you know 30 seconds having things drop out for 30 seconds would be a disaster okay and not to say that we want to fail for other customers obviously we don't but as I said for an event like this Apple cares enough about what happens with the event the profile sufficiently high that they are willing to invest in a level of testing and a level of attention that's paid to it to just make sure that any contingency that comes up will be covered so there's a lot of testing that goes on before the event end-to-end to make sure we can capture a signal send it to the entry point failover the entry points send it through the network to the edge servers and then get it with high quality to users around the world the day of the event we provide first of all automated network management our entire network those 15,000 servers the entry points the reflectors the edge servers that actually run the quicktime server is remarkably self managing our knock on a normal basis has on the order of four people sitting in it watching the whole network and that's because we have a very extensive automated system for monitoring lots of different aspects of what's happening on every machine and what's happening on the network pairs between machines and between them and end users and we have automated failover at a number of different levels of the system so that when a machine fails or a region fails a data center goes offline or connectivity is disrupted there's automatic failover and remapping so that very few users will see any impact at all from from that kind of event but there are times when something goes wrong that needs attention and so we do have people who are actually actively watching so the RR knock monitors on a daily basis constantly the whole network and for an event like this of this magnitude in this importance then they are also specifically watching what's going on with this event and in addition we set up a situation room for an event like this where the team that has been assembled to put together the event and run it is in that room watching the network watching what's going on and making sure that if there are any anomalies that crop up that they get fixed very very quickly often before any end-user note notices an impact so what do you get from all this well in January of this year you know there's a lot of expertise that we have for delivering these kinds of events we provided one hundred percent availability we provided over 12 gigabits per second peak delivery to almost 80,000 concurrent users and I think the total number of users during the event was on the order of 100,000 and at the same time maintaining a high level service high quality to all our other customers as well and then in addition we provide real-time and after the facts historical reporting on the what's going on with the abandoned on the traffic that's being served