WWDC2003 Session 702
Transcript
Kind: captions
Language: en
well hello welcome thanks for coming
this recession 70 to mpeg4 demystified
as part of the quicktime track that
we're very happy to have this year at
WWDC it I'm Amy Nugent I work on the
QuickTime product marketing team and
I'll be your host for this session you
probably just came from the QuickTime
state of the union presentation and
there you've learned where standards
really play a very important part in the
strategy for Apple as well as quicktime
and it's my honor to introduce to you
rob conan who briefly talked in the
state of the union president the mpeg-4
industry forum as well as many other
jobs and he will go through the mpeg-4
specification it's a very vast and deep
specification that is capable of many
things and I will leave you in the very
capable hands of Rob and okay have a
good session Roderick you very much area
so I think we've got to do one of two
things first either all of you flip up
or your your imax or we put on some
light in the room because it's really
very dark I can't see anyone so here I
books there's not enough i boosted i can
we have some like in the room is that
possible yes i think you can still see
the screen right and the screen is much
more important than I am so allow me to
take off this good morning everybody it
is my pleasure and honor to be able to
to talk to you today and to explain the
high-level concepts of event paid for I
think I will be able to demystify some
of it maybe not all of it notably one
thing I have to say it from this about
the licensing some people of you have
heard about licensing you should come
back on Thursday morning and there's
someone here to explain just that I will
say a few words though and if during
this talk you have any questions I don't
mind being interrupted just raise your
hand it would be good if you use a mic
for a question
because this is being translated
simultaneously into Japanese and the
translators can only hear you if you use
the mic if that's difficult just shout
out your question I'll repeat it for the
translators just need little defensive
I've been told not to press this button
because then everything will go wrong
when you give you need little devices
with buttons that you're not supposed to
press thank you for coming this is what
I would like to address today what is
mpeg-4 how does it work what are the
reasons and interesting developments why
should you use them big for this is just
a bunch of business talk so we'll go
over that quickly because you're all
developers I'll tell a little bit about
the deployments of epic for and then
just a few words about em for F or the
impact for industry forum which is an
advocacy group for for mpeg-4 so let's
start with the basics what is mpeg-4 I
am today not going to give you what's
happening back here
I got the message what is that before so
I'm not going to give you the gory
details of the video codec or the audio
codec or any of the system codecs with a
high level functional overview of
Olympic for is and what it does first
it's what we like to think of as d media
standard we call that way because it's a
standard it works across all devices all
networks all carriers of everything
basically and that's why we also call it
the info operable cross-platform
ecosystem while it is operable it's also
competitive when I will tell you why
it's competitive because just that you
have a standard doesn't take away all
the competition there is on the contrary
actually that creates a lot of
opportunities for competition most
people no impact for is a video codec
and epaulettes to talk about mpeg-4
video and AAC AAC is actually also
mpeg-4 or mpeg-2 advanced audio coding
and it's also a systems layer a part of
which is the file format which was based
on quicktime as as Frank casanova
explained this morning it goes way
beyond audio and video the audio and
video are the first element that will
see deployment and it supports stuff
that's way beyond that and I will tell
you a bit about that and it's designed
for all multimedia platforms digital
ones so where does it come from most of
you will know so who knows MPEG 3 here
anyone know MPEG three some people still
do epic free doesn't exist because what
you know as MPEG 3 or mp3 is actually
mpeg-1 audio but mpeg-1 at three layers
of audio and layer what was really
simple layer 2 is what's used in most of
the digital audio but the digital
broadcasting systems today in Europe at
least in America it's Toby and layer 3
is the most complicated of
take one audio layers that was way
beyond what could be implemented when it
was designed but it's not just the norm
and then you may know Olympic Village is
standard for digital television and for
DVD video and audio also in Europe again
in America it's more dolby audio then
there are epic 7 and epic 21 which are
not successors to mp4 or MPEG two epic
seven is sort of a metadata standard
that allows you to describe content and
epic 21 is a is a very fussy phrase is
the framework for interrupt will use an
exchange of digital media it does all
sorts of stuff that has to do with
whether you find content what's in a
unique identification for content what
do you find the rights to content and it
attempts to standardize elements of
digital rights management which is more
of a challenge i can tell you then
standardizing just a video called
equities which is hard enough as it is
so let's talk a little bit about the
mpeg-4 vision and this is a vision
that's been with us for a long time
actually since the middle of the 1990s
and it's coming on clothes and I've been
working on mpeg-4 for 10 years almost
that will be next year the vision and
advocate if you remember back in the
early 90s or the half-way the 90s there
was all this talk about convergence and
everything was going to be the same and
we're all going to have glass of fiber
into our living rooms and the big
discussion was okay are we going to
consumer content on the PC or are we
going to consume the content on the
television and back then we said that's
that's all a lot of there's not
going to be I'm not sure that can be
translated by the way it's that wasn't
quite right there was rather than
convergence we saw proliferation of
multimedia rather than less networks we
got more we got all these sorts of
difficult in mobile networks we got the
law
television network we got the digital
temp sorry telephone networks work as a
digital telephone network ISDN which has
had quite a bit of adoption in Japan and
Europe not as much here we had a dsl we
got cable we got stuff coming through to
us through satellites and it's going to
be a chaos and rather than just this
convergent terminal it's going to be
either the PC or the television another
load of bull we're going to have a lot
of different terminals handheld devices
phones more pcs different pcs set up
boxes and the role going to one to do
digital multimedia and back then the way
to do standardization was new network
new standard new stack communication
protocols codecs everything a new
especially in the communications world
and we said that doesn't make any sense
we have to have one layer of content
representation that works across all of
these different applications is agnostic
to the network to the terminal and
supports all these different types of
what you could call paradigms for using
content broadcast communication
retrieval and retrieve them to be online
or in package media like DVDs so
basically a single technology for all
these devices and of course that doesn't
mean that on your high-definition
television you're using the same dick
rate but you're using the same you very
values the same systems layer by the way
you can use the same mp4 files and they
get point to different content media
files with different encode bigwigs and
like Frank show this morning it could
easily take one file and transport skoda
to another that does work on I'm not on
a on a mobile device enabling what but I
like to think of as this right ones play
everywhere paradigm where you can use
your content across your devices on your
PC's on your CED vices and on your
phones even you can even take it with
you and at the other hand you can shoot
your films while you're
the road and just upload them to your
procedure they just play in quicklime so
that's where we see the applications of
mpeg-4 today and I already talked a
little bit about in the state of the
union we see the mobile devices we see
it in broadcast not as much yet this
will explode excuse me if we get the new
advanced video codec about which I will
say a few words a bit further down in
this talk we do see streaming services
interestingly we are getting a little
bit of interactivity in these epic for
allows for great interactivity which I
will definitely talk about and the BBC
is doing a trial now with that sort of
stuff and for package media which is
also waiting for the new choric at this
moment so let's now go to the heart of
this presentation which is how does it
work mpeg-4 is is an object-based
multimedia content representations
tangles and some people that know a
little bit about image coding or have
heard about it before it will know that
mpeg-4 supports arbitrary shape objects
and video that you can do segmentation
and stuff that's so true but you don't
need to do it an object might just as
well be a rectangular frame of video and
the audio and that may be the text that
scrolls across the video is another
object which is already huge difference
with mpeg-2 wait everything is just
pixels it's got a revolutionary systems
layer it's got state-of-the-art codex
which are responsibly upgraded which
means no new codec every half here is
something that you can perfectly well do
in the internet world but the CEO world
doesn't work that way people don't want
to buy a new DVD player every every half
here and it's got something called
profiles and levels to restrict
complexity and guarantee
interoperability those rd what we call
interoperability points profiles and
levels i will MPEG he finds a whole
bunch of those and you could actually
say there's too many of them it doesn't
really matter because industry
consortium search as the internet
streaming media alliance which is a
construction that Apple
founding Jay pick their profiles and
levels a soccer game gives this profile
for video this profile for audio going
to use this file formats will be there's
only one of them so that's simple and
that's how we are going to do improbable
streaming media all of us Philips son
Cisco percent of Apple you name so let's
take a look at this picture in a jump of
speech for a second and you can't mean
to text and that's okay because you
don't need to but what you see here is
all the different content types didn't
pay for supports and what you see
there's audio there's videos as graphics
there's even 3d graphics there's
textures animation that's something we
call Biff's or hotel about what mrs. and
then there is basically this represents
a multiplex which you see is an mp4 file
which is the basic container that can
carry everything then you can just
redistribute that stuff the these
containers and these streams using
whatever you would like to use because
it's basically mp4 is agnostic to all
these things so there's broadcast is
broadband delivery satellite as wireless
there's phone lines whatever and then
you can put it on the number of
different devices and what I talked
about the devices before so let's skip
that bit but what's now interesting is
let's look at what impact others are
actually going to go back to the screen
in mpeg2 you would do authoring and you
would take all of these objects you do
your authoring and Apple has a couple of
great products doing authoring but then
you're going to do encoding or what you
do with encoding is you you basically
say okay now I'm going to convert all
these things into pixels one plane of
pixels everything is collected to is
collapsed into a single plane that frame
rectangular frame of pixels gets encoded
I explain everything using video
concepts actually an audio I could do
something similar but it's just a little
bit easier for me doing it in visual
concepts so you take all the object you
collapse them into a single plane of
pixels you encode this using mpeg-2
and then you just display it here
there's nothing you can do anymore now
with that big for you can if you wish
you don't have to that you can keep all
of these objects separately you can have
multiple video just you could have one
you get have a graphic that's encoded
encoded independent league and have your
streaming text you could have your voice
and your music and code it separately
you can keep it separate what's called
elementary streams you could send these
to the decoder and then you do the
composition here so instead of doing
composition before end clothing here we
are now doing composition after decoding
of the objects which is here that's the
major actually what if there is one
major paradigm shift in mpeg-4 that's it
now in order to be able to do this you
need some sort of a language that tells
you okay this is where the object go on
the screen this is when they appear
that's what we call the bits the binary
format for scenes and so it's an
efficient binary language that allows
you to describe where the objects are
what they where they go when they appear
now if you have this best language you
cannot not just described the scene
statically you can also start describing
the scene dynamically you can attach
behavior to the objects it say okay this
logo is spinning it's changing its color
it's moving from the top left of the
screen to the bottom right of the screen
now if I were to do this in mpeg2 or any
traditional codex I would have to encode
all these pixels and again and again and
again and again again until the logos
here which is quite a bit of waste of
bits while I meant before I'll just give
one command saying okay move the logo
from there to there and take a second to
do it and that's it which is a very
small binary comments into the decoder
decode that takes care of everything
now this applies this visual objects it
applies to all the objects as well you
can describe 3d audio scenes in this and
have sources move around and as seen if
you wish that's quite a bit more
advanced but that's the basic concept of
mpeg-4 so let's look at this in a
typical impact forcing that is fully
free of any copyright so I won't get in
any trouble which means it's a bit dull
I made it myself it's an aquarium with
some seaweed there is an arbitrary shape
video object and I've been using this
for a while she's for now this was when
she was one day old there's some bubbles
from fish and there's another type of
fish which is a special sort of fish
which I'll explain a little bit and all
these are different objects so this is
an arbitrary shade video object or
natural video object these are graphic
things the fish and then there's the
bubbles there's the background it has
music this may be a voiceover oh and
then there's this this looks like a
wireframe and actually is a wire frame
with a picture projected onto it and the
neat thing about this is if you if you
move the vertices in the Royals
wireframe you can make the fish swim and
actually in real in real life you
wouldn't see all these wires these would
be hidden but that's just to show you
how it works these are a couple of the
objects to them before supports now this
is what the scene tree looks like all
these objects are represented by
branches in this tree and they have
sub-objects at what's some of these
trying to go back do you really want to
do yeah that works so all of these
objects can have audio and video
associated with them some of them are
static graphics some of them are streams
some of those audio someone's video and
this is actually literally what's
represented in the decoder and
now you can go in with your best
language and just do stuff with the
branches you can take a branch out that
an object disappears you could change
the place of the whole branch you can
change the color of an object just by
issuing these little bits commands so we
kept in we have another visual scene
with objects it could be a very
complicated scene could be a very simple
scene with one audio object in one video
objects and it just provides
interoperable streaming which is it's
quite a feat in itself these objects can
be of different nature they can be
natural which is they are recorded with
the camera or microphone they can be
synthetic which is there generated with
a computer program and there is a
compositor which is this new element and
puts the objects in the scene and then
there is an efficient real-time binary
scene description language which is
called this and this say a couple words
more about this it inherits a lot of
verbal the virtual reality modeling
language but as you as you may know that
one was neither real-time or binary and
therefore not very efficient for stuff
like streaming over the internet or to
mobile phones it was perfectly okay for
doing computer stuff and the coding
scheme of all these different types of
objects is optimal for the object type
so you don't try to encode speech with a
music encoder which is not really
optimal you don't have to encode a
graphic with a video encoder which is
optimized for moving video rather than
just still graphics you can use the
optimized coding scheme for each of
these objects and this is completely
independent of bitrate and I still say
this because most people now understand
that big 4 is it about low bit rates
just about low bit rates it's also about
low bit rates way back when 1993 mpeg-4
started as a low bitrate project but
that got changed like really quickly in
1994 but some people still think it's
about low bid for it so mpeg-4 there's a
studio profile that I
he goes up to over a gigabit per second
and video coding so let's look at the
different objects that are supported in
mpeg-4 the ones you know our video and
audio and these are the most widely
deployed video coding and advanced audio
coding mpeg-4 advanced audio coding in
addition to the video coding on the
visual side we have animated faces and
bodies and there's there's some
companies that that have products are
there for animated faces and I think the
BBC has been looking at doing this
because they have a legal requirement to
do talking heads for people that can
that can hear that people and they
they're supposed to be able to to read
lips and you could do this with animated
faces there are two-dimensional
three-dimensional animated meshes it
does a little wire frames then you can
project either still or even moving
video into these wireframes and then you
can deform the wire friends you get
really intricate effects and there's
text streaming text and still text and
graphics and jpg is also support it as a
part of the mpeg-4 framework to just use
the graphics and then there in the audio
site we have generic audio from mono to
5.1 channels and by combining different
audio objects going to actually go up in
almost indefinitely you don't need to
stop at 5.1 there's specialized speech
speech codex synthetic sounds this is
very advanced structured audio is it's
basically a language to program a
synthesizer and then to first to
describe instruments and second to to
play the instruments so there's a score
score language there's text to speech
which is merely an interface which you
can mark up text button can be
regenerated as speech and then there's
something called environmental
specialization which is making stuff
sound like it's in a specific place you
can
describes it place so let's look at that
the the parts how this will fit together
first there's the visual coding and then
there's the audio coding and this is
just decoding I'll say a few words about
this a bit further down in my talk but
it's important mpeg-4 only standardizes
decoding it doesn't standardized
encoding and that's why there's so much
competition between providers the same
with mpeg-2 and as you will see a bit
further down in my talk this provides
for a lot of improvement in quality of
these codecs and this is also why you
have to be very cautious with statements
from proprietary vendors about the
quality d quality of em before it
doesn't exist basically but you can get
the best quality with mpeg-4 and there
are fair comparisons to be made but i'll
say a few words more about that a little
later then there's a systems layer and
mpeg-4 which basic which does stuff
before decoding in terms of
demultiplexing and buffering and after
to decoding in terms of presentation
which is this composition of the objects
and the systems part used to contain the
file format which is the mp4 file format
which is extremely close to the 3gp 3g
3gp file format which you saw Frank talk
about in this talk this morning the only
difference is basically that there is a
toddler flatten that says this is the
3gp file which means ok i now have a mr
voice coding support which is not
something that is natively non to mpeg-4
but for the rest is just the same stuff
and and then there's something called
EMF which isn't always used you don't
have to use it but which would provide
you with an abstract interface to the
transport and if you use DMS which has a
little bit of a grandiose name delivery
multimedia integration framework it
stands for it's actually a quite compact
part of the standard if you use dimas
you can write your replication to a
transport layer
then you only need to write separate
interfaces to a disk or to a network or
to a broadcast even and your application
is to be further fully unaware of what
it's talking to and then there's the
transport layer which in principle is
not in the standard and this is how
content flows through it comes to a
transport goes through dimas if its
present systems takes care of the
multiplexing of all the different
objects it's decoded and then the
decoded objects are composited onto the
screen or into the sound space and
composition of audio could very well be
okay I turn up the volume of the
background a little bit and I turn down
the volume of the foreground speaker a
little bit or I choose the Japanese
speaker rather than the English speaker
these are all possibilities by by by
using epic for composition and there's
two sort of orthogonal parts conformance
which contains a lot of bit streams if
you have a decoder you can use the
conformance part and see if you decode
risk informant give you some level of
indication of interoperability in the
mpeg-4 industry for we do much more
interoperability work with exchange of
bit strings and then there's reference
software which is actually free of
copyright if you use it for building a
compliant implementation there is
something even though in principle this
is not in a state of something called
mpeg-4 own ID which is a specification
on basically how to use IETF protocols
and how to do the mappings and more
recently what's called advanced video
coding was added to the mpeg-4 standard
and I'll say a bit more about that in a
bit as well and you will see that the
numbers don't quite add up there's more
stuff that I don't think is important to
talk about right now so let's take a
look at the sum of recent developments
hey this slide was supposed to have been
hidden I want to first say a little bit
more about the objects I'll keep this a
little bit brief so we have video which
basically goes from my thinking of a
second to over a gigabit per second so
if you take one set of zeros out it's
megabit and if you take another set of
zeros out here it's gigabit per second
and Sony actually has cameras that
support this stuff Studio profiler
squirrel called multiple rectangular or
arbitrary shape objects in the scene
scalability supported include including
fine-grained scalability which has some
support but not a lot yet but it means
if I have my full bitstream i can drop
layers of the full bit stream and you
can still decode sensibly the picture of
the audio in this case the video sprites
you can use price for backgrounds we
could send them once and then you can
warp the the background with to make the
scene change but you don't need to send
keep sending them as moving moving video
and then we have some types of
computer-generated visual information
synchronized reflux and animated text
place embodiment animation talk about
this and the meshes with the moving
texture still or moving texture now for
audio and there's a lot of stuff here
and i should say some of this will be
used in some of this will likely not be
used and that's quite ok because we have
these profiles and people will pick what
they what they need again with audio we
could have a number of objects in the
scene that you can make your audio
composition i think the most important
codec in effect for his impact for
advanced audio coding which is very much
like epic to advanced audio coding has a
couple of new things there's another
audio codec for really low bit rates
with AAC is getting really low as well
these days and then there's one for it
extremely low bit rates it's called aih
iln and then there's a voice codec
actually two of them one again for
extremely low bit rates and one of them
for normal bit rates in it
24 kilobits per second you have just
basically transparent voice quality you
can't distinguish from real voice and an
audio you have again scalability so that
you can have actually it's interesting
you can you can build an AAC layer on
the cal player if you wish even so you
use the Cal player sort of what what's
called the prediction and then you can
build X n you can now put an AC layer on
that if you for instance do radio the
basic quality goes in kelp because it's
much mostly speech and if you want to
have a really good quality you do it in
mp4 AAC and something like that is
actually done in digital radio mondiale
or drm which is a digital broadcasting
standard and the conditions were such
that you have to be able to receive it
in very poor reception conditions and
then you have to get a good really good
quality signal if you receive a good
just a good signal and that uses this
type of scalability synthetic all the
objects also I talk a little bit about
this before we have this Orchestra
language whoops orchestral language and
score language so with this language to
describe the orchestra and with this you
describe the the music itself and this
is really a get one or two kilobits per
second you can do really great music and
there's a company that's been working on
this for a long time and they were going
to build a quicktime plugin and i hope
they'll come out with it soon as it was
promised for this for this summer media
supported and a couple of types of
synthesis and then there's this text to
speech interface which you can use
together with face and body animation
those are with some of the more esoteric
object types support today an industry
is for AAC and for just normal
rectangular video coding and there's
some companies that are trying to do
more interactive stuff with mpeg-4 but
they start with the systems layer
they have graphics they have arbitrary
shape stuff there semi-transparent
graphics and they use notably the binary
scene description as I explained its
inherited from vermel but it's much more
efficient and it's added real-time it's
basically married the mpeg-2 and back to
get from the broadcast world right so
people know about synchronizing audience
video about synchronizing different
objects and about buffer models and
stuff and the scene description marries
these concepts from verbal and from
mpeg-2 great broadcast great
synchronization that's what allows the
interaction it works in n 2d and 3d
dimensional and there's a couple of
three-dimensional players out there
already and it allows you to do dynamic
scene updates you can add objects to the
scene on the fly can delete them and you
cannot you can change them on all on the
fly by using this scene description
language and to provide an interface
with a smile world and to make it better
author able and pike later added what's
called the extensible impact for textual
format or xmt what's you basically a
textual format for bits and there's
actually two versions of them one of
them is very close to the dips and one
of them is more generic that I'll spare
those details but the important part is
that there is a smile harmonization to
the extent possible because there is a
lot of smile content out there what's
very important in mpeg-4 systems is that
you do get predictable behavior of audio
and video which is which hasn't always
been the case with all the web the
internet technologies and that you get
predictable buffer management so as you
know if I send content it will play on
the player but because the player knows
what to expect it won't get trouble with
buffer overflows it's so predictable and
it's all standardized there's some more
stuff with a smile integration here in
the timing which you can basically
do a more loose timing of your objects
and what's important here is that while
mpeg-4 doesn't standardized digital
rights management it has interfaces to
proprietary systems digital rights
management is it's not going to go away
I think it actually could provide some
useful features for for end-users even
though it's been for traders and
something that is hostile to end-users
that it's that's wrong but in order to
for a ecosystem and ecosystem to support
serious content being deployed in the
ecosystem something needs to be done
about this rights management and I think
Apple take a grill it took a great
approach with what's being done right
now in itunes it's very user-friendly
and basically the aram needs to be you
don't see it if you don't make normal
use of your content and that's what
we're trying what we're getting to see
these days as a standard interface in
the in mpeg4 and there's epic 21 which
will bring more interoperability in the
arab which means it's no longer as it is
today the the monopoly of one big
company basically and the file format i
already said it a couple of times is
based on quickly i met before just like
the top three TP file format which is
very close to mp4 quickly wrapping this
up there's a big gay or Java which you
can use for a really complicated content
render and for having programmed content
basically but also a standard api's to
find out what you're talking to what are
the terminal resources and stuff and
there's some advanced audio rendering
where you can basically make create the
sound without changing the source of the
sound you can describe the environment
in which it should be should be played
basically could say okay this is in a
closet or this is Anna giant
or this is a football field I can
describe this with the audio rendering
stuff and I realized I'm giving you a
lot of a lot of details going to build
we're going to try and talk about
application soon but I have to make the
case for the profiles first if you have
this huge tool box of all this stuff
then in theory you would have
interoperability but in practice there's
not going to be a multiple of
interoperability because everybody's
going to the implementing different
things right which is why MPEG defines
profiles and these profiles are the
conformance points as we call them which
is ok this is where you can test the
interoperability a profile basically
turbines a tool set I use these tools to
encode my video I use these tools to
encode my audio and then the level
within the tool set limits the
complexity and stuff like okay bits per
second for for video the screen size for
video or the the sample rate for audio
these things are in the levels and if
you take a look at the I estimating
internet streaming media alliance they
have said ok we're going to use what's
called advanced simple profile for for
the video we're going to use low
complexity AAC mpeg-4 AAC for the audio
and use the mp4 file format and that's
what we're all going to do and now i
have a within is ma I have an
interoperable stack and they add some
transport to that which is something
epic for doesn't define and then you
have the interoperability and it's
interesting to see that while mpeg-4 has
many profiles and i would say too many
and I'm partly responsible for them as
chair of the impact requirements group
it's very good to see that industry is
converging on just a few just a few
which means there is this
interoperability and and the ones they
are choosing are hierarchical so there's
the simple profile and advanced simple
profile which are mainly used in video
simple is what is now in quicktime
advanced simple is what it what is in
some of the more advanced and pick for
players and end coders and decoders but
they're they're compatible in the sense
that if you have simple content simple
profile content it will play in an
advanced temple player so that's good
and that's why you can't see that people
are exchanging content and divx for
instance is an implementation of
advanced simple we have a couple of
profile dimensions I think it's gets too
technical to go into real detail but all
the the elements in mpeg-4 have been
profiled that's basically the point of
this message and actually I don't think
there are handouts right at this
conference are there so we'll make this
available on the end for a website can
we do this I think so yeah so we will
make this presentation available in the
infrared website and you can download it
for if you later what do we do it let's
look at recent developments this is a
very interesting development epic for as
we know it today was standardized
1998-1999 some stuff was added at four
years ago very recently a new code
equals edit Olympic for it's called
advanced video coding and while mpeg-4
as it was until this was added was very
attractive to mobile and Internet where
there was no impact tool yet it wasn't
attractive enough yet to the broadcast
because the in order to replace or the
impact you win infrastructure or to add
something to the mpeg-2 infrastructure
you really need good advances in coding
efficiency and will impact for advanced
visual profile provides this it wasn't
enough for these major investments in
the broadcast industry it was enough for
for the end for the internet and for the
mo
and stuff but this new codec which is
called advanced video coding which
originally comes from the the itu world
and they've been working on this for a
long time also maybe first I knew of the
project was 10 years ago basically and
it's the same coding standardized in itu
and in I so as I actually basically
basically there's two groups in the
world that work on video coding
standardization there's the video coding
extra growth in the ITU the
International Telecommunication Union
and their lens is so MPEG came together
formed the joint video team and the JV t
codec and standardized this new product
which beats everything out there so
forget about what you hear from
Microsoft this is better and this has
been confirmed by by independent parties
like lsi logic who might have great
respect for him they guided and
interestingly and I was again I will say
more about that improvements will
continue because of the fierce
competition in this market there's a
really fierce competition and we've only
standardized the decoder so people will
come up with amazing encoders basically
and this will about this will give you
about broadcast quality impacts video
with about seven hundred kilobytes to
one megabit per second now that's
significant because that starts to get
in the range where you can do streaming
over a broadband network or very good a
DSL connection or a good cable modem it
starts to get there it's also good
enough for people to think about ok I'm
investing in a new generation set the
boxes now maybe I should take a look at
this new product it's also good enough
for this to be implemented in mobile
devices at some point in time they
already did supported the the basic epic
floors I could call it now and they also
start supporting this stuff
and it's amazing that neither what Apple
is doing with the with the conferencing
stuff using mpeg-4 for conferencing
there's a lot of people lining up to do
this to use advanced video coding or
h.264 for conferencing there's a lot of
industries waiting to start using this
codec and I'm sure even though Apple
never discloses its product plans not
even to me I'm sure that they're working
on this codec and they'll have it soon
ready pretty soon then there's advanced
audio coding and high-efficiency
advanced audio coding now with high
efficiency it's a neat little trick well
you can we split the spectrum in half
and then you predict the upper half of
the spectrum for you this is the upper
half of the spectrum from the lower half
of the spectrum and if you do CD quality
or near CD quality or really good audio
just basically internet quality this
gives you a lot really a lot of bit bent
with savings like CD quality or about CD
quality 48 kilobits per second and high
quality at just general internet calling
at 32 kilobytes per second the trick
doesn't work for transparent quality so
if you if you want to have really
transparent quality which is something
that iTunes is trying to achieve then
you would still use normal AAC then you
don't get anything from this prediction
trick it is really neat and it's being
used in the XM radio and digital radium
on the yellow drm for their broadcasts
because it works so well and now AAC and
including a high-efficiency AC have been
tested as the best crowded by the
European broadcasting Union overall the
proprietary codecs and that's what's
being shown here and this should
actually say AAC event audio coding this
is original and then you see high
efficiency AAC which was tested in one
specific implementation called aac+
whoops and then you see
a mp3 pro which actually is mp3 with the
same trick of light to the same
prediction trick and you see a Windows
Media here and real here and Windows
Media 9 I have been explained by audio
coding expert isn't really differ a lot
from Windows Media eight so this was a
test at 48 kilobits per second and
whoops why does it do this if I don't
want it well 48 kilobits per second done
by the ebu which is really independent
and this was a really professional test
double-blind which means people don't
know what they're listening to and
basically because audio testing is a lot
like voodoo and if you if the
experimental nose was being tested he
can make you believe anything and if you
know what you're listening to you can
also make be made to believe everything
or anything but if it's double blind and
neither the experimenter nor the
listener knows what's going on then you
get really valid results that's what
happened in this test a couple of other
developments some we're going on on
truly 3d video coding at very advanced
some work going on and truly lossless
audio coding that's also for the high
end and there's some we're going on on a
very interesting animation framework
which actually takes a step back and
says okay let's do this right this
animation let's create it excuse me an
integrated framework for animation of
all sorts of graphics context it's not
for video content or for just natural
audio but for for computer generated
content and we'll see where that goes
so why should you use them pick for
apart from the technical details and
this is going this is going in the
business stuff I i I'll quickly go over
this I think if you're a developer it's
may interest you just a little bit less
but let's just take a look at standards
and why they make sense the fuel a lot
of innovation and actually this lighter
I have to acknowledge Tim shaft
originally make this light for giving me
this light standard fuel innovation gsm
is a great example the european or
actually wooden our world quite standard
for mobile telephony na do to the 11
also none under different names right at
apple but you can connect here to your
wireless network it's just works great
they have a really long life standards
to look at the TV standard like bell in
europe and ntsc in the US or mp3 which
is actually over 10 years old now but
it's still a premium feature if you buy
a car stereo mp3 comes at at at a price
it's being built into car stereos and
stuff and digital devices today and it
will not go away no matter how great the
successors are this will after this will
kept being supported that means that as
a consumer you don't have to throw away
formats every other year or every year
and just keep your stuff when it keeps
working DHS has had a long life the CD
has been with us for over 20 years
standards creates huge markets the CD
the DVD and mpeg-2 which is a which are
really really multi tens hundreds of
billions of dollars markets and they
provide an interoperable ecosystem of
tools and come to where you can just use
stuff from different providers and plug
them together any worse and these
different providers can work independent
of one master so to speak
dependent on anyone you know if you
don't have give not locked into any
single vendor and the vendor may be
competing with you by the way in if it
moves into different spaces and there
are the pricing is controlled by the
market and not by a single vendor again
and if you don't like your equipment
from one vendor you can go to the other
one so if you use them big what impact
for a couple benefits for you you can
offer your concert once and then use it
on a couple of different many platforms
and players and code once you may have
to link all the different bit rates but
like was shown this morning this can be
made really easy your users can pick
their favorite stuff they don't have to
stick to one player content providers on
the other hand can pick their favorite
stuff everybody can just provide tools
in their own niche there's a lot of
different niches and it isn't like
one-size-fits-all and competition drives
the quality up if you look at them back
for is both a revolution and an
evolution it's a revolution in what I
explained about the design how it works
and how it can expand to synthetic
content it's an evolution in the in the
sense that it doesn't define you
transport protocols and stuff you could
just use it on whatever is there already
in place and specifically in impact for
as it is today and before with a PC
advanced video coding as it's coming now
it can all use in an impact to
environment which is very big plus for
broadcasters again they don't have to
replace all their impact use all their
impact your broadcasting stuff they just
need to plug into new codec which is
difficult enough but if the games are
good enough then the economics are sound
so it saves you money and it makes you
money i believe by making more efficient
use of bandwidth because it's efficient
by being able to repurpose existing
content now making
interactive or deploying it on a mobile
network no need to duplicate work if you
go to different networks you can
integrate it into existing in peculiar
environments and you can use it or not
be networks just as easily and it makes
you money because you can use your
content in your networks in new ways and
can add new dimensions to content and
there's little risk because it's a
standard that's widely supported
proprietary technology on the other hand
does lock you into third party business
and pricing models and make a dependent
on their road maps and their plans and
the way they choose to evolve their
business and it can get you into channel
conflicts so this is just one of the
forecasts this is a Kipps standalone mp4
tips and course embedded in processors
they think it will explode and it's
already happening I tend to agree and
there are many similar forecasts and one
interesting trend is okay for the coming
few years competition with Windows Media
after that standard will win because the
benefits are just so obvious that the
market will choose for the standards and
there's such a lot of people already
making mpeg-4 stuff it's amazing this is
an important point and I want to dwell
on this for a while because mpeg-4 only
standardizes the decoder there's a lot
of room for innovation and if you see
comparisons and I've seen at seeing very
bad comparisons of notably by Microsoft
that put QuickTime here and then their
latest michaelson late latest codec on
the other hand and then they compare the
quality without saying that they're only
using quick a simple profile for mpeg-4
and that they did be encoding themselves
I mean there's such a lot of tricks you
can pull if you do a quality comparisons
but if i look at mpeg-2 and this is
really the proof of the funding mpeg-2
bit rates have reduced by over fifty
percent over the lifetime of the spirit
and this is an underestimation and we'll
show you the graphs and this was after
the standard was frozen and without
needing to replace the decoders a great
new anchor comes out it just gets
plugged into the broadcasting system you
don't need to replace the set of boxes
just works people come up with great new
tools for encoding DVD DVDs DVD players
don't need to be replaced just decoder
is the same the encode it gets better
and that's what's happening with mpeg-4
in the market today and that's what will
happen really happened with advanced
video coding what's already happening
today and a vc advanced video coding
will beat all the proprietary codecs was
already up there including Windows Media
nine and if I look at that's interesting
you should disregard the numbers here
because they are wrong what's actually
right I thought we hit this stuff
someone's phone is ringing this is sick
there should be six megabits per second
when it started in 1994 1995 today you
can deliver the same quality in 2
megabits per second so it's not like
suggested one megabit it's 2 megabits
per second but still 26 megabits per
second to make a bit per sec without
changing the decoders that's quite
that's quite impressive and that's from
harmonic and if I will take a look at
what den berg says tandberg Peavy is a
competitor of harmonic they basically
tell you the same story but then the
graph should start at eight megabits per
second and now whoops this should have
read eight and here again it should have
read 2 megabits per second something
went wrong in the conversion from
PowerPoint 022 keynote but the picture
is clear from six megabit or from eight
megabits per second to do megabits per
second today is huge improvements
because there is competition so this
open standards its interoperability but
there's a lot
room for competition so briefly let's
look at the deployments of mpeg-4 we see
a PC media player support and a recent
survey turned up on the MPEG 4 and 4s
text notes mailing list turned up like
some 20 different players and some of
them are for facial animation and for 3d
content most of them do basic streaming
click x 6 is there of course real has a
standard plug in which means if you hit
mp4 content with a real player it goes
back to the real server download the
plug-in if it's not already there and
because the decode the content that's
done by in video there are several
plug-ins for Windows Media there's divx
which is an mpeg-4 compliant
implementation which has millions of
downloads weekly and just like quick
time and then mpeg-4 of course is widely
supported in and thread generation in
2.5 g mobile phone networks like Roberto
castaignos said this morning it really
becomes the case that in spite spite of
all these different mobile networks
which are not really in trouble you can
take content from one of them in Japan
and move them to Europe and the content
will play so you can take your phone to
Japan but you can send your content
using the phones and it will play epic
for is used for video AAC is the
optional sound codec in addition to the
mandatory speech codec and the file
format like we said three dubs 3gp is a
very close to mp4 it's just this
top-level a demand and the AMR codec
texture that's used in quicktime 6.3
recently released of course support 3gpp
then the internet swimmingly alliance
set a couple of words about that already
made a specification for interoperable
mpeg-4 across the internet what's maybe
more hidden in the background is that
mpeg-4 is becoming the de facto standard
for security and surveillance there's a
lot of surveillance cameras with hard
disk recorders and stuff that just use a
big force almost silently because we
don't and I don't get to hear a lot
about them and interestingly you see a
lot of whole media centers that do went
back for it and people use diffic still
rip the content and then they put it on
a DVD and they put it in the DVD player
the DVD understands that big 4 and these
are just a couple of recent
announcements and I was mentioned the
mole but there's chips there's video
cameras there is a solid-state video
cameras these are cool is it just this
size basically and you could we can
record on the on an SD card or a memory
stick or something you can record a half
an hour of video and audio that you can
that's watchable on a TV I won't say
it's like DVD quality but it's perfectly
watchable just another device this size
there's there's portable stuff and it's
coming more like this video jukebox is
to use them back for and there's of
course mobile phones they don't use
decode but some of them also stream it
so it's not just in or is it not just
recording some of them can even play it
out while it's being recorded it's
pretty cool so lastly I want to say a
couple of words about the impact for
industry forum and in that context even
though we're not responsible for it
about licensing of mpeg-4 because some
of you may have who's heard about
licensing here by the way yes right
that's right okay so come back here
Monday Thursday morning I won't be here
unfortunately because we have our annual
f4f meeting but someone will be here to
explain so let me say a couple of words
or a three-year-old now by the hundred
members we have worldwide in across
industries very much according to impact
force vision a nonprofit organization
we have these and many other members
Apple is of course there and you see is
a lot of major companies but there are
also smaller ones and they come from itd
come from the consumer electronics
industry to come from the mobile
operators they come from all across the
globe and all across the industry and
they all believe in this single standard
that works across everything our goal is
to get mpeg4 adopted and we have done a
couple of things that are important
we've discussed licensing a lot again I
will say without responsible for
licensing I'll clarify that in a later
slide we've done a lot of
interoperability with a program with
over 30 companies exchanging the
extremes between their products we will
have a logo program pretty soon and if
you type in mpeg-4 in Google you come to
the end for your website and you get a
host of information and this is a
membership it's if you're interested
three thousand dollars for full
membership and three hundred dollars for
and not-for-profits that I want to well
on that too much this is important
though and this is the last thing I'm
trying to tell you and then we'll have
questions the licensing a lot has been
said about licensing a lot has been true
and a lot has been false by the way but
the responsibilities are as follows MPEG
standardizes so ampeg is the moving
picture experts group makes the
standards and by ISIL rules they can't
really deal with licensing although but
the new codec there's been a lot of
effort to get a royalty-free baseline
codec it is the simplest incarnation
it's a profile and it's been a lot of
effort to to keep to try to keep this
royalty-free for licensing then there's
the mpeg-4 industry forum which done a
lot of work to you get licensing of the
ground but does it see anything of the
proceeds doesn't require anything of its
members with respect to licensing it's
just literally a catalyst
if you know our catalyst works before
and after the chemical reaction to get
the list isn't changed well if the if
the reaction goes really bad the
catalyst may go away that may still
happen but then there's the license
source that the people that actually
have patents to sit together in some
room and decide and sell licenses and
what I'm 4ef says the licenses need to
be competitive it should be possible to
build competitive product given the
licensing so that's what we're working
on right now still working on right now
and actually working a really hard right
now to get this right for ABC because we
know some things need to be improved
their Monday morning Larry horn I think
of MPEG LA will be here to answer your
questions about licensing is going to
tell you what's called the truth about a
big for licensing my personal opinion is
that it's great for devices it's great
for phones it doesn't work yet for
content providers that's what we're
working on right now and there's a lot
riding on this I can tell you hey I only
expect to this life Amy
Rob pointed out there is the session
Thursday morning with Larry party let's
hang a light is probably a particular
interest I'll just so sorry and there's
a lot more it's good quick on he has to
your own you guys is it them pick a
Larry horn that's coming it's a high
okay I wish I could be here he said we
just have to send for EF annual meeting
I need to elect a new board and all that
sort of stuff that it's interesting to
but I wish I could just be or not
slavery some questions not only a great
session questions
you