WWDC2016 Session 715

Transcript

[ Music ]
>> Good afternoon, and welcome
to the Accelerate Session.
My name is Eric Bainville,
and I'm with the Core OS,
Vector and Numerics group.
Our group provides
performance libraries for CPU.
Performance libraries means
we usually are [inaudible] can
provide very compute intense
functions like matrix products,
[inaudible] transform,
these kind of things.
Most, most of these
functions are inside the
Accelerate framework.
We find for example, image,
which is an image
processing library.
It will transform between types
and also do geometric
transformations on images.
and also do geometric
transformations on images.
Next to vImage you will
find vDSP, which is more
for signal processing,
[inaudible] transforms,
and other signal
processing functions.
Then we have the BLAS, which
is linear algebra library.
It's very old.
It comes straight from the 70's.
Few years ago we added
Sparse, SparseBLAS,
SparseBLAS computation on
the vector and matrices.
We also provided LAPACK
in the LinearAlgebra,
which is higher level
linear algebra library.
Outside of Accelerate, we also
have a few libraries, like simd,
which is a set of headers
providing you direct access
to vector instructions
and vector types,
and they are not
directed to vector units
in the CPU's [inaudible]
without writing [inaudible]
or assembly code.
We also have compression,
which we introduced last year
for lossless compression,
and everything we
provide is optimized
and everything we
provide is optimized
for all the CPU's we support.
So when you get a new phone,
we optimize a code for it
so you don't have to do it.
Alright. Today, first, we start
with a quick refresh
on compression.
Then we'll introduce two new
libraries inside Accelerate;
BNNS, which is a set of
low level compute functions
for neural networks;
and the Quadrature,
which is a small
library dedicated
to numerical integration
of functions;
and then my colleague,
Steve, will come on stage
and introduce new
additions to simd.
Well, first, before we start,
let me tell you quickly
how to use Accelerate.
So depending on your language of
choice, you will have to import
or include the, the
declarations of the functions,
and then you need to link
with the Accelerate framework.
So from Xcode in your project
settings you will navigate
on your target, and then you
click on the, on build phases,
on your target, and then you
click on the, on build phases,
and as a window appears,
you click on link
binary with libraries.
You open that.
Alright. And those
are the ones appear
with the little plus here,
and you will get a list
of all the libraries and
frameworks you can link with,
and Accelerate is the first one.
[ Applause ]
Yes. Alright.
Well, if it's not the first one,
there's a search
bar on top of it.
So if you look for compression,
you type compression,
I knew we'd get compression.
Alright. So that's, now
you're using Accelerate.
OK. Yes. Then compression,
remember last year
when Sebastian introduced the
LZFSE's the platform state
of the union where we still
don't know who these guys are,
but today we're announcing
that LZFSE is open source.
So you will find
it on the github,
and we publish it
under BSD license.
Just let me remind you why
you will want to use that.
Just let me remind you why
you will want to use that.
So these are the platform
comparison between LZFSE
and zlib compared
to the same options.
So we are 1.4x faster for encode
and 2.6x faster for decode.
Alright. That was
our compression.
Now let's jump to BNNS.
Basic neural networks
subroutines.
So the name looks
really like BLAS.
BLAS means basic linear
algebra subroutines.
As I said, it comes
straight from the 70's,
and BNNS provides a, a set
of very low-level
computation routines.
So I will [inaudible] later.
We do only low level
computations
like matrix products but
dedicated to neural networks.
Before I enter into
details of what it does
and what the API is,
let me remind you how
neural network works.
Let's say we have this
network, which is trained
to recognize animals, OK.
So on input you will
have an image,
and then you have
this big orange box.
Inside the orange box, the red
box represents the, the weights.
These are the parameters of,
of the network, and on output,
this thing we output
four values,
which are the probability
of having a cat, a dog,
a giraffe, or a snake.
OK. So first you
need to train it.
Let's say you have a cat image.
You process the cat image to the
network, and you get an answer.
So that's the highest
probability.
It says dog.
Well, that's what you
need to train them.
So it was a cat,
and what you do is
that you [inaudible]
the good answer,
and what if I slightly
modify the weights.
So the networks now will go
slightly in the cat direction,
and you need to do this millions
of times with a lot of images,
and at some point, you will
have a trained network answering
correctly for, for
on every request.
correctly for, for
on every request.
So if it was a, a giraffe,
and it says giraffe
because we trained it properly.
Alright. So that's
what the network does,
and notice the difference
between training and inference.
During inference, we
didn't change the weight.
So usually let's say
you have this as an app.
What will happen is that you
will do the training offline
when you build your app,
you will do your training,
and then when you shift the
app, you will shift the weights
and the, the network
apology, and what happens
on the device is on
this inference part.
OK. What's inside this
orange box, though?
So let me show you an example.
If you have been to the
What's New in Metal,
Part II speech yesterday, you
have seen a bigger example of,
that was a scene
recognition network,
and lots of demo [inaudible]
recognition network,
these are more advanced networks
with hundreds of layers.
This is state of the art from
five years ago, and this,
this describes a, a
[inaudible] recognition network.
So on input you have the small
image, which is a capture
So on input you have the small
image, which is a capture
of something written, and then
it goes to different layers.
Here you have a 5
x 5 convolution,
and the output will be
a stack of five images.
That's an output of five
different convolutions
with different weights, and
then you add another layer.
You take these five images,
apply a lot of convolutions
to them, and you will
get another bigger stack
of fifty images with
5 x 5 pixels.
So what's happening here is
you go from the image space
into the feature space
one layer at a time.
So the content of this
small images becomes more
and more abstract, and at the
end what you have is the feature
which will tell you that's
a zero, that's the one.
This is what you want on output.
OK. And so with a very
small number of layers,
you can really do that,
and it works really well.
So after these convolutions,
we take this 5 x 5 x 50 values,
take them as a single big vector
and apply a fully
connected layer,
which is just a big
metrics product,
and it will mix all
these values together
and produce a set of 100 values.
That's called the hidden
layer in this specific model,
and then you need a last one,
which will make this 100 values,
mix them together, and produce
the ten outputs you want.
So at this point you are
computing future space each
value here is a probability
of being one specific digit.
Alright. So that's what
is inside the network,
and as you have seen here,
we have two different kinds
of layers, and this is
what we do inside of BNNS.
We provide the compute
part of these layers.
OK. Before we start about
what we really compute
and what the API is, let
me show you some numbers.
So this, this, here
[inaudible] , Caffe,
which is a well-known package
for [inaudible] network
computation,
and this is a convolution
part of Caffe.
So we have 14 different
convolution sizes.
So we have 14 different
convolution sizes.
Here, you can read them.
And this is the time Caffe takes
to, actually, that's the speed.
So higher is better.
This is Caffe on
this convolutions,
and this is what
you get with BNNS.
So on average, it's 2.1x faster,
and if you have even
bigger convolutions,
you can even reach
almost four times faster.
Alright. So that
was all the numbers.
Now let me tell you
what is inside BNNS.
So that's a set of low
level compute functions.
Very similar to BLAS.
That's why we named it BNNS.
It doesn't really know what's,
what is a neural network.
That's, that's your
side of the equation.
What it does is provide really
only the compute part of it.
OK. And it only,
only does inference.
Actually I'm not sure
it would make sense to,
to run training on the device.
That's, that's too expensive.
You need millions of images
and million of computations.
That wouldn't fit.
So usually inference would be
done offline, and as I told,
you will just do the inference
part on your, in your app.
And we provide three
different types of layers.
Convolution layers,
pooling layers,
and the fully-connected layers.
Why? Well, actually in
the, in the modern network,
you spend more than 75 percent
of the time computing
convolutions,
and then the next one on
the list of pooling layers
with something like 15 percent,
and fully-connected layers
also take a lot of time
but usually you find them only
at the end of the network, like,
like we have seen in the
example where we only two,
two fully-connected
layers at the end,
but this did take a lot of time.
OK. Now that we know what
is inside, I will enter
into the details of the three
different kind of layers
and what we compute and how to
create them through the API.
Let's start with the
convolution layers.
So this is a convolution.
It takes an input image, a block
of weight that the orange matrix
It takes an input image, a block
of weight that the orange matrix
on the middle, and each pixel
in the output image is computed
by taking a block of
input pixels, multiply,
multiplying them by
the, the weights,
and then you take
the sum of that,
and you get your upper pixel.
And you need to do that
for every output pixel.
So if you count, that's
a four dimensional loop
because you need
to loop on x, y,
and then on the kernel
dimensions.
Actually what they do is
slightly more complex than that
because what we have on input
is just, is not one image.
We have a stack of images.
So we need to duplicate
the weights
and know what we compute is
this convolution on each layer,
and then we take the sum of
all these guys, and what we,
and we get, to get
our output pixel.
So now the, the loop
is five dimensional.
I added the IC [inaudible]
of the equation,
and really this is
not what we compute
because we also have
a stack in output.
So what we really compute
in this convolution is this.
We do this many times, one
time for each output layer,
We do this many times, one
time for each output layer,
and now we have a
six-dimensional loop.
That means even if the
dimensions are very small,
like in this example, we
are nothing bigger than 264.
It's very small, but when
you multiply all these stuff
together, what you get is
billions of operations.
That means tens of
milliseconds of compute,
and in an entire network,
which is much bigger,
you can have trillions of,
of floating point operations.
That means seconds of CPU time.
OK. A compute in the
convolution layer,
now how do you create
that with BNNS?
Well, first thing
you need to do is
to describe your input stacks.
So you need to specify the
dimensions of the image,
the number of channels, and
also all the layered in memory.
So the increment between
two rows and the increment
between two layers, and
also a very important thing,
what type is used to store them.
For example, we, we work
on float -- 64-bit float.
For example, we, we work
on float -- 64-bit float.
On neural networks, we don't
need all this precision,
and usually people will use
64-bit float, and that's good
because the storage
is half of it.
So instead of having 20
megabytes, we have only 10,
and if you can go on
integer, eight-bit integer,
you will have only
five megabytes.
And you usually, usually don't
use any precision in the output.
So you can specify the type
used to store the input.
You need to do the same for the
output stack, and then you need
to describe the convolution
itself.
That's the kernel dimensions,
the padding which is
a [inaudible] band of,
of 0 added to the input, the
stride, which is the increment
of the loop in x and y, and the,
again, you need to repeat the,
the channel counts for input and
output and specify the weights.
That's what the orange block in
the middle, specify the weights,
and, again, you can have a,
a different storage type of,
of the weights, and
usually you want 16- or,
or 8-bit storage
for the weights.
or 8-bit storage
for the weights.
Again, because it will
lower the memory usage
and the storage needed
to store them.
Then once you have done that,
you can create a convolution
filter with this function.
You tell it, OK, this
is my input stack,
that's my output stack, that's
my convolution create a filter,
and you will get a filter
object which then you can use
to apply the convolution
on your, on your data,
and once you are done with it,
you will call a [inaudible]
filter to get rid of it and,
and really the resource is used.
This was for the
convolution layers.
Now, let's go and with
the pooling layers.
Pooling is slightly
simpler than convolutions.
So what to compute one output
pixel what you do is you take a
block of input pixels and take
the max value of the average,
and that's your result,
and you do this
for all pixels in all channels.
That's what the formula says.
Well, again, to create a, a
filter for the pooling layer,
you need to describe the input
and output stacks the same way
you need to describe the input
and output stacks the same way
as before, and you also
need to describe the,
the pooling layer itself.
Again, with the kernel
dimensions, padding, stride,
and this time you don't have
weights, but which function
to use to compute the output.
That will be max average.
And then once you have done
that, you can create the filter,
and you get a filter
object similar
to the one we had before.
The last kind of layers
we support is the fully
connected layers.
Well, it's called
fully connected
but that's a matrix
product hidden here.
So on input you have a vector,
and then you will multiply it
by the matrix, add some,
add a vector of bias,
and then you get the output.
So just a, a matrix product.
So this time, you don't have
images in it for your vector.
So you need to describe the
vector, that's its size.
Point out to the data
and also the type you use
to store these values, and,
again, you can use 32, 16 bits,
floating point, and integer.
And the you need to
describe the layer itself
And the you need to
describe the layer itself
through the dimensions
of the matrix
and the matrix co-efficients.
And also the bias is not on
the slide, but you have a bias.
And then you can create the
convolution filter, and you get,
again, if [inaudible]
to the others.
Now what we have on this
filters is how do we apply them.
So you have your input
data as your output data
and your filter,
and you will have,
you have two functions
to apply them.
So it's called filter apply if
you have only one pair of input
and output, and you have,
if you have several of them,
you will call filter apply
batch, and you, you tell,
we tell you the number of
pairs you hvee and how to get
from one pair to the next one.
That's a stride in memory.
Alright. And that's it for BNNS.
Let me wrap up.
So BNNS is a set of, of
low-level compute functions
for our neural networks.
Really low level.
We do the compute.
We do it well.
We do it fast, but it doesn't
know what a neural network is.
We do it fast, but it doesn't
know what a neural network is.
It just does the compute.
So we optimize it to be
fast and energy efficient,
and important thing, it will
under multiple data types.
OK, that was it for BNNS.
Now Quadrature.
We have a request, will, people
who are asking us about library
to do a numerical
integration of functions.
So here it is.
Yeah, remember your,
your school days.
So this is computing
the integral
of a function between a and b.
So that's a green area
between the curve and the axis.
Well, so to do that, you first
need to describe your function.
So you need to provide
a callback.
Once thing we did, and which
is different from your,
the old library's
doing the same thing is
that the callbacks takes a
number of points to evaluate.
Because usually when you compute
the integral, you will have
to evaluate the function
in, at many points,
to evaluate the function
in, at many points,
and if you have a callback,
a vectorized callback doing
that faster, it's all good.
You can do much, much faster
with this multiple x callback.
So it will, it will pass
you a number of values in x,
and you will have
to fill it with a f
of xi inside each value of y.
So that's your function,
and then you need
to tell it how to integrate it.
So we provide a set of three
different integration methods
with different complexity
and, and time, and also some
of them are able to integrate
to infinity and, note,
you will find details in
the Quadrature header.
And you also need to specify
what is the error you want
on the output, and
the max number
of intervals we can
subdivide maybe to,
to complete the output.
And then you pass that to
the integrate function,
and you also tell it a, b,
and you pass a point
to receive the error.
It's called estimated
error is here,
and it will return you the
estimated error on the output
and it will return you the
estimated error on the output
and also status, we receive
the status of the computation
because if you ask a
very, very low error,
sometimes it's not
able to convert,
and we will get that
in a status.
And that's it for Quadrature.
And now let me call
Steve on stage.
He will tell you about
new additions to simd.
>> Thanks very much, Eric.
My name's Steven Canon.
I work in the Vector,
Numerics group with Eric.
Eric took you back to
calculus just then.
I'm going to take you
ahead again a little bit
to linear algebra right now.
We have this nice module, simd,
which provides geometric
operations and vector operations
for C, Objective-C,
C ++, and Swift.
And it really closely mirrors
the Metal shading language.
It ties in closely with
SceneKit and Model I/O
and all those graphics
libraries.
If you find yourself writing
vector code to do, you know,
small, you know, 3 x 3,
4 x 4 kind of linear
algebra operations,
4 x 4 kind of linear
algebra operations,
this is the library you probably
want to be using instead
of writing that by hand.
We have most of what
you want available.
If something's not there,
ask us to provide it.
It's all really fast
and there it is.
So what's there now?
We've have a bunch of types.
There's vectors of floats and of
doubles, and we also have signed
and unsigned integers
like 2, 3, and 4.
And we have matrices of floats
and doubles of those same sizes.
And this is just what's
available in every language,
in C and C++ and Objective-C.
There's a bunch of other types
as well which are really useful
for writing your own generic
vector code, but I'm going
to focus on sort of the common
subset of what's available
on all the languages on all the
platforms in our talk today.
We have operations on those
types, too, obviously.
There's all the usual
arithmetic operations
for vectors and for matrices.
And we have all the familiar
geometry and shader functions
if you've done any
shader programming before.
Most of what you
would want to use is,
is going to be available here.
I'll show you a little
example now.
I'll show you a little
example now.
So this is the same
function written three times
in three different languages.
We have it in Objective-C
up top, we have it in C ++
in the middle, and we have
it in Swift on the bottom,
and you can see that the
boilerplate is a little bit
different between
the languages just
because function
declarations work differently
in these languages,
but if we focus
in on the actual computation
part, it's essentially the same
in all the languages, and it
also really closely mirrors the,
the way you would just write
this naturally in math.
So you don't have a lot
of weird function calls.
You don't have to write
four loops, all that stuff.
You just write your code in
this, this natural fluent style,
and we translate
all that for you.
So it's, it's nice and easy
to write, and this looks just
like the Metal code you would
write to do this as well.
Now it happens that the reflect
function is already available
built into the library already.
So you don't need to
write this yourself.
Calling functions
between languages
like there's a whole bunch
of stuff in model I/O
that exposes Objective-C
API's that take these types.
Situation is pretty good.
The vector types are
compiler extensions in C,
The vector types are
compiler extensions in C,
Objective-C, and in C++.
And in Swift, they're
defined as structs,
but the compiler knows how
to map between them for you.
So you don't really
need to do anything.
Here's a simple example.
If I have an Objective-C
function,
I call some function here
that operates on vector types,
and I want to call that function
from Swift using the
Swift vector types,
I can just do that,
and it just works.
I don't need to do
anything fancy.
These types have
the same layout.
So there's no cost to converting
or anything like that.
Similarly, for matrices,
the Swift matrix types are
layout compatible with the C,
Objective-C, and C++ types.
So if I need to work with
them, here I have a Swift,
I'm going to create a
Swift type from the C type.
All I need to do is
use the init function.
Works fine.
Doesn't have any
computational cost.
It just sort of changes the
types silently for me, and,
similarly, I can use the C
matrix property if I need
to get a C type to call C or
Objective-C or C++ function.
to get a C type to call C or
Objective-C or C++ function.
So we have a few things
that are new this year
that I want to show to you.
We have three new functions
- simd orient, simd incircle,
simd insphere - and these are
overloaded to support a bunch
of different types and different
lengths and things like that.
Basically everything in the
simd library works that way.
So even though we have
just three new functions,
it's actually a lot
of new stuff.
So I'm going to start
with orient.
What orient does is it lets us
answer the question is a set
of vectors positively oriented?
And what that means,
if you don't remember
your linear algebra,
is do they obey the
right-hand rule.
You might remember that from
physics, or, equivalently,
is there determinate positive.
If there's any math
majors in the audience,
you're objecting
right now that a set
of vectors does not
have a determinate.
What I mean here is just
you, you kind of take them,
and you slam them
together to make a matrix.
Take the determinate
of that matrix.
Is that positive or not?
So why do we care about this?
This is kind of simple
and stupid.
You can use this to answer a lot
of computational geometry
questions that are very useful.
of computational geometry
questions that are very useful.
Like, is the triangle facing
towards me or away from me?
If you think about a
tetrahedron, there are two faces
that are pointing towards you,
and there are also two faces
on the backside that
face away from you.
And if you're doing graphics
operations, it's useful to know
which ones are facing
towards you
because those are the ones
you want to operate on.
Similarly, if I have
a line and a point,
and I want to answer the
question is the point
on the line, or if it's not,
which side of the line is it on.
I can use the orient predicate
to answer that question.
Now this seems kind of
simple, and it is simple except
that it can be very hard to
actually answer this question
when the point is close to the
line, and I'm going to talk
about that more later.
So here's a simple
example of that.
I'm going to have a plane over
on the right-hand side of the,
the display here, and I have,
going to put some
points on that plane.
So I have three points,
a, b, and c, and I'm going
to query the orientation
of those with simd orient.
Now because we go
counterclockwise as we go from a
to b to c, there, we say that
these are positively oriented.
to b to c, there, we say that
these are positively oriented.
That's what it means
to be positively
oriented in the plane.
If we move one of the points so
that that order is clockwise,
now it's negatively oriented,
and if I move the point c
so it's exactly on the
line between a and b,
then they're co-linear
and the orientation is 0,
or you might say
it's degenerate.
Now it's very hard to tell in
general if a point is exactly
on the line especially with
floating point coordinates.
And that's because orientation
is numerically unstable.
So because of floating point
rounding, if the results
of this determinate
is very nearly 0,
you can easily get the wrong
sign, and for some algorithms,
that doesn't matter, but
for other algorithms,
you can actually
fail to converge,
or you may get the wrong result
when you're working with this.
For collision detection
and things like that,
it can actually be quite
important to be able
to answer this question.
Also for things like triangular
mesh generation for models
and things, this is
an important question
to be able to answer exactly.
So let's look at
an example where,
where this is actually
hard to answer.
where this is actually
hard to answer.
I'm going to put two vectors
in the plane, u and v,
and these are almost
identical vectors, right.
They, they differ
by the smallest amount
they could possibly differ.
And I've magnified
them enormously
on the right-hand side.
So you can see that
they're actually different.
If I drew these to
scale, they would,
they would overlap entirely.
And if we compute the
orientation in the way
that you would normally compute
this, we get a result of 0
because of floating
point rounding.
Now this is a simple
example where we get 0.
With dimensions bigger
than 2 x 2,
we can actually get the
wrong sign entirely for this.
So it's not just that it could
be 0 when it shouldn't be.
But if we use the
simd orient function,
we get a tiny positive
number, which is,
which is the right result.
These are positively oriented.
And I should point out here
that don't interpret
this tiny positive number
as meaning anything.
It is not the determinate.
It is sometimes the determinate,
but it's sometimes just going
to have the right sign.
So what we're really
interested here is
in the sign of this number.
How are we able to do this?
So the geometric [inaudible]
that I'm showing you
today use something called
adaptive precision.
We go and compute as many
bits as we need to compute
to get the right result,
and this lets us return
the right result very,
very fast in most cases,
but if we need to go off
and do the exact computation
to give you the right answer,
we will, we will do that.
So you can just trust that that
this gives you the right answer
in your code, and you don't need
to worry about that yourself.
Incircle is very similar.
We take three points
in the plane.
That determines the circle.
You can notice that
they're positively oriented
around the circle here.
That's important.
And if I put a point
in the middle, x,
then simd incircle can
tell me if it's inside.
In that case, I get
a positive result.
If it's on the circle, I get 0.
And if it's outside the
circle, I get a negative result.
And insphere is exactly
the same thing.
Just in three dimensions now.
I need four dimensions
to determine the sphere.
I give it my point x,
and I get a result.
I'll show you an example I
talked about earlier figuring
out if a triangle faces
towards you or away from you.
Here I've got a really
simple struct
to represent a triangle
in Swift.
Triangle is determined
by three vertices
that I happen, a
collection here.
And I have this predicate
is facing to tell me if the,
the triangle is facing
towards the camera or not.
So the usual way you
would compute this is
to compute a normal to the
triangle with the cross product,
and then take the dot product
of that with the vector
to the camera, and
if that's positive,
then the triangle
faces the camera.
We can simplify this code
a lot and make it correct
by just using the
simd orient predicate.
So my code is simpler,
it's fast,
and it gives me the
right answer.
Is it, these are all things I
like, and that's what we try
to do in Accelerate across the
board is give you simple things
for complex mathematical
calculations.
So we showed you a bunch
of new stuff today.
We have some new
libraries entirely.
We have some new
libraries entirely.
We have BNNS for
neural networks.
We have Quadrature, and we
also have some new features,
orientation and incircle
in simd.
Every one of these features and
libraries is added in response
to a request that we
got from developers.
So we really want to hear from
you guys about what you need,
what we can add to make your
computational workloads easier.
We want to give you
simple interfaces
that let you do what you
need to do efficiently.
We've also done a ton of
other stuff this year.
In vImage, which Eric
mentioned briefly in passing,
we have a whole set
of geometry operations
for interleaved chroma planes.
This is absolutely one
of the most requested features
we've had in recent years.
So we're, we're really
excited to have that.
If you don't know what that
is, don't worry about it,
but if you know what it is,
you understand why it's useful.
And we also have expanded
supports for new formats
in the vImage conversion
routines.
This underlies a lot
of the deep color stuff
that you may have heard about.
So this is, this is
really important for that.
We have improved the performance
for interleaved complex
formats in vDSP.
With the FFT's, we support both
interleaved and planer layouts
where the complex and imaginary
parts are either separated
or put together.
Planer layouts are what we
really prefer to operate with,
and we recommend you do
that, but if you happen
to have interleaved data, you
can just use the FFT's now,
and they'll be really fast.
We've also improved
the performance
of all the Level
II BLAS operations.
Some of that's motivated by
the, the BNNS stuff you saw,
and some of that's just
opportunity that we saw to go
after that, and there's a
ton of other stuff that's new
in Accelerate, lots
of improved stuff.
Every time new processors
come out, we make sure
that we're tuning for that.
We want to take care
of all those low-level
computational details
so you can focus on writing
high-level algorithms that,
that just sit on top of that,
and you can do the
job you need to do.
In summary, we want to
be your single-stop shop
for computational algorithms
where we give you
implementations
that are correct, they're fast,
and they're energy efficient,
and we're going to keep
tuning them for new hardware
and we're going to keep
tuning them for new hardware
as it comes along so you don't
need to worry about that.
If you want to [inaudible]
yourself and we ship a,
a new processor, then you're
going to need to update
for that, and if you let
us handle that for you,
then you don't need to worry.
Keep the future requests coming.
We love to get them from you.
We talked to a bunch of you
in the labs earlier today.
We got a bunch of great
future requests already.
Can file radars.
We love for that to happen.
If you want more information,
the link for this talk is here.
I also encourage you to check
out previous years' talks
where you've gone into more
detail about other aspects
of the library that are useful.
There were two great
Metal sessions
that I really highly
recommend everyone check
out from yesterday,
especially if you're interested
in this kind of stuff.
Thanks very much for
coming out everyone.
[ Applause ]