WWDC2013 Session 224

Transcript

X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
[ Silence ]
>> Good morning and welcome.
Thank you.
[ Applause ]
Thank you.
My name is Quinn Taylor, I'm an
Internal Applications Engineer
at Apple and I'm excited
to be talking today
about Designing Code
for Performance.
It's great to see such a
big crowd here, obviously,
you're all interested
in performance
as well and that's fantastic.
So, to start off, I want
to give you a little bit
of introduction, the motivation
for this talk, why
it came to be.
It's no secret that the raging
success of the App Store has led
to a huge influx of new
developers to the platform.
As you've heard Tim
say on Tues--
on Monday, about two-thirds of
you are here for this conference
for the first time,
that's fantastic.
Many of you are in fact new to
programming in the recent past.
You come from a diverse
array of backgrounds
and experience levels
and that's great.
You have a lot of different
kinds of wonderful apps
that you can contribute
for our customers
in the App Store that they love.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
However, across all these
domains and different types
of applications, one
thing that's constant is
that everyone uses data
structures and it's common
to have performance issues.
No matter what your app does,
you're going to be using arrays
and dictionaries and so on and
it's important to understand how
that can affect your
application.
When you have issues like this,
often they're puzzling unless
you have some knowledge
of what's going on under the
hood and what's happening
so you can evaluate
your app and improve.
So, the goal of this session is
really to teach you how to fish.
I'm not here to give
you a one half tip
about how you can fix
this specific problem,
but really to look at the
grand scheme of things,
understand when it comes
to data structures,
how can I make my app
perform the best possible.
OK. So the things we're going
to cover today, first off,
when to focus on performance,
how to evaluate what we call
computational complexity,
how to choose and use data
structures in your application,
and last, how to design
your code for performance.
We'll give some concrete
examples.
So to start off, when to focus
on performance, and you think,
is it a trick question, the
answer is always, right?
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
That's true to certain extent
and we will talk
about that in detail.
To start off with a quote,
I have this on my wall,
this is form Steve Jobs,
"We don't get a chance to do
that many things, and everyone
should be really excellent.
We've all chosen to do
this with our lives.
So it better be damn good.
It better be worth it."
I love this quote
because it motivates me
to make my applications the
best that I possibly can and try
to eek performance
out of every angle.
Have you noticed that he says,
"Everyone should be
really excellent,"
it's not necessarily perfect.
We as developers know that
our applications have flaws.
We do the best that we can
to make them as polished
as possible before
we hand them over,
but we have a limited
amount of time.
Steve also said, "You
have to pick carefully.
Innovation is saying
no to 1,000 things."
So how do we strike
the balance here
when it comes to performance?
You may have seen a
quote from Donald Knuth,
a famous computer
scientist where he says,
"Premature optimization
is the root of all evil."
People tend to throw this
around at developer forums
and some people will use
this almost an excuse to say,
"You don't need to worry about
performance at all, right?
It's not going to
be a big deal."
Sometimes that's the case.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
But unfortunately, we usually
only see this middle sentence
of the quote.
But he said a lot more,
he said, "We should forget
about small efficiencies
about 97 percent of the time.
Yet we should not pass
up our opportunities
in that critical 3 percent.
So there comes a
time when focusing
on performance is really
important for your application.
I like to summarize this by
saying, "Optimize performance
when it will make a
meaningful difference."
That's what we're going
to be talking about today,
is how I can choose to see--
is this going to make a
difference for my application?
So, to talk about that, there's
a principle called Amdahl's Law
which is really about
helping you pick your battles
in performance.
So this law was proposed
by Gene Amdahl,
a very famous computer
architect, and basically,
it has to do with predicting
the maximum improvement
that you can expect by speeding
up some portion of your code.
This depends obviously on
what percentage of time
that code is taking to
begin with to see what kind
of speed that you can achieve.
And the payoff is much larger
for the dominant piece of code,
the thing that's
taking the most time,
this appear fairly obvious.
So the question is,
will the payoff
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
of improving the
performance of this of piece
of code be worth the effort,
the time that it's going
to take me to do that?
And this applies
directly to concurrency.
In fact, as this law was
originally stated, it had to do
with multi-core processing
and breaking your code
up into multiple pieces, and
that has great tie ends as well
to Grand Central
Dispatch and using blocks.
So let me give you an
example of Amdahl's Law.
Say that you have a process
that has two segments A and B
and one takes 80 seconds and
one takes 20 seconds, all right.
So we have a certain
amount of time.
Now, if you can spend a
bit of time and optimize
and cut the time spent by
process B-- segment B in half,
then you have a great win.
Now you're 90 percent of
your previous performance.
However, if you can apply
the same effort to speed
up process A and cut
that time in half,
now you've gone to 60 percent.
This is a much bigger win
and this is what we talk
about when we say
identifying bottlenecks,
looking at the actual
problems in your code.
So that's where you really
want to focus on performance.
You know, it's great to have
a performance win in segment B
but everyone has things
that they have to do.
So you got to choose
wisely what you work on.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
So back to Donald Knuth,
he talked about premature
optimization.
Now, premature optimization,
generally leads
to unnecessary complexity
in your code.
You take something
that's simple and to try
to get more performance out of
it, you change and tweak things
and make it more
complex and clever.
Sometimes that's great.
But if it ain't broke,
don't fix it.
You don't have to fix something
that's not necessarily a problem
for your code.
You have a ton of features
that you need to implement
for your app and polishing it
to make it great for you users.
So I'd like to focus on
something I call informed design
which leads to elegant
and efficient code.
Informed design is all
about considering your
performance early on,
even during the design phase.
You can do that before you
even write a line of code.
And it helps to intelligently
avoid problems
that you can actually
face in the real world,
rather than premature
optimization
and fixing your problem
that may not be
such a big problem after all.
And this is useful because it
can help you avoid designing
slowness into your application
only to fix it later.
So if you can think about
it upfront, it's a big win.
OK. So now let's move
to talking about how
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
to design for performance.
If you have a performance
issue that you've identified
in your code, there's
three broad ways
that you can resolve that.
The first is, don't do it.
If there's unnecessary work,
then you can completely
cut it out.
The second is do it as rarely as
possible, and the third is do it
as efficiently as possible.
Now, these are generally in
increasing order of difficulty.
It's very easy to just remove
code that you no longer need,
but getting something to
be more efficient can be
really difficult.
In order to answer these
questions and choose
which approach works
for your scenario,
you really have to
have some context.
Is the work necessary?
If not, I may be
able to remove it.
Is redundant work being done?
Doing the same thing
over and over again,
I may be able to
reuse that work.
Or is there a more
efficient way?
And that last question is
really the most tricky.
How do I know if I can do better
than what I'm doing right now?
So, to answer that question,
we're going to the next portion
of the talk-- where we talk
about computational
complexity and cost.
Now, these are some big words
so I'll explain out for you
in kind of broad terms.
So we're talking about the
cost of code, and by this,
I don't mean how much you
pay for an app in the store
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
or how much it cost to you
to have people develop it.
Every piece of code takes
some amount of time to run
because there's work being
done, and it's obvious
that more work takes more time.
However, it's not
necessarily obvious
that sometimes you can
have really short code
that does a lot of work and it
can hide some complexity, right?
There's a cost associated
with a particular code,
perhaps an API Caller, so on,
or you're using a library.
Now, the effects of data
growth can vary quite a bit
when you're choosing
an algorithm
and it can really impact you
performance, particularly
as your data size grows.
It may be disproportional
affect the number
of objects that you add.
And consequently, small
tests often won't turn
up this kind of problems.
We do-- we try to
do as much testing
as we can before we send
our app out into the world,
but you've probably
all experienced
that your customers will use
you app in new and exciting
and sometimes terrifying ways
and throw a lot of data added
in ways that you
didn't anticipate.
And sometimes these
performance issues will crap
out where you least
want them to.
OK. Fortunately, this kind of
complexity can often be analyzed
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
without even running the code.
Much like the Xcode static
analyzer can do that.
And the key to this is
understanding the complexity
of the code that you're running
and how much work is being done.
Now, Computer Science has entire
semester courses devoted to this
that leaves sophomore
students really puzzled.
We don't the have time to
get into entire semester
but I'm going to give you
kind of a crash course
and help you have a framework
for understanding complexity
and analyzing your code.
So, in Computer Science, we talk
about something called
"Big O" notation.
So this is a way by ranking
algorithms by efficiency,
generally, by their
time efficiency,
or memory efficiency, or so on.
And the letter O stands for
the order, the order of growth
of this and we'll talk
about this in detail.
And it really relates to how the
performance changes as the scale
of the work that you're doing
when you increase the number
of objects you're
working with for example.
So, with Big O notation,
what we're really seeking
to do is approximate the worst
case behavior for an algorithm,
the most work that you
might ever have to do.
Ideally with an algorithm, you
may be able to skip out early
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
and so on if you're doing
searches and that type of thing.
But Big O is concerned with
how bad could this possibly be?
So then that's my absolute
worst case of performance.
And in general, we
ignore coefficients
and lower order terms
and logarithmic bases
because what we really
care about is the order.
Whether it's n or n
squared as we'll go
into detail in just a moment.
Now, for any given task,
it's important to realize
that there are inherent limits.
There are some things
that just take time.
We'd like to do everything
instantly
but that's not always possible.
A perfect example of
this, Fred Brooks,
a famous software engineer,
"Nine women can't make
a baby in one moth."
As many women you assign to it,
you'd like to do
it concurrently,
it will be much more
convenient for them
but it's simply not possible.
There's a hard limit
on this, right?
So we're going to talk
about Big O complexity.
And I want to identify
for you a few key
and common order functions.
So I have a table here, in
the first column, notation,
you'll see how you
might see it written
and these are pronounced
Big O of 1 or order 1,
order log n, order n and so on.
You might also hear it referred
to by the name constant time,
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
logarithmic time, linear
time, quadratic time,
and so on 'cause
they're interchangeable.
And then provided in the
third column some examples
of common algorithms
or tasks that fall
into these different categories.
I won't go into them in
detail here on this slide
but this is great for
reference to come back and see
where my task actually fit.
And we'll talk a little bit
more in detail about identifying
which of these your
task belongs to.
Now, a few of these
are most common.
You'll see this all over the
place and they're very common
for algorithms that
you'll be working with.
So I'll talk about
these in detail.
So the first, before I get
into some specific details,
let me show you a comparison
of these order functions.
So it's easy to see
them in a table
but maybe it doesn't hit
home why this is important.
First off, this is what order n
squared complexity looks like,
like a parabola, order n
log n, order n, order log n,
and order 1 or constant time.
Now, the first thing that
you can see is these diverged
really rapidly.
When you get to a
large number of items
across the bottom scale,
you have some of this
that it gets really complex
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
and shoots its way
up towards the top.
That means your performance
is going to be bad.
You're doing that
much more work, right?
The other thing to
notice is that down
at the bottom left corner,
they're really close together.
So for small scale, it
may not be necessary
to find a really
optimal algorithm.
This would be arranged
where you want to watch
out for premature optimization.
Am I really going to
have an issue here,
because they're all
pretty close together.
So those are some
things to be aware off.
So, I'm going to go through some
examples of a few sample pieces
of code that have
somebody's complexities.
And remember, the key for this
is, is that we want to look
of the growth-- at the growth of
the amount of work as the size
of variant [inaudible]
increases.
So first is a very simple one.
Let's say that we have
an array of integers
and I have a function where
I pass in an integer value
and an index, and I simply want
to know does this value
appear at that index.
Now you could see we're
only doing one operation
and you may know that indexing
into an array is extremely fast.
So, in this case, it's
order 1 regardless
of what index I choose,
it's going to return
nearly instantly.
So if we have a sample array,
you can see I'm in query 1 index
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
and say that that
value is not there.
Here I found it,
and there I've not.
So I don't have to
look at anything else
in the array, just one spot.
So that makes this order 1.
If we change this
up just slightly,
now we have an order
and algorithm.
So, now I'm interested in seeing
if a particular value
appears anywhere in an array,
not just at given index.
In order to do that,
because I don't know
where it would appear, if it's
there at all, I have to look
through each index in the array.
So we have a for loop when
we start at the beginning
and we go towards the end
and then we test to
see if it's there.
If at any point I
find it, I return yes.
But the worst case is that it's
not there in the array at all.
And I have to check
thought every single index
in order to determine that.
So, if you could see here,
I have to march one by one
down the array until
I find the object
or determine that
it's not there.
So in worst case, order n.
For order n squared, let's
say we have a similar array,
I give it a value and I want
to say, does this value appear
at least twice anywhere
in this array?
In order to do that, I have
to compare every two possible
pairings of indexes in the array
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
to see if they're equal.
So what that means is that
I have to do two for loops.
I have an outer for loop and
an inner for loop and each
of this is an order n algorithm.
And inside this for loop once
I get to comparing two indexes,
you see, I compare each of these
two to see if they're equal
and if at any point I find them,
I can bail out and say yes.
And you can see that we're
doing a lot more work here.
J is moving back and forth like
a mad man and I is just trying
to keep up, and finally
here at the end,
we find those two
matching values.
Now in the worst case, there
may be no values at all.
These two values may not
be present in the array
or the value may not be present
twice in the array but we have
to go to the very
end to find that out.
Now, some of you
may look at this
and realize this is a
rather naive implementation.
I'm starting the j loop
at zero every time.
So, for example, when I get
to the very end of the array
and j starts at the beginning,
I've already compared that one,
I was at the beginning
and j was at the end.
So that's kind of silly
they have to do that.
So instead you could choose
to start the inner for loop
at i plus 1 for example
and you eliminate half
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
of your comparisons.
That's great, that's
a great win.
So now we're in the
n squared over 2,
but it still ends for a growth.
So the key to this is
although it's now much faster
for a given input
size, it's still going
to have the same
growth properties
as you add one more object or
one more item to the array,
you're going to add
n comparisons.
That's the important thing
from this, so order n squared.
OK. So, calculating complexity.
We've seen that you can
combine this order functions.
For the n squared example I just
showed, we have two for loops
and each of those has order n
and we have nested complexities,
we multiply those together,
n times n, we get n squared.
When you have sequential
complexities,
we add those together and
then we take the largest term.
So for example, this function
were introduced to n squared.
We have one n squared
call and two n and three
that are constant time but all
we care about it the n squared,
nothing else, because that's
the growth of this function.
OK. So you see that
was source code.
However, sometimes you may not
have access to the source code
or not be aware of what work
it's doing, and in such cases,
you can often estimate.
You can consider what the
code would have to do,
what kind of work it's
trying to accomplish,
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
and another complimentary
alternative is
to profile with instruments.
This can be a great way to find
performance issues in your code.
There are many sessions
from previous years
about using instruments
and in fact,
Xcode 5 as you saw earlier this
week introduces new debug gauges
including memory and CPU
is that it can help you
with identifying
this kind of issues.
Now, some APIs may
appear to be very similar
but it's important
not to confuse them
when you're estimating, just
look at the name and say, "Oh,
I know what that is"
because of the name.
For example, we have
a containsObject API
on both NSArray and NSSet.
But it would be folly to
assume that they're the same.
So let's look at NSArray first.
At the containsObject,
we want to see
if a given object appears
anywhere in the array.
What work does it do?
Well, in this case,
the documentation actually
tells us explicitly
that it sends the isEqual
message to each object
in the array until one
of them returns true
or which is the end
of the array.
OK. So it goes to each object.
That certainly sounds
like order n complexity
and that's certainly
a valid assumption.
I think that that's
pretty reasonable.
Now, we may think, well, what
if I can enumerate
these concurrently
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
and compare the objects?
That would be sort of a win.
And it would, it would reduce
it by a constant factor
but it would still
be a linear operation
with the size of the array.
OK. So let's see NSSet.
NSSet also has containsObject,
it has the same object.
I give it an object,
it does the same thing.
It tells me whether it's
in the collection or not.
It looks just like the one
that we just saw in NSArray.
Is it also order n?
You might assume
that it would be.
However, it's actually order
1, it's a constant time.
Regardless of the
size of the set,
containsObject is always
really fast, that's great.
There's a caveat, please don't
go start replacing NSArray
with NSSet anywhere in your
code because it's fast.
There's context to
understand about why is this
and if it will work for you.
So, the reason that it's faster
is it must be doing less work.
It's doing the same amount
of work, roughly speaking,
regardless of how
large my collection is.
So talk about that.
The reason that this is is
something called hashing.
We'll talk about
hash-based organization.
So, NSSet uses a hash table for
storage as this NSDictionary.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
Every object has a
deterministic hash value.
You may have seen
the hash method.
This is defined on NSObject
we'll talk about in detail.
And it returns an unsigned
integer value that you could use
to store this object
in a hash table.
And equal objects should
always have the same hash.
We'll talk about this in more
detail and give examples.
When in a hash table, objects
are grouped into "buckets",
parts of the array based on
the hash that they provide.
And the structure has a hash
function that takes a hash
and maps it to a given bucket.
It can decide that this range of
hashes goes first in the bucket,
and this one and so on,
and it can divide this up.
And the goal of this
hash function is
to achieve uniform
distribution so that you have
about the same number of objects
in all the buckets
in the hash table.
When you achieve this lookup in
a hash table, it really only has
to consider the objects that
are in a particular bucket.
It knows if an object is
going to be there it will be
in this bucket and it can check
isEqual on only a few objects
or none at all if there's
none in the bucket.
This is an obvious win.
If you have an array that's a
thousand objects, you may have
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
to check up to a thousand items
in order to see if it's there.
But with the set, you may
have to check only a handful.
This is great.
So let me give you a concrete
example with an animation here.
Say we have a hash function and
we have a hash table we're going
to store some objects in.
Let's say that we want
to store the names
of all Apple CEOs
past and present.
So let's start off
with Tim Cook.
We ran this through our hash
function and it determines
that Tim Cook by the hash
belongs in bucket 2, great.
So we set him there.
We have Steve Jobs and
that gets determined to go
in bucket 0, great, no problem.
OK. Now we're going
to add Gil Amelio
and our hash function
decides that it goes
in the same bucket
as Steve Jobs.
We have a collision
in our hash table,
rather unfortunate
and somewhat awkward.
[laughter] In order to see if
we should insert this object,
we have to compare
so we check isEqual.
And in fact, we see that
Steve Jobs is not equal
to Gil Amelio, thank goodness.
And so what happens
is we move one aside
and we chain this together,
they both occupy the
same bucket, great.
Now we have Michael Spindler,
no collision here,
goes into bucket 5.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
We have John Sculley, and
once again we get a hash
table collision.
So the hash function decides
it goes in the same place.
So once again, we compare
each of these objects
to see if this is equal.
They are there not equal so we
move them aside and we stick
that in the same bucket.
So, this is all well and
good except you'll notice
that now the majority of my
objects are all in one bucket
in the hash table and half of my
hash table is completely empty.
I'm not using it.
This is obviously not optimal.
If I have some sort of simple
query to see if someone is
in the list for example,
and my hash function happens
to navigate to the same bucket,
I now have to check through each
of those objects to see if
they're equal to Bill Gates
to determine no he's
not in the table.
All right, so this is
definitely not optimal.
Fortunately, something
like NSSet can choose
to optimize the hash
function on the fly.
If it realizes that your
performance is not as good
as you could expect it
reserves the right to be able
to move objects around, to
change the hash function
and redistribute them evenly.
So here we're in a
much better situation
and our lookups are restored
to being really fast.
OK, great.
So how can you participate
in this?
Defining your identity for your
object is how you take part
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
of this hash-based identity.
So NSObject as I mentioned
defines an isEqual method
and a hash method and they're
functionally equivalent
to comparing the pointers,
the object addresses
for these two objects,
and returning just
the address itself.
Now, Apple-provided subclasses
of NSObject will override
this as necessary.
For example, NSString has its
own hash, NSDate, NSNumber
and so on, there's a variety
that do this, and in fact,
arrays, NSSets themselves also
have their own hash and isEqual.
Custom objects that you
create should choose
to override these methods
if pointer quality is not
sufficient for your needs.
For example, say that you
have two person objects
and you want them to
be considered equal
if the Social Security Number
is the same for both of them,
something to that effect.
And you can learn a
lot more about this.
This will be in the slides
for reference afterward
about objects comparison and
implementing these methods.
OK. There are a few rules
of the road if you're going
to implement these
on your own object.
If isEqual returns
true for two objects,
then the hash must be the
same for both of these.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
This is critical because
if you have two objects
that should be the same but
have two different hashes,
they could get put in
different places in a hash table
and you won't be
able to find them
when you're looking for them.
You'll get incorrect results.
However, the same hash value
does not necessarily imply
that isEqual must be true.
You can have two objects
that have the same hash value
but isEqual will
be the tie breaker.
If you decide to
implement isEqual,
you really should also define
hash for this very reason
so that you can guarantee
that they are the same.
Now, a good hash as you're
implementing one should
minimize collisions.
A poor hash will really cause
your performance to tank.
All right, for example,
in the worst case,
let's say that you're
like, "I don't know
about this hash thing, I'm
just going to return one
for all of my objects."
They all have the same hash
which means they're all going
to go to the same
bucket in the hash table
which means you've now turned
an NSSet into an NSArray
and killed your performance.
That's the opposite
of what we want.
So we want to be careful
about that and we'll talk
about that in just a moment.
One other key important thing is
that when you have hash lookup,
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
your hashes really should
be stable and predictable.
If your hash tends
to change while it's
in the collection,
it's really bad.
You're not going to
be able to find it.
So there are two options really.
The first is not to modify
and object in a collection.
If you have a mutable string
and you've put it in as the key
or an object in a set, for
example, and you change
that string and the hash
will change and you're going
to have all sorts of hilarity
and hair-pulling that ensues.
The second option is to not
base your hash on mutable state.
This may be an option for you
'cause that's something they
consider as well.
So let me show you
sample implementation.
You all have the WWDC app
on your phones and iPads
and there is a news object,
as you see in the news
update throughout the week,
we have these news
objects to represent that.
This is a simplified
representation.
Here we're just looking as
the title and the time stamp.
So I want to focus on
these two implementations.
Now, for hash, you see that we
need to return some hash value
and we're just returning the
hash that NSString provides us
from this title property.
However, if we have
two news items
that may have the same title
and thus have the same hash,
we can break the tie
by calling isEqual
and compare both the
title and the time stamp.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
Let me also check to see
if it's the same object.
So, there are a lot of
examples of things like this
in the documentation
that I encourage you
to check out afterward.
OK. So next I want
to talk to you
about data structures
performance,
kind of applying
what we've learned.
We've got the map view
part out of the way
about Big O complexity
and hashes and so on.
So I'm going to talk
to you a little
but about choosing
data structures
and choosing data structures
and the performance
characteristics about them.
So let's start with an analogy.
Say that you have books
that you need to organize.
There's a lot of different ways
that you could choose
to organize books.
Say you could sort them
by topic or author,
title, even size or color.
You can choose an arbitrary
characteristic and choose
to sort them that way.
Everyone does this with
their CD collections
and the iTunes have
solved that problem now.
But everyone has their
own preferred way
to sort things and
organize them.
However, each of these options
makes some things really easy
and others difficult.
Say that you've chosen to
organize alphabetically
by author and then
later you want
to find all the New York
Times' best sellers,
or you want to find children's
books, those are scattered all
over in your collection and
it's not easy to look them
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
up when they're sorted
by author.
So this is something
to be aware of.
It's also important to plan
for scale when appropriate
as you're choosing
your data structures.
For example, it's a
completely different thing
to organize a small bookshelf
here where you can say,
"Go find the small green book"
and instantly you've all
found it because it's small
versus organizing a
really large library.
Things that will work at small
scale may not necessarily
when you get to large
amounts of data.
You have to have a
more intelligent way
to organize things.
So, choosing a strategy,
every data structure
has some tradeoffs.
It has things that
it's best at and things
that is not as well suited for.
And you want to use the one
that best fits your needs.
When you choose a data structure
that's not the best fit,
it really hurts your
performance, for example,
using a hammer to
drive in a screw,
not the ideal tool
for the situation.
Even if you have the right tool,
you may not be using
it optimally.
And if you have a hammer with
the nail and you just pressed
down really hardly as hard as
you can, it's not going to be
as effective as whacking
that really hard, right?
So, it's really important
to choose the right thing
for your needs and
use it correctly.
And we encourage
you to always prefer
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
to use the built-in
API whenever possible.
The collections that
are available
on the foundation framework
are extensively tested.
They've been around for years.
We use them just as much
as you do and we want them
to be really, really fast
just as much as you do.
This also-- using the built-in
frameworks also guarantees
that in the future when we have
improvements such as you'll see
in OS X Mavericks and in iOS 7,
that you will also get
those improvements for free.
Your apps will suddenly
become faster which is great.
So, getting down to actually
choosing a data structure,
there's a few things you
have to know for context.
The first is knowing
what you need.
Is it important that my
object have an order?
Will I have duplicates
in my collection?
Do I have to be able to add or
remove objects, have it mutable?
What operations are
most critical
for me to be really fast?
What am I going to be
doing with my structure?
And even where my data come
from and go to, for example,
you may choose differently
if your data will come
from user input or from a
property list file or Core Data
or JSON over the web or so on.
There's certain things that
may change your decision.
And the second thing is
to know what to expect
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
from your collection, which
we'll go into in detail.
There are certain
things that are broadly
to across all of
our collections.
For example, getting
the count of items
in a collection is
always really fast.
It's constant time.
We cache that for you because
that's a common operation.
Enumerating through
all the objects
in a collection is generally
linear time because you have
to visit each of the objects.
Even if you have a set or
dictionary, look up this fast
but going through
each on this going
to take the same out of time.
And other operations will
vary by the collection.
So, before we go over
individual data structures,
I want to share this and
note about mutability.
The best guideline with
mutability, we got a lot
of questions about this, is
to use it when you need it,
when it's semantically
correct, when you're adding
and removing objects
from a collection.
And mutable collections
do have benefits.
If you don't need mutability,
it can be to your benefit
to use an immutable
collection instead.
Most foremost among these
benefits is thread safety.
If a collection is
immutable such as an NSArray,
you know that no other thread
can add or remove objects
and you don't have to worry
about exceptions or crashes.
Also, there are memory and speed
optimizations that are possible
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
with immutable collections
because we know they're not
going to change in size.
We can allocate just the
exact right amount of memory
for the objects that
you've provided.
Now, another option is to make a
collection immutable afterwards.
Make a copy of it that you
can use for quick reference
and get those benefits
but still be able to build
up your collection
without a remove.
And lastly, if you do use
immutability, it's often great
if you can help us to help you.
You know how your
collection is being used.
We design it for a broad
range of different uses
and you know specifically where
you're going to do with it.
And there's times that you can
provide hence help us get you
the best performance.
One example is initializing
a collection
with initWithCapacity.
Now, what this does is
gives us kind of a hint
about how many objects
you're likely to store
in a given collection.
So for example, if you say "I'm
only going to store three items
in this mutable array
and add and remove them",
you can provide that
capacity upfront
and we can optimize
it accordingly.
Now, don't-- It's not wise to
try to outsmart the collections
and ask for a really
large capacity.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
The collections are
really finally tuned
and optimized to
take care of that.
And even if you do provide
the capacity hint upfront,
the collection can
dynamically grow and shrink
as objects were added
and removed.
So this just provides us an
upfront way to kind of get
in the ballpark of
what you need.
OK. So I'm going to go
on a quick tour of some
of the data structures
in foundations
in the most valuable players.
Give you a brief
overview and talk
about their performance
characteristics.
First, NSArray and MutableArray.
You're all familiar with this.
It's a work host of
the Cocoa library.
It's an order collection.
It always indexed access and
you can have duplicate objects
in an array.
Operations that are
really fast are anything
that includes indexing:
objectAtIndex,
the firstObject and lastObject.
Adding and removing
at either end
of a mutable array is
actually really fast as well.
This may come as
a surprise adding
at the end would seem really
fast but you may think
that if you had at the
beginning you'd have
to shift everything over.
However, that tends to
be a common use case
for a lot of our developers.
So we've optimized
that and we've--
even the guarantee that
inserting at the front,
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
the very front of a mutable
array is also really fast.
It's such a great thing
to be aware of as well,
to not be afraid to
insert in the front.
Slower operations include
anything that involves search
and looking through
any of these indexes
as we've already
mentioned before.
Also, adding and removing at
an arbitrary index somewhere
in the middle of the array,
not either of the ends,
would tend to be a
little bit slower.
One specialty operation I'd
like to call is binary search.
If you're not familiar
with this,
binary search is a quick way
to narrow down your search.
There are a couple
pre-conditions.
You have to have some
sorted range of data.
So if I know that my array is
already sorted in ascending
or descending order, I could use
binary search to quickly narrow
down and cut out half of
the objects each time.
And this is a great win because
this is an order login operation
as compared to order in.
if you remember from the graph,
order in was a line that goes
up at a 45-degree angle
and order login stays
really close to the bottom.
It's got great performance
as your objects has a scale.
Next is NSSet and NSMutableSet.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
This is an unordered collection,
it does not allow duplicates,
and it has hash-based lookup
as we discussed previously.
In sense, adding,
removing and searching
for objects are really fast,
precisely because of
the hash-based lookup.
Now, there are some
specialty operations
that are really convenient
on NSSet.
For example, set arithmetic.
If you think of a Venn diagram
where you've got two circles
that overlap, you can find
the intersection between them,
you can see who one is
completely containing the other
and so on.
And if you have mutable
sets, you can extend this
and intersect the set
and modify one set
to only contain the objects
that also appear in another set
or minus set or union
set and so on.
This can be really
convenient for merging
and separating different
collections of objects.
Good thing to be aware of.
Caveats with NSSet, when
you convert from an array
to an NSSet, you can do that
but you do lose your ordering
of the array and you
lose any duplicates
that may appear in the array.
Those will be stripped
out with the set.
If you convert back to an array,
you may get a smaller array
and you almost certainly get
one that's in a different order
that the one you passed in.
Related to this is that you
can't store a set directly
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
in a property list or
in JSON as we'll talk
about in a little bit.
You can convert to an array and
back as you're reading them out,
but these are caveats
to be aware of.
OK. A brief of note
on NSCountedSet.
This is really a handy one
that many people are not aware.
It's also unordered
just a like a set,
no duplicates and hash lookup.
It's a subclass of NSMutableSet
so it has all the
same operations
and the same characteristics.
Its big trick though is it
attracts the net insertion count
of each object.
So, object still only appear
once in a count of set, however,
if you insert the
same object again,
the count for that object will
increase to two and so on.
If you remove that
object, it decreases
and if it's removed it set 0.
This can be really convenient
if you're trying to count
up occurrences of a
particular object.
It may be words and text
or something like that.
It's kind of a histogram
binning type of thing,
so great thing to be aware of.
Next is NSDictionary
and NSMutableDictionary.
This you're also
very familiar with.
It's an unordered collection,
it has key value entries
to store those associated
together,
it has unique key values,
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
and it uses hash lookup
just a like a set.
So, adding, removing and
searching are also really fast
in the dictionary, anything
where you're looking
for an object by the key, adding
or removing and searching.
Specialty operations, one thing
that's convenient is it has some
handy API for reading
and writing directly
from a property list file.
And also one other thing
that was added in 10.8
in iOS 6 is really handy.
If you have a mutable
array and you know upfront
which keys will ever appear in
this mutable dictionary, sorry,
mutable dictionary, you can
provide a key set in array
of those keys upfront.
And there's a class
method on the NSDictionary.
You give it an array of keys,
it will give you a
sharedKeySet object.
If you provide that one
creating an NSMutableDictionary,
it can create a really
optimal hash algorithm
for organizing those
particular keys.
It knows upfront what's
likely to be there.
And it can be a great
situation if you need mutability
but you want speed and you
know the keys you're going
to have upfront.
Caveats of NSDictionary, you're
probably familiar with the fact
that any keys that you provide
to NSDictionary are
going to be copied in.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
They have to conform to
the NSCopying Protocol.
And you should never mutate an
object that is a dictionary key.
The hash will change very likely
and you probably won't be able
to find it and you'll
get incorrect results.
This is a major thing
to be aware of.
NSOrderedSet
and NSMutableOrderedSet
are an ordered collection,
similar to arrays, they don't
allow duplicates like I said,
and they support both
index and hash-base lookup.
This class was introduced in
iOS 7-- iOS 5 and OS X 10.7.
And it's effectively across
between a set and array.
It's not a subclass of either
one although you can get an
array or set representation
that one convenient thing
about this are their
live-updating.
So as objects are added or
remove to the mutable set,
those array and set
representations will be updated
as well.
Now, a caveat of this
is that you're going
to have increased memory usage.
You can't get the best of both
worlds with as a free launch.
So because we're
maintaining ordering,
we have an array internally.
And because we're
guaranteeing uniqueness
and no difficult objects,
we also have a set.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
So roughly double memory usage.
And also when you-- If you
want to store an ordered set
in a property list, you'd
have to convert it to an array
and then back as
you bring it out.
NSIndexSet and NSMutableIndexSet
differ
because they don't store
objects, they store indexes,
integer values, so it's a
collection of unique indexes
and these are really handy
for referencing some subset
of object in an NSArray.
And it's really handy especially
because you can avoid the
memory overhead of making a copy
or saying objects at indexes,
you can give it an index set
and it gives you a new
auto released array
with those objects in it.
If you want to avoid that
overhead, you can just enumerate
through the objects
at particular indexes
without creating a new array
and it's really handy for that.
Index set has really
efficient storage
and coalescing of indexes.
So if you have a mutable index
set and you add the indexes 1
and 3, it will store
those separately.
If you add 2 as well, it's
intelligent enough to coalesce
and say, "I'm going to
store the range 1 to 3."
And if you add more indexes
that are consecutive,
it will group those altogether.
So it's really efficient
with storage.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
It also supports set arithmetic
such as intersection, subset,
and difference like we
saw with mutable sets.
One caveat of this is when
you're using index set
and you apply it to
an NSMutableArray,
if that array changes after
you've obtain the index set,
you can have all
sorts of bad results.
For example, you have a mutable
array, you've gotten a group
of indexes, and then you
remove all the objects
or some of them in the array.
If you try to get
objects as indexes,
you can easily get
a range exception
because that index no
longer exists in array.
So this is something
to be aware of.
NSMapTable and NSHashTable
are very interesting.
They are very similar
to NSMutableDictionary
and NSMutablSet.
There's a couple
of key differences.
They offer a lot more
flexibility with some
of these additional options.
You can use pointer
identity rather than choosing
to use isEqual as you would
in these other collections.
It can contain any kind of
pointer not just in an object.
It could be a void
star or any kind
of pointer you can
store in here.
You can optionally say that you
want weak references so that
if all the strong references
to an object stored as a key
or value in a map table
for example should
those go away then
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
that entry would automatically
be removed for you.
And this works under ARC as
well with zeroing references.
And you can optionally say that
you don't want to copy on insert
as you add an object to
a map table, for example,
it won't copy the key.
You can specify that.
Now, you can't convert a
map table or a hash table
that contains these
raw C pointers directly
to their counterpart,
mutable dictionary or so on,
because they don't translate
over well, and also a caution
to be aware of premature
optimization.
This is one of the situations
that where you think,
this is going to be just what I
need, but you'll sacrifice some
of your compatibility with APIs
that require an NSDictionary.
And for most cases, it's
probably not necessary.
It's a great tool to be aware
of but it's not something
that you're generally going.
This was also introduced
in OS X 10.5
and of course binding iOS
release you should be able
to use this broadly
throughout your applications.
And last is NSCache.
This is also similar
to NSMutableDictionary
and it's a great
tool to have as well.
Primarily, there's
a few big benefits.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
First and foremost is that
its thread safe unlike
in mutable dictionary.
It doesn't copy your keys
unlike in mutable dictionary
and it supports auto-removal
under memory pressure.
So when the OS needs
to reclaim memory,
it can automatically boot
out entries from this cache.
This is ideal for objects
that you can regenerate
or obtain again easily.
For more detail on this,
I highly recommend the talk
Building Efficient OS X apps.
A lot of this also
applies to iOS as well.
It focuses a lot on memory usage
and this [inaudible] and energy.
And it will go into match
more depth about NSCache
and purgeable table data and
what you can do to work well
and be a good citizen when
memory pressure comes to bear.
So, wrapping up these
data structures,
I have just two brief
asides as it relates
to data structures
and storing them.
So first property list.
You're all familiar with them
that you can store hierarchies
of data, XML or binary format.
Each of you at least has an
info.plist in your application
and may use them
in other places.
They support property
list objects.
It's a set of six objects in
foundation, arrays, data, date,
dictionary, number, and string.
Any others that you can, for
example, convert to a string,
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
you would have to use NSCoding
and archive it into an NSData
and store it in the
property list.
So you do have flexibility
there.
It's good to be aware though
that mutability is
not preserved.
If I build up mutable
dictionaries and arrays
in a hierarchy that I like and
then store it to property list
and read it back out,
they're no longer mutable.
I'll get the immutable
variance to those.
That's important to know.
Property lists are generally
not efficient for lots
of binary data like
encoding things into NSData
and stick them in there, or for
really large files, particularly
if you're using and XML format.
There may be better
alternatives to look into.
You'll see property list
commonly with NSUserDefaults
for storing user preferences
for your application,
it's a common way
to interact with it.
NSPropertyListSerialization is
also a really handy way to deal
with this for input and
output of property list.
And I encourage you to
look at NSKeyedArchiver
and NSKeyedUnarchiver
if you have needs
to store other data
in a property list.
The second aside is on JSON.
This is JavaScript Object
Notation and originated
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
in JavaScript but it's supported
across a broad variety
of platforms.
It's a lightweight data
interchange format that's human
readable and it's really
handy, and it's commonly used
for things like web
services or sending things
between different platforms.
And it works with a
few Foundation classes.
Some of them are
similar to property list.
You'll see array,
dictionary, number and string,
but there's also NSNull.
If you're not familiar with
this, this is null place holder.
Collections in Cocoa don't
allow you to store nil inside
of a collection, but JSON
does allow null objects,
and you'll see those transfer
across NSNull place folders.
And there are also
some restrictions
such as the top level object in
JSON needs to be a dictionary
and you can only use strings
for the keys in your dictionary.
You can't use numbers
or other things.
OK. And you should definitely
be aware of NSJSONSerialization.
This was introduced
a few releases ago
and it's a really convenient
way to deal with JSON.
It supports reading and writing.
You can even pass an object and
see if it will form valid JSON.
You can also specify
whether you want mutability
when you read objects
out of JSON
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
which is different
from property list.
And this class has been improved
and iterated a lot over time.
It's fast and it's built-in,
you don't have to link
in a static library,
for example, on iOS.
You can count on the
improvements to efficiency
and performance over time.
And you'll see even
more improvements
in iOS 7 and in Mavericks.
OK. So that wraps up about
data structure and performance.
That's a lot of information.
For our final section, I'm going
to kind of tie this all together
and talk about real
world applications.
How we can take all this about
"Big O" complexity and talking
about data structures
performance
and how fast they are and
apply that to my actual code.
So I'm going to give you
example again from our code.
From the WWDC App, we
refresh the sessions.
You all saw this on Monday.
You're anxiously waiting
for those TBAs to disappear
and see what the
sessions were going to be.
And we have this
functionality to refresh those.
So the data is stored and
Core Data on the application,
on your device, and we
periodically fetch updates
from the server to check
if there's new information.
Now, we did notice some
performances issues
in early versions
of the applications.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
At first we were
getting great speed,
but as the conference
became more fleshed out
and more sessions were added,
the speed got progressively
slower
and noticeably slower
not just a little bit
but a lot more over time.
And it became kind
of unbearable.
And when we profile that
we found that's there was
non-linear growth.
We obviously had some sort
of complexity problem
doing too much work.
So I'm going to walk you
through a simple example here.
So this is what we did at first.
We have an array of sessions
that we get from the server,
we just fetched, and then we
iterate through each of those,
so we want to handle each
of these refreshed sessions
so we have a for loop
that's order and complexity.
And then we do a core data fetch
to see what we have
locally on our device.
We look for a session with that
particular ID and then we see
if I had a session with
that then I'm going
to merge them together.
If this is a new session,
I'm going to insert
it in the Core Data.
So we were seeing
non-linear performance.
And probably the suspicion
is this may be n squared.
Well, the for loop has to
be there and that the part
at the end about merging
sessions looks pretty simple.
So the work must be
somewhere in between,
right here in the
core data fetch.
And let's think about
what work it has to do.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
Well, we're forming a predicate
to search where we say,
"Find me a WWDC session
objection with the session ID
as equal to this
session I just gave you.
So when I tell core data
to do that, we think,
what would they have to do?
Well, because it may not have
ordering on the other side,
probably what it's doing
is it's going through each
of the sessions and seeing
if that session ID matches
which is an order
n algorithm, right?
So there we have
our hidden work.
We have an order
n loop and order n
for finding the session
as core data fetch.
That's where n squared is.
So, let's just hoist that right
out, let take the core data out,
we'll optimize that
a little bit.
Let's say that we now
change it so that instead
of fetching one session
at a time, we get the list
of session ideas that we've
just gotten and we fetch all
of those at the same time.
Now we have them all in
one array and that's great.
We should see order
and performance now,
except we don't.
If you have keen eyes,
you'll notice there's still
some work left inside the loop.
Now, although we know
what session it is
and we've just moved
the session--
the problem or the work
out of core data of finding
that session, we'd
moved into a line
where we're calling
index of object.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
Now we have an array
of all these sessions
but we're still doing
a linear search.
We're stepping through each
of these until we find the one
with the right session.
So, we haven't really
eliminated the problem.
But there is a great
way around it.
Since we're using session IDs
that are unique across these,
do a slight modification,
and after we've gotten all
those sessions upfront,
we create a dictionary where
the objects are the sessions
themselves and we key
them by the session ID.
This is going to be unique.
And now, inside the loop, you
can see that we just have look
at that dictionary
and find the one--
the session has that
particular session ID,
and look up in a dictionary is
order 1, it's no longer order n.
So we just solved our problem.
We've taken this algorithm from
order n squared to order n,
and now all of you can get your
session updates really quickly.
There's not 5,000 of
you waiting forever
for all those sessions
to reload.
So that's a real world example
from our own application.
So, I want to also
give a couple of tips
about eliminating extra work.
Really what you want to
do is minimize redundancy
and in particular
expensive code.
So what we just saw in the
previous example was each time
to the for loop we
we're searching
through the same array just
for a different session ID.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
If there's a way that we can
get that out of a loop and do
that work just once, do it less
frequently, that's a big one.
Also, take advantage of
faster lookup when possible
like we did at the NSDictionary.
Sets can also be useful for
this one, that's appropriate.
Using mutable collections
and strings
when it makes sense can also
be a big performance win.
And I'll show you an example
of that in just a moment.
And also streamline how
you access your data.
Don't make it hard for yourself
to get to the data that you need
as quickly as possible.
You can organize your
data in such a way
that it's structured
intelligently
and your lookup will
be really fast.
And again, try not to
reinvent the wheel.
There's times where you
need to write your own code
but there are situation, for
example, joining an array
of components with strings.
You may have a group of strings
that you want to comma separate.
You could write your own code to
do that and append into a string
and so on but there's already
API where you can give an array
of objects and give it
some string that you want
as the delimiter and it can
join them together for you.
So wherever possible,
use those as your benefit
from the optimizations and
eliminate extra work, you know,
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
that we're doing as fast
as we can think to do it.
So, an example on
eliminating this extra work,
let's say that you have a method
that takes an array and I want
to scan through that and do
something with these objects
and if the result is not nil,
I want to add it to a new array
and I'm going to
return that back
from this function,
from this method.
The big problem here is really
each time we're creating a new
immutable copy of an array.
This is an obvious no-no
and we can avoid this.
Instead of doing that, you can
create a mutable array upfront
and add the objects to
the array which is going
to be really fast, and at
the end when we're done,
we can call copy which gives
us back an immutable NSArray
and we can return that.
So we take a full
advantage of the mutability
when it make sense for us and
we still fulfill the contract
of our method by saying we'll
return an actual NSArray.
And you could do some other
things with appending to it,
NSMutableString or
NSMutableData and so on.
The key takeaway from this whole
section and I'd like to say,
don't leave performance
on the table.
If there's performance there
just waiting for you to take it
and be faster, don't
leave it there.
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
Take advantage of it.
You know, if I could be using
an NSDictionary for this rather
than an NSArray, I might
as well do that, you know.
You have to weigh the
benefits, the pros and cons
of choosing either way.
But I really encourage
you to think deeply
about the performance that
you're going to expect to see
with your data structures.
And so, a lot more
information that you can found
out is Dave DeLong is our
App Frameworks Evangelist
and he'll be a great point
man for any questions you have
on this and also on the Cocoa
Labs, in Developer Forums,
and then I have three
documentation links
that are really useful for
more in-depth understanding
of collections, property lists,
and archive sand
serializations and so on.
There's two related
sessions, I mentioned the one
on Building Efficient OS X
Apps and Hidden Gems in Cocoa
and Cocoa Touch is immediately
following in this room.
This is a great overview
of a lot
of things you may not be
familiar with or even aware
of throughout the Cocoa
and Cocoa Touch frameworks,
also in XCode and
different things like that.
So just to summarize, I
have a few key takeaways
that I want you to remember if
nothing else from the session.
The first is that complexity
kills large-scale performance.
If you have a lot of work and
your complexity grows over time,
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
when you get to large
amounts of data,
your performance will tank.
You need to be aware of that.
Second, know how much
work your code is doing.
Know what it's actually
trying to accomplish,
that's where a lot of the
hidden complexity lies.
Avoid redundancy and
strive for efficiency,
it's the last two points of
how you can resolve this kind
of performance issues.
If you're doing something
over and over again,
take it out of a loop.
Do it just once.
If you're doing something
in an inefficient way,
try to find a better algorithm.
Focus on the biggest
performance wins first
as we talked about Andahl's Law.
Find the bottlenecks and
eliminate them first.
Don't prematurely optimize.
Prefer to use the
built-in collections
and API wherever possible so you
can benefit from optimizations
and improvements over time.
Design according to
your particular needs.
Choose a data structure that
fits how you need to access it.
Something that will make
it really fast to the tasks
that you need to do and
it's going to be convenient.
And last, think about
performance early.
There's a difference between
premature optimization
and being-- having common sense
X-TIMESTAMP-MAP=MPEGTS:181083,LOCAL:00:00:00.000
about the performance
of your application.
When you use some
of this knowledge
about understanding how
complexity can affect
performance, it enables you to
make great decisions upfront
about how to store your data so
that you can access it quickly,
you'll be happy, and your
users will be delighted
with the performance
of your application.
Thank you.
[ Applause ]