WWDC2018 Session 414

Transcript

[ Music ]
>> Oh yeah, good morning, thank
you for coming.
I'm going to be joined by some
very bright people in a minute.
We have some very cool stuff
prepared for you today.
So let me start first by saying
if you never write code that
crashes this session is not for
you.
This is a talk for the rest of
us, for those of us who make
mistakes.
Today we're going to talk about
the tools and techniques that
you can use to stay on top of
your crashes if they're
affecting your users.
In particular I'm going to cover
the fundamentals of crashes, why
do they happen, what do they
look like.
Then I'm going to show you some
of the tools that we have to
access crash logs when they're
occurring in the wild.
Then Greg is going to come up
and give you some more detail on
how to read the content of a
crash log.
And then he's going to go in
depth in how to read a tricky
memory issue with a crash log.
And then Kuba will show you how
to catch threading races early,
these threading races lead to
crashes, and these crashes are
very hard to reproduce.
So first we should define it,
what is a crash?
A crash is a sudden termination
of your app when it attempts to
do something that is not
allowed.
So what's not allowed?
Well, sometimes it's impossible
for the CPU to execute code, the
CPU won't divide by zero.
Or sometimes it's the operating
system that's enforcing some
policy.
The operating system will
preserve the user experience by
killing your app if it's taking
too long to launch or it's using
too much memory.
Sometimes the programming
language you're using is trying
to prevent a failure and will
trigger a crash.
A Swift array and NSArray will
halt your process if you attempt
to go outside of your array
bounds.
Or sometimes it's you, the
developer that's trying to
prevent a failure.
You may have an API where you
assert is a parameter is non nil
and that's perfectly all right.
So stop me if you've seen this
before, this is what it looks
like in Xcode in the debugger
when the debugger is attached to
your app and has paused the
process just before your app is
terminated.
And let's take a closer look at
this backtrace on the left.
You can see here how the app was
started by the operating system
and while we're paused we can
see how the main function was
called and functions are calling
other functions within your app
and eventually we get to this
point where we hit a point where
in your code where the only
option is to crash, something
has gone wrong.
And ultimately the debugger
receives a signal that this app
is about to crash and pauses the
app.
Now sometimes you're not always
conveniently attached with the
debugger like we are here.
When you're not attached with
the debugger the operating
system will capture this
backtrace in plaintext and save
it out to disk in a human
readable crash log.
Now the truth is one a release
build of your app crashes the
log doesn't actually look this
pretty.
What's actually written out is a
list of binary names and
addresses.
This is a snippet from a
unsymbolicated crash log.
Now thankfully Xcode takes care
of symbolicating crash logs so
that what you'll see are those
pretty function names, file
names and line numbers that
you're familiar with.
So there are a number of ways to
access these crash logs.
I'd like first to talk about how
you can access those crash logs
from your beta testers on
TestFlight and your customers on
the App Store.
You can download these crash
logs using a feature in Xcode
called the Crashes Organizer,
this is what it looks like a
beautiful dark mode.
And let's take a tour through
this UI.
On the left, you can see all of
your apps distributed over
TestFlight and the App Store and
we support all of our platforms
including watchOS and app
extensions.
On the right, for a given crash
point you can see the number of
unique devices affected and we
group crash logs by similar
issue, by similar crash point
and we rank them in the source
list by the number of unique
devices affected.
And you can page through a
sampling of the individual logs
down below.
And when you click on this
button you can open the crash
log in your project, in the
debug navigator and see it
alongside your source code.
This is very cool if you haven't
seen it before, we'll see that
in a bit.
And in the detailed view of
course we show you a fully
symbolicated backtrace with the
crash point highlighted.
So now that we have the lay of
the land let's play with it.
So I have Xcode open here and
I'm going to open up the
Organizer window and you can see
that I've selected the Crashes
tab, that's the second tab here
and I've selected this chocolate
chip app that Kuba and I are
working on.
And I've uploaded this build to
TestFlight and you can see I'm
looking at build 5 right now and
a number of testers are
reporting crashes, so that's not
too good.
But I've worked on a couple of
these crashes you can see, but
this first one I haven't worked
on yet so let's try to resolve
that.
So this is affecting 242 devices
and I can see the backtrace that
was captured from the time that
the app crashed and the crash
point highlighted.
Now I'm not quite sure what's
going on yet but I bet if I open
this crash log in our source
code I can reason about what's
going on.
So I'm going to click the Open
in Project button, select the
project that matches build 5 of
my app and what I do, what you
can see is that this crash log
has opened up in the debug
navigator as if this app has
just crashed.
And you can see here that we're
stopped on this fatal error.
So a good question to ask myself
is, is this a valid use of fatal
error, I only want to be
crashing if it's absolutely
necessary.
So I can see that this is an
initializer of an enum from an
int and that enum can only be
zero or one, if it's not I'll
hit this fatal error.
So I think that this makes
sense, this should only be
crashing if there's some kind of
misuse by the programmer.
So I can see the caller of this
initializer if I navigate up the
call stack here and I can see
it's this tableView delegate
method.
This method is asking for a
title for a header in a given
section number.
So it must be that that section
number is not zero or one.
So I think I have a better
understanding of what's going
on, but let's try to reproduce
this issue in the app and see if
we can learn more.
So I've hit play.
And Chocolate Chip is a recipe
app I store all of my favorite
recipes and I've been testing
with this whipped cream recipe
and you can see everything's
fine, I can see my list of
ingredients, my list of steps.
These are the two recipe
sections, that's ingredients
that's section zero and steps
that's section one.
And if I click on another recipe
we've hit a crash and what I can
see is that we're stopped on the
same fatal error and the
backtrace looks very similar to
the crash log that we've been
looking at.
So that's a very good sign that
we're looking at the same issue.
So I'm going to clear this crash
log by hitting delete and let's
look at this debug session.
So in this fatal error I can see
that the message is nicely
printing out that the section
number is eight.
So that's the reason we're
crashing it's not zero or one.
And what's occurring to me now
is that this is my fault, when I
implemented this class I
implemented another delegate
method called numberOfSections.
And numberOfSections is saying
how many headers it should look
for and what I'm returning here
is the number of ingredients and
the ingredients count is eight.
But believe it or not I have a
clever way to solve this.
I know what I want to be
returning is the number of cases
in this recipe section enum and
I know that in Swift 4.2 some
new functionality was added by
the Swift Open Source Community,
thank you very much, a protocol
called CaseIterable.
If I have my recipe section
conformed to CaseIterable I can
reimplement this number of
sections to return the count of
all the cases in that recipe
section enum.
And that way what I'll be
returning is two, I'll be
returning the number of accurate
sections.
This is going to work out just
great.
So now if I check out this
chocolate chip cookies recipe no
crash, I see all of my
ingredients and my steps.
I did a very good job, I'm very
pleased with myself.
And I can go back to the
Organizer and mark this issue as
resolved, step away from the
computer and get back to my
baking.
[ Applause ]
Okay so you've just see how you
can use the Crashes Organizer to
download crash logs from
TestFlight, open the log in your
source code, and resolve an
issue.
So what do you need to do to get
started?
Very simple, your customers if
they opt into sharing with
third-party developers this just
works, their crash logs are
uploaded automatically.
All you need to do is sign into
Xcode with your Apple ID.
When you upload your app you
should include symbols so that
you get server-side
symbolication of your crash
logs.
And open up the Organizer window
to the Crashes tab to start
viewing those crashes.
Okay so we have covered viewing
crashes in the Organizer.
But if you're not distributing
over TestFlight or the App Store
there are a couple of other
options for you.
There is the devices window.
When you have a device connected
you can click this View Logs
button and will show you all the
logs that are saved on that
device and these logs are
symbolicated using the local
symbol information on your Mac.
When you run your [inaudible]
tests with Xcode, Xcode Server
or Xcode Build the test results
bundle will include any of the
crash logs that come from your
app that are written out during
the execution of that test run
and that's very convenient and
these crash logs are also
symbolicated.
You can use the Mac Console app
to view any crash logs from your
Mac or from the Simulator.
And on the device under
Settings, Privacy, Analytics,
Analytics Data you can see all
of the logs that are saved to
disk and your users can share a
log directly from this screen.
Okay so to ensure that
symbolication just works I have
three important best practices
for you.
Number one, if you use the
Crashes Organizer upload your
symbols with your app.
This is the default, this will
ensure that server-side
symbolication works, it's very
easy.
Number two, be sure to save your
app archives.
Your archive contains a copy of
your debug symbols, your .
Xcode uses Spotlight to find
these dSYMs and to perform local
symbolication when it's
necessary automatically.
And if you upload an app that
contains bitcode you should use
the Archives Organizer Download
Debug Symbols button to download
any dSYMs that come from a
store-side bitcode compilation.
Okay we've covered all the tools
that we offer for accessing
crash logs when they occur in
the field.
Now to give you an in-depth
guide into reading the content
of a crash log please give a
warm welcome to the helping,
friendly Greg Parker.
[ Applause ]
>> Thank you, Chris.
So we've just seen how you can
use Xcode to find crashes and to
examine them in the Xcode tools
in the debugger.
But the crash log file contains
more information, it contains
much more information than just
the stack trace.
It is frequently useful to look
at that extra information in
order to debug your problem.
So how can you get the full text
of a crash log?
Here's our Xcode Organizer, if
we bring up the contextual menu
there's a Show In Finder button.
The Show In Finder button will
bring up a text file which we
can open up in the Console app
or your favorite text editor, it
looks something like this.
So what's in this file, let's
take a look.
The top of the file starts with
some summary information, this
contains your app name, the
version number, the operating
system version it was running
on, and the date and time of the
crash.
Below that we have the reason
for the crash.
This was the specific error, the
specific signal that the
operating system sent to kill
the process.
We can also see some logging
information, the
application-specific information
section.
This section will include
console logs in some cases, if
you have an unhandled exception
it may include the exception
backtrace.
This section is not always
available, on iOS devices it is
often hidden for personal
privacy reasons.
But on Simulator to macOS this
section can include information
that is useful.
Below that we have the thread
stacks, these were the
backtraces of all the threads
that were running at the time of
the crash.
One of them is marked as the
crash thread and we also have
whatever other threads were
running at the time that the
process died.
Below that we have some
low-level information, we have
the register state of the thread
that crashed and we have the
binary images that were loaded
into the process.
This is the application
executable and all the other
libraries.
And Xcode uses this for
symbolication in order to look
up the symbols, the files and
line number information for the
stack traces.
So that's the content of a crash
log file.
So how do we debug this, how do
we read this, what do we look
at?
We start with the crash reason,
the exception type.
In this case, the exception type
is an EXC bad instruction
exception, the SIGILL signal was
the illegal instruction signal.
That means the CPU was trying to
execute an instruction that does
not exist or is invalid for some
reason and that's why the
process died.
We can also look at the thread
that crashed, what code was
running at the time of the
crash.
Here we see the fatal error
message function in the Swift
runtime.
No guesses as to what the fatal
error message function does.
The error message in this case
is included in the
application-specific
information, so we can see what
the Swift runtime printed as the
process exited.
So let's take a closer look at
that stack trace.
We saw the fatal error message
function and that was called by
a function in our code.
We have a recipe class, an image
function being called and that
function in turn called fatal
error message as a result of
some error.
Because this is a symbolicated
stack trace with full debug
information we have a file and
line number in our code of what
the crash was.
So we can take a look at that
line, we can open up our
project, this is
RecipeImage.swift, line 26 was
the one that was marked in the
crash.
And those of you who are
experienced Swift programmers
will have a pretty good guess as
to why this line might crash.
We have a force unwrap operator.
We have a function, The UIImage
Constructor which returns an
optional value.
If the optional value is nil the
force unwrap operator will halt
the process, generate a crash
log and exit.
If we remember the
application-specific information
it included the error message
that the Swift runtime prints as
this error check fails and it
says, unexpectedly found nil
while unwrapping an optional
value.
So that's good, that's
consistent with the code.
We have a force unwrap operator
on line 26, we have an error
message in the crash log that
says we were unwrapping an
optional value.
This all makes sense as a
consistent story for what caused
this crash.
So a force unwrap failure is an
example of a precondition or an
assertion in the code.
Preconditions and assertions are
error checks that deliberately
stop the process when an error
occurs.
Some examples of this are of
course the force unwrap of an
optional that we just saw.
The Swift runtime will assert
that the optional is not nil and
crash if it is.
We have out-of-bounds
Swift.Array access is another
example.
If you access an array and your
index is out-of-bounds the Swift
runtime will fail, interim check
it will fail a precondition and
halt the process.
Swift arithmetic overflow also
contains assertions, if you're
adding two numbers together and
the result is too large for an
integer variable there is a
precondition for that, the
process will halt.
Uncaught exceptions are
frequently caused by
preconditions in the code.
There are many error checks
where if the precondition fails
it will throw an exception and
if the exception is not caught
the uncaught exception will
cause a crash log.
And of course you can write
assertions and preconditions in
your own code if you have errors
that you want to crash the
process and generate a crash log
in response.
Another example of a crash log
is a case where the operating
system kills your process from
the outside.
An example of this are watchdog
events, such as timeouts.
If your application takes too
long to do something the
operating system may detect
that, kill the process of
generate a particular crash log
as it does so.
Environmental conditions can
also cause the operating system
to halt the process.
If the device is overheating the
operating system will kill
processes that are using too
much CPU.
If the device is running out of
memory the operating system will
kill processes that are using
lots of memory.
Another case is an invalid code
signature.
The operating system enforces
that code be signed and if a
signature is invalid or the code
is unsigned the operating system
will kill the process and
generate a particular type of
crash log.
These terminations by the
operating system can be found in
the Devices window in Xcode,
they can be found on the macOS
console.
They do not always appear in the
Xcode Organizer, so be careful
of that.
In Apple's developer
documentation we have a technote
that describes many different
signatures and structures of
crash logs like these particular
examples, what they look like,
how you can recognize them, and
it goes into much more detail
than we have time for here.
But let's look at one example,
here's another crash log file
and again in order to understand
the crash log we start with the
crash reason.
In this case the crash reason is
in EXC crash exception with the
SIGKILL signal.
The SIGKILL signal is commonly
used when the operating system
wants to stop your process.
It sends the SIGKILL signal, the
SIGKILL signal cannot be
handled, it cannot be caught by
your process, it will die in
response to this signal.
We can also see the reason the
operating system sent the signal
in the crash log.
In this case we have a
termination reason with a code
ate bad food.
If you look in the developer
technote I mentioned earlier, it
will describe what ate bad food
means.
And we have a textual
description that says exhausted
real clock time allowance of
19.95 seconds.
So if we combine this
information with the information
in the technote it will tell us
our application took too long to
launch.
We had 20 seconds to launch, we
weren't done in that much time,
the operating system killed the
process.
Below we can see the crash logs
at the time the process died.
It's possible that those crash
logs were the code that took too
long, maybe were stuck in an
infinite loop, maybe were stuck
waiting for network I/O and
that's why we took too long to
launch.
Or on the other hand, maybe this
code is innocent and there's
something earlier in the launch
process that was too slow and
that's why the process died.
So launch timeouts, how can you
avoid them.
We want you to avoid them,
launch timeouts are a common
reason for app rejection during
Apple's app review.
So how do you avoid this?
Well test your app of course.
But there's a catch, the launch
timeout watchdog is disabled in
the Simulator and it's disabled
in the debugger so if you're
doing all your testing in the
Simulator and the debugger
you'll never see watchdog
timeouts.
So when you test your app be
sure to test without the
debugger.
If you're a macOS app launch
your app in the Finder.
If you're an iOS app run in
TestFlight or launch your app
using the iOS App Launcher.
All of these will run your app
outside the debugger and the
launch timeouts will be enabled
and enforced.
When you test, test on a real
device of course, test outside
of Simulators.
And test your devices with older
hardware, whatever's the oldest
hardware you want your app to
support.
If you test only on newer
hardware you might find that
your app launches fast enough on
a faster device, but slower
devices might take too much
time.
So let's talk about another
class of errors, let's talk
about memory errors and what
they look like in crash logs.
When I say memory error I mean
cases like reference counting of
an object being over-released or
using an object after it has
been freed or a buffer overflow
where you have a byte array or
another C array and you access
that array out-of-bounds.
So let's look at another crash
log and spoiler alert this one
is a memory error.
We start again with the
exception type.
This is an EXC bad access
exception, the SEG violation
signal.
This is typically caused by a
memory error.
The bad access exception means
one of two things, either we
were writing to memory that is
read-only or we were reading
from memory that does not exist
at all.
Either of those will cause a bad
access exception and the process
will stop.
We see here the address that we
were accessing at the time of
the crash.
We can look at the stack trace,
this is the function that
performed the bad access of
course.
This is the objc release
function, it's part of the
implementation of reference
counting in Objective-C and some
Swift objects.
So that again sounds like it's
likely to have been a memory
error that caused the bug.
So what code caused objc
release?
We can look at the rest of the
stack trace.
We have the object dispose
function, this is a function in
the Objective-C runtime that is
used to deallocate objects.
The object dispose function
called a function called ivar
destroyer on one of our classes,
our LoginViewController class.
The ivar destroyer function is
part of Swift code, this is a
function that cleans up the
properties, that cleans up the
ivar storage of an object as it
is deallocated.
So that gives us part of the
story as to what caused this
crash.
We were deallocating an object
of our login view controller
class.
That class in its d init code
was trying to clean up its
properties and its ivars and
while releasing one of those
properties we crashed.
So that gives us a little bit of
detail as to what went wrong.
Can we do better, is there more
information in the crash log
that will tell us more about
what happened?
We can look at the invalid
address itself, sometimes the
actual bad address value will
contain useful information in
it.
And this particular bad address
I can tell you looks like a use
after free.
Well how do I know that?
Partly it's just long
experience, when you read enough
crash logs you start to learn
patterns of what the bad values
are.
This particular bad value looks
very much like the address range
for the malloc memory allocator
which we happen to have
available in this crash log.
So we have the address range
that was used by the memory
allocator and our invalid
address looks like it's inside
the malloc range, but it's been
shifted by 4 bits, it's been
rotated by 4 bits.
So it looks like it was a valid
malloc address rotated.
That is a clue from the memory
allocator itself.
Let me show you why that is.
Here's what our object looked
like when it was still valid.
An object starts with an isa
field, the isa field points to
the object's class.
This is how Objective-C objects
are laid out, this is how some
swift objects are laid out.
So what does the objc release
function do?
It reads the isa field and then
dreferences the isa field so it
can get to the class object and
perform method lookups.
Ordinarily of course this works,
this is how it's supposed to
work.
What happens if our object has
already been freed.
When the free function deletes
an object it inserts it into a
free list of other dead objects.
And it writes a free list
pointer to the next object in
the list where the isa field
used to be.
With one slight twist, it does
not write a pointer into that
field it writes a rotated
pointer into that field.
It wants to make sure that the
value written there is not a
valid memory address precisely
so that bad use of the object
will crash.
So when objc release goes to
read the isa field it instead
gets a rotated free list
pointer.
When it dreferences the rotated
free list pointer it crashes.
The memory allocator did that
for us, it deliberately rotated
that pointer to make sure we
would crash if we tried to use
it again.
So that is the signature we see
in this crash log.
We had the invalid address field
looks like a pointer in the
malloc region but rotated the
same way that malloc rotates its
free list pointers.
So that's a strong sign that
whatever object we are trying to
release at this point in the
code has already been
deallocated, that's the memory
error that occurred.
So that's additional detail of
what our story is.
We have our object being
deallocated, we're cleaning up
its ivars, one of those ivars
was already a freed object and
that's what caused our crash.
Can we do better?
Can we figure out which object
was being released by objc
release?
Ordinarily, the function calling
objc release would give us a
clue as to what that was.
But the problem with the ivar
destroyer function is it is a
compiler generated function.
We didn't write a function
called ivar destroyer, which
means there's no filename or
line number associated with this
point in the crash, we don't
know which of our properties was
being deallocated at that point.
Here's our class, we have three
properties in this class, we
have a username, a database, and
an array of views.
And at this point in our story
we don't know which of these
objects was the one being
released, it could've been any
of them.
Can we do better, can we work
out which of those objects was
the one being released from the
information in the crash log
because of course if we can't
reproduce it in the debugger the
crash log is all we have to go
on?
In this case we can do better.
We can see a +42 where the file
and line number would've been.
And the +42 is our clue because
the +42 is an offset in the
assembly code of the function.
We can disassemble the ivar
destroyer function, look at the
code and work out which property
was being accessed at offset 42.
So how does that work?
We go into the debugger, into
the debugger console.
We can run lldb at a terminal.
We can run lldb in the Xcode
debug terminal.
The debugger has commands to
import a crash log a little bit
as if it crashed inside the
debugger.
So we run this command to load
the crash log interpretation
commands and then we run another
command to import our crash into
the debugger.
So we want three things to make
this work.
We want a copy of our crash log
on the Mac, we also want a copy
of our app, and a copy of the
dSYM file that went with the
app, all of them matching this
crash log, all the same version
of our app.
So this is why we want you to
keep your app archives.
If we have all these files on
the Mac, we run the crash log
command, lldb uses Spotlight to
find the matching executable,
find the matching symbols, and
load it into the debugger.
So we see the stack trace of our
crash thread, we see the file
and line number information
where available, and now we can
go to work.
Now we can find the address of
the ivar destroyer function and
disassemble it.
This shows us the assembly code
of our function.
Now I don't have time to teach
you how to read assembly code,
but luckily for crash logs you
don't actually have to be
completely fluent in reading
assembly code.
It often is sufficient to be
able to skim the assembly code
and get a general idea of what
is going on, you don't have to
understand every single
instruction to be able to get
useful information out of a
crash log.
If we read through this function
and we know about the call
instruction and the jump
instruction which are how you
call functions, we can separate
this code into three blocks.
We have this top section which
is making a function call into a
reference count releasing
function and this one is
releasing our username property.
The next region is releasing the
database property.
And the next region is releasing
the views property.
So we don't understand what all
these instructions mean, but we
have a general sense of what
each region in the code is
doing.
It's a little bit like having a
line number associated with the
code.
So now we go back to the
information in our crash log.
We had the ivar destroyer
function +42 calling objc
release.
So there is instruction at +42,
but there is one more catch.
The catch is in a stack trace
the assembly level offset of
most stack frames is the return
address, it is the instruction
after the function call.
So the instruction of the called
objc release was the previous
instruction, it's this
instruction.
If we read this it's the call of
objc release which is good,
that's consistent with what we
saw in the stack trace in the
crash log which was a call to
objc release at this offset.
And this release function is
releasing the database property.
So now we have more detail about
what our crash is doing.
We released the username
property and that was
successful.
We have not yet gotten to the
views property, it might be
valid, it might be invalid we
don't know.
What we do know is we tried to
release the database property
and that object looked like it
was already freed object based
on the malloc free list pointer
signature.
So that gives us a pretty good
story of what caused this crash.
We were freeing a login view
controller object and the
database property was invalid.
So we haven't actually found a
bug yet, none of this code is
incorrect, the ivar destroyer
function is not wrong, something
else was buggy in our code.
But from the crash log we have
been able to narrow down where
our testing should go, where our
attempts to reproduce the bug
should go.
We should be exercising this
class, we should be exercising
the database field, we should be
reading the code that uses that
database object and try and find
the bug.
So what have we just done?
We read a crash log from
scratch.
We started with the crash
reason, we read the exception
type, we understand what the
exception type means.
We examined the stack trace of
the thread that crashed,
understand what it was doing,
and what the actual error that
failed was.
And we looked for clues
elsewhere in the crash log, in
this case we used the bad
address of the memory error and
we used the disassembly of the
crashing function.
So memory errors are a wide
variety of crashes, there are
many different signatures in
crash logs that can be caused by
memory errors.
Here are some examples for you.
Crashes in the Objective-C
message send function or in the
reference counting machinery or
deallocation machinery in Swift
and Objective-C are very
commonly caused by memory
errors.
Another common memory error
symptom is an unrecognized
selector exception.
These are often caused when you
have an object of some type, a
code that is using that object,
and then the object gets
deallocated and used again.
But instead of getting the
malloc free list signature that
we saw in that previous crash
log, instead a new object is
allocated at the same address
where the old object used to be.
So when the code tries to use
the old object, call a function
on the old object, we have a
different object of a different
type at that same address and it
doesn't recognize that function
at all and we get an
unrecognized selector exception.
One other common symptom of
memory errors is an abort inside
the memory allocator itself,
inside the malloc and free
functions.
This is an example of a
precondition like we saw
earlier, this is a precondition
inside the memory allocator.
It may be identifying cases
where the heap data structure of
the malloc memory itself has
been corrupted by a memory error
and it halts the process and
response.
Or it may be detecting incorrect
use of the malloc APIs.
For example, if you free an
object twice in a row the malloc
allocator can sometimes
recognize that as a double free
and immediately halt the
process.
So let me give you some final
tips for analyzing crash logs in
general and analyzing memory
errors in particular.
In the crash we just looked at
we spent most of our time
looking at the code that
crashed, the specific lines of
code that crashed, and the
thread that crashed.
It is important to look at the
other code in your process
related to the code that
crashed.
For example, in this crash the
ivar destroyer function is not
wrong, that's not where the bug
is.
The bug is somewhere else, some
other code is incorrect.
You should also look at the
stack traces in the crash log
other than the crashed thread.
The crash log contains all the
stack traces in the process and
that can contain useful
information and clues for
helping you puzzle out what the
process was doing.
Perhaps the other threads show
more details about where in the
application it was running,
maybe it was performing network
code and that's visible on one
of the other stack traces.
Or perhaps there's a
multithreading error and the
other threads may provide clues
as to what the thread race was.
You should also look at more
than one crash log for a
particular crash cause.
The Xcode Organizer helpfully
groups your crashes based on
where in the code they crashed.
And sometimes there will be
multiple crashes at a same crash
point, but some of the logs will
contain more information than
others.
For example, the malloc free
list signature that we just saw
that might appear in some crash
logs but not in others.
So it is useful to browse
through multiple crashes in the
same crash set to see if some of
them have information that is
more useful than others.
In addition, the Organizer as a
group crashes together will
sometimes put crashes with
different causes together in the
same group.
There may be other threads or
the backtrace of a crashed
thread that identify to you, to
a human eye you will recognize
that there are multiple causes
of this group of crashes even
though the Xcode Organizer put
them all together.
If you only looked at one crash
log you might not even know that
that second crash is occurring
until you fix the first one and
ship it and start getting crash
logs from your users.
Once you've done some analysis
of your crash, once you've
narrowed down where in your
process it might be occurring or
which objects it might've been
using you can use tools like the
Address Sanitizer and the
Zombies instrument to try and
reproduce the crash.
Because even though we did a
good job with the malloc free
list crash log of narrowing down
what happened it's a lot easier
to debug crashes when they occur
in a debugger, in a test with
Sanitization error messages
telling you what went on.
So a moment ago I mentioned
looking up multiple stack
traces, multiple thread stacks
in order to diagnose
multithreading errors.
And to talk more about debugging
multithreading errors let me
introduce Kuba.
[ Applause ]
>> Thank you.
Thank you.
As Greg mentioned, some memory
corruptions can be caused by
multithreading issues.
Multithreading bugs are often
one of the hardest types of bugs
to diagnose and reproduce.
They get especially hard to
reproduce because they only
cause crashes once in a while,
so your code might seem to work
fine in 99% of cases.
And these bugs can go unnoticed
for a very long time.
Often monitoring bugs cause
memory corruptions and what
you'll see in crash logs looks
just like memory corruptions as
well and we've seen examples of
those in the previous section.
So when you're dealing with
crashes inside the malloc or
free or retained counting
operations those are typical
symptoms of memory corruptions.
There's specific symptoms of
multithreading bugs as well.
The crashing thread often
contains, sorry the crash log
often contains multiple threads
that are executing related parts
of your code.
So if a particular class or a
method shows up in a crash log
in multiple threads that's an
indication of a potential
multithreading bug.
The memory corruptions that are
caused by multithreading issues
are often very random.
So you might see crashes
happening at slightly different
lines of code or slightly
different addresses.
And as Greg mentioned, you can
see those show up as different
crash points in Xcode even
though it's the same bug.
And also the crashing thread
might not actually be the
culprit of the bug, so it's
important to look at the stack
traces of the other threads in a
crash log.
So now let's take a closer look
at an example of a
multithreading bug and I will
show you how we can diagnose
such a bug using a tool called
Thread Sanitizer which is part
of Xcode.
So let's take another look at
our cookies recipe app that
Chris and I made and let's look
at some more crash logs that we
received from our users.
Let's focus on this second app
crasher of our app, this one.
This crash log indicates that
there's something wrong going on
when we're doing with a class
called LazyImageView, which is a
class that I wrote and we'll
look at it in just a second.
But let's try to understand more
from the crash logs first.
Let's look at the entire stack
of this thread which I can do by
clicking this button right here
that also shows all other
threads.
And if we focus at the top most
frames here we'll see that
what's really going on is that
the free function is calling
abort which indicates a heap
corruption, a type of memory
error.
If we look at the stack traces
of some other threads, like
thread 5 here, we'll see that
it's executing also some code
inside LazyImageView.
Let's look at another crash in
this group of crashes and we'll
see that this a common theme in
all these crash logs.
One thread is reporting a heap
corruption when the free
function is calling abort and a
secondary thread is processing
in a very related part of the
code, actually in the same
class, inside LazyImageView
again.
That's most likely not a
coincidence, I highly suspect
that this is a multithreading
issue.
So let's actually take a look at
our LazyImageView class.
So I'll click this button right
here to jump straight to this
line of code and open it in our
project.
And you can see the source of
LazyImageView right here.
It's a subclass of UIImageView,
but it has an extra feature
because it loads images lazily
and asynchronously.
And we can the logic for this
right here in the initializer.
What we do is we dispatch a job
to a background queue where we
will create the image on a
background thread and once that
is done we will dispatch back to
the main queue to actually
display the image on the screen.
The crash log points us to this
line of code here where we are
accessing an image cache which
we are using to make sure that
we don't unnecessarily create
the same image multiple times.
So maybe there is some bug in
the way my cache is implemented.
Let's try to avoid guessing,
instead I'll run the app in the
Simulator and we will try to
reproduce this crash.
And let me close the crash log
session first.
All right, so this is our
cookies recipe again and you'll
notice that if I try to add a
new recipe by clicking this +
button right here we get asked
to select an image for our new
recipe.
So this controller which is on
the screen right now displays
all these images using
LazyImageView.
So just showing it up on the
screen and scrolling through the
content already exercises all of
the code inside LazyImageView,
but I don't see any crashes.
Unfortunately, it's a very
common problem with
multithreading bugs, they are
notoriously hard to reproduce.
So even when you repeatedly test
the code that has such a bug you
might not actually see a crush.
Let's try actually doing that,
let's try closing and opening
this controller a couple times
and let's see if maybe
eventually we will be lucky and
we will be able to trigger this
crash.
All right here we go, the
debugger has stopped because it
has crashed, but even when you
actually do catch this crash in
the debugger it doesn't really
help us.
All the debugger provides is
that this is some sort of XC bad
excess but it does not what
caused the corruption or why did
it actually happen.
Luckily Xcode has a tool that's
perfect just for this situation
and it's called Thread Sanitizer
and that's what I'm going to use
here.
So let's open up the scheme
editor of our project which I
can do by clicking the name of
our app here and choosing Edit
Scheme.
And if I then switch over to the
tab called Diagnostics you'll
see that we have several runtime
diagnostic tools here, like
Address Sanitizer which is great
at finding buffer overflows.
Let me choose Thread Sanitizer
and I'll also select Pause on
Issues which means that the
debugger will break every time
that Sanitizer detects a bug.
Let's run the app in the
Simulator with Thread Sanitizer
enabled and let's see what
happens now if I try to add a
new recipe.
If I click the + button now
you'll see that he app gets
immediately stopped because
Thread Sanitizer found the bug.
And notice that I didn't have to
try multiple times, Thread
Sanitizer reproduces
multithreading issues extremely
reliably.
So let's look at some details of
this bug.
We see that it's a Swift access
race.
And if we look at the left side
into the debug navigator we get
even some more details about
this bug.
We see that there's two accesses
performed by two different
threads, here is thread 2 and
thread 4, that are both trying
to access the same memory
location at the same time, which
is not allowed.
So if we take a look at these
two lines of code that are
forming the race we see that
they are both accessing the
image cache again.
So since this is a data
structure that is shared between
multiple threads as we see here
it needs to be a thread data
structure.
So let's look at how it's
implemented, let's jump to
storage here which is what we're
using and let's see if it
actually is thread safe.
So here's the source of image
cache, right here at the top of
the file and we can immediately
spot what's wrong.
This is just a plain Swift
dictionary so that's not good.
Swift dictionaries are not
thread safe by default.
So if we want to share a mutable
Swift dictionary between
multiple threads we have to
protect it with synchronization,
which means we have to make sure
that only one thread accesses it
at a time.
So now let's actually work on
fixing this problem and making
the class thread safe and I'll
do that in two steps.
First, I'll refactor this code a
little bit so that we have more
control over storage and then in
a second step I will use a
dispatch queue to actually make
this class thread safe.
So first of all what I don't
like here is that storage is
declared as a public variable.
That means that potentially any
code inside my app can be
accessing it.
And it will be really hard to
make sure that all the code in
my app does it correctly, so
let's change it to private
instead.
And let me introduce a different
way of accessing image cache and
I'll do that by introducing a
subscript.
That means that users of image
cache can use brackets to load
and store data from the cache.
So subscripting needs a getter,
like this one and a setter.
And for now let's just implement
these by directly accessing the
underlying storage.
And to make the rest of this
file actually build I need to
update the users, so instead of
accessing the storage property
we should be using brackets and
index into image cache directly
like this.
So if I hit Build Now you'll see
that the code will compile fine
now, but I didn't really fix any
bugs yet.
But I did achieve something, I
now have direct control over all
the code that can access
storage.
It's either going to be code in
this getter or the setter, no
other code in my app has access
to it.
So I'm in a very good position
to actually fix this Swift
access race.
And let me do that by using a
dispatch queue.
So let's create a new private
variable called queue and let's
assign a new dispatch queue into
it.
Dispatch queues are serial by
default, so this one is also
serial, which means that it will
only allow one piece of code to
execute inside that queue at a
time.
So that's perfect, that's
exactly what we need here.
How do we actually execute code
inside a dispatch queue?
We can use queue.sync and
whatever code I move into
queue.sync will be executed
inside that serial queue and
only one at a time.
And I can return a value here
because I need to return
something from the getter.
And I can fix the same thing in
the setter as well.
And if I move this line of code
into this queue.sync it will be
executed as part of that
dispatch queue.
So this way the code is now
thread safe because every single
line of code that accesses
storage is always executed
inside a serial dispatch queue,
which means it will only be
executed one at a time and is
[inaudible] thread safe.
Now it might be tempting to only
use the synchronization in the
setter where we are modifying
the storage and avoiding it in
the getter like this.
But that's not correct, that can
still cause memory corruptions
and still cause crashes.
Let me actually prove that to
you by running this version of
the code in the Simulator and
let's see if the Sanitizer finds
this little more subtle bug now.
As I expected it does.
We really have to protect both
the getter and the setter with
the synchronization.
So let me run my app one last
time in the Simulator and you
will see it if I try to a new
recipe this time.
The controller loads up just
fine and we don't get any more
warnings because the class is
now correctly thread safe.
So I can actually go back to our
Organizer window and mark this
crash as resolved because we
have found, identified, and
fixed this bug.
[ Applause ]
All right so what we've just
seen is that I've started with a
set of crash logs that all have
symptoms of a multithreading
bug.
And then I've used this tool
called Thread Sanitizer to
identify and eventually fix this
bug.
Thread Sanitizer not only
detects multithreading issues,
but it also makes them reproduce
much more reliably.
Notice that in the demo I didn't
have to invoke the controller
multiple times.
This tool works on macOS and in
the Simulator, but just like all
other runtime diagnostic tools
it only finds bugs in codes that
you exercise by actually running
it.
So you should keep that in mind
and make sure that your
automated or manual testing uses
Thread Sanitizer, especially on
any code that is using threading
or GCD.
And if you'd like to know more I
recommend that you watch a video
from my WWDC 2016 session called
Thread Sanitizer and Static
Analysis where we have
introduced this tool and we also
talked about how it works under
the hood.
As a reminder, Thread Sanitizer
can be found in the Scheme
Editor.
So if you go to Product, Scheme
and Edit Scheme that will bring
up the Scheme Editor.
And then we can switch over to
the Diagnostics tab where you
can find Thread Sanitizer among
the other runtime diagnostic
tools.
I have one more debugging tip
that I would like to share with
you that's useful when dealing
with multithreading.
When you are creating dispatch
queues you can provide a custom
label in the initializer.
You can assign a custom name to
an operation queue and if you're
working with threads you can use
custom names with threads as
well.
These names and labels are
displayed in the debugger but
they also appear in some types
of crash logs and that can help
you often narrow down possible
causes of multithreading bugs.
Okay so to stay on top of
crashes I have three takeaway
points for you.
Number one, always test your
apps on real devices before
submitting them to the App
Store, that will help you avoid
getting rejected in App Review.
Number two, when you do get
crashes from your users you
should always try to reproduce
them.
Look at the crash logs and look
at the stack traces and try to
figure out which parts of your
app do you need to exercise to
trigger a crash or to try to
reproduce the crash.
And finally, for crashes that
are hard to reproduce I
recommend using bug finding
tools, such as Address Sanitizer
or Thread Sanitizer which work
for editing memory corruption
bugs and multithreading
problems.
So now let's recap what we've
learned today.
Chris has showed us how we can
and should use the Organizer
window in Xcode to get
statistics and also details
about crash logs.
Greg showed us how to read and
analyze the text of crash logs.
In many cases, they can be
reproduced like if you are
dealing with app lunch timeouts.
Then we've crash log crashes
that are hard to reproduce
because they happen randomly
like memory corruptions.
And we have mentioned what signs
they leave in crash logs.
And finally, I've shown how bug
finding tools like the
Sanitizers can help you
reproduce issues that are memory
corruptions and threading issues
and I recommend that you use
these tools as well.
For more information please
visit our session's webpage
where we will also include links
to the technotes that we
mentioned and also the other
documents that provides
debugging tips that are helpful
when dealing with crashes.
And I'd like to remind you that
there is a crash logs lab
starting right after this
session at 12 p.m. in Technology
Lab 8, so please stop by if you
have any questions about crashes
and crash logs.
And enjoy the rest of WWDC,
thank you very much.
[ Applause ]