WWDC2018 Session 610

Transcript

[ Music ]
[ Applause ]
>> Hello, everybody.
I'm very excited to be here
today to talk about
understanding ARKit Tracking and
Detection to empower you to
create great augmented reality
experiences.
My name is Marion and I'm from
the ARKit Team.
And what about you?
Are you an experienced ARKit
developer, already, but you are
interested in what's going on
under the hood?
Then, this talk is for you.
Or you may be new to ARKit.
Then, you'll learn different
kind of tracking technologies,
as well as some basics and
terminology used in augmented
reality, which will then help
you to create your very own
first augmented reality
experience.
So, let's get started.
What's tracking?
Tracking provides your camera
viewing position and orientation
into your physical environment,
which will then allow you to
augment virtual content into
your camera's view.
In this video, for example, the
front table and the chairs is
virtual content augmented on top
of the real physical terrace.
This, by the way, is Ikea.
And the virtual content will
appear always virtually correct.
Correct placement, correct size,
and correct perspective
appearance.
So, different tracking
technologies are just providing
a difference reference system
for the camera.
Meaning the camera with respect
to your world, the camera with
respect to an image, or maybe, a
3D object.
And we'll talk about those
different kind of tracking
technologies in the next hour,
such that you'll be able to make
the right choice for your
specific use case.
We'll talk about the already
existing AR technologies'
Orientation Tracking, World
Tracking, and Plane Detection.
Before we then have a close look
at our new tracking and
detection technologies which
came out now with ARKit 2.
Which are saving and loading
maps, image tracking, and object
detection.
But before diving deep into
those technologies, let's start
with a very short recap of ARKit
like on a high level.
This is, specifically,
interesting if you are new to
ARKit.
So, the first thing you'll do is
create an ARSession.
An ARSession is the object that
handles everything from
configuring to running the AR
technologies.
And also, returning the results
of the AR technologies.
You then, have to describe what
kind of technologies you
actually want to run.
Like, what kind of tracking
technologies and what kind of
features should be enabled, like
Plane Detection, for example.
You'll then, take this specific
ARConfiguration and call run
method on your instance of the
ARSession.
Then, the ARSession, internally,
will start configuring an
AVCaptureSession to start
receiving the images, as well as
a Core Motion  manager to begin
receiving the motion sensor, so,
data.
So, this is, basically, the
built-in input system from your
device for ARKit.
Now, after processing the
results are returned in ARFrames
at 60 frames per second.
An ARFrame is a snapshot in time
which gives you everything you
need to render your augmented
reality scene.
Like, the captured camera image,
which would then be, which will
be rendered in the background of
your augmented reality scenario.
As well as a track camera
motion, which will then be
applied to your virtual camera
to render the virtual content
from the same perspective as the
physical camera.
It also contains information
about the environment.
Like, for example, detected
planes.
So, let's now start with our
first tracking technology and
build up from there.
Orientation Tracking.
Orientation Tracking tracks,
guess what?
Orientation.
Meaning it tracks the rotation,
only.
You can think about it as you
can only use your hat to view
virtual content, which also,
only allows rotation.
Meaning you can experience the
virtual content from the same
positional point of view, but no
change in the position is going
to be tracked.
The rotation data is tracked
around three axles.
That's why it's also, sometimes,
called the three degrees of
freedom tracking.
You can use it, for example, in
a spherical virtual environment.
Like, for example, experience a
360-degree video, in which the
virtual content can be viewed
from the same positional point.
You can also, use it to augment
objects that are very far away.
Orientation Tracking is not
suited for physical world
augmentation, in which you want
to view the content from
different points of views.
So, let's now have a look at
what happens under the hood when
Orientation Tracking is running.
It is, actually, quite simple.
It only uses the rotation data
from core motion, which applies
sensor fusion to the motion
sensors data.
As motion data is provided at a
higher frequency than the camera
image, Orientation Tracking
takes the latest motion data
from Core Motion, once the camera
image is available.
And then, returns both results
in an ARFrame.
So, that's it.
Very simple.
So, please note that the camera
feed is not processed in
Orientation Tracking.
Meaning there's no computer
version under the hood here.
Now, to run Orientation Tracking
you only need to configure your
ARSession with an AROrientation
TrackingConfiguration.
The results will then be
returned in an ARCamera object
provided by the ARFrames.
Now, an ARCamera object always
contains the transform, which in
this case of Orientation
Tracking, only contains the
rotation data of your tracked
physical camera.
Alternatively, the rotation is
also represented in eulerAngles.
You can use whatever fits best
to you.
Let's now move over to more
advanced tracking technologies.
We'll start with World Tracking.
World Tracking tracks your
camera viewing orientation, and
also, the change in position
into your physical environment
without any prior information
about your environment.
Here, you can see on the left
side the real camera's view into
the environment, while on the
right side you see the tracked
camera motion while exploring
the world represented in the
coordinate system.
Let's now explain better, what
happens here, when World
Tracking is running.
World Tracking uses a motion
sensor, the motion data of your
device's accelerometer and
gyroscope to compute its change
in orientation and translation
on a high frequency.
It also provides its information
in correct scale in meters.
In literature, just this part of
the tracking system is also
called Inertial Odometry.
While this motion data provides
good motion information for
movement across small time
intervals and whenever there's
like, sudden movement, it does
drift over larger time intervals
as the data is not ideally
precise and subject to
cumulative errors.
That's why it cannot be used
just by its own for tracking.
Now, to compensate this drift,
World Tracking, additionally,
applies a computer vision
process in which it uses the
camera frames.
This technology provides a
higher accuracy, but at the cost
of computation time.
Also, this technology is
sensitive to fast camera motions
and this results in motion blur
in the camera frames.
Now, this vision only part of
the system is also called Visual
Odometry.
Now, by fusing those both
systems, computer vision and
motion, ARKit takes the best of
those both systems.
From computer vision, it takes
a high accuracy over the larger
time intervals.
And from the motion data it
takes the high update rates and
good precision for the smaller
time intervals, as well as the
metric scale.
Now, by combining those both
systems World Tracking can skip
the computer vision processing
for some of those frames, while
still keeping an efficient and
responsive tracking.
This frees CPU resources, which
you can then, additionally, use
for your apps.
In Literature, this combined
technology is also called Visual
Inertial Odometry.
Let's have a closer look at the
visual part of it.
So, within the computer version
process interesting regions in
the camera images are extracted,
like here, the blue and the
orange dot.
And they are extracted such that
they can robustly all to be
extracted and other images of
the same environment.
Those interesting regions are
also called features.
Now, those features are then
matched between multiple images
over the camera stream based on
their similarity and their
appearance.
And what then happens is pretty
much how you are able to see 3D
with your eyes.
You have two of them and they
are within the sidewise small
distance.
And this parallax between the
eyes is important as this
results in slightly different
views into the environment,
which allows you to see stereo
and perceive the depth.
And this is what ARKit now,
also, does with the different
views of the same camera stream
during the process of
triangulation.
And it does it once there's
enough parallax present.
It computes the missing depth
information for those matched
features.
Meaning those 2D features from
the image are now reconstructed
in 3D.
Please, note that this
reconstruction to be successful,
the camera position must have
changed by a translation to
provide enough parallax.
For example, with the sidewise
movement.
The pure rotation does not give
enough information here.
So, this is your first small map
of your environment.
In ARKit we call this a World
map.
In this same moment, also, the
camera's positions and
orientations of your sequences
are computed, denoted with a C
here.
Meaning, your World Tracking
just initialized.
This is the moment of
initialization of the tracking
system.
Please note that also in this
moment of this initial
reconstruction of the World map,
the world origin was defined.
And it is set to the first
camera's origin of the
triangulated frames.
And it is also set to be gravity
aligned.
It's denoted with a W in the
slides.
So, you now have a small
representation of your real
environment reconstructed as a
World map in its own world
coordinates system.
And you have your current camera
tracked with respect to the same
world coordinate system.
You can now start adding virtual
content to augment them into the
camera's view.
Now, to place virtual content
correctly to an ARSession, you
should use ARAnchors from ARKit,
which are denoted with an A
here.
ARAnchors are reference points
within this World map, within
this world coordinates system.
And you should use them because
the World Tracking might update
them during the tracking.
Meaning that, also, all the
virtual content that is assigned
to it will be updated and
correctly augmented into the
camera's view.
So, now that you've used the
ARAnchors you can add virtual
content to the anchor, which
will them be augmented correctly
into the current camera's view.
From now on, this created 3D
World map of your environment is
your reference system for the
World Tracking.
It is used to reference new
images against.
And features are matched from
image to image and triangulated.
And at the same time, also, new
robust features are extracted,
matched, and triangulated, which
are then extending your World
map.
Meaning ARKit is learning your
environment.
This then allows, again, the
computation of tracking updates
of the current camera's position
and orientation.
And finally, the correct
augmentation into the current
camera's view.
While you continue to explore
the world, World Tracking will
continue to track your physical
camera and continue to learn
your physical environment.
But over time, the augmentation
might drift slightly, which can
be noticed like you can see in
the left image, in a small
offset of the augmentation.
This is because even small
offsets, even small errors will
become noticeable when
accumulated over time.
Now, when the device comes back
to a similar view, which was
already explored before, like
for example, the starting point
where we started the
exploration, ARKit can perform
another optimization step.
And this addition makes, a
Visual Intertial Odometry
system, makes the system that
ARKit supplies to a Visual
Inertial SLAM System.
So, let's bring back this first
image where the World Tracking
started the exploration.
So, what happens now is that
World Tracking will check how
well the tracking information
and the World map of the current
view aligns with the past views,
like the one from the beginning.
And will then perform the
optimization step and align the
current information and the
current World map with your real
physical environment.
Have you noticed that during
this step, also the ARAnchor was
updated?
And that is the reason why you
should use ARAnchors when adding
virtual content to your
scenario.
In this video, you can see the
same step again with a real
camera feed.
On the left side you see the
camera's view into the
environment, and also, features
which are tracked in the images.
And on the right side, you see a
bird eye's view onto the
scenario, showing what ARKit
knows about it and showing the
3D reconstruction of the
environment.
The colors of the points are
just encoding the height of the
reconstructed points with blue
being the ground floor and red
being the table and the chairs.
Once the camera returns back to
a similar view it has seen
before, like here the starting
point, ARKit will now apply this
optimization step.
So, just pay attention to the
point cloud and the camera
trajectory.
Have you noticed the update?
Let me show you, once more.
This update aligns the ARKit
knowledge with your real
physical world, and also, the
camera movement and results in
the better augmentation for the
coming camera frames.
By the way, all those
computations of World Tracking,
and also, all this information
about your learned environment,
everything is done on your
device only.
And all this information, also,
stays on your device only.
So, how can you use this complex
technology, now, in your app?
It is actually quite simple.
To run World Tracking you just
configure your ARSession with an
ARWorldTrackingConfiguration.
Again, its results are returned
in an ARCamera object of the
ARFrame.
An ARCamera object, again,
contains the transform, which in
this case of World Tracking
contains, additionally, to the
rotation, also, the translation
of the track camera.
Additionally, the ARCamera also
contains information about the
tracking state and
trackingStateReason.
This will provide some
information about the current
tracking quality.
So, tracking quality.
Have you ever experienced
opening an AR app and the
tracking worked very poorly or
maybe it didn't work at all?
How did that feel?
Maybe frustrating?
You might not open the app,
again.
So, how can you get a higher
tracking quality for your app?
For this, we need to understand
the main factors that are
influencing the tracking
quality.
And I want to highlight three of
them here.
First of all, World Tracking
relies on a constant stream of
camera images and sensor data.
If this is interrupted for too
long, tracking will become
limited.
Second, World Tracking also
works best in textured and
well-lit environments because
World Tracking uses those
visually robust points to map
and finally triangulate its
location.
It is important that there is
enough visual complexity in the
environment.
If this is not the case because
it's, for example, too dark or
you're looking against a white
wall, then also, the tracking
will perform poorly.
And third, also, World Tracking
works best in static
environments.
If too much of what your camera
sees is moving, then the visual
data won't correspond with the
motion data, which might result
in the potential drift.
Also, device itself should not
be on a moving platform like a
bus or an elevator.
Because in those moments the
motion sensor would actually
sense a motion like going up or
down in the elevator while,
visually, your environment had
not changed.
So, how can you get notified
about the tracking quality that
the user is currently
experiencing with your app?
ARKit monitors its tracking
performance.
We applied machine learning,
which was trained on thousands
of data sets to which we had the
information how well tracking
performed in those situations.
To train a classifier, which
tells you how tracking performs,
we used annotations like the
number of visual-- visible
features tracked in the image
and also, the current velocity
of the device.
Now, during runtime, the health
of tracking is determined based
on those parameters.
In this video, we can see how
the health estimate, which can
be seen-- which, is reported in
the lower left, gets worse when
the camera is covered while we
are still moving and exploring
the environment.
It also shows how it returns
back to normal after the camera
view is uncovered.
Now, ARKit simplifies its
information for you by providing
a tracking state.
And the tracking state can have
three different values.
It can be normal, which is the
healthy state and is the case in
most of the time.
It's the case in most of the
times.
And it can also be limited,
which is whenever tracking
performs poorly.
If that's the case, then the
limited state will also come
along with the reason, like
insufficient features or
excessive motion or being
currently in the initialization
phase.
It can also be not available,
which means that tracking did
not start yet.
Now, whenever the tracking state
changes, a delegate is called.
The camera did change tracking
state.
And this gives you the
opportunity to notify the user
when a limited state has been
encountered.
You should, then, give
informative and actionable
feedback what the user can do to
improve his tracking situation,
as most of it is actually in the
user's hand.
Like for example, as we learned
before, like a sidewise movement
to allow initialization or
making sure there's enough
adequate lighting for enough
visual complexity.
So, let me wrap up the World
Tracking for you.
World Tracking tracks your
camera 6 degree of freedom
orientation and position with
respect to your surrounding
environment and without any
prior information about your
environment, which then allows
the physical world augmentation
in which the content can
actually be viewed from any kind
of view.
Also, World Tracking creates a
World map, which becomes the
tracking's reference system to
localize new camera images
against.
To create a great user
experience, the tracking quality
should be monitored and feedback
and guidance should be provided
to your user.
And World Tracking runs on your
device only.
And all results stay on your
device.
If you have not done it already,
try out one of our developer
examples.
For example, the Build Your
First AR Experience, and play a
bit around, just 15 minutes with
the tracking quality in
different situations; light
situations or movements.
And always remember to guide the
user whenever he encounters a
limited tracking situation to
guarantee that he has a great
tracking experience.
So, World Tracking is about the
camera-- where your camera is
with respect to your physical
environment.
Let's now talk about how the
virtual content can interact
with the physical environment.
And this is possible with Plane
Detection.
The following video, again, from
the Ikea app, shows a great use
case for the Plane Detection,
placing virtual objects into
your physical environment, and
then interacting with it.
So first, please note how, also,
in the Ikea app the user is
guided to make some movement.
Then, once a horizontal plane is
detected, the virtual table set
is displayed and is waiting to
be placed by you.
Once you position it, rotate it
as you want it, you can lock the
object in its environment.
And did you notice the
interaction between the detected
ground plane and the table set
in the moment of locking?
It kind of bounces shortly on
the ground plane.
And this is possible because we
know where the ground plane is.
So, let's have a look at what
happened under the hood here.
Plane Detection uses the World
map provided by the world I just
talked about, just talked about
a moment ago, which is
represented here in those yellow
points.
And then, it uses them to detect
surfaces that are horizontal or
vertical, like the ground, the
bench, and the small wall.
It does this by accumulating
information over multiple
ARFrames.
So, as the user moves around the
scene, more and more information
about the real surface is
acquired.
It also allows the Plane
Detection to provide and like
extent the surface, like a
convex hull.
If multiple planes belonging to
the same physical surface are
detected, like in this part now,
the green and the purple one,
then they will be merged once
they start overlapping.
If horizontal and vertical
planes intersect they are
clipped at the line of
intersection, which is actually
a new feature in ARKit 2.
Plane Detection is designed to
have very little overhead as it
repurposes the mapped 3D points
from the World Tracking.
And then it fits planes into
those point clouds and over time
continuously aggregates more and
more points and merge the planes
that start to overlap.
Therefore, it takes some time
until the first planes are
detected.
What does that mean for you?
If your app is started, there
might not directly be planes to
place objects on or to interact
with.
If the detection of a plane is
mandatory for your experience,
you should again guide the user
to move the camera with enough
translation to ensure a dense
reconstruction based on the
parallax, and also, enough
visual complexity in the scene.
Again, for the reconstruction, a
rotation only is not enough.
Now, how can you enable the
Plane Detection?
It's, again, very simple.
As the Plane Detection reuses
the 3D map from the World
Tracking, it can be configured
by using the
ARWorldTrackingConfiguration.
Then, the property
planeDetection just needs to be
set to either horizontal,
vertical, or like in this case,
both.
And then, just call your
ARSession with this
configuration.
And the detection of the planes
will be started.
Now, how are those, the results
of the detected planes returned
to you?
The detected planes are returned
as an ARPlaneAnchor.
An ARPlaneAnchor is a subclass
of an ARAnchor.
Each ARAnchor provides a
transform containing the
information where the anchor is
in your World map.
Now, a plane anchor,
specifically, also has
information about the geometry
of the surface of the plane,
which is represented in two
alternative ways.
Either like a bounding box with
a center and an extent, or as a
3D mesh describing the shape of
the convex hull of the detected
plane and its geometry property.
To get notified about detected
planes, delegates are going to
be called whenever planes are
added, updated, or removed.
This will then allow you to use
those planes, as well as react
to any updates.
Now, what can you do with
planes?
Like what we've seen before on
the Ikea app, these are great
examples.
Place virtual objects, for
example, with hit testing.
Or you can interact with some,
for example, physically.
Like we've seen bouncing is a
possibility.
Or you can also use it by adding
an occlusion plane into the
detected plane, which will then
hide all the virtual content
below or behind the added
occlusion plane.
So, let me summarize what we've
already gone through.
We've had a look at the
Orientation Tracking, the World
Tracking, and the Plane
Detection.
Next, Michele will explain, in
depth, our new tracking
technologies, which were
introduced in ARKit 2.
So, welcome Michele.
[ Applause ]
>> Thank you, Marion.
My name is Michele, and it's a
pleasure to continue with the
remaining topics of this
session.
Next up is saving and loading
maps.
This is a feature that allows to
store all the information that
are required in a session.
So, that it can literally be
restored in another session at a
later point in time to create
augmented reality experiences
that persist to a particular
place.
Or that could, also, be stored
by another device to create
multiple user augmented reality
experiences.
Let's take a look at an example.
What you see here is a guy;
let's name him Andre, that's
walking around the table with
his device having an augmented
reality experience.
And you can see his device now
is making this seem more
interesting by adding a virtual
vase on the table.
A few minutes later his friends
arrive at the same scene.
And now, they're both looking at
the scene.
You're going to see Andre's
device on the left and his
friend on the right now.
So, you can see that they're
looking at the same space.
They can see each other.
But most importantly, they see
the same virtual content.
They're having a shared
augmented reality experience.
So, what we have seen in these
examples can be discovered in
three stages.
First, Andre went around the
table and acquired the World
map.
Then, the World map was shared
across devices.
And then, his friend's device
re-localized to the World map.
This means that ARKit was
able to understand in the new
device that this was the same
place as the other device,
computed the precise position of
the device with respect to the
map, and then, started tracking
from there just like the new
device acquired the World map
itself.
We're going to go into more
detail about these three phases.
But first, let's review what's
in the World map.
The World map includes all the
tracking data that are needed
for the system to be localized,
which includes the feature
points as Marion greatly
explained before.
As well as local appearance for
this point.
They also contain all the
anchors that were added to the
session, either by the users,
like planes, for example.
I mean by the system-- like
planes.
Or by the users, like the vase,
as we have seen in the example.
This data is serializable and
available to you so that you can
create compelling persistent or
multiple user augmented reality
experiences.
So, now let's take a look at the
first stage, which is acquiring
the World map.
We can play back the first video
where Andre went around the
table that you can see his
device on the left, here.
And on the right, you see the
World map from a top view as
acquired by the tracking system.
You can recognize the circle is the table
and the chair around it.
There's a few things to pay
attention to during this
acquisition process.
First, everything that Marion
said during tracking also
applies here.
So, we want enough visual
complexity on the scene to get
dense feature points on the map.
And the scene must be static.
Of course, we can deal with
minor changes, as you have seen
the tablecloth moving by the
wind.
But the scene must be mostly
static.
In addition, when we are
specifically acquiring a World
map for sharing we want to go
around the environment from
multiple points of view.
In particular, we want to cover
all the direction from which we
want to later be localized from.
To make this easy, we also made
available a world mapping status
which gives you information
about the World map.
If you guys have been to the
What's New in ARKit talk,
Arsalan greatly expand this
to quickly recap.
When you start the session the
World map status will start
limited.
And then, will switch to
extending as more of the scene is
learned by the device.
And then, finally, we go to
mapped when the system is
confident you're staying in the
same place.
And that's when you want to save
the map in the mapped state.
So, that's good information.
But this is mostly on the user
side applied to acquire the
session.
So, what does this mean to you
as a developer?
That you need to guide the user.
So, we can indicate the mapping
status and even disabling the
saving or sharing of the World
map until the mapping status
goes to the mapped state.
We can also, monitor the
tracking quality during the
acquisition session and report
to the user if the tracking
state has been limited for more
than a few seconds.
And maybe even give an option to
restart the acquisition session.
On the receiving end of the
device, we can also guide the
user to better localization
process.
So, when we are, again, in the
acquisition device, when we are
in the map state we can take a
picture of the scene and then,
ship that together with the
World map.
And on the receiving end we can
ask the user find this view to
start your shared experience.
That was how to acquire the
World map.
Now, let's see how you can share
the World map.
First, you can get the World map
by simply calling the
getCurrentWorldMap method in the
ARSession.
And this will give you the World
map.
The World map is a serializable
class.
So, then we can simply use
NSKeyedArchiver utility to
serialize it to a binary stream
of data, which then, you can
either save to disk in case of a
single user persistent
application.
Or you can share it across
devices.
And for that, you can use the
MultiPeerConnectivity framework,
which has great feature like
automatic device, nearby device
discovery, and allows efficient
communication of data between
devices.
We also, have an example of how
to use that in ARKit called
Creating a Multiuser AR
Experience that you can check
out on our developer website.
On the receiving end of the
device, once you've got the
World map let's see how you can
set up the World Tracking
configuration to use it.
Very simple.
You just set the initial World
map property to that World map.
When you run the session, the
system will try to find that
previous World map.
But it may take some time, even
because the user may not be
pointing at the same scene as
before.
So, how do we know when
localization happen?
That information is available in
the tracking state.
So, as soon as you start the
session with the initial World
map, the tracking state will be
limited with reason
Relocalizing.
Note that you will still get the
tracking data available here,
but the world origin will be the
first camera, just like a new
session.
As soon as the user points the
device to the same scene, the
system will localize.
The tracking state will go to
normal and the world origin will
be the same as the recorded
World map.
At this point, all your previous
anchors are also available in
your session, so you can put
back the virtual content.
Note here that because of what's
happening behind the hood,
behind the scenes, is that we're
matching those feature points,
there needs to be enough visual
similarity between the scenes
where you acquired the World map
and the scene where you want to
relocalize.
So, if you go back to this table
at night, chances are it's not
going to work very well.
And that was how you can create
multiple user experiences or
persistent experiences using the
saving and loading map.
Next, image tracking.
So, augmented reality is all
about adding visual content on
top of the physical world.
And on the physical world,
images are found everywhere.
Think about art pieces, pieces of art hanging on the wall the
world, magazine covers,
advertisements.
Image tracking is a tool that
allows you to recognize those
physical images and build
augmented reality experiences
around them.
Let's see an example.
You can see here; two images
being tracked simultaneously.
On the left, a beautiful
elephant is put on top of the
physical image of the elephant.
On the right, the physical image
turned into a virtual screen.
Note also, that the images can
freely move around the
environment as tracking around
at 60 frames per second.
Let's talk about looking at
what's happening behind the
scenes.
So, let's say you have an image
like this one of the elephant
and you want to find it in a
scene like this.
We're using grayscale for this.
And the first type is pretty
similar to what we do in
tracking.
So, we'll track those
interesting points from both the
reference image and the current
scene.
And then, we try to go in the
current scene and match those
features to the one on the
reference image.
By applying some projected
geometry and linear algebra,
this is enough to give an
initial estimation of the
position orientation of the
image with respect to the
current scene.
But we don't stop here.
In order to give you a really
precise pose and track at 60
frames per second, we then do a
dense tracking stage.
So, with that initial estimate
we take the pixels from the
current scene and warp them back
to a rectangular shape like you
see on the right-- top right
there.
So, that's a reconstructed image
by warping the pixels of the
current image into the
rectangle.
We can then compare the
reconstructed image with a
reference image that we have
available to create an error
image like the one you see
below.
We then optimize the position
orientation of the image, such
that that error is minimized.
So, what this means to you that
the pose would be really
accurate.
Thank you.
And will still track at 60
frames per second.
So, let's see how we can do all
of this in ARKit.
As usual, the ARKit API is
really simple.
We have three simple steps.
First, we want to collect all
the reference images.
Then, we set up the AR Session
Configuration.
There are two options here.
One is the World Tracking
configuration that gives, also,
the device position.
And this is the one we have
talked, so far.
And in iOS12, introduced a new
configuration, which is a
standalone image tracking
configuration.
Once you start the session you
will start receiving the results
in the form of an ARImageAnchor.
We're now going into more
details of these three steps,
starting from the reference
images.
The easiest way to add reference
images to your application is
through the, Xcode asset
catalog.
You simply create an AR Resource
Groups and drag and drop your
images in there.
Next, you have to set the
physical dimension of the image,
which you can do on the property
window on the top right.
Setting the physical dimension
is a requirement and there's a
few reason for that.
First, it allows the pose of the
image to be in physical scale.
Which means, also, your content
will be in physical scale.
In ARKit, everything is in
meters, so also, your visual
content will be in meters.
In addition, it's especially
important to set the correct
physical dimension of the image
in case we combine the image
tracking with the World
Tracking.
As this will give immediately
consistent pose between the
image and the world.
Let's see some example of this
reference images.
You can see here, two beautiful
images.
These images will work really
great with image tracking.
They have high texture, high
level of contrast, well
distributed histograms, as well
as they do not contain
repetitive structures.
There are, also, other kinds of
images that will work less good
with the system.
You can see an example of this
on the right.
And if we take a look at these
top two examples, you can see
that the good image we have a
lot of those interesting points.
And you can see that the
histogram is well distributed
across the whole range.
While on the Snow image,
there's only a few of those
interesting points and the
histogram is all skewed toward
the whites.
You can get an estimation of how
good an image will be directly
in Xcode.
As soon as you drag an image in
there, the image is analyzed and
problems are reported to you in the form
of warnings to give you early
feedback, even before you run
your application.
For example, if you click on
this bottom image that could be
a magazine page, for example, we
can see that the Xcode says that
the histogram is not well
distributed.
In fact, you can see there's a
lot of whites in the image.
And it would also say that this
image contains repetitive
structures, mainly caused by the
text.
Another example, if you have two
images which are too similar and
are at risk of being confused at
detection time, also, Xcode
warns you about that.
You can see an example of these
two images of the same mountain
range, the Sierra.
There's a few things that we can
do to deal with this warning.
For example, let's go back to
this image that had repetitive
structures and not well
distributed histograms.
You can try to identify a region
of this image which is
distinctive enough, like in this
case, for example, the actual
image of the page.
And then, you can crop that out
and use this as the reference
image, instead.
Which will give you, of course,
all the warnings are going to be
removed and will give you better
tracking quality.
Another thing that we can do is
use multiple AR Resource Groups.
This allow many more images to
be detected.
As we recommend to have a
maximum of 25 images per group
to keep your experience
efficient and responsive.
But you can have as many groups
as you want.
And then, you can switch between
groups programmatically.
For example, if you are want to
create an augmented reality
experience in a museum that may
have hundreds of images.
Usually though, those images are
actually physically located in
different rooms.
So, what you can do is put the
images that physically will be
present in the room into a
group.
And images of another room into
another group.
And then use, for example, core
location to switch between
rooms.
Note also, that you can have
similar images, now, as long as
they are in different groups.
So, that was all about reference
images.
Let's now, see our two
configurations.
The ARImageTrackingConfiguration
is a new standalone image
tracking configuration, which
means it doesn't run the World
Tracking.
Which also, means there is no
world origin.
So, every image will be given to
you with respect to the current
camera view.
You can also combine image
tracking with a World Tracking
configuration.
And in this case, you will have
all the scene understanding
capability available like Plane
Detection, light estimation,
everything else.
So, what is more appropriate to
use which configurations?
Let's see.
So, in the
ARImageTrackingConfigurations is
really tailored for use cases
which are built around images.
We can see an example on the
left here.
We can have an image that could
be a page of a textbook.
And to make the experience more
engaging, we are overlaying
dynamic graph.
In this case, on how to build an
equilateral triangle.
So, you can see that this
experience is really tailored
around an image.
If you have, let's see this
other example.
Image tracking is used to
trigger some content that then
goes beyond the extent of the
image.
In this case, you want to use
the ARWorldTrackingConfiguration
as you will need the device
position to keep track of that
content outside the image.
Also, note that the image
tracking doesn't use the motion
data, which means it can also be
used on a bus or an elevator,
where the motion data don't
agree with the visual data.
So, let's see now, how we can do
this in code.
You can easily recognize those
three steps here.
The first one is to gather all
the images.
And there's a convenience
function for that in the
ARReferenceImage class that
gathers all the images that are
in a particular group.
In this case, it's named Room1.
We can then simply set the
trackingImages property to those
images in the
ARImageTrackingConfigurations.
And run the session.
You will then start receiving
the results, for example, to the
session:didUpdate anchors
delegate method, where you can
check if the anchors is of type
ARImageAnchor.
In the anchor, you will find, of
course, the position and
orientation of the image, as
well as the reference image
itself.
Where you can find, for example,
the name of the image as you
named it in the asset catalog so
that you know which image has
been detected.
There's also another Boolean
property, which tells you if
this image is currently being
tracked in the frame.
Note here that other than these
use cases that we have seen so
far when you build experiences
around images, image detection
and tracking allows a few more
things.
For example, if two devices are
looking at the same physical
image, you can detect this image
from both devices.
And this will give you a shared
coordinate system that you can
then use as an alternative way
to have a shared experience.
Another example, if you happen
to know where an image is
physically located in the world,
like for example, you know that
the map of this park is in the
physical world.
You can use image tracking to
get the position of the device
with respect to the image and,
therefore, also the position of
the device with respect to the
world, which, you can then use,
for example, to overlay
directions really attached to
the physical world.
So, that concludes the image
tracking.
Let's now go and look at the
Object Detection.
So, with image tracking we have
seen how we can detect images,
which are planar objects in the
physical world.
Object detection extends this
concept to the third dimension
allowing the detection of more
generic objects.
Note, though, that this object
will be assumed to be static in
the scene, unlike images that
can move around.
We can see an example here.
That's the Nefertiti bust.
It's a statue that could be
present in a museum.
And now, you can detect it with
ARKit.
And then, for example, display
some information on top of the
physical object.
Note also that in the object
detection in ARKit, we are
talking about specific instances
of an object.
So, we're not talking about
detecting statues in general,
but this particular instance of
the Nefertiti statue.
So, how do we represent these
objects in ARKit?
You first need to scan the
object.
So, really, there's two steps to
it.
First, you scan the object and
then you can detect it.
Let's talk about the scanning
part, which mostly is going to
be on your side as a developer,
to basically, create that
representation of the object
that can be used for detection.
Internally, an object is
represented in a similar way as
the world map.
You can see an example of the 3D
feature points of the Nefertiti
statue there on the left.
And to scan the object, you can
use the Scanning and Detecting
3D Objects developer sample
that's available on the website.
And note here, that the
detection quality that you will
get at runtime, later, is highly
affected by the quality of the
scan.
So, let's spend a few moments to
see how we can get the best
quality during the scanning.
Once you build and run this
developer sample you will see
something like this on your
device.
The first step is to find the
region of space around your
object.
The application will try to
automatically estimate this
bounding box, exploiting
different feature points.
But you can always adjust this
box by dragging on a side to
shrink it or make it larger.
Note here, that what is really
important that when you go
around the object you make sure
that you don't cut any of the
interesting points of the
object.
You can also, rotate the box
with a two-finger gesture from
top.
So, make sure that this box is
around the object and not
cutting any interesting part of
it.
The next part is the actual
scanning.
In this phase what we want to do
is really go around the objects
from all the points of view that
you think your users will want
to detect it later.
In order to make it easy for you
to understand which part of the
objects have been, already,
acquired like this beautiful
tile representation.
And you also can see a
percentage on top which tells
you how many tiles have already
been acquired.
And it's really important in
this phase that you spend time
on the regions of the object
which have a lot of features
that are distinctive enough.
And you go close enough to
capture all the details.
And again, that you really go
around from all the sides.
Like you see here.
Once you're happy with the
coverage of your objects, you
can go to the next step, which
is allows you to adjust the
origin by simply dragging on the
coordinate system.
And this will be the coordinate
system that will be later given
to you at detection time in the
anchor.
So, make sure that you put it in
a place which makes sense for
your virtual content.
So, at this point, you have a
full representation of your
object, which you can use for
detection.
And the application will now
switch to a detection mode.
We encourage you to use this
mode to get early feedback about
the detection quality.
So, you may want to go around
the object from different points
of view and verify that the
object is detected from all
these different point of view.
You can point your device away,
come back from another angle,
and make sure that the scan was
good to detect the object.
You can also, move these objects
around so that the light
condition will be different.
And you want to make sure that
those are still detected.
This is particularly important
for objects like toys that you
don't know where they're
actually going to be physically
located.
We, also, suggest that you take
the object and put it in a
completely different environment
and still make sure that it is
detected.
In case this is not detected you
may want to go back to the
scanning and make sure that your
environment is well lit.
We really like, well lit
environment during the scanning
is very important.
If if you have a lux meter, it will
be about 500 lux will be best.
And if that is still not enough,
you may want to keep different
versions of the scans.
So, at this point, once you're
happy with the detection quality
you can simply drop the model to
your Mac and add it to the AR
Resource Groups, just like you
did for the images.
Also note that there are some
objects that will work really
great with this system.
Object like you can see on the
left.
First of all, they are rigid
objects and they are, also, rich
of texture, distinctive enough.
But there are also certain kinds
of object that will not work
well with the system.
You can see an example of this
on the right.
And for example, metallic,
transparent, or metallic or
reflective objects will not
work.
Or transparent objects like
glass material object will also
not work because the appearance
of these objects will really
depend on where they are in the
scene.
So, that was how to scan the
objects.
Again, make sure that you have
well-lit environment.
Let's now see how we can detect
this in ARKit.
If this looks familiar to you,
it's because the API is pretty
similar to the one of the
images.
We have a convenience method
to gather all the objects in a
group.
This time is in the
ARReferenceObjects class.
And to configure your
ARWorldTracking configuration,
you simply pass this object to
the detectionObjects property.
Once you run the session, again,
you will find your results.
And in this case, you want to
check for the ARObjectAnchor,
which will give you the position
and orientation of the object
with respect to the world.
And also, the name of the object
as was given in the asset
catalog.
So, you guys may have noticed
some similarities between the
object detection and the world
mapping relocalization.
But there's also few
differences.
So, in the case of the object
detection we are always giving
the object position with respect
to the world.
While in the world map
relocalization is the camera
itself that adjusts to the
previous world map.
In addition, you can detect
multiple objects.
And object detection works best
for objects which are tabletop,
furniture sized.
While, the world map is really
the whole scene that's been
acquired.
With this insight, we conclude the
object detection.
Let's summarize what you have
seen, today.
Orientation tracking tracks only
the rotation of the device and
can be used to explore statical
environments.
World Tracking is the fully
featured position and
orientation tracking, which will
give you the device position
with respect to a world origin.
And enables all the scene
understanding capabilities like
the Plane Detection, which will
make you able to interact with
the physical, horizontal, and
vertical planes where you can
then put virtual objects.
We have seen how you can create
persistent or multiuser
experiences with the saving and
loading map features in the
ARKit2.
And how you can detect physical
images and track them at 60
frames per second with the image
tracking and how you can detect
more generic objects with the
object detections.
And with this, I really hope you
guys have a better
understanding, now, of all the
different tracking technology
that are present in ARKit and
how they work behind the scenes.
And how you can get the best
quality out of it.
And we're really looking forward
to see what you guys are going
to do with that.
More information can be found at
the session link in the
developer website.
And we have an ARKit Lab
tomorrow, 9 a.m.
We will both, me and Marion will
be there answering any question
on ARKit you may have.
And with that, thank you, very
much and enjoy the bash.
[ Applause ]