WWDC2017 Session 602

Transcript

>> Good afternoon.
[ Applause ]
Welcome to our session
introducing ARKit.
My name is Mike.
I'm an engineer from ARKit team.
And today I'm thrilled to talk
to you about the concepts as
well as the code that go into
creating your very own augmented
reality experience on iOS.
[ Cheering and Applause ]
Thank you.
I know many of you are eager to
get started with augmented
reality.
Let's show you just how easy it
is using ARKit.
But first, what is augmented
reality?
Augmented reality is creating
the illusion that virtual
objects are placed in a physical
world.
It's using your iPhone or your
iPad as a lens into a virtual
world based on what your camera
sees.
Let's take a look at some
examples.
We gave a group of developers
early access to ARKit.
And here's what they made.
This is a sneak peek at some
things you might see in the near
future.
Within, a company focused on
immersive storytelling,
tells the story of Goldilocks
using AR.
Transforming a bedroom into a
virtual storybook, they allow
you to progress a story by
reciting the text, but even more
importantly, they allow you to
explore the scene from any
angle.
This level of interactivity
really helps bring your virtual
scene alive.
Next, Ikea used ARKit in order
to redesign your living room.
[ Applause ]
By being able to place virtual
content next to physical
objects, you open up a world of
possibilities to your users.
And last, games.
Pokemon Go, an app that you've
probably already heard of, used
ARKit to take catching Pokemon
to the next level.
By being able to anchor your
virtual content in the real
world, you really allow for a
more immersive experience than
previously possible.
But it doesn't stop there.
There are a multitude of ways
that you can use augmented
reality to enhance your user
experience.
So let's see what goes into
that.
There's a large amount of domain
knowledge that goes into
creating augmented reality.
Everything from computer vision,
to sensor fusion, to talking to
hardware in order to get camera
calibrations and camera
intrinsics.
We wanted to make this all
easier for you.
So today we're introducing
ARKit.
[ Applause ]
ARKit is a mobile AR platform
for developing augmented reality
apps on iOS.
It is a high level API providing
a simple interface to a powerful
set of features.
But more importantly, it's
rolling out supporting hundreds
of millions of existing iOS
devices.
In order to get the full set of
features for ARKit, you're going
to want an A9 and up.
This is most iOS 11 devices,
including the iPhone 6S.
Now let's talk about the
features.
So what does ARKit provide?
ARKit can be broken up into
three distinct layers, the first
of which is tracking.
Tracking is the core
functionality of ARKit.
It is the ability to track your
device in real time.
With world tracking we provide
you the ability to get your
device's relative position in
the physical environment.
We use visual inertial odometry,
which is using camera images, as
well as motion data from your
device in order to get a precise
view of where your device is
located as well as how it is
oriented.
But also, more importantly,
there's no external setup
required, no pre-existing
knowledge about your
environment, as well as no
additional sensors that you
don't already have on your
device.
Next, building upon tracking we
provide scene understanding.
Scene understanding is the
ability to determine attributes
or properties about the
environment around your device.
It's providing things like plane
detection.
Plane detection is the ability
to determine surfaces or planes
in the physical environment.
This is things like the ground
floor or maybe a table.
In order to place your virtual
objects, we provide hit testing
functionality.
So this is getting an
intersection with the real world
topology so that you can place
your virtual object in the
physical world.
And last, scene understanding
provides light estimation.
So light estimation is used to
render or correctly light your
virtual geometry to match that
of the physical world.
Using all of these together we
can seamlessly integrate virtual
content into your physical
environment.
And so the last layer of ARKit
is rendering.
For rendering we provide easy
integration into any renderer.
We provide a constant stream of
camera images, tracking
information as well as scene
understanding that can be
inputted into any renderer.
For those of you using SceneKit
or SpriteKit, we provide custom
AR views, which implement most
of the rendering for you.
So it's really easy to get
started.
And for those of you doing
custom rendering, we provide a
metal template through Xcode,
which gets you started
integrating ARKit into your
custom renderer.
And one more thing, Unity and
UnReal will be supporting the
full set of features from ARKit.
[ Applause ]
So, are you guys ready?
Let's get started.
How do I use ARKit in my
application?
ARKit is a framework that
handles all of the processing
that goes into creating an
augmented reality experience.
With the renderer of my choice,
I can simply use ARKit to do the
processing.
And it will provide everything
that I need to render my
augmented reality scene.
In addition to processing, ARKit
also handles the capturing that
is done in order to do augmented
reality.
So using AVFoundation and Core
Motion under the hood, we
capture images as well as get
motion data from your device in
order to do tracking and provide
those camera images to your
renderer.
So now how do I use ARKit?
ARKit is a session-based API.
The first thing you need to do
to get started is simply create
an ARSession.
ARSession is the object that
controls all of the processing
that goes into creating your
augmented reality app.
But first I need to determine
what kind of tracking I want to
do for my augmented reality app.
So, to determine this we're
going to create an AR session
configuration.
AR session configuration, and
its subclasses determine what
tracking you want to run on your
session.
By enabling and disabling
properties, you can get
different kinds of scene
understanding and have your
ARSession do different
processing.
In order to run my session, I
simply call the Run method on
ARSession providing the
configuration I want to run.
And with that, processing
immediately starts.
And we also set up the capturing
underneath.
So under the hood you'll see
there's an AV capture session
and a CM motion manager that get
created for you.
We use these to get image data
as well as the motion data
that's going to be used for
tracking.
Once processing is done,
ARSession will output ARFrames.
So an ARFrame is a snapshot in
time, including all of the state
of your session, everything
needed to render your augmented
reality scene.
In order to access ARFrame, you
can simply call or pull the
current frame property from you
ARSession.
Or, you can set yourself as the
delegate to receive updates when
new ARFrames are available.
So let's take a closer look at
ARSessionConfiguration.
ARSession configuration
determines what kind of tracking
you want to run on your session.
So it provides different
configuration classes.
The base class,
ARSessionConfiguration, provides
three degrees of freedom
tracking, which is just the
orientation of your device.
Its subclass, ARWorldTracking
Session Configuration provides
six degrees of freedom tracking.
So this is using our core
functionality world tracking in
order to get not only your
device's orientation, but also a
relative position of your
device.
With this we also get
information about the scene.
So we provide scene
understanding like feature
points as well as physical
positions in your world.
In order to enable and disable
features, you simply set
properties on your session
configuration classes.
And session configurations also
provide availability.
So if you want to check if world
tracking is supported on your
device, you simply need to call
the class property isSupported
on ARWorldTracking Session
Configuration.
With this you can then use your
World Tracking Session
Configuration or fall back to
the base class, which will only
provide you with three degrees
of freedom.
It's important to note here that
because the base class doesn't
have any scene understanding
functionality like hit tests
won't be available on this
device.
So we're also going to provide a
UI required device capability
that you set in your app so that
your app only appears in the App
Store on devices that support
World Tracking.
Next, let's look at ARSession.
ARSession, again, is the class
that manages all of the
processing for your augmented
reality app.
In addition to calling Run with
a configuration, you can also
call Pause.
So Pause allows you to
temporarily stop all processing
happening on your session.
So if your view is no longer
visible, you may want to stop
processing to stop using CPU and
no tracking will occur during
this pause.
In order to resume tracking
after a pause, you can simply
call Run again with the stored
configuration on your session.
And last, you can call Run
multiple times in order to
transition between different
configurations.
So say I wanted to enable plane
detection, I can change my
configuration to enable plane
detection, call Run again on my
session.
My session will automatically
transition seamlessly between
one configuration and another
without dropping any camera
images.
So with the Run command we also
provide resetting of tracking.
So there's Run options that you
can provide on the Run command
in order to reset tracking.
It'll reinitialize all of the
tracking that's going on.
And your camera position will
start out again at 000.
So this is useful for your
application if you want to reset
it to some starting point.
So how do I make use of
ARSessions processing?
There's session updates
available by setting yourself as
the delegate.
So in order to get the last
frame that was processed, I
could implement session
didUpdate Frame.
And this will give me the latest
frame.
For error handling, you can also
implement things like session
DidFailWithError.
So this is in the case of the
fatal error.
Maybe you're running a device
that doesn't support World
Tracking.
You'll get an error like this.
And your session will be paused.
The other way to make use of
ARSessions processing is to pull
the current frame property.
So now, what does an ARFrame
contain?
Each ARFrame contains everything
you need to render your
augmented reality scene.
The first thing it provides is a
camera image.
So this is what you're going to
use to render the background of
your scene.
Next, it provides tracking
information, or my device's
orientation as well as location
and even tracking state.
And last, it provides scene
understanding.
So, information about the scene
like feature points, physical
locations in space as well as
light estimation, or a light
estimate.
So, physical locations in space,
the way that ARKit represents
these is by using ARFrames -- or
ARAnchors, sorry.
An ARAnchor is a relative or a
real-world position and
orientation in space.
ARAnchors can be added and
removed from your scene.
And they're used to basically
represent a virtual content
anchored to your physical
environment.
So, if you want to add a custom
anchor, you can do that by
adding it to your session.
It'll persist through the
lifetime of your session.
But an added thing is if you're
running things like plane
detection, ARAnchors will be
added automatically to your
session.
So, in order to respond to this,
you can get them as a full list
in your current ARFrame.
So that'll have all of the
anchors that your session is
currently tracking.
Or you can respond to delegate
methods like add, update, and
remove, which will notify you if
anchors were added, updated, or
removed from your session.
So that concludes the four main
classes that you're going to use
to create augmented reality
experience.
Now let's talk about tracking in
particular.
So, tracking is the ability to
determine a physical location in
space in real time.
This isn't easy.
So, but it's essential for
augmented reality to find your
device's position.
So not any position, but the
position of your device and the
orientation in order to render
things correctly.
So let's take a look at an
example.
Here I've placed a virtual chair
and a virtual table in a
physical environment.
You'll notice that if I pan
around it or reorient to my
device, that they'll stay fixed
in space.
But more importantly, as I walk
around the scene they also stay
fixed in space.
So this is because we're using,
constantly updating the
projection transform, or the
projection matrix that we're
using to render this virtual
content so that it appears
correct from any perspective.
So now how do we do this?
ARKit provides world tracking.
This is our technology that uses
visual inertial odometry.
It's your camera images.
It's the motion of your device.
And it provides to you a
rotation as well as a position
or relative position, of your
device.
But more importantly, it
provides real world scale.
So all your virtual content is
actually going to be to scale
rendered in your physical scene.
It also means that motion of
your device correlates to
physical distance traveled
measured in meters.
And all the positions given by
tracking are relative to the
starting position of your
session.
So one more function of how
World Tracking works.
We provide 3-D feature points.
So, here's a representation of
how World Tracking works.
It works by detecting features,
which are unique pieces of
information, in a camera image.
So you'll see the axes
represents my device's position
and orientation.
It's creating a path as I move
about my world.
But you also see all these dots
up here.
These represent 3-D feature
points that I've detected in my
scene.
I've been able to triangulate
them by moving about the scene
and then using these, matching
these features, you'll see that
I draw a line when I match an
existing feature that I've seen
before.
And using all of this
information and our motion data,
we're able to precisely provide
a device orientation and
location.
So that might look hard.
Let's look at the code on how we
run World Tracking.
First thing you need to do is
simply create an ARSession.
Because again, it's going to
manage all of the processing
that's going to happen for World
Tracking.
Next, you'll set yourself as the
delegate of the session so that
you can receive updates on when
new frames are available.
By creating a World Tracking
session configuration you're
saying, "I want to use World
Tracking.
I want my session to run this
processing."
Then by simply calling Run,
immediately processing will
happen.
Capturing will begin.
So, under the hood, our session
creates an AVCaptureSession --
sorry, as well as a
CMMotionManager in order to get
image and motion data.
We use the images to detect
features in the scene.
And we use the motion data at a
higher rate in order to
integrate it over time to get
your device's motion.
Using these together we're able
to use sensor fusion in order to
provide a precise pose.
So these are returned in
ARFrames.
Each ARFrame is going to include
an ARCamera.
So an ARCamera is the object
that represents a virtual
camera.
Or you can use it for a virtual
camera.
It represents your device's
orientation as well as location.
So it provides a transform.
Transform is a matrix or a
[inaudible] float 4 by 4 which
provides the orientation or the
rotation as well as translation
of your physical device from the
starting point of the session.
In addition to this we provide a
tracking state, which informs
you on how you can use the
transform.
And last, we provide camera
intrinsics.
So camera intrinsics are really
important that we get them each
frame because it matches that of
the physical camera on your
device.
This information like focal
length and principal point,
which are used to find a
projection matrix.
The projection matrix is also a
convenience method on ARCamera.
So you can easily use that to
render your virtual geometry.
So with that, that is tracking
that ARKit provides.
Let's go ahead and look at a
demo using World Tracking and
create your first ARKit
application.
[ Applause ]
So, the first thing that you
notice when you open new Xcode 9
is that there's a new template
available for creating augmented
reality apps.
So let's go ahead and select
that.
I'm going to create an augmented
reality app.
Hit Next. After giving my
project a name like MyARApp, I
can choose between the language,
which here I have the option
between Swift as well as
ObjectiveC as well as the
content technology.
So the content technology is
what you're going to use to
render your augmented reality
scene.
You have the option between
SceneKit, SpriteKit as well as
Metal.
I'm going to use SceneKit for
this example.
So after hitting Next and
creating my workspace, it looks
something like this.
Here I have a view controller
that I've created.
You'll see that it has an
ARSCNView.
So this ARSCNView is a custom AR
subclass that implements all the
rendering -- or most of the
rendering for me.
So it'll handle updating my
virtual camera based on the
ARFrames that get returned to
it.
As a property of ARSCNView, or
my sceneView, it has a session.
So you see that my sceneView, I
set a scene, which is going to
be a ship that's translated a
little bit in front of the world
origin along the z-axis.
And then the most important part
is I'm accessing the session --
I'm accessing the session and
calling Run with a World
Tracking session configuration.
So this will run World Tracking.
And automatically the view will
handle updating my virtual
camera for me.
So let's go ahead and give that
a try.
Maybe I'm going to change our
standard ship to use arship.
So let's run this on the device.
So after installing, the first
thing that you'll notice is that
it's going to ask for camera
permission.
This is a required to use
tracking as well as render the
backdrop of your scene.
Next, as you'll see, I get a
camera feed.
And right in front of me there's
a spaceship.
You'll see as I change the
orientation of my device, it
stays fixed in space.
But more importantly, as I move
about the spaceship, you'll see
that it actually is anchored in
the physical world.
So this is using both my
device's orientation as well as
a relative position to update a
virtual camera and look at the
spaceship.
[ Applause ]
Thank you.
[ Applause ]
So, if that's not interesting
enough for you, maybe we want to
add something to the scene every
time we tap the screen.
Let's try that out.
Let's try adding something to
this example.
So as I said, I want to add
geometry to the scene every time
I tap the screen.
First thing I need to do to do
that is add a tap gesture
recognizer.
So after adding that to my scene
view, every time I call the
handle tap method, or every time
I tap the screen, the handle tap
method will get called.
So let's implement that.
So, if I want to create some
geometry, let's say I'm going to
create a plane or an image
plane.
So the first thing I do here is
create an SCNPlane with a width
and height.
But then, the tricky part, I'm
actually going to set the
contents -- or the material, to
be a snapshot of my view.
So what do you think this is
going to be?
Well, this actually going to
take a snapshot or a rendering
of my view including the
backdrop camera image as well as
the virtual geometry that I've
placed in front of it.
I'm setting my lighting model to
constant so that the light
estimate provided by ARKit
doesn't get applied to this
camera image because it's
already going to match the
environment.
Next, I need to add this to the
scene.
So in order to do that, I'm
going to create a plane node.
So, after creating an SCNode
that encapsulates this geometry,
I add it to the scene.
So already here, every time I
tap the screen, it's going to
add an image plane to my scene.
But the problem is it's always
going to be at 000.
So how do I make this more
interesting?
Well, we have provided to us a
current frame, which contains an
AR Camera.
Which I could probably use the
camera's transform in order to
update the plane node's
transform so that the plane node
is where my camera currently is
located in space.
To do that, I'm going to first
get the current frame from my
SceneView session.
Next, I'm going to update the
plane node's transform
in order to use the transform of
my camera.
So here you'll notice the first
thing I do I actually create the
translation matrix.
Because I don't want to put the
image plane right where the
camera's located and obstruct my
view, I want to place it in
front of the camera.
So for this I'm going to use the
negative z-axis as a
translation.
You'll also see that in order to
get some scale, everything is in
meters.
So I'm going to use .1 to
represent 10 centimeters in
front of my camera.
By multiplying this together
with my camera's transform and
applying this to my plane node,
this will be an image plane
located 10 centimeters in front
of the camera.
So let's try this out and see
what it looks like.
So, as you see here again, I
have the camera scene running.
And I have my spaceship floating
in space.
Now, if I tap the screen maybe
here, here and here, you'll see
that it leaves a snapshot or an
image floating in space where I
took it.
[ Applause ]
This shows just one of the
possibilities that you can use
ARKit for.
And it really makes for a cool
experience.
Thank you.
And that's using ARKit.
[ Applause ]
So, now that you've seen a demo
using ARKit's tracking, let's
talk about getting the best
quality from your tracking
results.
First thing to note is that
tracking relies on uninterrupted
sensor data.
This just means if camera images
are no longer being provided to
your session, tracking will
stop.
We'll be unable to track.
Next, tracking works best in
well-textured environments.
This means we need enough visual
complexity in order to find
features from your camera
images.
So if I'm facing a white wall or
if there's not enough light in
the room, I will be unable to
find features.
And tracking will be limited.
Next, tracking also works best
in static scenes.
So if too much of what my camera
sees is moving, visual data
won't correspond to motion data,
which may result in drift, which
is also a limited tracking
state.
So to help with these, ARCamera
provides a tracking state
property.
Tracking state has three
possible values: Not Available,
Normal, and Limited.
When you first start your
session, it begins in Not
Available.
This just means that your
camera's transform has not yet
been populated and is the
identity matrix.
Soon after, once we find our
first tracking pose, the state
will change from Not Available
to Normal.
This signifies that you can now
use your camera's transform.
If at any later point after this
tracing becomes limited,
tracking state will change from
Normal to Limited, and also
provide a reason.
So, the reason in this case,
because I'm facing a white wall
or there's not enough light, is
Insufficient Features.
It's helpful to notify your
users when this happens.
So, to do that, we're providing
a session delegate method that
you can implement:
cameraDidChangeTrackingState.
So when this happens, you can
get the tracking state, if it's
limited, as well as the reason.
And from this you'll notify your
users.
Because they're the only ones
that can actually fix the
tracking situation by either
turning the lights up or not
facing a white wall.
The other part is if sensor data
becomes unavailable.
So, for this, we handle this by
session interruptions.
So, if your camera input is
unavailable due to -- the main
reasons being your app gets
backgrounded or maybe you're
doing multitasking on an iPad,
camera images also won't be
provided to your session.
In this case tracking will
become unavailable or stopped
and your session will be
interrupted.
So, to deal with this, we also
provide delegate methods to make
it really easy.
Here it's a good idea to present
an overlay or maybe blur your
screen to signify to the user
that your experience is
currently paused and no tracking
is occurring.
During an interruption, it's
also important to note that
because no tracking is
happening, the relative position
of your device won't be
available.
So if you had anchors or
physical locations in the scene,
they may no longer be aligned if
there was movement during this
interruption.
So for this, you may want to
optionally restart your
experience when you come back
from an interruption.
And so that's tracking.
Let's go ahead and hand it over
to Stefan to talk about scene
understanding.
Thank you.
[ Applause ]
>> Thank you, Mike.
Good afternoon everyone.
My name is Stefan Misslinger.
I'm an engineer on the ARKit
team.
And next we're going to talk
about scene understanding.
So the goal of scene
understanding is to find out
more about our environment in
order to place virtual objects
into this environment.
This includes information like
the 3-D topology of our
environment as well as the
lighting situation in order to
realistically place an object
there.
Let's look at an example of this
table here.
If you want to place an object,
a virtual object, onto this
table, the first thing we need
to know is that there is a
surface on which we can place
something.
And this is done by using plane
detection.
Second, we need to figure out a
3-D coordinate on which we place
our virtual object.
In order to find this we are
using hit-testing.
This involves sending a ray from
our device and intersecting it
with the real world in order to
find this coordinate.
And third, in order to place
this object in a realistic way
we need a light estimation to
match the lighting of our
environment.
Let's have a look at each one of
those three things starting with
plane detection.
So, plane detection provides you
with horizontal planes with
respect to gravity.
This includes planes like the
ground plane as well as any
parallel planes like tables.
ARKit does this by aggregating
information over multiple frames
so it runs in the background.
And as the user moves their
device around the scene, it
learns more about this plane.
This also allows us to retrieve
an aligned extent of this plane,
which means that we're fitting a
rectangle around all detected
parts of this plane and align it
with the major extent.
So this gives you an idea of the
major orientation of a physical
plane.
Furthermore, if there are
multiple virtual planes detected
for the same physical plane,
ARKit will handle merging those
together.
Then the combined plane will
grow to the extent of both
planes, hence the newer plane
will be removed from the
session.
Let's have a look at how it's
used as in code.
The first thing you want to do
is create an ARWorldTracking
session configuration.
And plane detection is a
property you can set on an
ARWorldTracking session
configuration.
So, to enable plane detection,
you simple set the plane
detection property to
Horizontal.
After that, you pass the
configuration back to the
ARSession by calling the Run
method.
And it will start detecting
planes in your environment.
If you want to turn off plane
detection, we simply set the
plane detection property to
None.
And then call the Run method on
ARSession again.
Any previously detected planes
in the session will remain.
That means they will be still
present in our ARFrames anchors.
So whenever a new plane has been
detected, they will be surfaced
to you as ARPlaneAnchors.
An ARPlaneAnchor is a subclass
of an ARAnchor, which means it
represents a real-world position
and orientation.
Whenever a new anchor is being
detected you will receive a
delegate call session didAdd
anchor.
And you can use that, for
example, to visualize your
plane.
The extent of the plane will be
surfaced to you as the extent,
which is in respect to a center
property.
So as the user moves the device
around the scene, we'll learn
more about this plane and can
update its extent.
When this happens you will
receive a delegate session
didUpdate frame -- or didUpdate
anchor.
And you can use that to update
your visualization.
Notice how the center property
actually moved because the plane
grew more into one direction
than another.
Whenever an anchor is being
removed from the session, you
will receive a delegate called
session didRemove anchor.
This can happen if ARKits merges
planes together and removes one
of them as a result.
In that case, you will receive a
delegate call session didRemove
anchor, and you can update your
visualization accordingly.
So now that we have an idea of
where there are planes in our
environment, let's have a look
at how to actually place
something into this.
And for this we provide
hit-testing.
So hit-testing involves sending
or intersecting a ray
originating from your device
with the real world and finding
the intersection point.
ARKit uses all the scene
information available, which
includes any detected planes as
well as the 3-D feature points
that ARWorldTracking is using to
figure out its position.
ARKit will then intersect our
ray with all information that is
available and return all
intersection points as an array
which is sorted by distance.
So the first entry in this array
will be the closest intersection
to the camera.
And there are different ways on
how you can perform this
intersection.
And you can define this by
providing a hit-test type.
So there are four ways on how to
do this.
Let's have a look.
If you are running plane
detection and ARKit has detected
a plane in our environment, we
can make use of that.
And here you have the choice of
using the extent of the plane or
ignoring it.
So if you want your user to be
able to move an object just on a
plane, you can take the extent
into account, which will mean
that if a ray intersects within
its extent, it will provide you
with an intersection.
If the ray hits outside of this,
it will not give you an
intersection.
In the case of, for example,
moving furniture around, or when
you only have detected a small
part of the ground plane, we can
choose to ignore this extent and
treat an existing plane as
infinite plane.
In that case you will always
receive an intersection.
And you can just use a patch of
the real world, but let your
users move an object along this
plane.
If you're not running plane
detection or we have not
detected any planes yet, we can
also estimate a plane based on
the 3-D feature points that we
have available.
In that case, ARKit will look
for coplanar points in our
environment and fit a plane into
that.
And after that it will return
you with the intersection of
this plane.
In case you want to place
something on a very small
surface, which does not form a
plane, or you have a very
irregular environment, you can
also choose to intersect with
the feature points directly.
This means that we will find an
intersection along our ray,
which is closest to an existing
feature point, and return this
as the result.
Let's have a look at how this is
done in code.
So the first thing we need to do
is define our ray.
And it intersects on our device.
You provide this as a CG point,
which is represented in
normalized image space
coordinates.
This means the top left of our
image is 0, 0, whereas the
bottom right is 1, 1.
So if we want to send a ray or
find an intersection in the
center of our screen, we would
define as CG points with 0.5 for
x and y.
If you're using SceneKit or
SpriteKit, we're providing a
custom overlay that you can
simply pass a CG point in a few
coordinates.
So you can use the result of a
UI tap over touch gesture as
inputs to define this ray.
So let's pass this point onto
the hit-test method and define
the hit-test types that we want
to use.
In this case we're using exiting
planes, which means it will
intersect with any existing
planes that ARKit has already
detected, as well as estimated
horizontal planes.
So this can be used as a
fallback case in case there are
no planes detected yet.
After that, ARKit will return an
array of results.
And you can access the first
result, which will be the
closest intersection to your
camera.
The intersection points is
contained in the worldTransform
property of our hit-test result.
And we can create a new ARAnchor
based on this result and pass it
back to the session because we
want to keep track of it.
So if we take this code and
would apply it to the scene here
where we point our phone at a
table, it would return us the
intersection points on this
table in the center of the
screen.
And we can place a virtual cup
at this location.
By default, your rendering
engine will assume that your
background image is perfectly
lit.
So your augmentation looks like
it really belongs there.
However, if you're in a darker
environment, then your camera
image is darker, and it means
that your augmentation will look
out of place and it appears to
glow.
In order to fix this, we need to
adjust the relative brightness
of our virtual object.
And for this, we are providing
light estimation.
So light estimation operates on
our camera image.
And it uses its exposure
information to determine the
relative brightness of it.
For a well-lit image, this
defaults to 1000 lumen.
For a brighter environment, you
will get a higher value.
For a darker environment, a
lower value.
You can also assign this value
directly to an SEN light as its
ambient intensity property.
Hence, if you're using
physically-based lighting, it
will automatically take
advantage of this.
Light estimation is enabled by
default.
And you can configure this by
setting the
isLightEstimationEnabled
property on an ARSession
configuration.
The results of light estimation
are provided to you in the Light
Estimate property on the ARFrame
as its ambient intensity value.
So with that, let's dive into a
demo and look how we're using
scene understanding with ARKit.
[ Applause ]
So the application that I'm
going to show you is the ARKit
Sample application.
Which means you can also
download it from our developer
website.
It's used to place objects into
our environment.
And it's using scene
understanding in order to do
that.
So, let's bring it right up
here.
And if I move it around here,
what you see in front of me is
our focus square.
And we're placing this by doing
hit-testing in the center of our
scene and finding on placing the
object at its intersection
point.
So if I move this along our
table, you see that it basically
slides along this table.
It's also using plane detection
in parallel.
And we can visualize this to see
what's going on.
So let's bring up our Debug menu
here and activate the second
option here, which is Debug
Visualizations.
Let's close it.
And what you see here is the
plane that it has detected.
To give you a better idea, let's
restart this and see how it
finds new planes.
So if I'm moving it around here,
you see it has detected a new
plane.
Let's quickly point it at
another part of this table, and
it has found another plane.
And if I'm moving this along
this table, it eventually merges
both of them together.
And it figured out that there's
just one plane there.
[ Applause ]
So next, let's place some actual
objects here.
My daughter asked to bring some
flowers to the presentation.
And I don't want to disappoint
her.
So, let's make this more
romantic here and place a nice
vase.
In that case, we again hit-test
against the center of our screen
and find the intersection the
point to place the object.
One important aspect here is
that this vase actually appears
in real-world scale.
And this is possible due to two
things.
One is that WorldTracking
provides us with the pose to
scale.
And the second thing is that our
3-D model is actually modeled in
3-D in real-world coordinates.
So this is really important if
you're creating content for
augmented reality that you take
this into account that this vase
should not appear as high as
building or too small.
So let's go ahead and place a
more interactive object, which
is my chameleon friend here.
[ Applause ]
And one nice thing -- thank you
-- and one nice thing is that
you always know the position of
the user when you're running
WorldTracking.
So you can have your virtual
content interact with the user
in the real world.
[ Applause ]
So, if I move over here, it
might eventually turn to me, if
he's not scared.
Yeah, there we go.
[ Applause ]
And if I get even closer he
might react in even different
ways.
Let's see.
It's a bit -- oh!
There we go.
Another thing that chameleons
can do is change their color.
And if I tap him, he adjusts the
color.
So let's give it a green.
And one nice feature that we put
in here is I can move him along
the table, and he will adapt to
the background color of the
table in order to blend in
nicely.
[ Applause ]
So this is our sample
application.
You can download it from the
website and put in your own
contents and play around with
it, basically.
So next, we're going to have a
look at rendering with ARKit.
Rendering brings tracking and
scene understanding together
with your content.
And in order to render with
ARKit, you need to process all
the information that we provide
you in an ARFrame.
For those of you using SceneKit
and SpriteKit, we have already
created customized views that
take care of rending ARFrames
for you.
If you're using Metal, and want
to create your own rendering
engine or integrate ARKit into
your existing rendering engine,
we're providing a template that
gives you an idea of how to do
this and provides a good
starting point.
Let's have a look at each one of
those, starting with SceneKit.
For SceneKit we're providing an
ARSCNView, which is a subclass
of an SCNView.
It contains an ARSession that it
uses to update its rendering.
So this includes drawing the
camera image in the background,
taking into account the rotation
of the device as well as any
[inaudible] changes.
Next, it updates an SCNCamera
based on the tracking transforms
that we provide in an ARCamera.
So your scene stays intact and
ARKit simply controls an
SCNCamera by moving it around
the scene the way you move
around your device in the real
world.
If you're using Light
Estimation, we automatically
place an SCN light probe into
your scene so if you use objects
with physically-based lighting
enabled you can already take
advantage or automatically take
advantage of Light Estimation.
And one thing that ARCNView does
is map SCNNotes to ARAnchors so
you don't actually need to
interface with ARAnchors
directly, but can continue to
use SCNNotes.
This means whenever a new
ARAnchor is being added to the
session, ARSCNView will create a
node for you.
And every time we update the
ARAnchor, like its transform, we
update the nodes transform
automatically.
And this is handled through the
ARSCNView delegate.
So every time we add a new
anchor to the session, ARSCNView
will create a new SCNNode for
you.
If you want to provide your own
nodes, you can implement
renderer nodeFor anchor and
return to your custom node for
this.
After this, the SCNNode will be
added to the scene graph.
And you will receive another
delegate call renderer didAdd
node for anchor.
The same holds true for whenever
a node is being updated.
So in that case, DSCNNodes
transform will be automatically
updated with the ARAnchors
transform and you will receive
two callbacks when this happens.
One before we update its
transform, and another one after
we update the transform.
Whenever an ARAnchor is being
removed from the session, we
automatically remove the
corresponding SCNNode from the
scene graph and provide you with
the callback renderer didRemove
node for anchor.
So this is SceneKit with ARKit.
Next, let's have a look at
SpriteKit.
For SpriteKit we're providing an
ARSKview, which is a subclass of
SKView.
It contains an ARSession, which
it uses to update its rendering.
This includes drawing the camera
image in the background, and in
this case, mapping SKNodes to
ARAnchors.
So it provides a very similar
set of delegate methods to
SceneKit, which it can use.
One major difference is that
SpriteKit is a 2-D rendering
engine.
So that means we cannot simply
update a camera that is being
moved around.
So what ARKit does here is
project our ARAnchor's positions
into the SpriteKit view.
And then render the Sprites as
billboards at these locations,
at the projected locations.
This means that the Sprites will
always be facing the camera.
If you want to learn more about
this, there a session from the
SpriteKit team, "Going beyond
2-D in SpriteKit" which will
focus on how to integrate ARKit
with SpriteKit.
And next, let's have a look at
custom rendering with ARKit
using Metal.
There are four things that you
need to do in order to render
with ARKit.
The first is draw the camera
image in the background.
You usually create a texture for
this and draw it in a
background.
The next thing is to update our
virtual camera based on our
ARCamera.
This contains setting the view
matrix as well as the projection
matrix.
Third item is to update the
lighting situation or the light
in your scene based on our light
estimate.
And finally, if you have placed
geometry based on scene
understanding, then you would
use the ARAnchors in order to
set the transforms correctly.
All this information is
contained in an ARFrame.
And you have two ways of how to
access this ARFrame.
One is by polling the current
frame property on ARSession.
So, if you have your own render
loop you would use -- well, you
could use this method to access
the current frame.
And then you should also take
advantage of the timestamp
property on ARFrame in order to
avoid rendering the same frame
multiple times.
An alternative is to use our
Session Delegate, which provides
you with session didUpdate frame
every time a new frame has been
calculated.
In that case, you can just
simply take it and then update
your rendering.
By default, this is called on
the main [inaudible], but you
can also provide your own
dispatch queue, which we will
use to call this method.
So let's look into what Update
Rendering contains.
So the first thing is to draw
the camera image in the
background.
And you can access the captured
image property on an ARFrame,
which is the CV Pixel Buffer.
You can generate Metal texture
based on this Pixel Buffer and
then draw in a quad in the
background.
Note that this is a Pixel Buffer
that is vended to us through AV
Foundation, so you should not
hold on to too many of those
frames for too long, otherwise
you will stop receiving updates.
The next item is to update our
virtual camera based on our
ARCamera.
For this we have to determine
the view matrix as well as the
protection matrix.
The view matrix is simply the
inverse of our camera transform.
And in order to generate the
projection matrix, we are
offering you a convenience
method on the ARCamera, which
provides you with a projection
matrix.
The third step would be to
update the lighting.
So for this, simply access the
Light Estimate property and use
its ambient intensity in order
to update your lighting model.
And finally would be to iterate
over the anchors and its 3-D
locations in order to update the
transform of the geometries.
So any anchor that you have
added manually to the session or
any anchor that has been
detected or that has been added
to plane detection will be part
of these frame anchors.
Then are a few things to note
when rendering based on a camera
image.
We want to have a look at those.
So one thing is that the
captured image that is contained
in an ARFrame is always provided
in the same orientation.
However, if you rotate your
physical device, it might not
line up with your user interface
orientation.
And a transform needs to be
applied in order to render this
correctly.
Another thing is that the aspect
ratio of the camera image might
not necessarily line up with
your device.
And this means that we have to
take this into account in order
to properly render our camera
image in the screen.
To fix this or to make this
easier for you, we're providing
you with helper methods.
So there's one method on
ARFrame, which is the Display
Transform.
The Display Transform transforms
from frame space into view
space.
And you simply provide it with
your view port size as well as
your interface orientation, and
you will get an according
transform.
In our Metal example, we are
using the inverse of this
transform to adjust the texture
coordinates of our camera
background.
And to go with this is the
projection matrix variance that
takes into account the user
interface orientation as well as
the view port size.
So you pass those along with
clipping planes limits and you
can use this projection matrix
in order to correctly draw your
virtual content on top of the
camera image.
So this is ARKit.
To summarize, ARKit is a high
level API designed for creating
augmented reality applications
on iOS.
We provide you with World
Tracking, which gives you the
relative position of your device
to a starting point.
In order to place objects into
the real world, we provide you
with Scene Understanding.
Scene Understanding provides you
with Plane Detection as well as
the ability to hit-test the real
world in order to find 3-D
coordinates and place objects
there.
And in order to improve the
realism of our augmented
content, we're providing you
with a light estimate based on
the camera image.
We provide custom integration
into SceneKit and SpriteKit as
well as a template for Metal if
you want to get started
integrating ARKit into your own
rendering engine.
You can find more information on
the website of our talk here.
And there are a couple of
related sessions from the
SceneKit team who will also have
a look at how to use dynamic
shadows with ARKit and Sprite
and SceneKit as well as a
session from the SpriteKit team
who will focus on using ARKit
with SpriteKit.
So, we're really excited of
bringing this out into your
hands.
And we are looking forward to
see the first applications that
you're going to build with it.
So please go ahead and download
the sample code, the sample
application from our website.
Put your own content into it and
show it around.
And be happy.
Thank you.
[ Applause ]