WWDC2018 Session 236

Transcript

[ Music ]
>> Hello. My name is Chris
Fleizach and let's talk about
making iOS talk using
AVSpeechSynthesis.
So our agenda for today.
What and why is
AVSpeechSynthesis?
Let's talk about some of the
basics.
We'll talk about choosing the
right voice, some properties
that are available like rate,
pitch, and volume, and finally
attributed strings.
AVSpeechSynthesis is an API for
generating computer synthesized
speech on your iOS devices.
It has many uses.
For example, you might want to
post announcements within your
app.
You might be creating an
interface that's not meant to be
looked at.
Or you might be creating an
education app where having
synthesized speech provides a
good reinforcement for learning
materials.
One example, providing audio
updates during a workout can be
communicated effectively using
synthesized speech.
A note about synthesis and
accessibility.
So speech synthesis is a
powerful tool for helping many
users with many disabilities.
For example, cognitive users can
get reinforcement about the
output that they are
experiencing.
Users who have trouble
vocalizing speech can use
synthesis to generate speech for
them.
Non-sighted users use speech
synthesis to consume their
interfaces.
However, it's important to note
that it is not a replacement for
VoiceOver or other screen reader
technologies.
For example, speech can overlap
with what VoiceOver is speaking
at the same time.
And most of the speech won't be
available to any devices that
are connected such as a Braille
device.
Instead, you should make your
app accessible using the
UIAccessibility API.
Okay. So let's get to the
basics.
First step, create an
AVSpeechSynthesizer.
You can use that with this code
snippet.
One thing to note is that you
want to make sure that it's
retained until speech is done.
If the synthesizer goes out of
scope, any synthesis in-flight
will be canceled.
Now that we've created a
synthesis, the next job is to
create an utterance.
We can then dispatch that to the
synthesizer.
So, in this example, we create
an utterance with the string
hello and then we dispatch it
using the speak method.
A note about audio sessions.
So when AVSpeechSynthesis is
activated using speak, the audio
session will automatically set
to active.
If you want to mix with other
audio, you can use the
setCategory with options on your
shared AVAudioSession mixed with
others.
If you wanted to duck other
audio so that your speech is
primary but then other audio
becomes lower in volume, you can
use the duckOthers option.
Another note is that the audio
session will not be set to
inactive after speech is done.
That's because it's a shared
session only so that if there is
any other audio playing at the
same time, we would not want to
stop that.
So if you want to set the audio
session to inactive, you would
do that yourself.
There are some callback methods
available in AVSpeechSynthesis
that will help inform about the
life cycle of an utterance.
These are optional methods in
the synthesizer delegate
protocol.
For example, you can know when
speech starts, speech finishes,
even the character ranges that
will be spoken.
You can also know when speech is
paused or continued.
So an example of this in code
snippet form, you can set your
synthesizer delegate to an
object and then implement these
methods, for example the
didStart in which case you'll
get the utterance back, the
didFinish which will do the
appropriate thing, and then the
willSpeakRangeOfSpeechString
which will give you back an
NSRange which then you can
convert into a string range to
use that in your string.
Now a little demo.
>> Hello, my friends.
I'm inside the iPhone.
>> Great. Those are the basics
of AVSpeechSynthesis and, as you
can see, it's very simple to get
off the ground.
The next topic is how do you
choose a right voice?
There are many built-in voices,
one for each supported language.
Something to note is Siri voices
are not available through this
API.
Users can, however, download
higher-quality voices.
And when those are downloaded,
they will appear in the list of
voices.
So you can select a voice using
either an identifier or a
language.
If you select a voice by
language, it will select the
user's default voice.
So how do we do that?
First you make your utterance
and then you set the voice
property using
AVSpeechSynthesisVoice.
As an example of using the
identifier initializer, you
could get all the speech voices
which is an array of speech
voice objects and then choose
the first one, for example, and
then pass in the identifier
property.
Here are some of the languages
that are supported on iOS by
default.
Now let's talk about some of the
properties that you can use on
AVSpeechSynthesizer.
Speech rate can be controlled
using the rate property, and it
goes from 0 to 1.
Now this will scale speech to go
slower or faster.
Speech will be scaled from about
0 to 1 with values from 0 to .5.
So that means if you pass in .5,
you will get the default normal
speaking rate.
If you want to go faster, you
can pass in values from .5 to 1
and that will scale speech from
normal speaking rates at 1x all
the way up to 4x.
To do that, you can create a
speech utterance and then set
the rate to a float value or you
can use some of these constants
which are default rate maximum
or minimum.
Let's talk about pitch and
volume.
These are also properties you
set on the speech utterance.
Pitch controls how high the
voice is or how low.
Volume controls the volume of
the voice itself.
So, for example, we can set
pitch to a very high pitch voice
using a one value.
We can set volume to a lower
volume using .25.
Note that lowering the speech
volume does not affect the
system volume.
Finally, let's talk about
attributed strings.
Attributed strings allow us to
customize how speech sounds
using different attributes.
One available attribute that I'd
like to discuss is IPA notation.
IPA stands for International
Phonetic Alphabet, and it allows
us to customize how specialized
names, nouns, and other things
are pronounced.
It's available in these
languages.
And here is an example of what
it might look like.
To pronounce the iPhone in the
way that we are used to, you
would use this IPA notation.
Now the obvious question is how
do you generate this IPA
notation?
It is not easy to type with a
keyboard.
One way to do that is to use the
pronunciation editor in
accessibility settings.
So when you go to the settings
app, go into accessibility and
speech, you will find
pronunciations and find a screen
like this.
You can enter in the phrase that
you want to find the correct
pronunciation for, tap the
microphone button, and then
speak it.
After you're done, you'll be
presented with options on
possible variations.
As you tap those, it will say
those out loud.
You can choose the one that
sounds correct and then copy
this value that you get into
your code.
So let's see how that's done.
First, we make an attributed
string then we can add the
attribute for the speech IPA
notation using the value that we
got from the settings.
Finally, we can create a speech
utterance with the attributed
string.
So in summary, AVSpeechSynthesis
is a great way to augment your
app experience if you add speech
at the right time.
Multiple languages and voices
are available and can be
dynamic.
Finally, you can customize
pronunciation for the way words
sound using the IPA notation
attribute.
For more information, visit the
session URL and thanks for
watching.