WWDC2003 Session 404
Transcript
Kind: captions
Language: en
hi everybody my name is Xavier logo and
I'm the Mac os10 evangelist in developer
relations and I like to welcome you to
session for viewer for any code for
Japanese Chinese and everything else I
to look at the title is kind of long
before we start with the station where
we have great content we're going to
give you an update on what we've been
doing with regards to unicode int enter
I like to take just a moment because
yesterday night I don't know if you want
but we had the meet the evangelist event
downstairs and I represent a lot of
technologies on Mac OS 10 to developers
but most of the time I get a lot of
questions on well why should I use in E
code why why are the developer should
really focus on using at three or using
the cocoa layout engine on using MLT why
do I need to use in equal in my
application and that was very
interesting because it seems still a lot
of developers don't understand all the
benefits with nickel we've been talking
about it since the beginning of Mac OS
10 we had sessions over last three years
but let me give you just a quick rundown
on why as a developer you should really
focus and use unicode as much as
possible in your application first if
you're a silica developer and you're
developing an application for China
Japan or Korea you just have to do it
now Apple has been investing a lot of
money and a force in your Japanese to
port for instance and the only way you
can take advantage of some of the
features we're going to be talking today
like for instance accessing the 32,000
glyphs that we have in here agonal is by
using Unicode but then second it's
important for you to understand that
unicode is we're actually after is
putting all its efforts we're not doing
any more things we not supporting any
new languages in word script and for you
you have to see whether the way of
evolving your application so please take
good note of all the content of the
stations we have brand new features for
customers in the surgical countries and
for that I like to invite on stage Deb
regardless who's the manager of a soft
and is unicode years 0 with the yukos
construction ago
Thank You job yay good afternoon
everyone today we're going to talk about
unicode in Mac OS 10 so here's a quick
introduction to what we're going to be
discussing today the market in Japan and
China has changed in an order for your
application to be competitive there you
need to support Unicode luckily for all
of you Mac os10 has great unicode
support and we'll talk today about the
tools that are available for your
application specifically we'll be
discussing how governments and customers
in Japan and China are asking for new
features and new characters in
particular we're going to discuss why
only unicode can meet those new
requirements I'll will talk about some
great new features in Panther they're
only available through Unicode and to
unicode applications I will talk a
little bit about how unicode is
different from world script from what
you may have been doing before and what
you need to do in your application in
order to work with unicode so there are
a lot of reasons to move to unicode for
the Japanese and Chinese markets but
here's the biggest problem by far and
that is the customers are demanding more
characters well why is that in order to
give you if you're not literate in
Japanese or Chinese in order to explain
by analogy suppose that your name is
Smith but you spell it s my th e now you
may go to a website maybe at amazon.com
and you want to enter your name to give
an order but when you do that the
website comes back and says I'm sorry in
order to use this website you have to
spell your name FM i th well that's
pretty bogus you think I should have to
change the way I spell my name in order
to use this website well that's exactly
the situation that many customers in
Japan and China find themselves in and
the reason is that the number of the
variety of characters that people use
to write their names is much larger than
what has traditionally been in the
Japanese and Chinese character sets on
Mac OS so I have an example here this is
five different ways you see if I can use
the laser pointers okay that's kind of
tiny but does that third line there is
five different ways of writing the
Japanese family name Watanabe and you
may not even think that some of them
even look different but all of them are
different and there's actually a lot
more ways than that for writing that
family name and so customers not
surprisingly want to be able to write
their name the way they write it on
everything else they don't want to have
to fall back to some different
standardized form of the character when
they're using a computer and this
doesn't just affect people named for
example using the mac traditional
chinese character set it's not possible
to write the names of all the subway
stops in hong kong or even the name of
the new international airport there so
it's a big problem there's just not
enough characters for what customers
want to do with computers today
governments are also specifying more
characters all of the major character
set standards in japan and china have
been revised over the last few years
there's new versions of all of these
character sets and they all specify many
more characters than were in those
character set standards before on some
of them are not just specifications in
particular the GBA Tino 30 character set
in china and HK SCF character set in
hong kong our government requirements
the government requires that software
support these character sets so in order
to meet all these new requirements we
have to support more characters the
problem is that the world script system
which has been in Mac OS for a long long
time can't support any more characters
it has limitations and it just can't
support the number of characters that
are needed
so the answer is unicode unicode is an
industry standard it's one encoding that
handles all the living languages in the
world today and a large number of dead
ones besides because it's a single
encoding a character is a character as a
character the meaning of the character
doesn't change depending on what font
you have if you've been used if you've
used Japanese or Chinese or Korean or
what have you on Mac OS 9 you might have
been in the situation where you use the
wrong font and you see something like
that garbage characters anybody like to
guess what that really is it's Korean
which would know that unless you chose
the right font with Unicode this doesn't
happen because the meaning of the
character doesn't change depending on
the font Unicode solves the character
problem because it's got plenty of room
for all the characters that customers
and governments need the latest version
of Unicode four-point-oh which was just
released a couple of months ago has over
96,000 graphic characters so that easily
covers all the needs that customers have
and it covers all of the new Asian
character set standards let's talk a
little bit about what kind of unicode
support we have in Mac OS 10 our main
human interface fonts is lucida grande
and in Panther it now covers all of the
Roman characters in Unicode and all of
the Greek characters and it all it
covers several other scripts besides our
other core Roman fonts like times in
Helvetica also have a large Roman
repertoire although they don't cover the
entire set of Roman characters in
Unicode but beyond this we've got lots
of greats on coverage in Mac OS 10 our
Japanese support is outstanding we have
six beautiful japanese desktop
publishing fonts the family name is hit
again oh they're in type 1 formats
and as you can see they're really
beautiful there's an example up there on
the screen and these funds have greater
character coverage than any other
Japanese fonts on the market today and
they cover all of the major standards
that you might be interested in for the
japanese market not just just 213 but
also adobe japan 15 characters that are
used for photo typesetting and even the
complete set of government shaped
recommendations from the national
language committee but we don't support
just Japanese we also have great Chinese
support mac OS 10 since jaguar has had
support for the chinese GBA Tino 30
standard all these are also beautiful
fonts there's another example up there
the GBA Tino 30 fonts have over 32,000
glyphs and support all of the Chinese
characters in plane 0 of Unicode every
single one as well as minority languages
like ye for Mac os10 Panther we're
adding support for HK SES from Hong Kong
and big 5e from Taiwan for traditional
Chinese and these new fonts have over
22,000 characters but we support even
more languages also new for Panther
always extended our arabic coverage it
doesn't cover all of the arabic and
unicode but we cover a bigger chunk than
we did before and support more languages
we support some of the scripts you see
here that we also supported in Jaguar
we've also added support for native
North American languages like Inuktitut
and Cherokee and we've also added a new
font that we built ourselves to cover a
lot of the symbol blocks and Unicode and
I won't read all these off but you can
see that we have a lot more symbol
coverage that we had in then than we had
in earlier releases of Mac OS 10
okay this time I run but we haven't just
added fun we've also made other
improvements to our international
support as you may have heard of at the
State of the Union session yesterday for
Mac os10 unico text drawing is much
faster over twice as fast as it was in
Jaguar we've also improved are
bi-directional support you no longer
have to specify to the system whether a
paragraph is left to right a right to
left it will figure it out heuristic
alee so your users can just type and it
will determine whether it's a right to
left or left to right paragraph and get
put the punctuation in the right place
we've updated our by ty algorithm to the
Unicode four-point-oh standard and we
we've also made several bug fixes now I
should mention that the seed that you
folks have received doesn't have all the
latest stuff that we're working on so
you may not see some of this until you
get the GM release of Panther we've also
made some fixes to our support for indic
languages in 10 point two point four we
introduced dictionary based Thai word
break but with panther we're now adding
the ability for users to specify their
own dictionary and supplement the
built-in one so users can now put their
own list of Thai words in and that will
affect word break in every application
in the system and finally Apple supports
16 languages for localization and we
haven't expanded that list in Panther
however however we've always had a
longer list of languages than the ones
we support ourselves so that you the
developers can localize your languages
in two languages we don't support just
as an example i noticed that somebody
released a serbian localization for
safari we don't support serbian
ourselves because of the extra languages
that are available you can do that if
you want to and we've expanded that list
for Panther
well Arabic and Hebrew and Japanese and
Chinese are fine but you say if I don't
have a new new support in my application
my company is toast what are you going
to do well it's not a problem because
Mac os10 allows you to add support for
new languages yourself there's two easy
steps all you need to do is add a font
and we have a font developer website
that you can go to to find out how to
build fonts and how to enable them to
work with mac OS 10 and then you need a
way for users to input your new language
and you can do that either via keyboard
or be an input method and we have a tech
note on how to do keyboards and we have
sample code for input methods so rest
assured you can add a new new support
yourself ha nu nu is a philippine
writing system by the way in case you
are wondering we've made improvements at
the api level in jaguar we introduced
the variant glyph access protocol and
the reason that we have that is that
even though unicode has over 96,000
characters there are some ways of
writing Chinese ideographs where you
have different variants even though
they're considered the same Unicode
character there are still slightly
different ways of writing it and that's
actually very similar to what you see in
Roman font you can have Roman funds that
have different ways of writing the same
character some of the funds that we
include in the system like Zaffino or
Apple Chancery have multiple versions of
the same letter that are useful in
different situations so the variance
lift access protocol lets you specify
exactly which variants of the character
you want or lets the user specify that
in Panther we're also introducing an
extension to the text services manager a
new protocol that gives an input method
access to the entire contents of a
user's document and that lets the input
method do two things
let's it give much better accuracy for
conversions and it also supports some
new human interface features and we'll
actually see that a little later on and
our Japanese input method cotati takes
advantage of this we've also made
extensions to the font panel to allow
access to a lot more of the capabilities
that spawns have that but have that have
been hidden up until now so you can now
get it those through the font panel and
that information can be passed to your
application we've also improved the
infant menu for those of you who are not
familiar with it the input menu is the
little menu that looks like a flag or at
least it did in Jaguar that you get when
you have more than one keyboard layout
enabled or a few of input methods or so
on and so forth we've greatly
streamlined it and improved the human
interface there's no longer a pencil
menu the pencil menu is specific to each
input method but we've taken the
contents of the pencil menu and merged
it into the input menu so now there's
just one menu and input methods that
have been revised to take advantage of
this new human interface kind of a much
more streamlined UI and have their modes
appear individually so that instead of
having to choose the input method and
then choose the mode you can just go
straight to the mode you want of course
older input methods continue to work
flawlessly and transparently there's no
need to revise the an input method
unless you want to we've also made
improvements in our input methods
themselves as I mentioned cody has much
better accuracy and a better interface
and with lots of new features which
we'll see in a moment in ten point two
point four we introduced a new
traditional chinese input method han in
which has much easier input for
traditional chinese for panther we've
expanded our simplified chinese input
methods to allow access to all of the
characters in gb 18 0 30 and that's a
lot of them as i mentioned our gb 18 0
30 fonts have over 32,000 glyphs and
finally we've added more
playing keyboards for more language
support and now to show you some of
these new features I'd like to bring up
yeah so kita and Michael Grady for a
demonstration Michael we can switch to
demo machine 3 please hello everybody
I'd like to give you a brief demo of the
UI improvements we've been making to the
text input menu for Panther this is not
a panther machine this is Jaguar I would
just wanted to go over some of the
problems we try to solve with the input
menu in Jaguar so you have the US flag
so one problem we found from our focused
user group study was discoverability was
a big problem non Mac users had no idea
that this menu contains anything to do
with input and that many of them could
not figure out how to switch from us or
a Roman keyboard layout to the Japanese
input so it was clear implanted that we
had to improve the icons we use
throughout the system another problem
with this from this menu is its location
as you switch between applications with
various of their various menu bar
lengths the icons will tend to follow
around tagging along at the end of the
menu bar and that can be distracting to
the user and a third problem is that the
presence of the menu itself in the menu
bar can sometimes interfere with the
absentee list and sometimes even clicked
portions of it off so it's clear that we
had to bring down the amount of real
estate that we use in the menu bar so
let's switch over to machine number two
please and show what things look like on
Panthers
here is an icon you'll notice is only
one of them and are we there yet no
machine number two please there we are
there's the icon I was speaking about so
much more obvious intuitive to users
that are there might be access to
additional input modes or input sources
in there and it's in the right side of
the menu so it will not follow along at
the end of the apps menu list and cause
distraction to the user let's see how
this works how can we take what used to
be implemented is two menus into one and
here's the answer you'll notice that we
have a number of input sources here and
you wonder what what are those they're
not keyboard layouts they're not
implemented they are the input mode
implemented by a particular input method
in this case Apple's Japanese
implemented photo/aaron they're all
there they all belong to the same input
method and the would be referred to in
the past of the pencil menu the second
menu in the Jaguar menu bar is flattened
into this menu right here now it's
interesting that these input modes have
become full full and first-class input
sources the same level as what
implemented used to be and keyboard
layouts they are the preferred input
source that the user should see and in
any system UI they will be shown along
side by side with those other input
sources at that level another system
provided you I is a new palette
reminiscent of the palettes the input
mode palettes provided by the input
methods themselves in the past and
lastly we can bring up the international
press and have a look at the
improvements there before we get into
that one lecture notes that for those
applications they have particularly
large menu lists or if you just don't
like the input menu around and
or you can come and drag you out of the
way and that can simply be reinserted by
this checkbox in the press panel you'll
notice the hierarchical nature of input
methods and how they advertise input
modes letting the user choose subjects
of input modes they would like in the
menu and you'll also notice the u.s.
layout by the way still has the old flag
icon but this is being changed actively
we didn't have it ready for this demo it
can be inserted in the menu and removed
which is not something we could do in
the past in the past whether or not you
are using input mode specific to a
single input method the u.s. layout of
the Roman default layout always showed
up in a menu and it can now be removed
the input method itself can be disabled
but the modes that would be enabled if
you were to reactivate it also show up
you might be wondering about the
existing input methods and what
compatibility we have with those you'll
notice that they are fully supported
completely transparent choose that input
method and you'll notice that he
implemented specific pencil menu shows
up here automatically the implement that
did not have to change but of course we
want to encourage input method
developers to adopt this new input mode
protocol to give the users the benefit
of a single user interface for choosing
input modes and that's what I have for
the texting food menu and I would like
to bring up some yes lo quita will
discuss improvements in our Japanese
input method thank you
[Applause]
hello everybody I'm very glad to be here
with you because I'm very excited about
the great improvements we are making for
ponder the one of the big change is is
the texting food menu Michael
destination and I will tell you about
i'll show you katoue for there are three
major new features in qatari one of them
is very high conversion accuracy we've
been continuously improving the
container convergent accuracy since my
question 10.1 and we believe we achieved
a milestone with this view you only we
improved the engine itself we apply the
new technology called when latent
semantic mapping or lsm in order to
resolve the the class of ambiguity which
no other input method can right now
which is to find out a topic of the
document you have say consider our word
hot if you talking about summer the hot
property means about temperature but if
you're talking about Thai food for
example it's probably about 40 hot this
it's like that I'll show you how it
worked
say the document in the left hand side
says talking about the Jazz Fest boy
says Monterey Jazz Fest ball is all this
just visible in the world which is true
the the documents at the right hand side
says Boston Martin Boston Marathon is
the one of the most all this Martin in
the world and the in Japanese both
player and runner are pronounced the
same OSHA and when they entered in a
different sentence like this the
traditional input method couldn't
resolve those ambiguities but in case of
this new in a hotel for it can look at
this context and find out the correct
conversion for each cases social here in
Scioscia here take over copper please
please look at this first chapter this
means play and at the right hand side
this play needs one it converts the word
correctly depending on the context of
the document the other improvement we
are making 403 is you I to correct
convergent errors and typical mistyping
the first one is the conversion which is
high handgun in Japanese when you find
incorrectly their cases you find
incorrectly converted one in the
document and you can place a customer
and you can get a conversion candidate
window for that word the second one is
the rare cases you you type too fast and
you confirm the text before you
mentor and before you had to type they
all the characters and attack everything
but now you can get back to to the
previous state like you get back to the
the active state the last one is renew
its often it at least happened to
happens to me I start typing in an
incorrect mode for example i start
typing English in Japanese mode or I
start typing Japanese in English mode it
can correct those cases and if you have
a japanese keyboard all those can be
done by double typing the Kentucky say
oh say you don't like this the last call
version click and dug up that will type
the Kentucky oh yeah it gets you the
candidate window
[Applause]
say you confirm the text before you
meant to buy tub double typing the
collective you can get past the
conversion state and the last one is
what what's that okay yeah hey you start
typing konichiwa and you suddenly
noticed that oh this is a row incorrect
look you typed Kentucky two times and
you can continue typing just stirred
thank you the third feature is we put an
msme compatible mode for those people
who switch from windows and comfortable
with using msme compatible keystrokes
and also those who are using to
environment but can fold and so that
they they want to the same keystroke
between mark and windows and please know
that many of those features require your
help many of those features uses three
you utilizes the document access
protocol the new API so in order to
provide a constant user interface for
your customers you need to you need to
adopt those api debra will mention those
in detail now yeah return the tops off
to Debra thank you
okay so those are great new features and
as Kinison mentioned we need your help
in order to make them available in all
applications there are even more
improvements at the API level that I'll
go into now one big one that people have
been asking us for people who have
converted their carbon applications to
unicode have found that there's a sticky
point there is no there has been no
support up till now for formatting or
parsing dates times and numbers in
Unicode if you wanted to do that in a
carbon unicode application you had to
use the old script manager api's and
then convert the text to unicode well in
Panther were introducing a new set of
API is in core foundation that lets you
format in parts dates times and numbers
CF locale CF date formatter and CF
number formatter so now you can have a
totally Unicode application for both for
formatting and parsing and also for
sorting we're now supporting many more
locales for Unicode and the reason we're
able to do that is that we're taking
advantage of an open source library
called itu or international components
for unicode which is now part of panther
now we're not yet allowing applications
to access this library directly the
reason is we want to make sure that
we're capable of supporting binary
compatibility from released to release
but that is something we're looking at
so that may become available to your
applications in future releases another
side benefit of using ICU is that our
coalition is three times faster than it
was in mac OS and jaguar so sorting
applications will get much faster so now
let's talk about what you need to do in
your applications and we'll go through
several different cases if you have a
Coco application you're already in
pretty good shape if you use the coco
tech system
there are a couple of special things you
need to watch out for some applications
some cocoa applications have had their
own typesetter classes and they've made
those classes subclasses of the
applicant class in a simple horizontal
typesetter well the problem is that that
app kid class doesn't support the
advanced layout features it doesn't
support bi-directional text and it's
basically obsolete so if you subclass
that class you'll wind up with your own
typesetter class having the same
problems in Panther there's now a new
public typesetter class that you can sub
class is called n shts typesetter and if
you use that class your application will
have all the same features that are
built into the cocoa tech system another
thing that you have to watch out for is
when you save attributed text a lot of
the information that implements the new
features that we've been showing things
like variant glyphs or font features and
font capabilities those are saved saved
as attributes on text so if you save
attributed text yourself and you
enumerate what you think is the complete
set of attributes you might lose this
information when you save it to a
document so it's important when you save
attributed text to save all of the
attributes so that information that the
user enters like a particular variant
glyph that they used to write their name
doesn't get lost when they save and then
reopen the document things are pretty
easy for carbon applications to
especially if they're using MLT the
multilingual text engine or the new h i
text view which is based on ml 2 e that
makes things pretty easy because all of
this stuff is supported by M LTE and you
don't have to do very much you use CF
string for your Unicode tech storage
support the font panel and allow access
to advanced font features and you're set
it's pretty easy
a lot of you for historical or
performance or what-have-you reasons I
have your own custom text engine that's
necessary for your specific application
and in those cases things are a little
bit harder but it's still possible to
support all these features and we'll go
through how you need how you can do that
in your application you still need to
store your texas unicode because many of
these new features are only available to
unicode applications you can use either
CF string the core foundation class for
Unicode text or you can just store an
array of 16-bit unit cars either way
works if you have unico text you need to
draw your text using a Unicode text
drawing API and for carbon that means
Atsui fortunately as i mentioned in
Panther Atsui is over 2 times faster so
there's really no reason not to use that
c4 unico text drawing in your
application for input of Unicode text
you need to use the text services
manager and if you were already
supporting japanese or chinese input
methods you're probably already using
TSM one thing that's new for Panther is
that new features like the document
content access protocol are only
available via carbon events nazia the
Apple events that we also supported in
the past so if your application is using
Apple events to interact with TSM you'll
have to move to carbon events in order
to take advantage of the latest features
and once you support TSM there's
basically three categories of
interaction that you need to worry about
I want to supporting the active area
which has always been true for input
methods another is the new document
content access protocol and we'll talk
in more detail about that in a moment
and the final one is supporting input
and storage of variant glyphs and we'll
also talk about that and finally as for
the other for the easier approach to
carbon applications you want to support
the font panel so that users have access
to
all the capabilities that fonts have to
offer so before I go into a little bit
more detail I want to give a quick
review of what it is about Unicode that
makes it a little bit more challenging
to implement an application it's quite
different from the world script approach
that you might be used to the most
important concept for Unicode is what's
called the character glyph model and it
makes a distinction between characters
and glyphs you can think of characters
as the form of language that's spoken
it's the semantic content it's the way
you would speak the language glyphs on
the other hand are the shapes that show
up on the printed page or that you see
on a display monitor and you can think
of them as the written form of the
language now usually there's a very
direct correspondence between the spoken
form in the written form but that's not
always the case it's certainly not the
case for complicated writing systems
like Arabic or Indic languages but there
are even cases in English and Japanese
where there is not a direct one-to-one
relationship between characters and
glyphs and it's the job of a Unicode
text rendering engine like Atsui or
cocoa texts to Matt between characters
and glyphs here are a few examples that
show why it says that's a challenging
problem the first line is Hindi and in
Hindi between the characters and the
glyphs things move around and in fact
some of the things that are independent
characters when they're rendered is
glyphs wind up as decorations on other
glyphs so there's both rearrangements
and formation of clusters and ligatures
the second line is Arabic and as we all
know Arabic is a right-to-left language
and so the characters and the glyphs are
in opposite orders but beyond that
Arabic is also a cursive writing system
and so the glyphs flow together to form
ligatures and you can't really map
directly between characters and glyphs
there's ordering and ligature issues
that you have to deal with but even for
Roman
here's an example where we have the word
resume and the E with an acute accent is
stored in character space as an e with a
combining acute accent and when that's
drawn that has to become an accented e
so there's an example of where in a
Roman language there isn't a
straightforward mapping between
characters and glyphs so what are some
of the problems that you can run into in
an application if you don't keep the
character glyph model in mind well one
thing that's particular to unicode we
all think of Unicode as a 16-bit
character set whoops okay I didn't press
the bad button so
hmm there we go okay we think of Unicode
as a 16-bit character set but I
mentioned that earlier that there is
over 96,000 characters in the latest
version and a little arithmetic shows
that you can't fit that in 16-bit so
what we think of as the 16-bit version
of Unicode is called plain 0 or the
basic multilingual plane and that's
where all the commonly used characters
go but Unicode also supports a lot of
rare and less commonly used characters
and those are allocated in claims 1
through 16 and in order to represent
those characters in your text you need
to use two 16-bit values that's called a
surrogate pair and there's an example
that's from our hitachino font it looks
like like any other idea graphic
character but it's stored as two 16-bit
values because it comes from plane to of
Unicode so that's one issue you have to
worry about as we saw in the previous
slide you can have composing sequences
where multiple characters in the Unicode
sense form a single care what the user
thinks of as a single character so the
base character II with a combining acute
accent is one example there's lots of
other combining marks like that there's
clusters and index there are ligatures
in Arabic and in English for Korean
there are Jomo's that come together to
form Hangul and so forth and so on so
there's really not a direct one-to-one
relationship between characters and
glyphs in addition unicode also has
multiple ways of doing the same thing so
in the last slide we saw the e with
combining acute accent but unicode also
has a single character that's an E with
an acute accent and there that's mostly
for historical reasons and for
compatibility with earlier character set
standards and there are a lot of cases
like that so there are often multiple
you can think of them as spellings for
the same string of text it can be
represented in Unicode in multiple ways
so here's one example on the left I have
Korean Hangul and on the right i have
the 3g ahmo's that make up that Hangul
they're both equally valid ways of
representing the text so of course that
makes things like comparison and
searching a little bit more challenging
and finally for more complicated writing
systems you have issues of
directionality languages like Arabic and
Hebrew go right to left you can have
them in the same paragraph with
texticles left-to-right languages the
whole index family of languages has
rearrangement where characters move
around come when you write them compared
to when you speak them and so that the
glitz and the characters are really you
can't count on them being in the same
order at all and that doesn't just
affect the order of glyphs within a
style run it also affects the order of
style runs within a paragraph so if you
have a paragraph of mixed English and
Arabic or English and Hebrew text whole
style runs can move around and you
really need the system's help to figure
out where everything belongs so
fortunately so that you can avoid these
problems in your application we have
lots of api's in the system that you can
use to make sure you do the right thing
in terms of figuring out where
characters begin and end there are lots
of system api's for finding text
boundaries not check just characters but
also clusters words lines and paragraphs
there are api's and i'm not going to go
into great detail on this all of the
documentation for this is available
online but there's api's in cocoa for
finding character and cluster boundaries
in carbon for finding boundaries of all
sorts and if the reason you're looking
for a character boundary is in order to
truncate text you don't even have to do
that yourself you can actually ask
actually to truncate your text for you
you just pass it an option tell it how
wide you want the text to be and it will
find a linguistically correct place to
truncate the text and add a truncation
character
because of the problems with multiple
spellings that I talked about before
there are system API that can help you
with that that will comparison or
searching of text as i mentioned due to
directional issues text can move around
in within a paragraph and so you need to
when you're drawing you need to deal
with an entire paragraph at a time and
there are api's in cocoa and carbon that
will help you do that for cocoa you can
use the text system directly or use
attributed strings and typesetters for
carbon of course there's Atsui and as
long as you let the system know about an
entire paragraph it will figure out
where everything belongs and then you
can figure out where the line breaks are
and draw the lines individually of
course because there isn't one to one
mapping between characters and glyphs
that's also an issue for moving the
cursor with the arrow keys or clicking
with the mouse or highlighting text and
there are api's that can help you do
that one issue that every unicode
application has to deal with unless it's
brand-new is how to handle legacy data
that's not in Unicode now we've had
api's in the system for a long long time
to convert between unicode and other
character sets so i'm not going to go
into that one issue though is to figure
out what character said should i use
what character set should i assume the
texts in well if the character set is
marked in the document somehow then
you're set you know what the character
set is but very often you're dealing
with plain text or other text that
doesn't have any information on what the
old character set was that it's encoded
in so then you have to guess and there
are a couple of api's that can help you
do that if you think it's going to match
the language that your application is
running in then you can call get
application text encoding that will
return an encoding that usually matches
the language that's been selected the
localization that's been selected for
your application it might be more
appropriate to pick an encoding that's
associated with the users most preferred
line
which because maybe your application
doesn't support that language but the
users data is quite likely to be an
encoding that's associated with it and
CF string get system encoding will
return in encoding that usually matches
that language now why do I say usually
well the reason is that there are
languages that Mac os10 knows about that
were never supported in world script
that were never supported on OS 9 and
they don't have legacy and coatings
associated with them some of them like
the enemies do have a world script
encoding but that doesn't mean that you
can draw the data with quick-draw text
it's just something that you can convert
using an encoding conversion API other
languages like Hawaiian have known on
Unicode encoding associated with them at
all so if your application is running in
Hawaiian or the users most preferred
language is Hawaiian you're not going to
get a sensible answer from these two
api's if you're reading an Internet
application then you shouldn't be using
Mac OS and coatings at all you should be
using the standard encodings that are
defined by internet standards bodies and
you can go to I ETFs and I A&A websites
to find out about those and there are
api's that will help you convert those
names into a text encoding that you can
use internally
I'll talk a little bit more about the
new AP is for formatting and parsing
dates times and numbers if it's in core
foundation you can either get the
current locale or you can get a locale
from a standard iso locale string which
has a language code followed by a
country code you can also take
information from the world script world
like language region or script and
convert that to an iso string which you
can then use to get a locale the new
classes in core foundation as i
mentioned support both formatting and
parsing their support for currencies and
you can go back and forth between
internal representations including core
foundation types but also standard c
types and a formatted CF string and
there's also lots of customization
options you can take advantage of and
for more information you can look at the
seed release that you all received and
the last topic I'd like to cover is to
the less couple of topics I'd like to
cover our TSM and variant glyph access
so for those of you who supported the
text service manage text services
manager in the past the thing that's
different for unicode support is that
you need to create a TSM document of
type of you doc to take advantage of the
latest features you need to move to
carbon events instead of Apple events
but as I'm sure you've been hearing
elsewhere at the conference there's lots
of good reasons to move your application
to carbon events supporting the input
method active area is something that's
been around for a long time but if your
application is not a Unicode application
yet you'll also need to move to
supporting Unicode input and again new
and Panther is the protocol for
accessing the entire contents of your
document and that's critical to provide
some of the user interface and
conversion accuracy features you saw
demonstrated earlier cotati can analyze
the contents of your documents to give
great conversion results unless you can
what the content of your document is so
I'm going to go through the some of this
rather quickly because we don't really
have time to dive into it in detail for
unicode input there's a single carbon
event that has unicode text it can also
have glyph Arians information and we'll
talk about that in a little bit the
input method active area support
protocol is pretty much the same as it's
always been there's just a few carbon
events you have to handle and there's
there's nothing new here the big new
thing is the document access protocol
and I don't have time to go into this in
great detail you might think from this
long list of carbon events that it's
pretty complicated but it's not the
model is really simple the way this
works is that it makes your document
look like a CF string to the input
method so the carbon events that you
respond to are just the same things that
CF string supports it's really a very
straightforward model so if you
implement support for these carbon
events input methods can access the
contents of your document and you get
the improved conversion accuracy and new
UI features like easy reconversion I'll
talk a little bit now about variant
glyph access this is optional
information that comes with a Unicode
text input event you get an array of
glyph information records and each each
record has this information in it first
of all there's a range of text and that
can be more than one 16-bit unit car and
the reason for that is it could be a
variant version of a surrogate pair so
it could be more than one eunuch are for
that reason or it could be a variant
version of something like electric
ligature for example the Zap female font
that comes with mac OS 10 has different
versions of the FI ligature and to allow
the user to pick which one they want
they can do that via the very influenced
access protocol in that case the range
of text would be the F in the
I so it can be more than one character
you also have to specify the font that
the variant is coming out of there's two
ways to identify which particular glyph
you want one is the a font specific
glyph ID and that's used for example
with true type fonts but it can also be
a glyph ID from a published glyph
collection like Adobe Japan 15 and the
record will identify which which of
those two approaches is being used you
don't have to worry about that too much
because actually provides a style tag
and all you have to do is take the
information out of the carbon event
stuff it in this style tag and give it
to Atsui and this is all covered in
techno 2079 I want to talk about this
very much at all this will be covered in
the in a session that's coming up right
after this one across the hall up what
you need to know about fonts and Mac OS
10 this is how to support advanced font
attributes via the new font panel in
Panther there's already a carbon event
for font selection via the font panel
and we've just added more information to
it there's now a complete dictionary
with all the information that's
specified in the font panel and all you
need to do is extract the data from that
and just pass it to a fooi you don't
have to worry about what it means you
just basically have to funnel it through
your application and for more details on
that you can go to the font session
which is a session 406 and coming up
right after this one across the hall ok
I'd like to bring key to sign back up on
stage one more time to talk about our
chinese input methods and the character
palette kita son ok then we get rid of
those windows
let me add one more thing here hello
again I'll show you a few more features
before wrap-up the first one is
simplified chinese i know traditional
change on system 10 point two point four
we added hun in traditional chinese
input method which is very popular in
input method online and we were
providing this input method only for
localized system no I love quest nine
and we are offering to everybody on Mac
OS 10 this is a word-based pinyin input
method pinion and both offer input
method and it's much much easier to use
I'll show you how is when I can't
oh jeje y star lie no lie lie you me yep
okay it says welcome to the to the WBC
in San Francisco one difference one in
da da da ly q gene shall be nice oh
that's right the next one for
traditional Chinese is we added support
for HKS CS and big 5e those are
additional charges set on top of what we
have today this is a clip clip from a
newspaper in Hong Kong and chapters mass
in red I don't have a pointer is it not
those chapters marking read our were
missing in previous standard which is
big five and you might be surprised how
many characters are missing and actually
existing just this thing this is the
name of the new airport in hong kong
hong kong international airport and
these two at the bottom asked name of
names of subway station in hong kong you
couldn't even write your airport name or
substance of have helped way station
name without this extension and if you
or your application don't support
unicode you can't display these gases
it's quite embarrassing
next one is our simplified chinese input
method we extended the insert method a
DC input method so that it covers all
the chapters in gb 180 3180 or 30 let me
peek simplify and let me peek the mode
and by the way you don't need to do this
if the simplified chinese input method
our advice and support a flattened mode
long 12 q yeah this is one of the
character which only in gb 80 or 30 and
the other example in a smooth smooth
five this is too okay the last one is
not check the palette we introduced the
character for at first in jaguar and we
found out many customers loves it and we
also got many feedbacks one of the
feedback we got is somewhere you want to
enter chapter exactly what you see on
the screen and in chapter polish your
character palette honors default setting
in a vacation usually you get different
funds between chapter palette and
application but we got a feedback you
that you want to exactly the same
chapter between the application and
sacrifice so here's the chapter palette
which looks like one we have in java and
you have this little disclosure triangle
here as font valuation if you open it
here's your list that is the selected
character in all fonts in the system so
you can browse this character a using
all fonts in the system and pick one you
like and if the character you've
collected in this list happens have
happen to have variants glyphs it listos
variants in this variant feel so for
example that this is a portion 3 and you
want a different day for this like say
here a long head and here you have
insured with font at the bottom right
and if you press this button oh you
insert the character into the document
let me try different one say this one
and you insert the different clip let me
try to find one you can drag the
character to this area to go to that
chapter and see say say I want a genome
into politics and here you are you have
different nabi and say your name is say
the one which have our did that they too
dot up here you can insert and you not
have to dot
[Applause]
also you can truck a doctor bear chapter
and find which one has disc actor like
this so now I bring the roblox to the
stage for the wrap-up Thank you Thank
You kita Thun and like to emphasize
again how important it is to support
Unicode and the document content access
protocol and your applications so that
your users have access to all these
great features so I didn't have a prop
budget for this talk so I don't have a
coffin to roll out on stage but world
script is dead quick trow text is dead
they can't begin to cover some of the
requirements that we're seeing in the
Japanese and Chinese markets today we're
spending all of our efforts all of our
focus is on unicode we're not spending
any time on world script we're not
spending any time on making enhancements
to quick draw text so unicode is it
unicode will give your application great
competitive advantages in the japanese
and chinese markets so you really should
focus on adding that and if you do that
as a side benefit you get the rest of
the world besides which is not a small
thing thank you everyone
so I'd like to rock wrap up now here's a
couple of other sessions or more than a
couple that you might be interested in
immediately after this session is what
you need to know about fonts in Mac OS
10 the whole name didn't fit on the
slide you can find out about the font
panel and the typography panel and lots
of other useful information about using
fonts on Mac OS 10 and that's in the
emission room starting at three-thirty
on friday at five o'clock in the
presidio room there's the cocoa tech
session you can find out about new
features for Panther and all the other
great things that are going on in the
world of cocoa text unfortunately at the
exact same time also at friday at 5pm in
Nob Hill is a session on our new in KPIs
which if you're interested in enhancing
your application support for handwriting
you can find out about how to use these
new API to do that and finally if you
want to let us know what's bugging you
or what you think is going great our
international technologies feedback
forum is friday at ten-thirty in the
north beach room and we'd love to have
you come and give us feedback on what we
could do better and what we're doing
right so if you have further questions
the first person you should be talking
to is da da and his email address is
easy to remember it's da ba at apple com
if you have any questions when you're
done talking to Zach yeah you can also
contact me and my email address is
Goldsmith without the h @ apple.com you
don't need to scribble up a lot of stuff
down because the URL that you see at the
bottom of the screen developer apple com
WWDC 2003 URLs HTML will have all the
insert all the contact information and
all the URLs from all the talks at WWDC
here are some places you can go for more
information there's our documentation
library of course we also have a nice
summary page for international texts
technologies that's developer.apple.com
/ int l if you want to develop ponce we
have a fun developer web page that's
developer.apple.com / fonts there's of
course references for the app kit and
for Atsui unicode utilities which is
used for text finding text boundaries
and comparison and searching there's a
specialized set of topics on cocoa text
handling there's documentation on CF
string here's a couple of using a hand
full of youthful tech notes and sample
code techno 205 fixes on how to do your
own keyboard layouts 2079 the variants
lift access protocol it has much more
detail than I was able to go into
there's a sample app for Atsui and how
to draw unico text and how to do your
own input method and some pointers
outside of Apple the Unicode consortium
has a website for more information about
Unicode that's the best place to start
there's a new version of the Unicode
book coming out for the new
four-point-oh version of the standard
and much more readable than the standard
itself is unicode demystified by richard
Gilliam I highly recommend that there's
an introduction to Unicode if you want
to learn more about it and finally the
open source international components for
unicode library has its own website
that's hosted by IBM