Transcript
>> OK, hello everybody.
I'm Deborah Goldsmith and today we're going to
be talking about making your application ready
for the entire world, or at least a big part of it.
So why should you care about that?
Well recently over half of Apple's revenue has been
coming from customers outside of the United States.
And most of those customers don't use
English as their primary language.
Now some foreign languages like French and German
are pretty close to English in the way they behave.
But many of our biggest markets use languages
that follow very different rules from English.
So it might be counter-intuitive about what you need to do
to your application in order to support those languages.
Fortunately you don't need to figure
out what those rules are yourself.
You can make your application ready for those markets
by calling system APIs that do all the work for you.
And somewhat unusually today or at least in
prior years we're not going to be spending a lot
of time talking about API details or code samples.
Mostly we're going to be talking about
concepts that will help you understand
when you should call system APIs
and which ones you should call.
So today we're mostly going to be talking
about Internationalization not Localization.
What's the difference?
Localization is the process of translating
your application's user interface.
So for example the text in the menus, the text in the
buttons, the text in other controls, those kinds of things,
it's the language that your application
uses to talk to the user.
Internationalization is different.
Internationalization is about making
data in all the world's languages work
and there's many different kinds of that data.
For example there's text content, which may come
from the user or it may come from an external source.
Dates, times, numbers, currency amounts, calendars
can vary and we're also going to be talking
about time zones a little bit, because
those behave differently around the world.
So the goal for this talk is to help you
understand how to make your application world ready.
And the goal is to have one version of your application, not
a French version and a Chinese version, a Russian version,
one binary that can support content in any
language and also can run its user interface in any
of the languages that the system supports.
So today again we'll be focusing on the
Internationalization part not how to do Localization.
So content-- content is data that the user provides
or it comes from an external source, maybe a website.
Content can be in multiple languages and the language
doesn't have to match the language of the user interface.
That is the Localization language.
And not only can it be in multiple languages it
can be in multiple languages at the same time
in the same document or even in the same paragraph.
Now this is something-- we're not going to be focusing
on this today but that's something to keep in mind
as you figure out how your application processes text.
OK, language and the language preference controls
the Localization, that is it controls the language
of the menus controls of the user interface.
The way it works is the user picks a primary language
in either the Language and Text pref pane in Mac OS X
or in the Language Preference on iOS
and that will pick one lang.lproj
out of your application or out of another kind of bundle.
In addition the primary language
controls a few other things.
It also controls the algorithm that is used to sort
words for presenting an ordered list to the user
and it also controls word breaking behavior.
On the desktop you can actually set those separately.
I don't know if you can see it
peeking out from behind the iPhone,
there's a little pop up there,
which is the order for sorted lists.
Mac OS X lets you set that independent from the UI language,
although by default it's the same as the UI language.
If you change the UI language you must
restart applications for it to take effect.
And in the case of iOS it actually
restarts the device so that all
of the user interface is running in the same language.
OK the thing that we're going to be spending the most time
on today is controlled by the Locale or Region preference.
Again that's in the Language and Text pref pane on the
desktop and it's in International Preferences on iOS.
And this controls things like dates, times,
numbers, calendars and so on are formatted.
There is a language component to
the Locale or Region and you usually
that language component is the same as
the UI language, but it isn't always.
Now one big difference between the Locale or Region
and the UI language is that you can change the Locale
without having to re-launch applications.
So that's what users are going to
expect in terms of correct behavior.
So here's our cast of characters
or cast of classes in any case.
These 6 classes are what we're going to be
spending the most time talking about today.
NSLocale is kind of the controller class for all of this.
It embodies the current region and format
preferences from the user and it has a lot
of different properties that you can set individually.
NSNumberFormatter as you might expect is a class that
you use to format numbers and also to parse them.
NSDateFormatter does the same thing for dates and times
and you may have heard of all of these classes already.
A couple of less familiar classes are
NSCalendar, which handles calendar operations
and NSTimeZone, which encapsulates logic for time zones.
And I'm assuming that you've all heard of NSString and
we're not going to be going over all of NSString today
but just the parts that pertain
to natural language processing.
So let's start with NSLocale.
NSLocale again is set by the Region
Format preference in the pref pane
and all Locales have an identifier associated with them.
That identifier is a string, which kind of sums up
the part of the world that the Locale has to do with.
So one example is the US English Locale and
the identifier for that is just en_US.
Below that I have a more complex
example, which we'll go through in detail,
which shows most of the parts of a Locale identifier.
So every Locale identifier has a language
and in this case the language is Serbian
as represented by the small sr there at the front.
That part of the Locale identifier uses ISO language codes.
Almost always there's also a region in this case
sort of counter-intuitively the RS stands for Serbia.
As I said almost all Locales have
a region but sometimes they don't.
For example there's an Esperanto Locale.
Esperanto is an artificial language.
It doesn't really correspond to any country and so
there is no country or region for the Esperanto Locale.
Also a region doesn't necessarily have to be a country.
There are letter-based codes, alphabetic codes
for countries but there are also numeric codes
for regions that represent parts of the world.
So for example, there is a region for all of Latin America.
There is a region for all of Europe.
Mostly we don't use those but it is
possible to have that in that position.
Now something that you'll sometimes see
in a Locale identifier is the script
and the script is there usually for one of two reasons.
First and most importantly is if
you need it for disambiguation.
So for example, Serbian is written about with equal
frequency in the Cyrillic script and the Latin script.
So you always need to specify which script you're using
when you're specifying a Serbian Locale, so it's required.
Sometimes you want it for overrides.
For example, there are two kinds of Chinese
writing that we support in the system.
There is the simplified Chinese set
and the traditional Chinese set.
And usually you can infer which
one to use based on the region.
So, for example, the Chinese in Hong Kong
region implies traditional Chinese.
However, if you wanted to set your Region preference to
be Hong Kong using Chinese, but you wanted to force it
to use the simplified version in that
case you'd specify the script explicitly.
Sometimes there's a variant, very occasionally
and I'll give an example of that later on.
I don't want to focus on it too much.
Finally Locale identifiers can also have keywords
and in this case there's a keyword for the currency,
which overrides the currency that
comes from the Locale data.
So for example, in this case we're specifying that we want
to use the Euro regardless of what
the default is for Serbia.
You can also override the calendar using a keyword.
So where do you get a Locale object from?
Well you can create a Locale from the identifier string if
you want a specific Locale but the usual way to get one is
to call CurrentLocale and that will give you a Locale object
that corresponds to the user's current Preference Settings.
Now that object won't change after creation and remember
users expect that if they change the Preference Setting
that the behavior of your application
will change almost immediately.
So in order to react to that there's
a notification you can respond to,
which is and its CurrentLocaleDidChangeNotification,
say that 10 times fast.
And so your application can look for that and when you
receive it you can go through and update all your objects.
There is a convenience function, which is the
autoupdatingCurrentLocale class method on NSLocale
and that will give you an NSLocale that
responds to that notification itself
so it will update itself when the
user changes their preference.
In addition, if you set that Locale on a number formatter or
a date formatter or on any other kind of foundation object
that takes a Locale, those objects will in turn
update automatically when the notification comes in.
Now something you have to watch out for if you
are looking for the notification yourself is
that the NSLocale is looking for the same notification
so you can get into a little bit of a race condition
and I'll give you an example where that could occur.
Let's say you have a window and that window shows
today's date at the top and you want that date to change
when the user changes their Preference
so that it uses the proper Region format.
So you might set up that date with an NSDate
formatter that's sent to the autoupdatingCurrentLocale
and then you look for the LocaleDidChangeNotification
to repaint the window.
Well the problem is that you and the locale
are both looking for the same notification
and if you get it first you'll repaint the
window before the Locale has had a chance
to update and you'll still get the old formatting.
So, if you're using the autoupdatingCurrentLocale
and you're also looking for the notification,
it's important to keep in mind that it's
non-deterministic who gets it first.
So let's move on to talk about numbers
and some of the differences you might see
in the way they're formatted between Locales.
So, one important difference is the decimal point character
and the grouping separator character,
but also the size of the groups.
In the United States and in many other countries
groups are 3 digits in length so it's every 1,000
but in some Locales groups are 4 digits in length
and in still other Locales different groups
in the same number can have 4 digits or 3 digits.
So the first group might have 4 digits
but subsequent groups might have 3 digits.
In this case we have a U.S. English formatted number on the
left and we have a French formatted number on the right.
The French number uses a non-breaking
space for the 1,000 separator,
the grouping separator and a comma for the decimal point.
Not every Locale uses the ASCII digits for
representing numbers, most do, but not all of them.
In this case on the right we've got a number formatted
according to the Arabic Locale and you can see
that it uses a completely different set of
digits in order to represent the number.
Currency can also vary, not just the
symbol but also where it appears.
So for example, again we've got a French formatted
currency amount on the right and the Euro appears,
the currency appears after the
number and separated by a space.
Another thing to keep in mind is that the currency
symbol can change even if the currency is the same.
So for example, in the United States when we represent
an amount in dollars, we just use the dollar sign.
But if you're representing that same amount in dollars
in say Australia then it would say U.S. dollars
because in Australia a single dollar character
means the Australian dollar not the U.S. dollar.
Percentages can also vary in the way they're formatted.
Not just the digits used but as you can see
Arabic uses a different percent character
than the Roman alphabet and also the different digits.
Also the positioning of the percentage sign either
after or before the number can vary and finally even
for floating point concepts like not a number
or infinity some Locales localize that data.
So, for example, we use NAN for not a number in the U.S.
English Locale but Icelandic uses a different string,
which I don't even know how to
pronounce but that's it on the right.
So if your needs are simple for number formatting it's
very straightforward NSNumberFormatter has a class method
and this is new in OS 10.6 and in iOS 4.
All you have to do is pass in the NSNumber for your Number
and which Number style you want and you get a string back.
No muss, no fuss.
And there are 4 basic Number Formatting
paradigms that are supported.
Paradigm is probably too big a word for this, styles.
OK, general just formats things
in a general floating point number.
There's the currency style, which uses the currency
symbol, there's percentage and you'll notice
that for percentage the number is multiplied by 100 and
the reason is that you would use the percentage format
to represent a number between 0 and 1 and that would
be formatted as a percentage between 0 and 100.
And finally there's a scientific style
if you want a scientific notation, and
again this can vary between Locales.
There are some more advanced things that you might want
to do with numbers and for those advanced uses you want
to create an NSNumberFormatter object and keep it.
So one example is if you're formatting a lot
of numbers or creating one NSNumberFormatter
and then calling it repeatedly is more
efficient than calling the class method.
There's no class method for parsing numbers.
So if you want to parse a string into a number
you need to create an NSNumberFormatter.
And if you need to tweak the format for example,
controlling the number of significant digits,
whether the fraction is shown, whether the decimal point
is always shown, how the sign is represented, etc., etc., etc.,
there are accessor functions on
NSNumberFormatter that set that up
and if you need to do that you need to create an object.
Well what are some of the things that can
go wrong if you don't call system APIs?
And we'll have several slides like
this and all of these are lifted
from real examples that happened in real applications.
So, one problem is using stringWithFormat or
printf or scanf for formatting or parsing numbers.
The problem is that %e and %f will not handle non-ASCII
digits like the Arabic example that we saw earlier.
So you cannot use these APIs to
format numbers in a localized way.
People sometimes assume that the decimal point or the
grouping separator or the size of groups are the same
as whatever country they are living in whereas
it varies considerably around the world.
The same thing for percents; people
assume that a percent is always formatted
as the ASCII digits 0 through 9 followed by an ASCII % sign.
Sometimes people will create an NSNumberFormatter
and then set the pattern string and that's fine
in some circumstances, but it will erase the
Locale specific formatting that you get for free
and so all of a sudden your number
formatting is not localized any more.
And finally, a problem that people can run into is let's say
you've got a document and its showing $2.00 and the U.S.,
sorry the user goes and changes their Currency
preference to another currency, say the Euro.
Well now you've got two Euros except
of course $2.00 is not two Euros.
So if its user supplied data then that's
generally not something you need to worry about.
But if you are writing an application where you could
be converting amounts between currencies it's important
to realize that the system does not do that for you.
All that's changing is the way the number is represented
and any currency conversions you have to handle yourself.
OK, moving right along, let's talk about dates
and times and what differs between Locales.
I mentioned that a Locale has a language associated
with it and that is the language that controls the names
of the months, the days of the week, the AM/PM strings and
also relative terms like today, yesterday, or tomorrow.
Now if I'm running my system in English but I set
my Region Preference to be a French Locale I'm going
to get French month names because that
is what's associated with the Locale.
So there's an example of today's date in the French Locale.
Another thing that can differ between
Locales is the Calendar in use.
So here again is today's date, except in
this case we're using the Japanese Locale
and we have the Japanese Calendar set and it's
the date is the 22nd year of the Heisei era, June 10th.
Another thing that can be different between
Locales is which day is the first day of the week.
In the U.S. when you're representing days in a calendar
Sunday is on the left and Saturday is on the right.
But in other places Monday is on the left
and Sunday is on the right so that can vary.
Some places use 12-hour time like
the U.S. and other places like Japan
or in Europe use 24-hour time and that also varies.
Even if the language is the same the order
of the date elements can be different.
So, for example, in the U.S. we say June 10, 2010,
but in the UK they write it like this 10 June 2010.
So again there are some predefined styles
for date and time formatting and also parsing
and those have the names Short, Medium, Long and Full.
And as you can see as you go from Short to Full you get more
and more information represented in the resulting string.
Now there are two ways you can use this.
One is to just pick a style and stick with it.
So say I'm always going to use the long date.
But another thing you can do is start at a particular
length and then maybe make it smaller, for example,
if the date is in the column and the user
shrinks the column, and we do this in the finder.
We start with the longer forms of the date and time
and then as the user shrinks the column that represents
that data we fall back from the Long to the Medium
and the Medium to the Short to take up less room.
Starting in Mac OS X 10.6 and in iOS 4 we have
some new features, both of which are very useful
and I'd like to spend a little
bit of time talking about them.
So you've probably noticed that in different places
in the OS, for example the Finder or in Mail on the phone,
you'll see relative terms like today or yesterday
and it hasn't been very easy to do that in the past
but now we've got a property on NSDateFormatter with
a RelativeDateFormatting property and you can set that
and your dates will do the same thing and there's
an example there, except it wasn't yesterday,
it should have been June 9th, but never mind.
Another new facility, which is very
useful is the Date Template Facility.
Now what would you use that for?
That is what you would use if the predefined date
formats or time formats doesn't meet your needs.
If you need a different subset of the date elements and
what you do is you pass in a template string and a Locale
and some options and this class method will return
a format string, which you can then turn around
and set on your NSDateFormatter and that will
format things according to what you requested.
So, for example, let's say I'm writing a Calendar application
and I want to put an hour view down or say a day view
where I have hours down the left hand side, well, I
guess the left hand side would be over here for you guys,
but I just want the hour, but I don't
know if it's 12- or 24-hour time.
And before this API you had to do a lot of fussing around
with the date format string to try to figure out how
to set up this piece of a date or time format.
In this case though, you can just pass the
template string j, which is a meta-character
that just says give me the hour
whether it's 12 or 24 and depending
on the Locale you'll get back a different format string.
So for English you may 12-hour hour followed by an
AM/PM indicator or you'd get the 24-hour hour character
and you get the two different results you see on the right.
Maybe my Calendar application has a month view and the month
view has the month and the year at the top, but I don't know
if the months or the year are supposed to come first.
If the user is using the Japanese
calendar, do I put the era in?
This API takes care of that for you.
So in this case you pass a template where I
say I want the shortest possible representation
of the year but I want the full name of the month.
And the strings you get back are for the U.S. Locale
you get the name of the month, a space and a year.
But in the Japanese Locale using the Japanese
calendar I get the era, I get the year,
I get the month and everything works out fine.
So what are some of the kinds of errors that
we've seen people run into with dates and times?
Well something that seems to happen a
lot is that people us NSDateFormatter
for parsing or formatting non-localized dates.
For example dates that you get off the Internet
or dates that appear in some internal data format
and if you use NSDateFormatter that way without
understanding that it's localized you can get bad results.
Typically dates like that aren't localized.
Now to get an NSDateFormatter that you can use to parse or
format dates and times like that instead of getting one set
to the current Locale, which is the default, just create a
new Locale, set it to this identifier and this is an example
of the variant that we talked about earlier
when we were talking about Locale identifiers.
So in this case this is the POSIX variant of the
U.S. English Locale and this is the Locale identifier
that corresponds to the standard C Locale.
This will always give you stuff back that uses English
names of months and is formatted in a standard way.
So if you set your NSDateFormatter using
that Locale you'll get non-localized dates.
Another option is to just call the BSD layer where there
are APIs for parsing and formatting dates and times
and if you do use those, just pass NULL for the
Locale because that indicates again the C Locale
and that make perfect sense because the primary purpose
of NSDateFormatter is to handle localized dates and times.
If you're dealing with a non-localized date or time you can
set it up to do that but you don't really need to use it.
Another thing that people sometimes
did is parse format strings.
So, for example, if I were writing a Calendar application
and I wanted to put the months and the year at the top
of the view, there was no way to get NSDateFormatter
to do that prior to these recent releases.
So what people would do is they would set the full
date format, then they would extract the format string
from the NSDateFormatter and then they
would go picking through it to try to figure
out which pieces to use and that's very error prone.
But now that there's a dateFormatFromTemplate
you don't need to do that anymore.
Another thing that people do is use NSCalendarDate
at all, it's deprecated and you shouldn't use it.
You can use NSDate to represent a date and time.
In fact, that's its primary purpose but people have
used the description method on NSDate to format dates
and it will not format a date in a proper localized fashion.
So you should use NSDateFormatter whenever you're
parsing or formatting dates that are localized.
And then another mistake that people have made is
assuming that the calendar is always Gregorian.
OK there we go.
So Calendars, let's talk about some of the things that
can differ from different Calendars and different Locales.
Well one is the year.
This is the year 2010 in the Gregorian calendar.
However, in the Thai Buddhist calendar this is the year 2553
and in various other calendars
the years are all over the place.
Every Calendar has an implicit era.
So for example in the Gregorian calendar we're in the
AD era, but usually we don't bother representing that.
However, for some calendars like the Japanese
calendar the era changes rather more frequently
and it's important to take that into account.
So, for example, this is the 22nd year of the Heisei era.
Another thing that can vary between
calendars is the number of months in a year.
So for example, the Gregorian calendar always has 12
months but some calendars have 12 months or 13 months
or even the number of months can vary from year to year.
The lengths of months can also vary.
You can remember of the names of the months of
the Gregorian calendar using the nursery rhyme
but other calendars have a different set of
months and those months have different lengths
than the lengths of the Gregorian calendar.
Even the lengths of the months in the Gregorian calendar
can vary depending on whether it's a leap year or not.
And some calendars, for example the Coptic
calendar, have months as short as 5 days.
Another thing that you really wouldn't
expect is that the year can change other
than at the first day of the first month of the year.
So for example, in the Japanese
calendar the year changes when the reign
of a new emperor begins, which
doesn't have to be January 1st.
So for example, the day after January 7th of the 64th year
of the Showa era is January 8th of
the first year of the Heisei era.
Well fortunately an NSCalendar
takes care of all of this for you.
It abstracts all the operations that you might want to
do on calendars or dates, determining how many days are
in a particular month, how many months are in
a year, converting between calendar components
and an absolute date/time, doing operations like what's the
date 3 days after this one, all sorts of things like that.
Mac OS X 10.6 has support for a
large set of non-Gregorian calendars.
iOS 4 supports what we call Gregorian-like non-Gregorian
calendars and those are Gregorian calendars where the set
of months is the same but the year and era may be different.
And at some point in the future we plan to expand
support of non-Gregorian calendars on iOS also.
So what are some of the things that can go
wrong when you're doing Calendar operations
and you don't let the system handle it for you?
And again, these are all lifted from
real situations that we've seen.
One is assuming Gregorian calendar, assuming
that there are always 12 months in a year.
This is an interesting one, assuming
that month numbers are sequential.
Remember that I mentioned that some calendars have
years with 12 months and years with 13 months?
An example of that is an Arabic calendar.
Well in the year that has 12 months
or rather the year that has 13 months
that extra month is not at the end, it's in the middle.
So in a year with 12 months that month is not there.
So you skip over it when you're numbering
months in a year without that month.
The same thing can happen with days,
even in the Gregorian calendar.
For example, October 15, 82 in the Gregorian
calendar only has 21 days and you go straight
from October 4th to October 21st I think.
You can't assume that the error is
optional because just, for example,
in the Japanese calendar seeing the
year 22 doesn't tell you anything
if you don't know whether it's
Heisei or Showa or what have you.
Some Apps assume that weeks always start on a Sunday.
People have been tripped up by the fact that the year can
change other than on the first day of the first month.
And something that's really tricky is recurrences.
So again let's assume you're writing a Calendar application
and you want to allow the user to set up a meeting
that happens once a month or somebody's
birthday, which is once a year
or, for example, the last Tuesday of
the month or the second Thursday.
Well what those terms mean changes
when you change the calendar.
So, for example, if I have somebody's birthday
and it's a particular day of a particular month
in a particular calendar that recurrence
relationship is different if I switch calendars.
The day that is the second Tuesday of the month is not--
the second Tuesday of the third month is not
the same in the Gregorian and Arabic calendars.
They are different days.
So if you're defining a recurrence relationship like
this in your calendar or a similar kind of application,
it's important to keep track of the
calendar that was used to define it.
So if the user set their birthday in the
Arabic calendar you should keep track
of that fact that it was set in the Arabic calendar.
OK let's spend a little time talking about Time Zones.
Those can also be a little counter-intuitive.
Every time someone has an offset from what's called
Greenwich Mean Time or universal coordinated time,
although those are not precisely the same thing,
they're close enough for what we're talking about.
There are also rules about whether daylight
time is observed and when it's observed.
Every time zone has a unique identifier.
Time Zone information in Mac OS X and iOS
come from something called the Olson database
and that's used by a wide variety of computer systems.
Every time zone in the Olson database
is uniquely identified by an ID.
But time zones also have localized names.
I will spend a little bit more time talking about that.
And as Time Zone represents the abstraction of the Time Zone
and it will tell you the answers to all of those questions.
So what are some of the errors that we've seen people
make in working with Time Zones in applications?
One is assuming they know what the GMT offset is
or what the rules are for daylight savings time.
So, for example, the U.S. as a country change the dates
where we observe daylight savings time a few years back.
So those can be different based on
what time period you're observing.
So if you're formatting a date that is back in say 2001 then
it's going to be different from formatting a date that's
in 2010 in terms of when daylight savings time kicks in.
In addition, historically whether you're
observing daylight savings time or when it happens
or even what your DMT offset is can vary.
So for example, for a long time Indiana
did not observe daylight savings time
and then when the U.S. changed the rules a few years
back they decided that they would start observing it
but at the same time different counties in Indiana
decided that they would switch their time zones.
So for some counties that were previously Central
became Eastern and some that were Eastern became Central
and so NSTimeZone takes care of that for you.
It tracks it as long as a user sets a Time
Zone preference correctly it will keep track
of all those historical changes.
Another thing that people do is use
the Olson ID which is really more
like a programming identifier to show to the end user.
So, for example, the time zone that we're
in right now is called America/Los_Angeles.
That's not really something that you
want to show to a user and NSTimeZone
and NSDateFormatter will let you get a
localized name that will make more sense.
If you do call NSTimeZone make sure that you're
getting the right version of the Time Zone name,
the generic name versus the daylight
name versus the standard name.
Another assumption that people sometimes make is
that the short IDs for time zones things like PST
for Pacific Standard Time are unique, they are not.
For example, PST is also used in Australia.
So you can't look at something like PST and
assume you know what the full Time Zone is.
OK, lastly we're going to spend some time talking
about Natural Language Processing with NSString.
So there are two operations that
we're going to talk about today.
One is Breaking a string into pieces
and the other is Sorting.
So new in 10.6 and iOS 4 there is an API NSString
called enumerateSubstringsInRange:options:usingBlock:.
And you can use that to perform lexical
operations on a string of Natural Language text.
You can find word boundaries, line break
opportunities, sentence boundaries and so on.
And this is one of the APIs that's controlled
by the UI language not by the Locale.
Another thing that's controlled by
the UI language is the sort order.
Excuse me.
I'm just going to take a sip of water.
[Sound effects] And that's important because
different languages can have very different sort orders
and the way certain features are
handled varies between those languages.
So, for example, the way diacritics are handled
when sorting English is completely different
from the way it's handled when sorting French.
And the API that you can use to do any kind of comparison
for sorting purposes is localizedStandardCompare.
So here I have two examples of a sorted list just
to show you how different a sort order can be.
On the left we have Hawaiian.
Now that list may not look alphabetized to you, but it is
and the reason for that is that native Hawaiian uses a set
of letters which is a subset of the 26 letters that
are used for English and words with those letters,
that is native Hawaiian words, always
sort before words that use other letters.
So in this case letters like B and C are not used in
native Hawaiian and therefore they sort at the end.
Similarly for French the thing that's
different is the way that accents are handled.
So if you look at the last three items in the
French list, in French the accent at the end
of the word is more significant
than the accent at the beginning.
And so you can see that those last three words are
sorted according to first to the accent at the end
and then according to the accent at
the second position and it generalizes.
You go through the accents in French
backwards in order to determine the sort order.
So localizedStandardCompare will take care of
all of this for you as long as you call it.
So what are the kinds of errors that we've seen people
make in applications by not calling the right APIs?
One is that people often assume that words and lines are
always separated by whitespace, space, character, tab,
return, etc. That's not true for many languages including
some in large markets like Japanese, Chinese and Thai.
A very common mistake is to use NSString's
Compare method for sorting a list
that is going to be shown to the end user.
The problem is that Compare is not localized.
It will not take any of those language
issues we just discussed into account.
It just uses a fixed binary order.
So Compare is great if you're doing something like building
a B tree index where you need a fixed comparison order
and you don't want it to change when
the user changes their preference.
But if you're sorting a list to show to users
then you need to use a localized comparison.
There is an extended version of Compare that can do
localized compares if you need it for advanced use.
So, for example, in that API is
compare:options:range:locale and you'll get a localized sort
as long as you pass something for the Locale.
An example of where you might want to use this, for
example, localizeStandardCompare turns on the option
that we call numeric sorting and what that does is if
you have numbers that appear as part of the strings
that you're sorting, it will compare
those as the actual numeric value.
And that's what the finder uses so that for example,
if you File 1 through File 9, File 10 through 19,
the finder will sort those in numeric order.
If you don't want that, you can call the advanced
form of Compare and turn off that option.
That's the kind of case where you might
want to call this more advanced API.
Another error that we see people make
quite commonly is doing comparisons
for sorting with diacritic-and case-insensitivity.
Now those options are intended for searching
like a find dialogue, not for sorting a list.
If you turn on diacritic-insensitivity that French
example that we saw will not be sorted properly.
Similarly if you turn on case-insensitivity you will
not get the right order because some languages --
the order in which upper and lower case versions
are shown differs based on the language.
Some languages put the uppercase version first
and others put the lowercase version first.
Again, if you turn on case-insensitivity when
you're sorting, the order in between upper
and lower case will be essentially random.
It's whatever falls out of your sort algorithm.
So whenever you're doing a sort always make
sure that you are diacritic- and case-sensitive.
OK we'll we're pretty much done.
Two sessions that are related to this topic are Advanced
Text Handling for iPhone OS, which was Tuesday at 4:30
and Understanding Foundation, which was
the session immediately before this one.
So all of you get in your time machine and
get back and go and watch those sessions.
But if you don't have a time machine and you didn't attend
the sessions already then you can find all the information
about these sessions on the WWDC website.
And for more information you can go
to the http://devforums.apple.com.
I'm sure you all have this URL already.