WWDC2003 Session 404

Transcript

Kind: captions Language: en hi everybody my name is Xavier logo and I'm the Mac os10 evangelist in developer relations and I like to welcome you to session for viewer for any code for Japanese Chinese and everything else I to look at the title is kind of long before we start with the station where we have great content we're going to give you an update on what we've been doing with regards to unicode int enter I like to take just a moment because yesterday night I don't know if you want but we had the meet the evangelist event downstairs and I represent a lot of technologies on Mac OS 10 to developers but most of the time I get a lot of questions on well why should I use in E code why why are the developer should really focus on using at three or using the cocoa layout engine on using MLT why do I need to use in equal in my application and that was very interesting because it seems still a lot of developers don't understand all the benefits with nickel we've been talking about it since the beginning of Mac OS 10 we had sessions over last three years but let me give you just a quick rundown on why as a developer you should really focus and use unicode as much as possible in your application first if you're a silica developer and you're developing an application for China Japan or Korea you just have to do it now Apple has been investing a lot of money and a force in your Japanese to port for instance and the only way you can take advantage of some of the features we're going to be talking today like for instance accessing the 32,000 glyphs that we have in here agonal is by using Unicode but then second it's important for you to understand that unicode is we're actually after is putting all its efforts we're not doing any more things we not supporting any new languages in word script and for you you have to see whether the way of evolving your application so please take good note of all the content of the stations we have brand new features for customers in the surgical countries and for that I like to invite on stage Deb regardless who's the manager of a soft and is unicode years 0 with the yukos construction ago Thank You job yay good afternoon everyone today we're going to talk about unicode in Mac OS 10 so here's a quick introduction to what we're going to be discussing today the market in Japan and China has changed in an order for your application to be competitive there you need to support Unicode luckily for all of you Mac os10 has great unicode support and we'll talk today about the tools that are available for your application specifically we'll be discussing how governments and customers in Japan and China are asking for new features and new characters in particular we're going to discuss why only unicode can meet those new requirements I'll will talk about some great new features in Panther they're only available through Unicode and to unicode applications I will talk a little bit about how unicode is different from world script from what you may have been doing before and what you need to do in your application in order to work with unicode so there are a lot of reasons to move to unicode for the Japanese and Chinese markets but here's the biggest problem by far and that is the customers are demanding more characters well why is that in order to give you if you're not literate in Japanese or Chinese in order to explain by analogy suppose that your name is Smith but you spell it s my th e now you may go to a website maybe at amazon.com and you want to enter your name to give an order but when you do that the website comes back and says I'm sorry in order to use this website you have to spell your name FM i th well that's pretty bogus you think I should have to change the way I spell my name in order to use this website well that's exactly the situation that many customers in Japan and China find themselves in and the reason is that the number of the variety of characters that people use to write their names is much larger than what has traditionally been in the Japanese and Chinese character sets on Mac OS so I have an example here this is five different ways you see if I can use the laser pointers okay that's kind of tiny but does that third line there is five different ways of writing the Japanese family name Watanabe and you may not even think that some of them even look different but all of them are different and there's actually a lot more ways than that for writing that family name and so customers not surprisingly want to be able to write their name the way they write it on everything else they don't want to have to fall back to some different standardized form of the character when they're using a computer and this doesn't just affect people named for example using the mac traditional chinese character set it's not possible to write the names of all the subway stops in hong kong or even the name of the new international airport there so it's a big problem there's just not enough characters for what customers want to do with computers today governments are also specifying more characters all of the major character set standards in japan and china have been revised over the last few years there's new versions of all of these character sets and they all specify many more characters than were in those character set standards before on some of them are not just specifications in particular the GBA Tino 30 character set in china and HK SCF character set in hong kong our government requirements the government requires that software support these character sets so in order to meet all these new requirements we have to support more characters the problem is that the world script system which has been in Mac OS for a long long time can't support any more characters it has limitations and it just can't support the number of characters that are needed so the answer is unicode unicode is an industry standard it's one encoding that handles all the living languages in the world today and a large number of dead ones besides because it's a single encoding a character is a character as a character the meaning of the character doesn't change depending on what font you have if you've been used if you've used Japanese or Chinese or Korean or what have you on Mac OS 9 you might have been in the situation where you use the wrong font and you see something like that garbage characters anybody like to guess what that really is it's Korean which would know that unless you chose the right font with Unicode this doesn't happen because the meaning of the character doesn't change depending on the font Unicode solves the character problem because it's got plenty of room for all the characters that customers and governments need the latest version of Unicode four-point-oh which was just released a couple of months ago has over 96,000 graphic characters so that easily covers all the needs that customers have and it covers all of the new Asian character set standards let's talk a little bit about what kind of unicode support we have in Mac OS 10 our main human interface fonts is lucida grande and in Panther it now covers all of the Roman characters in Unicode and all of the Greek characters and it all it covers several other scripts besides our other core Roman fonts like times in Helvetica also have a large Roman repertoire although they don't cover the entire set of Roman characters in Unicode but beyond this we've got lots of greats on coverage in Mac OS 10 our Japanese support is outstanding we have six beautiful japanese desktop publishing fonts the family name is hit again oh they're in type 1 formats and as you can see they're really beautiful there's an example up there on the screen and these funds have greater character coverage than any other Japanese fonts on the market today and they cover all of the major standards that you might be interested in for the japanese market not just just 213 but also adobe japan 15 characters that are used for photo typesetting and even the complete set of government shaped recommendations from the national language committee but we don't support just Japanese we also have great Chinese support mac OS 10 since jaguar has had support for the chinese GBA Tino 30 standard all these are also beautiful fonts there's another example up there the GBA Tino 30 fonts have over 32,000 glyphs and support all of the Chinese characters in plane 0 of Unicode every single one as well as minority languages like ye for Mac os10 Panther we're adding support for HK SES from Hong Kong and big 5e from Taiwan for traditional Chinese and these new fonts have over 22,000 characters but we support even more languages also new for Panther always extended our arabic coverage it doesn't cover all of the arabic and unicode but we cover a bigger chunk than we did before and support more languages we support some of the scripts you see here that we also supported in Jaguar we've also added support for native North American languages like Inuktitut and Cherokee and we've also added a new font that we built ourselves to cover a lot of the symbol blocks and Unicode and I won't read all these off but you can see that we have a lot more symbol coverage that we had in then than we had in earlier releases of Mac OS 10 okay this time I run but we haven't just added fun we've also made other improvements to our international support as you may have heard of at the State of the Union session yesterday for Mac os10 unico text drawing is much faster over twice as fast as it was in Jaguar we've also improved are bi-directional support you no longer have to specify to the system whether a paragraph is left to right a right to left it will figure it out heuristic alee so your users can just type and it will determine whether it's a right to left or left to right paragraph and get put the punctuation in the right place we've updated our by ty algorithm to the Unicode four-point-oh standard and we we've also made several bug fixes now I should mention that the seed that you folks have received doesn't have all the latest stuff that we're working on so you may not see some of this until you get the GM release of Panther we've also made some fixes to our support for indic languages in 10 point two point four we introduced dictionary based Thai word break but with panther we're now adding the ability for users to specify their own dictionary and supplement the built-in one so users can now put their own list of Thai words in and that will affect word break in every application in the system and finally Apple supports 16 languages for localization and we haven't expanded that list in Panther however however we've always had a longer list of languages than the ones we support ourselves so that you the developers can localize your languages in two languages we don't support just as an example i noticed that somebody released a serbian localization for safari we don't support serbian ourselves because of the extra languages that are available you can do that if you want to and we've expanded that list for Panther well Arabic and Hebrew and Japanese and Chinese are fine but you say if I don't have a new new support in my application my company is toast what are you going to do well it's not a problem because Mac os10 allows you to add support for new languages yourself there's two easy steps all you need to do is add a font and we have a font developer website that you can go to to find out how to build fonts and how to enable them to work with mac OS 10 and then you need a way for users to input your new language and you can do that either via keyboard or be an input method and we have a tech note on how to do keyboards and we have sample code for input methods so rest assured you can add a new new support yourself ha nu nu is a philippine writing system by the way in case you are wondering we've made improvements at the api level in jaguar we introduced the variant glyph access protocol and the reason that we have that is that even though unicode has over 96,000 characters there are some ways of writing Chinese ideographs where you have different variants even though they're considered the same Unicode character there are still slightly different ways of writing it and that's actually very similar to what you see in Roman font you can have Roman funds that have different ways of writing the same character some of the funds that we include in the system like Zaffino or Apple Chancery have multiple versions of the same letter that are useful in different situations so the variance lift access protocol lets you specify exactly which variants of the character you want or lets the user specify that in Panther we're also introducing an extension to the text services manager a new protocol that gives an input method access to the entire contents of a user's document and that lets the input method do two things let's it give much better accuracy for conversions and it also supports some new human interface features and we'll actually see that a little later on and our Japanese input method cotati takes advantage of this we've also made extensions to the font panel to allow access to a lot more of the capabilities that spawns have that but have that have been hidden up until now so you can now get it those through the font panel and that information can be passed to your application we've also improved the infant menu for those of you who are not familiar with it the input menu is the little menu that looks like a flag or at least it did in Jaguar that you get when you have more than one keyboard layout enabled or a few of input methods or so on and so forth we've greatly streamlined it and improved the human interface there's no longer a pencil menu the pencil menu is specific to each input method but we've taken the contents of the pencil menu and merged it into the input menu so now there's just one menu and input methods that have been revised to take advantage of this new human interface kind of a much more streamlined UI and have their modes appear individually so that instead of having to choose the input method and then choose the mode you can just go straight to the mode you want of course older input methods continue to work flawlessly and transparently there's no need to revise the an input method unless you want to we've also made improvements in our input methods themselves as I mentioned cody has much better accuracy and a better interface and with lots of new features which we'll see in a moment in ten point two point four we introduced a new traditional chinese input method han in which has much easier input for traditional chinese for panther we've expanded our simplified chinese input methods to allow access to all of the characters in gb 18 0 30 and that's a lot of them as i mentioned our gb 18 0 30 fonts have over 32,000 glyphs and finally we've added more playing keyboards for more language support and now to show you some of these new features I'd like to bring up yeah so kita and Michael Grady for a demonstration Michael we can switch to demo machine 3 please hello everybody I'd like to give you a brief demo of the UI improvements we've been making to the text input menu for Panther this is not a panther machine this is Jaguar I would just wanted to go over some of the problems we try to solve with the input menu in Jaguar so you have the US flag so one problem we found from our focused user group study was discoverability was a big problem non Mac users had no idea that this menu contains anything to do with input and that many of them could not figure out how to switch from us or a Roman keyboard layout to the Japanese input so it was clear implanted that we had to improve the icons we use throughout the system another problem with this from this menu is its location as you switch between applications with various of their various menu bar lengths the icons will tend to follow around tagging along at the end of the menu bar and that can be distracting to the user and a third problem is that the presence of the menu itself in the menu bar can sometimes interfere with the absentee list and sometimes even clicked portions of it off so it's clear that we had to bring down the amount of real estate that we use in the menu bar so let's switch over to machine number two please and show what things look like on Panthers here is an icon you'll notice is only one of them and are we there yet no machine number two please there we are there's the icon I was speaking about so much more obvious intuitive to users that are there might be access to additional input modes or input sources in there and it's in the right side of the menu so it will not follow along at the end of the apps menu list and cause distraction to the user let's see how this works how can we take what used to be implemented is two menus into one and here's the answer you'll notice that we have a number of input sources here and you wonder what what are those they're not keyboard layouts they're not implemented they are the input mode implemented by a particular input method in this case Apple's Japanese implemented photo/aaron they're all there they all belong to the same input method and the would be referred to in the past of the pencil menu the second menu in the Jaguar menu bar is flattened into this menu right here now it's interesting that these input modes have become full full and first-class input sources the same level as what implemented used to be and keyboard layouts they are the preferred input source that the user should see and in any system UI they will be shown along side by side with those other input sources at that level another system provided you I is a new palette reminiscent of the palettes the input mode palettes provided by the input methods themselves in the past and lastly we can bring up the international press and have a look at the improvements there before we get into that one lecture notes that for those applications they have particularly large menu lists or if you just don't like the input menu around and or you can come and drag you out of the way and that can simply be reinserted by this checkbox in the press panel you'll notice the hierarchical nature of input methods and how they advertise input modes letting the user choose subjects of input modes they would like in the menu and you'll also notice the u.s. layout by the way still has the old flag icon but this is being changed actively we didn't have it ready for this demo it can be inserted in the menu and removed which is not something we could do in the past in the past whether or not you are using input mode specific to a single input method the u.s. layout of the Roman default layout always showed up in a menu and it can now be removed the input method itself can be disabled but the modes that would be enabled if you were to reactivate it also show up you might be wondering about the existing input methods and what compatibility we have with those you'll notice that they are fully supported completely transparent choose that input method and you'll notice that he implemented specific pencil menu shows up here automatically the implement that did not have to change but of course we want to encourage input method developers to adopt this new input mode protocol to give the users the benefit of a single user interface for choosing input modes and that's what I have for the texting food menu and I would like to bring up some yes lo quita will discuss improvements in our Japanese input method thank you [Applause] hello everybody I'm very glad to be here with you because I'm very excited about the great improvements we are making for ponder the one of the big change is is the texting food menu Michael destination and I will tell you about i'll show you katoue for there are three major new features in qatari one of them is very high conversion accuracy we've been continuously improving the container convergent accuracy since my question 10.1 and we believe we achieved a milestone with this view you only we improved the engine itself we apply the new technology called when latent semantic mapping or lsm in order to resolve the the class of ambiguity which no other input method can right now which is to find out a topic of the document you have say consider our word hot if you talking about summer the hot property means about temperature but if you're talking about Thai food for example it's probably about 40 hot this it's like that I'll show you how it worked say the document in the left hand side says talking about the Jazz Fest boy says Monterey Jazz Fest ball is all this just visible in the world which is true the the documents at the right hand side says Boston Martin Boston Marathon is the one of the most all this Martin in the world and the in Japanese both player and runner are pronounced the same OSHA and when they entered in a different sentence like this the traditional input method couldn't resolve those ambiguities but in case of this new in a hotel for it can look at this context and find out the correct conversion for each cases social here in Scioscia here take over copper please please look at this first chapter this means play and at the right hand side this play needs one it converts the word correctly depending on the context of the document the other improvement we are making 403 is you I to correct convergent errors and typical mistyping the first one is the conversion which is high handgun in Japanese when you find incorrectly their cases you find incorrectly converted one in the document and you can place a customer and you can get a conversion candidate window for that word the second one is the rare cases you you type too fast and you confirm the text before you mentor and before you had to type they all the characters and attack everything but now you can get back to to the previous state like you get back to the the active state the last one is renew its often it at least happened to happens to me I start typing in an incorrect mode for example i start typing English in Japanese mode or I start typing Japanese in English mode it can correct those cases and if you have a japanese keyboard all those can be done by double typing the Kentucky say oh say you don't like this the last call version click and dug up that will type the Kentucky oh yeah it gets you the candidate window [Applause] say you confirm the text before you meant to buy tub double typing the collective you can get past the conversion state and the last one is what what's that okay yeah hey you start typing konichiwa and you suddenly noticed that oh this is a row incorrect look you typed Kentucky two times and you can continue typing just stirred thank you the third feature is we put an msme compatible mode for those people who switch from windows and comfortable with using msme compatible keystrokes and also those who are using to environment but can fold and so that they they want to the same keystroke between mark and windows and please know that many of those features require your help many of those features uses three you utilizes the document access protocol the new API so in order to provide a constant user interface for your customers you need to you need to adopt those api debra will mention those in detail now yeah return the tops off to Debra thank you okay so those are great new features and as Kinison mentioned we need your help in order to make them available in all applications there are even more improvements at the API level that I'll go into now one big one that people have been asking us for people who have converted their carbon applications to unicode have found that there's a sticky point there is no there has been no support up till now for formatting or parsing dates times and numbers in Unicode if you wanted to do that in a carbon unicode application you had to use the old script manager api's and then convert the text to unicode well in Panther were introducing a new set of API is in core foundation that lets you format in parts dates times and numbers CF locale CF date formatter and CF number formatter so now you can have a totally Unicode application for both for formatting and parsing and also for sorting we're now supporting many more locales for Unicode and the reason we're able to do that is that we're taking advantage of an open source library called itu or international components for unicode which is now part of panther now we're not yet allowing applications to access this library directly the reason is we want to make sure that we're capable of supporting binary compatibility from released to release but that is something we're looking at so that may become available to your applications in future releases another side benefit of using ICU is that our coalition is three times faster than it was in mac OS and jaguar so sorting applications will get much faster so now let's talk about what you need to do in your applications and we'll go through several different cases if you have a Coco application you're already in pretty good shape if you use the coco tech system there are a couple of special things you need to watch out for some applications some cocoa applications have had their own typesetter classes and they've made those classes subclasses of the applicant class in a simple horizontal typesetter well the problem is that that app kid class doesn't support the advanced layout features it doesn't support bi-directional text and it's basically obsolete so if you subclass that class you'll wind up with your own typesetter class having the same problems in Panther there's now a new public typesetter class that you can sub class is called n shts typesetter and if you use that class your application will have all the same features that are built into the cocoa tech system another thing that you have to watch out for is when you save attributed text a lot of the information that implements the new features that we've been showing things like variant glyphs or font features and font capabilities those are saved saved as attributes on text so if you save attributed text yourself and you enumerate what you think is the complete set of attributes you might lose this information when you save it to a document so it's important when you save attributed text to save all of the attributes so that information that the user enters like a particular variant glyph that they used to write their name doesn't get lost when they save and then reopen the document things are pretty easy for carbon applications to especially if they're using MLT the multilingual text engine or the new h i text view which is based on ml 2 e that makes things pretty easy because all of this stuff is supported by M LTE and you don't have to do very much you use CF string for your Unicode tech storage support the font panel and allow access to advanced font features and you're set it's pretty easy a lot of you for historical or performance or what-have-you reasons I have your own custom text engine that's necessary for your specific application and in those cases things are a little bit harder but it's still possible to support all these features and we'll go through how you need how you can do that in your application you still need to store your texas unicode because many of these new features are only available to unicode applications you can use either CF string the core foundation class for Unicode text or you can just store an array of 16-bit unit cars either way works if you have unico text you need to draw your text using a Unicode text drawing API and for carbon that means Atsui fortunately as i mentioned in Panther Atsui is over 2 times faster so there's really no reason not to use that c4 unico text drawing in your application for input of Unicode text you need to use the text services manager and if you were already supporting japanese or chinese input methods you're probably already using TSM one thing that's new for Panther is that new features like the document content access protocol are only available via carbon events nazia the Apple events that we also supported in the past so if your application is using Apple events to interact with TSM you'll have to move to carbon events in order to take advantage of the latest features and once you support TSM there's basically three categories of interaction that you need to worry about I want to supporting the active area which has always been true for input methods another is the new document content access protocol and we'll talk in more detail about that in a moment and the final one is supporting input and storage of variant glyphs and we'll also talk about that and finally as for the other for the easier approach to carbon applications you want to support the font panel so that users have access to all the capabilities that fonts have to offer so before I go into a little bit more detail I want to give a quick review of what it is about Unicode that makes it a little bit more challenging to implement an application it's quite different from the world script approach that you might be used to the most important concept for Unicode is what's called the character glyph model and it makes a distinction between characters and glyphs you can think of characters as the form of language that's spoken it's the semantic content it's the way you would speak the language glyphs on the other hand are the shapes that show up on the printed page or that you see on a display monitor and you can think of them as the written form of the language now usually there's a very direct correspondence between the spoken form in the written form but that's not always the case it's certainly not the case for complicated writing systems like Arabic or Indic languages but there are even cases in English and Japanese where there is not a direct one-to-one relationship between characters and glyphs and it's the job of a Unicode text rendering engine like Atsui or cocoa texts to Matt between characters and glyphs here are a few examples that show why it says that's a challenging problem the first line is Hindi and in Hindi between the characters and the glyphs things move around and in fact some of the things that are independent characters when they're rendered is glyphs wind up as decorations on other glyphs so there's both rearrangements and formation of clusters and ligatures the second line is Arabic and as we all know Arabic is a right-to-left language and so the characters and the glyphs are in opposite orders but beyond that Arabic is also a cursive writing system and so the glyphs flow together to form ligatures and you can't really map directly between characters and glyphs there's ordering and ligature issues that you have to deal with but even for Roman here's an example where we have the word resume and the E with an acute accent is stored in character space as an e with a combining acute accent and when that's drawn that has to become an accented e so there's an example of where in a Roman language there isn't a straightforward mapping between characters and glyphs so what are some of the problems that you can run into in an application if you don't keep the character glyph model in mind well one thing that's particular to unicode we all think of Unicode as a 16-bit character set whoops okay I didn't press the bad button so hmm there we go okay we think of Unicode as a 16-bit character set but I mentioned that earlier that there is over 96,000 characters in the latest version and a little arithmetic shows that you can't fit that in 16-bit so what we think of as the 16-bit version of Unicode is called plain 0 or the basic multilingual plane and that's where all the commonly used characters go but Unicode also supports a lot of rare and less commonly used characters and those are allocated in claims 1 through 16 and in order to represent those characters in your text you need to use two 16-bit values that's called a surrogate pair and there's an example that's from our hitachino font it looks like like any other idea graphic character but it's stored as two 16-bit values because it comes from plane to of Unicode so that's one issue you have to worry about as we saw in the previous slide you can have composing sequences where multiple characters in the Unicode sense form a single care what the user thinks of as a single character so the base character II with a combining acute accent is one example there's lots of other combining marks like that there's clusters and index there are ligatures in Arabic and in English for Korean there are Jomo's that come together to form Hangul and so forth and so on so there's really not a direct one-to-one relationship between characters and glyphs in addition unicode also has multiple ways of doing the same thing so in the last slide we saw the e with combining acute accent but unicode also has a single character that's an E with an acute accent and there that's mostly for historical reasons and for compatibility with earlier character set standards and there are a lot of cases like that so there are often multiple you can think of them as spellings for the same string of text it can be represented in Unicode in multiple ways so here's one example on the left I have Korean Hangul and on the right i have the 3g ahmo's that make up that Hangul they're both equally valid ways of representing the text so of course that makes things like comparison and searching a little bit more challenging and finally for more complicated writing systems you have issues of directionality languages like Arabic and Hebrew go right to left you can have them in the same paragraph with texticles left-to-right languages the whole index family of languages has rearrangement where characters move around come when you write them compared to when you speak them and so that the glitz and the characters are really you can't count on them being in the same order at all and that doesn't just affect the order of glyphs within a style run it also affects the order of style runs within a paragraph so if you have a paragraph of mixed English and Arabic or English and Hebrew text whole style runs can move around and you really need the system's help to figure out where everything belongs so fortunately so that you can avoid these problems in your application we have lots of api's in the system that you can use to make sure you do the right thing in terms of figuring out where characters begin and end there are lots of system api's for finding text boundaries not check just characters but also clusters words lines and paragraphs there are api's and i'm not going to go into great detail on this all of the documentation for this is available online but there's api's in cocoa for finding character and cluster boundaries in carbon for finding boundaries of all sorts and if the reason you're looking for a character boundary is in order to truncate text you don't even have to do that yourself you can actually ask actually to truncate your text for you you just pass it an option tell it how wide you want the text to be and it will find a linguistically correct place to truncate the text and add a truncation character because of the problems with multiple spellings that I talked about before there are system API that can help you with that that will comparison or searching of text as i mentioned due to directional issues text can move around in within a paragraph and so you need to when you're drawing you need to deal with an entire paragraph at a time and there are api's in cocoa and carbon that will help you do that for cocoa you can use the text system directly or use attributed strings and typesetters for carbon of course there's Atsui and as long as you let the system know about an entire paragraph it will figure out where everything belongs and then you can figure out where the line breaks are and draw the lines individually of course because there isn't one to one mapping between characters and glyphs that's also an issue for moving the cursor with the arrow keys or clicking with the mouse or highlighting text and there are api's that can help you do that one issue that every unicode application has to deal with unless it's brand-new is how to handle legacy data that's not in Unicode now we've had api's in the system for a long long time to convert between unicode and other character sets so i'm not going to go into that one issue though is to figure out what character said should i use what character set should i assume the texts in well if the character set is marked in the document somehow then you're set you know what the character set is but very often you're dealing with plain text or other text that doesn't have any information on what the old character set was that it's encoded in so then you have to guess and there are a couple of api's that can help you do that if you think it's going to match the language that your application is running in then you can call get application text encoding that will return an encoding that usually matches the language that's been selected the localization that's been selected for your application it might be more appropriate to pick an encoding that's associated with the users most preferred line which because maybe your application doesn't support that language but the users data is quite likely to be an encoding that's associated with it and CF string get system encoding will return in encoding that usually matches that language now why do I say usually well the reason is that there are languages that Mac os10 knows about that were never supported in world script that were never supported on OS 9 and they don't have legacy and coatings associated with them some of them like the enemies do have a world script encoding but that doesn't mean that you can draw the data with quick-draw text it's just something that you can convert using an encoding conversion API other languages like Hawaiian have known on Unicode encoding associated with them at all so if your application is running in Hawaiian or the users most preferred language is Hawaiian you're not going to get a sensible answer from these two api's if you're reading an Internet application then you shouldn't be using Mac OS and coatings at all you should be using the standard encodings that are defined by internet standards bodies and you can go to I ETFs and I A&A websites to find out about those and there are api's that will help you convert those names into a text encoding that you can use internally I'll talk a little bit more about the new AP is for formatting and parsing dates times and numbers if it's in core foundation you can either get the current locale or you can get a locale from a standard iso locale string which has a language code followed by a country code you can also take information from the world script world like language region or script and convert that to an iso string which you can then use to get a locale the new classes in core foundation as i mentioned support both formatting and parsing their support for currencies and you can go back and forth between internal representations including core foundation types but also standard c types and a formatted CF string and there's also lots of customization options you can take advantage of and for more information you can look at the seed release that you all received and the last topic I'd like to cover is to the less couple of topics I'd like to cover our TSM and variant glyph access so for those of you who supported the text service manage text services manager in the past the thing that's different for unicode support is that you need to create a TSM document of type of you doc to take advantage of the latest features you need to move to carbon events instead of Apple events but as I'm sure you've been hearing elsewhere at the conference there's lots of good reasons to move your application to carbon events supporting the input method active area is something that's been around for a long time but if your application is not a Unicode application yet you'll also need to move to supporting Unicode input and again new and Panther is the protocol for accessing the entire contents of your document and that's critical to provide some of the user interface and conversion accuracy features you saw demonstrated earlier cotati can analyze the contents of your documents to give great conversion results unless you can what the content of your document is so I'm going to go through the some of this rather quickly because we don't really have time to dive into it in detail for unicode input there's a single carbon event that has unicode text it can also have glyph Arians information and we'll talk about that in a little bit the input method active area support protocol is pretty much the same as it's always been there's just a few carbon events you have to handle and there's there's nothing new here the big new thing is the document access protocol and I don't have time to go into this in great detail you might think from this long list of carbon events that it's pretty complicated but it's not the model is really simple the way this works is that it makes your document look like a CF string to the input method so the carbon events that you respond to are just the same things that CF string supports it's really a very straightforward model so if you implement support for these carbon events input methods can access the contents of your document and you get the improved conversion accuracy and new UI features like easy reconversion I'll talk a little bit now about variant glyph access this is optional information that comes with a Unicode text input event you get an array of glyph information records and each each record has this information in it first of all there's a range of text and that can be more than one 16-bit unit car and the reason for that is it could be a variant version of a surrogate pair so it could be more than one eunuch are for that reason or it could be a variant version of something like electric ligature for example the Zap female font that comes with mac OS 10 has different versions of the FI ligature and to allow the user to pick which one they want they can do that via the very influenced access protocol in that case the range of text would be the F in the I so it can be more than one character you also have to specify the font that the variant is coming out of there's two ways to identify which particular glyph you want one is the a font specific glyph ID and that's used for example with true type fonts but it can also be a glyph ID from a published glyph collection like Adobe Japan 15 and the record will identify which which of those two approaches is being used you don't have to worry about that too much because actually provides a style tag and all you have to do is take the information out of the carbon event stuff it in this style tag and give it to Atsui and this is all covered in techno 2079 I want to talk about this very much at all this will be covered in the in a session that's coming up right after this one across the hall up what you need to know about fonts and Mac OS 10 this is how to support advanced font attributes via the new font panel in Panther there's already a carbon event for font selection via the font panel and we've just added more information to it there's now a complete dictionary with all the information that's specified in the font panel and all you need to do is extract the data from that and just pass it to a fooi you don't have to worry about what it means you just basically have to funnel it through your application and for more details on that you can go to the font session which is a session 406 and coming up right after this one across the hall ok I'd like to bring key to sign back up on stage one more time to talk about our chinese input methods and the character palette kita son ok then we get rid of those windows let me add one more thing here hello again I'll show you a few more features before wrap-up the first one is simplified chinese i know traditional change on system 10 point two point four we added hun in traditional chinese input method which is very popular in input method online and we were providing this input method only for localized system no I love quest nine and we are offering to everybody on Mac OS 10 this is a word-based pinyin input method pinion and both offer input method and it's much much easier to use I'll show you how is when I can't oh jeje y star lie no lie lie you me yep okay it says welcome to the to the WBC in San Francisco one difference one in da da da ly q gene shall be nice oh that's right the next one for traditional Chinese is we added support for HKS CS and big 5e those are additional charges set on top of what we have today this is a clip clip from a newspaper in Hong Kong and chapters mass in red I don't have a pointer is it not those chapters marking read our were missing in previous standard which is big five and you might be surprised how many characters are missing and actually existing just this thing this is the name of the new airport in hong kong hong kong international airport and these two at the bottom asked name of names of subway station in hong kong you couldn't even write your airport name or substance of have helped way station name without this extension and if you or your application don't support unicode you can't display these gases it's quite embarrassing next one is our simplified chinese input method we extended the insert method a DC input method so that it covers all the chapters in gb 180 3180 or 30 let me peek simplify and let me peek the mode and by the way you don't need to do this if the simplified chinese input method our advice and support a flattened mode long 12 q yeah this is one of the character which only in gb 80 or 30 and the other example in a smooth smooth five this is too okay the last one is not check the palette we introduced the character for at first in jaguar and we found out many customers loves it and we also got many feedbacks one of the feedback we got is somewhere you want to enter chapter exactly what you see on the screen and in chapter polish your character palette honors default setting in a vacation usually you get different funds between chapter palette and application but we got a feedback you that you want to exactly the same chapter between the application and sacrifice so here's the chapter palette which looks like one we have in java and you have this little disclosure triangle here as font valuation if you open it here's your list that is the selected character in all fonts in the system so you can browse this character a using all fonts in the system and pick one you like and if the character you've collected in this list happens have happen to have variants glyphs it listos variants in this variant feel so for example that this is a portion 3 and you want a different day for this like say here a long head and here you have insured with font at the bottom right and if you press this button oh you insert the character into the document let me try different one say this one and you insert the different clip let me try to find one you can drag the character to this area to go to that chapter and see say say I want a genome into politics and here you are you have different nabi and say your name is say the one which have our did that they too dot up here you can insert and you not have to dot [Applause] also you can truck a doctor bear chapter and find which one has disc actor like this so now I bring the roblox to the stage for the wrap-up Thank you Thank You kita Thun and like to emphasize again how important it is to support Unicode and the document content access protocol and your applications so that your users have access to all these great features so I didn't have a prop budget for this talk so I don't have a coffin to roll out on stage but world script is dead quick trow text is dead they can't begin to cover some of the requirements that we're seeing in the Japanese and Chinese markets today we're spending all of our efforts all of our focus is on unicode we're not spending any time on world script we're not spending any time on making enhancements to quick draw text so unicode is it unicode will give your application great competitive advantages in the japanese and chinese markets so you really should focus on adding that and if you do that as a side benefit you get the rest of the world besides which is not a small thing thank you everyone so I'd like to rock wrap up now here's a couple of other sessions or more than a couple that you might be interested in immediately after this session is what you need to know about fonts in Mac OS 10 the whole name didn't fit on the slide you can find out about the font panel and the typography panel and lots of other useful information about using fonts on Mac OS 10 and that's in the emission room starting at three-thirty on friday at five o'clock in the presidio room there's the cocoa tech session you can find out about new features for Panther and all the other great things that are going on in the world of cocoa text unfortunately at the exact same time also at friday at 5pm in Nob Hill is a session on our new in KPIs which if you're interested in enhancing your application support for handwriting you can find out about how to use these new API to do that and finally if you want to let us know what's bugging you or what you think is going great our international technologies feedback forum is friday at ten-thirty in the north beach room and we'd love to have you come and give us feedback on what we could do better and what we're doing right so if you have further questions the first person you should be talking to is da da and his email address is easy to remember it's da ba at apple com if you have any questions when you're done talking to Zach yeah you can also contact me and my email address is Goldsmith without the h @ apple.com you don't need to scribble up a lot of stuff down because the URL that you see at the bottom of the screen developer apple com WWDC 2003 URLs HTML will have all the insert all the contact information and all the URLs from all the talks at WWDC here are some places you can go for more information there's our documentation library of course we also have a nice summary page for international texts technologies that's developer.apple.com / int l if you want to develop ponce we have a fun developer web page that's developer.apple.com / fonts there's of course references for the app kit and for Atsui unicode utilities which is used for text finding text boundaries and comparison and searching there's a specialized set of topics on cocoa text handling there's documentation on CF string here's a couple of using a hand full of youthful tech notes and sample code techno 205 fixes on how to do your own keyboard layouts 2079 the variants lift access protocol it has much more detail than I was able to go into there's a sample app for Atsui and how to draw unico text and how to do your own input method and some pointers outside of Apple the Unicode consortium has a website for more information about Unicode that's the best place to start there's a new version of the Unicode book coming out for the new four-point-oh version of the standard and much more readable than the standard itself is unicode demystified by richard Gilliam I highly recommend that there's an introduction to Unicode if you want to learn more about it and finally the open source international components for unicode library has its own website that's hosted by IBM