WWDC2001 Session 502
Transcript
Kind: captions Language: en today's session is going to be about describing at a high level what we have to offer and three new Apple Java frameworks Java spelling framework Java speech synthesis framework and Java speech recognition to framework these API are free and they will be available this week surely by Friday they're in the pipeline and we just need to put them on the web from an apple developers website and you can download them and use them with Java and your favorite IDE jbuilder that you got in the bag metro works project builder what have you and have some fun with them we're not going to go into a lot of detail about how speech synthesis and speech recognition work or about all their capabilities but by that I mean the underlying technologies built into OS 10 that have been around for a while in different versions of OS 9 OS 8 etc but you can find more information about those things on our Apple developers website and we're assuming that you all are familiar with Java now one thing we're going to talk about is Java beans one of the things we've designed into these api's is to allow them to be used in visual builders the premiere example is jbuilder and before we go on I just wanted to review exactly what a Java Bean is so it's basically at a high level method that that was designed by Sun and its partners to be able to share discrete components among third party developers and tool vendors so that each tool could interoperate and use visually these discrete components so technically a Java Bean is basically a class that has certain method signatures or another class called a bean info that is its pair that describes properties it can also generate events so for example the classic bean would be a button and you could get an action event on that button for example and the job of being specification Alta also defines methods to provide editors and customer that is something richer then you'd get by default in a visual builder such as just editing a string or cranking up a number up and down switching true to false etc and Java beans of course are part of the of the Mack exchange j2se implementation part of all j2 implementations so the first framework will discuss is the java speech framework and the job of speech framework is basically layered upon the synthesis and recognition technologies building to us - and as I said before these are from carbon they exist on mac OS 9.1 and carbon as well the native versions and of course we're running on OS 10 here today so what are some of the differences between what we're talking about today and the underlying technologies well the java speech API object based whereas the carbon API our C API native API the Java frameworks provide programmatic as well as visual programming for the frameworks the carbon API are only programmatic java speech API are a higher level the the carbon C API provides a very very low level every single possible thing you'd ever want to do with speech synthesis or speech recognition the java speech API are only available on Mac OS 10 the native API are available on carbon 7.5 and beyond so speech synthesis basically it's about converting text to speech and we provide in our Java frameworks about 99% coverage of the functionality you'll find in the carbon API its architectural II neutral now what we mean by that is speech synthesis provides the ability to get callbacks instead of just saying speak this text you may want to know when it's done speaking a particular word a sentence when it's done speaking a part of a word you can even stick other cues into the string that it's speaking and we'll go into that more later and get callbacks from the speech technology now those callbacks won't certainly happen on the UI thread of whatever particular UI toolkit you're using swing and I Bhatia IFC cocoa etc so what we've done is we've put in a thread task handler and a default implementation for swing to provide that synchronization for you by default and an abstraction in the task handler interface that allows you to add your own synchronization between the speech thread callback thread and the UI thread and we provide javabeans and customized property editors for tight integration with the job IDs so the main classes there are others but the main two classes of speech synthesis or the sense of synthesizer itself and it basically allows you to speak the text and attach event listeners to the different events it generates the voice class which embodies all the characteristics of a voice now Mac os10 as Mac OS 9 and in previous Mac OS s have has built-in voices about a half dozen different voices of different characteristics one sounds like a robot one sounds like a young girl one sounds like a older man etc etc and these all these voices have names and the voice class allows you to pick one of these voices by name adjust its properties properties such as the rate with which the voice speaks its pitch its volume etc and we have three basic events we generate from the speech synthesizer the first event is the word event and you can listen for that event whenever the synthesizer concludes speaking a particular word you can also listen for the sync event now the sync event is something that you can embed a cue for in the text you're asking the synthesizer to speak and I'll demonstrate that later in a photo event which is used to animate for example the face let's say you wanted a computer-generated face with a mouth to really move with the sound of the word you would use these callbacks and again you can find out detailed information about that call those events in the speech synthesis documentation for carbon on the Apple Developer website so you also have other events since you basically say here synthesizer speak this text and then your thread continues on you're not going to be able to listen for an exception right there that may be generated down the road so the synthesizer itself is a problem say you embedded incorrectly some sync event or some other embedded command and we'll go into embedded commands later you can listen for error events you can also listen for exactly when the synthesizer begins speaking when it finishes speaking whether that is someone stopped it some other API just called stop stop talking or it concluded itself it read all the text and that would be the done event so let's do a speech synthesis demo and this demo is packaged in the speech synthesis SDK and let's launch it here this is just a normal Java app wrapped with mrj app builder and it's built this is all all Java with a swing aqua look and feel and it's built to not only show not only be a little fun but be educational because I'm assuming that Java programmers that want to do speech synthesis don't necessarily know all the capabilities of the synthesis system underneath on Mac os10 so it's fun to play with we'll go through and we'll just ask it to speak something we basically have a voice here agnus and we can see her age 35 about female etc the rate that she speaks that the pitch base and pitch modulation the volume that she speaks at and we'll just ask her to speak isn't a nice to have a computer will talk to you and maybe we want her to speak a little faster it'd be nice to have a computer that'll talk to you so you can play around with these and see the different voices we'll go through what's it what's some more interesting ones do that act on all these strings by the way are into the system they for the particular voices you choose there particularly they particularly exemplify the type of voice that you have selected and we'll look at one more let's look at some really aliens that looks like a peaceful planet so different voices it's pretty interesting and this again is is is just the Java framework now another thing that you can do with speech synthesis is embed commands so let's say for example we have many commands that you can embed you and embedded command basically is just a bracketed keyword in the text that's how you send speech synthesis and Betty commands that's not part of the Java speech framework that is actually part of the underlying speech synthesis technology on OS 10 and you can do many things you can change the emphasis of a particular word if you really want to hit it heavily as the computer speaks it you can change a number mode for example and this is interesting so let's let's go play with this for a minute speech synthesis is it is pretty intelligent it tries to guess how it should pronounce numbers but sometimes it it you need it to pronounce it single digit by digit other times you need it to read it as if it was a whole number say 502 instead of 502 but let's type something in let's say please call me at 5 5 5 - 1 2 3 4 at extension 1 2 3 4 5 6 7 8 5 6 I mean okay so first we'll listen to the computer speak to snooze call me at 5 5 5 1 2 3 4 at extension 5678 so there was a little bit of interesting hiccup in the highlighting there but it pronounced the first number correctly it recognized as a phone number so it read it individually but an extension number you wouldn't normally tell someone to call you at 5,000 seventy-eight you'd say extension five six seven eight so will embed here the number command to ask it to speak this literally and then once we're done at the other end of that at the number so returns to a normal number mode will insert the normal command again will ask it please call me at five five five one two three four at extension five six seven eight so you can do things like that so it's a little more intelligent about what it's pronouncing another thing that you can do is among all these embedded commands is ask for a sync event as I mentioned a few slides earlier let's say you're using speech synthesis to do a presentation or something and after a certain sentence or a certain word certain paragraph you want to present a graphic on the screen and then you want it to continue on with what it's doing or you want to sync in some other way a UI with with the speech so let's type in a sentence such as you can buy product a for five dollars or product B for ten ten dollars and then let's enter a sync command and the way in our sync command this is sort of legacy OS 7.5 or so OS nine ish of how they had many four-letter keywords for different technologies in the OS so in this case since the roots of this are still in carbon and in earlier OS is even though on OS 10 now we basically need to add a four-letter keyword to our sync events so that when we get a call back we'll know exactly which callback that was for for the sync event so first all embed this here and then and by the way this is this is just a convenience this whole mechanism I'm using to embed these commands this is just text so I can actually just cut and paste here yes and I'll change this to BCC or something and pay attention to the embedded command feedback area here at the bottom of the window we return by blood-like to four or five dollars for Block B for ten dollars so that's a method that you could use to sync something with the speech synthesis and before we go on let me just replay that and pay attention to the phone um feedback down here phone on key codes error codes are actually in the speech synthesis documentation for carbon on the Apple developers website but you'll see it roll through all the different sounds as we speak you attend by a blood like two four or five dollars or blood for ten dollars and so again you could use that to animate properly the shape of an the mouth of an animated head for example so that is speech synthesis that's about all the capabilities that are in speech synthesis of course there are variations not shown in this demo for example you can pause the speech you can choose to stop the speech at a word break at a sentence break immediately I won't have you insane for pausing but let's continue on now to java speech recognition and this is the nightmare demo because we've already tried it in here and i feared this before i even got in this room and the acoustics even with none of you in here really plays havoc with this microphone but we'll give it a shot anyways but first let's just talk about java speech recognition so a java speech recognition it recognizes spoken language contained in a language model and we'll talk a little bit more about a language model later but that could be something for example like call a meeting or a phone home that's a simple language model itself we provide about 70% coverage for speech recognition when compared to the carbon speech API again when architecture neutral as I said before with speech synthesis so that you can coordinate the recognition callbacks with your UI callbacks as convenience and again we provide javabeans and customized property editor for tight integration with visual IDs such as Jay builder or Mitchell works the two main classes force java speech recognition are the recognizer and the recognizer does about what you'd expect it allows you to start recognition stop recognition set a particular language model i--listen add your listeners for events etc the second important class is the language model there are two different ways to use the language model class one is just hey here's an array of different sentences I want you to listen for tell me when you get one of them and tell me which one you got with which one you recognized another one is more complex and it really is something that if we have time I'm going to demonstrate a bit but it's still in really an alpha stage for for a couple different reasons that'll probably be clearer when when we give that demo the events that the recognizer can generate are unrecognized hey you know somebody said something but I don't really know what they said I it didn't match up with anything in my language model that you gave me and you would use that to do something like ask the user to speak that phrase again another event that you can listen force the detected event that just means someone has started to speak but the but the recognizer has not yet completed recognizing what they said they are not done saying what they're saying and they the other event is the done event this means that the recognizer actually recognized what they said and it can tell you exactly what that was so language model and editor this is an example of a language model the top model top l limb is the top language model and it has two possibilities call person or schedule meeting now schedule meeting is literal listen for the word schedule meeting call person actually is a concatenation of two other models the call model third line down and the person model Cole model can be Coughlin or dial the person model could be literally Arlo Brent Matt or my wife so in this language model and then also you have the top language model with person or view today's schedule variations of the other two so in this language model you could possibly receive dial Brent which would be the first part of the first language model or schedule meeting or view today's scheduled or scheduled meeting with Brent for example so let's go to the speech recognition demo keep our fingers crossed and this is the speech recognition mic here I'm going to start up speech recognizer and this is a very simple demo is actually something I used to control my web browser but right now I just have it as a simple list that's going to pick up what I'm saying so let's wait for it to start okay and if we can cut this Mike for a second so in the acoustics of this room some of the softer sounds I found will hard to pick up between this microphone and the fans and all the computers and in the different space in the room but there we see the speech recognition working and by the way we're going to go into sample code and and actually write some code take a look at some other code after we're done with these demos let me quit this here and can we have the main presentation board back thank you so the last framework that I want to take a look at is the job of spelling framework the java spelling framework is a set of java api that allows you to access the underlying cocoa spelling service and it's built on the cocoa spelling services which in turn are built on top of mac OS 10 it provides three levels of usage three styles or levels one is just a simple set of API you can call to ask it is this particular word spelt correctly give me a list of suggested Corrections modify the system dictionary that is I have a new word it shouldn't be in the dictionary it spelled correctly it just isn't add it to the dictionary and that would be system-wide so then another cocoa apps such as TextEdit would then pick up that word in its spell-checking session another way to integrate is complete integration with the swing text components now I don't know if you're aware but in swing all text components derived from J Tex component in J text component has a classic model-view-controller paradigm design and that makes it very easy for us to tie in to all text widgets than in swing whether it's a one line on stylized text multi-line on stylized text or a fully stylized multi-line text and it also has an abstraction for integrating with non swing text components now its main classes are the spelling checker itself and this just basically has the the simple static API that you would call if you wanted to just roll all this yourself another important class is the misspelled word class you'll receive that in callbacks from the spelling panel which we'll go into later that will say what word was misspelled where it was in a group of text how long it is etc so that you can use that to mark up your text etc if you're not using the full-featured second mode of operation we described earlier which is full integration with Java text components on a word class which is a parent class of misspelled it just indicates a correctly spelled word so this is a snapshot of the classic spelling session these are all swinging aqua widgets and as we can see we have a classic spelling panel on top of our demo and I'll show that in a second and through this panel you can receive a bunch of different events but let's see how we bring that panel up so in the case of the second usage just go to that session on a standard Java text component it takes these two lines of code and your text component has to be nothing special so you just instantiate the j' text component driver and you ask it to check the spelling of the text widget and you can still if we go back you can still edit the underlying text component if you wish and you can just close the spelling panel and manage all of your dictionaries etc this spelling panel that's on top now will appear as soon as we call driver dot spelling check spelling so another manner of checking the spelling is with the inline pop-up in real time spell checking in real time spell checking you will get cues for misspelled words which are red dashed underlines just as you would in any Cocoa application and by right by control clicking on the misspelled words you can get a list of suggested suggested Corrections and you can add and delete things from the system dictionary the Java spelling framework UI API which is to do with this window on top allows you to integrate the spell checking panel with some other text component there is since swing didn't come along for a while in javis life many different text widgets out there and so you can still use for the most part the Java spelling framework as is with with the spelling panel with your own custom text widget as long as you do a few things now one is implementing the J text component driver a new implementation of driver J text component drivers that is the swing default swing implementation and you would need it implements the interface below driver and you would need to implement your own implementation of the driver interface and then once you did so you would tie in with the spelling panel the class above and you would be able to have this spelling panel tied in with your text component of course you would have to handle the highlighting of the misspelled word you would just get cues and the cues you would get are events generated by the spelling panel one of the cues would be an event called ignore the user wants to ignore the currently suggested misspelled word find next these just basically all apply to the buttons in that spelling panel correct the user wants to correct the misspelling the correct event would have the choice of the new correct spelling and it would be your job if you were implementing this driver yourself to replace that text another event is found and it would be your job to highlight a misspelled word when you receive that event so let's take a look at this spell checking demo and we'll go through three methods of doing this so the first one is the standard spell checking session and as you can see we have a list of possible suggested corrections to the misspelled word and it's picked the top one the top one is always the the most likely to be correct it's not always though and we'll see that so in this case I will choose that word but for the third word I really want a fourth word I really want one not the first and I'll correct that then and this is just your typical spelling session but the thing to remember is that this is all Java these are all Java widgets this is all swing accessing the underlying cocoa spelling service will correct that will correct that etc and since this this particular integration is that first two line call using full swing integration with a spelling service when we have a word down below that needs to be highlighted the the text panel scrolled for us etc we'll replace that word and we'll replace the last word so that is the first manner with which you can check spelling and let's get back our eye misspellings I start over the second manner is just to simply check them all so it checks them all and again puts a red underscore misspelling cue under all the misspelled words and we can right click as we saw on the presentation is selected the correct word we'll just we'll just select a few here so you can do that as well let's get back a fresh copy and go to the real time spell checker and while that's coming up all of these calls are being made to check the spelling when it eventually has a word then it wants to find out is this spelled correctly it's taken from the world of Java in this particular process and it's sent via do I don't know if you know anything about do do is part of is a distributed objects mechanism in Cocoa so it's packaged up in Cocoa sent over the wire to another running on Mac os10 the spell server that spell server looks at the word finds it in the dictionary or doesn't and since back response whether it's spelled correctly or not so all that happens pretty quickly and to give an example let's turn on the real-time spell checker and let's just type garbage at first let's just type blah you know who knows what this is but as you can see as I type it is checking every single correction I make to the document and all that it's just happening extremely rapidly now let's go back and let's actually do something look at our classic sentence again so since the real-time spell checker is still on we can delete make our correction manually we know that this is s for example should be is but we can also then do as we did for the check all option which is to ask it to give us a list of suggested corrections and we can correct it in this manner so that is a demo of Java spell checking so where to get the code you'd go to Apple's Java developer website it's sometime this week I'd look by Friday and I'll definitely be there but well on the subject to code let's let's see how he it easy it is just to roll some of these things our own and let's see a bit about what I was talking about when I said Java beans so let's launch jbuilder and let's close this nice little app again okay oh go back okay let's start this guy up again since the demigods don't like me I think I'll be again to save incrementally okay okay let's save all let's go back to the visual designer and let's add our speech since this being again okay and let's set the layout of this this will be the world's most horrible GUI do null meaning XY let's go back to swing buttons okay that's not a button to start speech wrecking our speech synthesis another one to stop it and let's add a swing scroll panel and inside of that let's add text widget and let's put some default text in it okay and save again and let's go and you know we won't even bother naming this buttons let's kill and asked the synthesizer to begin speaking so let's say this I love this feature of jbuilder it's just totally awesome oh this is codeinsight so especially during these demos is it really great jacent the sizes should be down there somewhere did it name it just synthesizer I think it did okay so the size R 1 dot speak text and we'll get that text from this dot J text area 1 dot get next I can't get any easier than that well you can write it itself no I'd be out of a job okay let's don't add that feature Blake please so let's go back and add stop stop stop speech okay and okay let's run it so obviously this is about the simplest form of synthesizer but you could also add callbacks to it but we'll just take a look at those events on it so we can speak it this one's neat let's add watch the stuff and hope that stop works huh because otherwise it's gonna repeat this okay and ask it to speak this was neat okay that's good this was neat this was neat okay that's good it worked not actually near an orphan thank you so so that is speech synthesis so while we're in this demo let's add one more button to the beauty of jbuilder and let's add our spell checking button spell checking beam one more time this J this J text yeah okay compile it again I'll fast it compiles and let's type something more interesting in let's see try to think of something new okay that's right and let's bring up the spell checking panel that one line of text that we added and let's see if it has a guest for me here and yes it does okay so we'll do that and correct that spelling doesn't have one for there so I'll just change it myself there it does have one for that for the heck of it let's just say ignore that one and let's also say that for whatever reason we want that word to be declared as spelled correctly let's learn it and continue on and and then just kill our session here and then we could just continue on or whatever now let's open up let's quit that so there are two examples of easily incorporating the speech synthesis and spell checking framework into your application if we look again we need to look at the top this application because J builder did some things for us it instantiated the jate X component driver when we added it the to the application visually it also do the same thing for the synthesizer class as well and then when we look it down at our code below it just took one line of code to ask the synthesizer to speak the text another line of code to ask you to stop it one line of code to ask it to check the spelling and and protecting a spelling they're also checking real-time and check all API as well so any way you slice it you have a couple lines of code that only would work on Mac OS 10 but then again since there are only a couple of lines of code and because of the types of frameworks they are for example spell checking you might want to consider integrating them into your application java applications that you want to run on all platforms the way there are several ways you could do this but the benefit is that it would basically run the same on all the platforms it's just that when you got to Mac OS 10 and users users weigh in it there they'd also have for example spell checking so you might use since it's only a couple lines of code for spell checking reflection to call this you might also wrap it in another API build it on Mac OS 10 or building on a platform that has these same API stubbed out so you could compile and then run it on all of your systems and and simply win it when it ran on Mac OS 10 it would have the spell checking capabilities built into it and same for basic speech synthesis although you know that may be more involved in your application when you do speech synthesis so those are the first two easy frameworks to demo now the bleeding-edge guy is speech recognition so let's actually go and open up a new file open up a file that was put on our system when we installed the speech framework okay let's see here dudududu where am i pilot was that okay Wooper and research and java speech by mark samples speech okay let's look at example one this is the sample code to the speech recognition sample we saw and for the most part what it's doing is instantiating the recognizer and creating an stanching and language model setting the language mode on the recognizer and in this case using the simplest manner of setting the data in the language model that is just a array of strings so and then it asks the recognizer just start now the code we don't see here is that there's a utility class built into the java speech recognition framework called the feedback panel and that was that set of strings i'm not going to go through the whole thing actually we don't need to go that machine I can look I can launch it up here that was the set of strings that you saw like the the list of strings in the small sample and we'll just bring it up just to associate again so everything in this panel is the feedback panel and you can use that as well you might want to cue your user into what they can speak and we've just wrapped that in a Java frame and then we have told the the feedback frame what recognizer it should attach to and it's gone and attached to all the listeners that especially the done event listener so that can highlight the correctly spelled correctly a spoken phrase now the other thing that I wanted to show you that is the real bleeding edge stuff let me close this file let's go back to G butter example one let's go back to the designer and let's go and just add a recognition being to this a little sample application and take a look at the rough early version of the language model customizer so the thing that actually makes this rough is that drag-and-drop probably if you're familiar with Java at all you know is still in a rough state of affairs not only do we have some books on our site but there are just some bugs and the shared code that all the VM implementers share with Sun and I'd basically discovered those specific bugs while doing this editor and we need to get those changes fed back to Sun get them into the pipeline so everything everyone can benefit from them but I will try to struggle my way through this to give you a bit of an example of how it works so I'm going to try to recreate that first language model we had on the slides when we had a call person for example scheduled meeting with person so let's say for first of all that we just wanted to say scheduled meeting with person so we could say we could add a new language model down here oops I need to add anyone first called person and we could say this is let's see we could say that this is Dan Mickey Vincent mind not in the room and then we could say we could drag this person now this is part of this feedback windows one of the one of the bugs of that I was referencing before so we could type a peer call person and then I should be able to say call Dan or call Nicky now the actual we had a bit of a difficulty with setting up the mic so the development development environment for this sample is on this machine I'm sitting on here and the mic actually has attached to the other machine so I can't actually show you this working but trust me it works dragging an absolutely problem but you can go into a test mode and build your language model you could speak it etc go back to edit mode to edit other things once you're done you can save it as a file somewhere and it'll bring up the OS 10 a save dialog box just save it somewhere and say ok you're done and then if we go back and we look at the source J builder because J bloated here's to even the nitty-gritty of the bean specification it's gone in and used the java speech recognition recognizer being language model being info and being to allow us our framework to add source code to the file that you're editing that's that's pretty neat in this case the source code we added was and allow me to just just add this to two lines so you can see it better the source code that we added was new language model set data file for the particular file that now contains our language model and then we could listen for those same phrases called and Cole Nikki out of that complex language model more will be added to that so that you can have easier ways to recognize a more complex model but that's the state of the situation with speech recognition at the moment so speech recognition is provided as one of those frameworks made freely available on our website as of this week the way that I would recommend you use it for the time being is with the simple language model that we saw below which is just setting a simple set of phrases listening for the recognition event the done event and then getting the sentence out of that event to see what the user has spoken so that's a bit of code and taking a look at how easy it is to use the frameworks of course there's a lot more in the frameworks than we've shown today there are all sorts of interesting things and callbacks and synthesis and recognition that you can get at we've actually covered most of the spelling framework as is so again you can get the code from the developer website and send me email if you have any questions you