The iPhone Gets Nuanced

iPhone 5 KeyboardI got the iPhone 5 because it was free. The loathsome ATT charged me $300 and I sold the old iPhone 4 on eBay for that amount. I hate ATT, but their 4G LTE is really fast in the Bay Area and factory unlocking a phone under contract to enable overseas SIMs and free tethering is trivially easy. The new phone is like the old one but bigger, faster and thinner — reinforcing my view that Apple post Steve is an incremental innovator, not a disruptive one. Most of the changes are the result of a new operating system, not new hardware. But one feature is blowing me away — totally changing how I use my phone. The new feature is keyboard dictation, which appears on all iOS 6 keyboards, whether you have the new iPhone or not. By dictation, I emphatically do not mean Siri. Siri is a dog that performs a few well-chosen show tricks and inspired at least one hysterical advertising spoof. Siri is very useful for directions, reminders, OpenTable reservations, and a good laugh. Siri entertains — but dictation delights. Dictation has been around for a decade and on iOS since the 4S and third generation iPad, but it was always more trouble than it was worth. But suddenly, dictation not only works, it works shockingly well. For text messages, emails, tweets, and even first drafts of longer documents it is massively faster to dictate than to type (unfortunately I still need to type blog posts the old way. Maybe that explains the 60 day hiatus…).  I have a hard time understanding why Apple is not using its ad dollars to promote dictation, not Siri — unless the processing costs are huge and they are losing money on the feature. What changed? In a word, Nuance plus a massive investment in cloud infrastructure. Nuance Communications is the public company behind Dragon Dictate — which has been the market leader in desktop speech recognition for the past 15 years at least (the company was founded in 1992 out of SRI as Visoneer, known mainly for early OCR software). Neither Apple nor Nuance talk about it, but it looks to many people like Apple has licensed its dictation software, including Siri’s front end interpreter, from Nuance. One sign: before Apple bought Siri, it used to carry a “speech recognition by Dragon” label (earlier, Siri had used Vlingo, which apparently did not work as well). Not only that, but Nuance has built several speech recognition apps for the iPhone and iPad that work exactly like the speech recognition built into the iPad and iPhone 5. This is interesting in part because Apple never licenses critical technology for long. It insists on controlling its core technology from soup to nuts, so many people assume that Apple has considered buying Nuance. The problem is that Nuance holds licenses with many Apple competitors who would disappear if Apple bought the company. Apple would need to massively overpay for the asset — something they never do. More likely, Apple will hire talented speech recognition people and build its own proprietary competing product, just like it did with maps when it declared independence from Google. In this case, figure that dictation will regress for a year or two, just as maps have done, because real time, accurate speech recognition makes maps look simple. Plus Nuance protects its patents aggressively and these patents are, according to some writers, not easy to avoid. Although Google is avoiding them nicely; Android speech recognition is also outstanding. How do they do it? The Google way: throw talent at it. Google hired more PhD linguists than any other company and then they hired Mike Cohen. Cohen is an original co-founder of Nuance and if anyone can build voice recognition without tripping on the Nuance patents, he can. Apple appears likely to pursue a similar course. Mobile dictation works by capturing your words, compressing them into a wave file, sending it to a cloud server, processing it using Nuance software, converting it to text, and sending it back to your device where it appears on your screen. Like all good advanced technology, it passes Arthur Clark’s third law: it is indistinguishable from magic. The tricky bit is the software processing, which has to have a rich set of rules based on context. The software decides on the meaning of each word based not only on the sound pattern, but on the words it heard before and after the word it is deciding upon. This is highly recursive logic and nontrivial to execute real time. Try saying “I went to the capital to see the Capitol”, “I picked a flower and bought some flour”, or “I wore new clothes as I closed the door” and you begin to understand the problem that vexes not only software, but English learners everywhere. Apple dictation handles these ambiguities perfectly — meaning that it either gets the answer right, or it realizes that there are multiple possible answers, takes a guess, and hovers the alternative so that you can correct it with a quick touch. It takes a little bit of practice to use dictation well. It helps to enunciate like a fifth grade English teacher and to learn how to embed punctuation. The iPhone OS6 User Guide has a list of available commands. Four are all you need: “comma”, “period”,”Question mark”, and  “New Paragraph” (or “Next Paragraph”). You can also insert emoticons “smiley” :-), “frowny” :( and “winky” ;-). For anything else, speaking the punctuation usually works: “exclamation point”, “all caps”, “no caps”, “dash”, “semicolon”, “dollar sign”, “copyright sign”, “quote”, etc. Overall, the experience of accurate mobile dictation is a magic moment — like the first time you use a word processor or a spreadsheet (for those who recall typewriters and calculators), or the first browser or email (yeah, we didn’t used to have those, either). Give it a try. Apple has done something amazing and for once, actually under-hyped it. 

LinkedInPrintFriendlyGoogle BookmarksGoogle GmailYahoo MailInstapaperPocket