Talk to the Dragon, NaturallySpeaking


— 10:03 AM on January 4, 2013

While I wouldn't consider myself to be particularly injury prone, I've managed to damage various bits of my body over the years. Muscles have been pulled, joints have been sprained, skin has been gashed, and bones have been broken. These injuries usually result from me catapulting off a bicycle, and I've earned most of 'em. My latest crash was less than spectacular, though. Instead of careening down a muddy trail dodging roots and rocks in an attempt to hit a drop-off just right, I was riding along a quiet city street at relatively low speed. I misjudged the gap between two cars, squeezed the brakes too late for the rain-slick pavement, and then hit the deck.

As far as spills go, this was a minor one. My bike escaped without a scratch, and for the most part, so did I. However, my right ring finger was in the wrong place at the wrong time, I suspect crushed between the road and my handlebar. It sustained a spiral fracture to the proximal phalanx—the first stretch extending from the palm—and it is now being held together by three wires while the bone heals. When looking at the initial X-ray, the ER doctor joked, "You're not a concert pianist, are you?"

"No," I replied, "I write at a keyboard for a living." Doh!

At first, I panicked. But then a wave of calm washed over me, and not just because I'd been dosed with pain killers. It's 2012, I thought. The PC must have speech recognition sorted out by now.

There was good reason for my optimism. I remembered reading about PC-based speech recognition more than a decade ago. At the time, the accuracy was supposed to be around 95%, a figure I recall because the author noted that this meant one in twenty words was still wrong. Surely, after years of software tuning, with the horsepower available in modern hardware, the accuracy would be higher. After all, even the speech recognition on my smartphone is pretty good.

In the months before I munched my finger, I'd been talking to my Galaxy Nexus on an almost daily basis. Using the Siri-style voice commands is a little slower than navigating the UI manually. For me, the real draw is the ability to dictate short text messages, emails, and searches—things that are frustrating to type on the tiny keyboard, especially when holding the phone in a portrait orientation with one hand.

Android's built-in speech recognition engine can't be trained to recognize specific words or to compensate for thick accents, but the default implementation does a decent job of interpreting my mumbling even in noisier environments. Languages can also be cached locally, making voice usable if you don't have a data connection. While mistakes are common, the error rate is no higher than what my clumsy fingers produce with the on-screen keyboard. Even accounting for corrections, voice is still faster for short bursts of text.

Speech recognition may be a viable alternative to my smartphone's touchscreen keyboard, but it faces stiffer competition on my desktop: a full-sized mechanical keyboard on which I can crank out over 100 words per minute. Like Android, Windows actually has a speech recognition engine of its own. However, on the morning after my crash, everything I read on the subject said Nuance's Dragon NaturallySpeaking 12 was much better. The basic version was selling for half price and claimed an accuracy of "up to 99%," so I took the plunge.

The setup process was simple, and the default training routine took only a few minutes to learn my voice. Moments later, I was leaning back in my chair, feet propped up on my desk, dictating sentences with ease. Punctuation was a snap, everything was capitalized properly, and I didn't even mind wearing a headset until I left the house without noticing the tufty mess it had made of my hair.

Initially, I was incredibly impressed. NaturallySpeaking does such a good job of deciphering speech that I don't for a moment doubt the 99% accuracy claim. Most of the mistakes that pop up on my screen are a result of my tendency to mumble and speak quickly, causing words to slur together. Fixing errors is easy, though. Voice commands can be used to move the cursor and select words for correction, enabling hands-free editing with little fuss.

If you want to invest more time tailoring NaturallySpeaking to your voice—and more time practicing clear and even delivery—additional training passages can be dictated. The application can learn words and tendencies by reading your emails and other documents, which comes in handy if you use a lot of specialized terms. New words can be added manually to Dragon's vocabulary, of course, and there are a number of auto-formatting options that govern how numbers, units, abbreviations, and other special cases are handled.

Even if you don't resort to additional training, NaturallySpeaking slowly builds up a profile based on your interactions with the software. The accuracy certainly improved over the two weeks I was using Dragon regularly. However, the moment my massive, post-surgery cast was traded for a lower-profile splint that freed the index and middle fingers on my right hand, I tossed my headset in favor of pseudo hunting and pecking. This slightly hobbled setup produces fewer words per minute for sustained writing sessions, but it feels far less cumbersome for day-to-day work.

You see, Dragon NaturallySpeaking works exceptionally well if you want to dictate big blocks of text into Microsoft Word, with which it is tightly integrated. For most other applications, including my preferred Notepad++ text editor, pop-up dialog boxes accumulate dictated passages before transferring them to the desired location. That's not so bad if you're using only one application at a time, but it's less than ideal when your daily routine involves bouncing between a text editor, multiple IM windows, email, a web browser, and Excel. It probably doesn't help that I've become accustomed to writing TR news posts, articles, and reviews in raw HTML rather than using WYSIWYG editors that would be more amenable to natural language dictation. NaturallySpeaking isn't quite smart enough to figure out HTML markup.

Another problem with NaturallySpeaking is that it doesn't feel particularly fast. I never speak so quickly that it can't keep up, but the app still takes its sweet time displaying words on the screen. That's probably an artifact of interpreting multiple words to improve accuracy. Yet even with the accuracy slider pushed all the way toward faster recognition speeds, I'm usually at the end of a sentence before my words have been transformed to visible text. I've been running Dragon on a quad-core system with an SSD and 8GB of RAM, so my rig should be up to the task; the Windows Task Manager tells me NaturallySpeaking is leaving most of those hardware resources idle. I guess I'm just not used to feeling like I'm waiting on a text editor.

Dragon NaturallySpeaking may have limitations, but its phenomenal speech recognition engine makes fewer errors than my healthy hands produce typos. That's an impressive achievement regardless of the less-than-ideal fit for my particular workflow. At least I know I have a competent dictation solution should my mad typing skills be compromised by another injury. As it turns out, typing has been the least of my problems. You don't know how much you miss fast, accurate mouse movement until you're forced to use your opposite hand. All those holiday games I had queued up will have to wait another few weeks for the wires to come out and my splint to be discarded. Perhaps I should have lined up some old-school text adventures to pass the time.

   
Register
Tip: You can use the A/Z keys to walk threads.
View options

This discussion is now closed.