Vlad Sejnoha, chief technology officer of Nuance Communications, demonstrated Nuanceā€™s TV system, which obeys spoken commands to flip channels.

Matthew Cavanaugh, New York Times

Voice technology: A real conversation-starter

  • Article by: NATASHA SINGER
  • New York Times
  • April 3, 2012 - 4:45 PM

BURLINGTON, MASS. - Vlad Sejnoha is talking to the TV again.

OK, maybe you've done that, too. But here's the weird thing: His TV is listening.

"Dragon TV," Sejnoha says to the screen, "find movies with Meryl Streep." Up pops a list of films like "Out of Africa" and "It's Complicated."

"Dragon TV, change to CNN," he says. Presto -- the channel flips to CNN.

Sejnoha is sitting in what looks like a living room but is, in fact, a sort of laboratory inside Nuance Communications, the leading force in voice technology and the speech-recognition engine behind Siri, the virtual personal assistant on the Apple iPhone 4S.

Here, Sejnoha, the company's chief technology officer, and other executives are plotting a voice-enabled future where human speech brings responses from not only smartphones and televisions, cars and computers, but also coffee makers, refrigerators, thermostats, alarm systems and other smart devices and appliances.

It is a wildly disruptive idea. But such systems are already beginning to change the way we interact with the world and, for better and worse, how we think about technology. Until now, after all, we've talked only to one another. What if we begin talking to all sorts of machines, too -- and, like Siri, those machines respond as if they were human?

Granted, people have been talking into machines and at machines since the days of Edison's phonograph. By the 1980s, commercial speech recognition systems had become sophisticated enough to transcribe spoken words into text. Today, voice technology is a fixture of many companies' customer-service operations, albeit an occasionally maddening one.

But now the race is on to make the voice the sought-after new interface between us and our technology. The results could rival innovations like the computer mouse and the graphic icon and, some experts say, eventually pose challenges for giants like Google by bypassing their traditional search engines.

No player is bigger in voice technology than Nuance, of Burlington, Mass., an industry pioneer that has acquired more than 40 companies in the field and today employs 7,300 people. It is one of the companies that helped make a big technological leap from programs that take dictation to systems that actually extract meaning from words and respond to them. Now it wants to push far beyond that.

"They are the equivalent of Microsoft, Google or Amazon in a very niche technological space," says Andrew Rosenberg, an assistant professor of computer science at Queens College.

Like many new technologies, sophisticated voice systems have potential drawbacks. Some experts worry about privacy invasions, others about our ever-deepening attachment to devices like smartphones.

Humans are wired for speech and tend to respond to talking devices as if they were kindred spirits, says Sherry Turkle, a professor of the social studies of science and technology at the Massachusetts Institute of Technology.

"I'm not saying voice recognition is bad," Turkle says. "I'm saying it's part of a package of attachments to objects where we should tread carefully because we are pushing a lot of Darwinian buttons in our psychology."

Only a decade ago, voice-enabled virtual assistants seemed more science fiction than business fact. But in 2000, Paul Ricci, a former executive at Xerox, concluded that voice software could one day disrupt the marketplace the way the mouse and the icon had in the 1980s.

"We had to decide early on where there were markets where we could successfully deploy the technology," said Ricci, Nuance's chief executive.

Nuance, then known as ScanSoft, went on an aggressive acquisition spree. It bought a desktop dictation system called Dragon NaturallySpeaking, as well as dozens of small companies that had carved niches in medical dictation, automated voice-response systems and speech research. Its most significant acquisition was Nuance, a rival that had been spun off from S.R.I. International of Menlo Park, Calif. The combined company took the Nuance name. (S.R.I. International later developed and spun off Siri, which was acquired by Apple in 2010.)

"They have literally tried to buy every good asset out there, or build it themselves, knit it all together and augment it," Richard Davis, an analyst at Canaccord Genuity, says of Nuance.

Nuance reported revenue of about $1.3 billion for 2011, with $515 million of that coming from its health care technology business.

Not everyone is as enamored of voice technology. Some privacy advocates worry that it adds an audio track to the digital trail that people leave behind when they use the Web or apps, potentially exposing them to more data mining.

Voice recognition software works by sending speech to processors that break down spoken words into sound waves and use algorithms to identify the most likely words formed by the sounds. The system typically records and stores speech so it can teach itself to become more accurate over time.

Nuance says it is impossible to identify consumers from the recordings, because the company's system recognizes people's voices only by unique codes on their devices, rather than by their names. The company's privacy policy says it uses the voice data of consumers only to improve its own internal systems.

© 2018 Star Tribune