Digital assistants such as the Amazon Echo, Google Home and Apple HomePod can listen to you. And they can talk back. But that doesn’t mean that they can carry on a conversation.


As the devices that run these assistants become more commonplace — 39 million Americans now own one, according to a recent study — programmers and artificial intelligence (AI) experts are working on ways to make them more user-friendly. They foresee a day when you’ll be able to chat with Siri or Alexa as you would with a friend.

Because these devices are not yet nimble enough to understand and respond to everything we might say, Amazon, Apple and Google have honed them for particular tasks, such as playing a song or building a shopping list. The devices’ ability to recognize the words we say is improving steadily, but just because they perceive them doesn’t mean that they comprehend them.

That’s the next goal of AI researchers: to get the devices to understand what we say and be able to formulate appropriate responses.

Machine-learning algorithms are helping these devices deal with turn-by-turn exchanges. But so far, each verbal exchange is limited to a simple three- or four-turn “conversation.” In a perfect world, engineers would build one giant neural network that learns to do everything. For now, these devices move toward better conversations in small steps.

It’s like elementary school grammar classes for computers.

“If you keep your language short and simple, you can maybe go as deep as three steps,” said Nancy Fulda, a researcher at Brigham Young University who specializes in conversational systems. “The technology operates by different rules than people do.”

Electronic teamwork

A digital assistant relies on many different technology systems, all working together on the device and inside a network of computer data centers that connect to the assistant over the internet.

When you say something, one system tries to recognize each word and convert it to text. Another system tries to understand the meaning of each word and how it relates to the others. A third system spits out new text that responds to what you’ve said. A fourth converts this response to digital speech. Other systems also may weigh in, but you get the point.

Engineers used to build speech recognition systems by writing one small computing rule at a time — a painstaking process. But so-called neural networks are now replacing those handwritten rules, accelerating the progress of speech recognition. Neural networks are complex mathematical systems that can learn particular tasks by pinpointing patterns in large amounts of data.

You probably have encountered such a system if you’ve called a customer support line and been asked by an electronic voice to briefly describe the reason you are calling. You’re prompted to make statements like “check order status” or “pay bill.” This sort of system has been programmed to route calls to the appropriate area by analyzing recordings of old customer support calls.

Digital assistant algorithms analyze hundreds, even thousands, of requests and learn to identify them. When generating responses, these assistants plug particular information into an existing template.

More conversational

A neural network drives Google’s new “conversational mode.” In the past, you couldn’t talk to the device without saying “Hey, Google.” Now, after saying this once, you can deliver multiple commands and questions.

In some cases, Google Home can differentiate when you are talking to it and when you’re talking to someone else in the room. It does this with a system that has been “trained” with the interactions other people have had with their devices. Basically, Google’s customers are helping Google build a smarter product.

But differences among the devices and their systems exist. Make the same request on Amazon Echo, Google Home and Apple HomePod — the location of the nearest coffee shop, for instance — and you might get three different responses. That’s probably because each device is using a different “knowledge graph,” a vast database of facts and other information you may ask for. Google, for example, may use data gathered from Google Maps, whereas the other companies may pull from sources such as Yelp.

Engineers hope that machine learning will continue to replace handwritten rules and expand what these devices can do, but conversation is a complex task. Researchers have built experimental neural networks that learn to carry on richer conversations by analyzing reams of real (human) dialogue, such as exchanges on Twitter or Facebook Messenger. But these neural networks can veer into nonsense. And they tend to reinforce the flaws of human conversation, including gender bias, rudeness and even racism.

No one is predicting when digital assistant conversation will be perfected; it might be years from now, or it might even be decades. But it’s coming. If you doubt that, just ask your digital assistant.