Imagine a Jarvis-like Assistant making phone calls on your behalf to schedule an appointment, make reservations in restaurants or get holidays on your behalf. Thanks to Google Duplex, this dystopian scenario only seen in sci-fi films, will be a part of our lives soon.
Earlier in May, at the I/O conference, Google CEO Sundar Pichai unveiled Google Duplex. It is an AI-powered voice technology that sounds like human and is an extension of the Google Assistant app which will mainly be used to carry out real-world tasks over the phone.
How Does It Work?
Duplex is a new technology for conducting natural conversations to carry out easy tasks that involve calling. Basically, you can ask Google Assistant to call a business on your behalf.
Once the request is made, Duplex will make the call and then have a real and live conversation with the person who answers the phone and schedule appointments for you.
Pichai demonstrated two phone calls on stage to give people an overview of what to expect in the near future. In the first phone call, the AI had a conversation with a woman to set up an appointment at a hair salon.
Listen to Duplex scheduling a hair appointment, you won’t able to tell the difference between the machine and the human. The back and forth conversation sound so natural.
Another example, Google’s AI calls up a restaurant to reserve a table on your behalf.
But Google has limited the scope of Duplex to service-related conversations so that it doesn’t get confused or lost.
The tech giant stated in a blog post, “The technology is directed towards completing specific tasks, such as scheduling certain types of appointments. One of the key research insights was to constrain Duplex to closed domains, which are narrow enough to explore extensively. Duplex can only carry out natural conversations after being deeply trained in such domains. It cannot carry out general conversations.”
Duplex has a self-monitoring capability, which allows it to recognise the tasks it cannot complete autonomously, like scheduling an unusually complex appointment. In that case, it signals to a human operator who can complete the task.
After receiving backlash from people those who are familiar with AI, Google has also addressed the concerns regarding ethics and privacy, saying that the company is designing Duplex with disclosure built into the system.
“We understand and value the discussion around Google Duplex — as we’ve said from the beginning, transparency in the technology is important. We are designing this feature with disclosure built-in, and we’ll make sure the system is appropriately identified. What we showed at I/O was an early technology demo, and we look forward to incorporating feedback as we develop this into a product.” said Google.
Tech Behind Google Duplex
At the core of Duplex is a recurrent neural network (RNN), that has been built using TensorFlow Extended. It makes the voice behind Duplex sound human-like. Google’s developers used a combination of text-to-speech (TTS) engine and a synthesis TTS engine (using Tacotron and WaveNet) to vary the tone of the machine.
RNN uses the output of Google’s automatic speech recognition technology, as well as features from the audio, the history of the conversation, the parameters of the conversation and more. Developers trained Duplex separately for each task but leveraged the shared corpus across tasks. Speech disfluencies like ‘um’, ‘hmm’ etc have been added to the system to make it sound even like a human. Developers used real-time supervised training to train the system in a new domain. This is similar to an instructor supervising a student as they are doing their job, providing guidance as needed, and making sure that the task is performed at the instructor’s level of quality.
How Natural Does It Sound?
“Duplex is built to sound natural, to make the conversation experience comfortable. It’s important to us that users and businesses have a good experience with this service, and transparency is a key part of that. We want to be clear about the intent of the call so businesses understand the context. We’ll be experimenting with the right approach over the coming months,” claims Google.
There are several challenges that Google has faced in conducting a natural conversation like natural language is hard to understand, a natural behaviour is tricky to model, latency expectations require fast processing and generating natural sounding speech, with the appropriate intonations.
Will It Be Successful In India?
The success of Google Duplex in India depends on the number of users. As of now, the demographic which indulges in online reservation, schedules appointments at salons over the phone or shops online, is very small. But that is a very small segment of tech-savvy people in a country of 1.35 billion. According to a study, current active e-commerce penetration in India stands only 28 percent.
In the day and age when most of the millennial smartphone users use apps such as Zomato, Dineout or Swiggy, among others, do people still call and book tables? According to Shane Mac, CEO and co-founder at Assist, yes, they do. Speaking on a radio show, Mac said, “60 percent of people still call [for reservations]. I think this is where it [Duplex] has a real advantage for small businesses. They don’t have the resources today to have a great scheduling system.” If this is the case in the US, much of it would apply in India as well.
The main hurdle for Google would be the 122 recognised spoken languages in India. Google will have to customise Duplex in at least 22 official languages and dialects because nearly 70 percent of Indians consider local languages digital content more reliable than English. Therefore, in order to deepen the user base, the tech giants have to integrate more Indian regional languages into their AI-powered voice assistants also we need a huge amount of content in vernaculars for mass adoption of voice as a prefered user interface.
Over 300 million Indians have access to smartphones but not many know how to make the most of it. How many of us actually use Google Assistant for our day to day life activities such as sending texts, finding routes, schedule meetings or reminders or even set alarms for next morning? It isn’t exactly high on the list of priorities either.
Try deep learning using MATLAB