As the most advanced species on the planet, speech is the most natural and effortless way of communicating for us humans. Now, the machines have evolved to understand and interact with us and this concept no longer belongs in the realm of sci-fi. Amazon Echo, Google Home and Vector are commercial products that people can buy.
Now, XMOS, a semiconductor company specializing in voice processing, has created algorithms which are capable of processing a spoken voice even in loud or challenging environments. Founded in 2005 and built on research from the University of Bristol, the technology used by XMOS is used in numerous devices, including Amazon’s Echo.
XMOS’ Power-Packed Chip
XMOS speech detection and isolation techniques include:
Beamforming: the ability to track a person’s voice as they move around
Acoustic echo cancellation: the ability to separate the user’s voice from the sound being played by the device itself
Dereverberation: the ability to cancel out echoes
Barge-in: the ability to stop audio playback when the device’s wake-word is detected
Fixed or automatic gain control: The ability to control or limit voice gains or volume intensities
Noise cancellation/suppression: the ability to cancel out noises.
Director of corporate marketing, Esther Connock told a news portal, “Currently XMOS serves as the only qualified stereo solution partner for Amazon and the only one that can do acoustic cancellation in stereo. At present, XMOS specializes in building cutting-edge voice applications and also has its attention in investigating areas like in-car interfaces.”
XMOS claims to have developed a technology for sound source separation. A technology that extracts multiple voices in a conversation which in future would allow focussing only the speaker’s voice in a conversation where there are a lot of surrounding noises. The company also has high hopes for the future of voice and believes in a ‘personal assistant’ in a flexible and wearable device that will provide voice recognition services.
The Digital Twin
XMOS believes in creating a digital twin. A voice application that can talk as natural as a human. The idea is to give emotions to speech so that it can learn to adapt to the way you use it. For example, your voice assistant could learn that you do not wish to be spoken to unless you spoke first.
XMOS wants to make the communication as organic as possible like how humans talk to each other. XMOS believes that a frictionless communication between man and machines is the future of voice processing. In addition, XMOS also believes that the digital twin should know much more than your music preferences or how you want it to interact with you. As Cannock says, “The digital twin will learn not just my music preferences, but my everything preferences. When I want to be disturbed, my friends that I will prioritize talking to – everything.”
Though people are often amused by the voice processing technology among many others, like many it still feels resistance.
People are okay with their smartphones with cameras, voice recording, and all other gimmicks listening to them 24X7 but they can not stand when they have a speaker that listens to them.XMOS believes that trusted content will be the key to the acceptance of voice among people, putting user-experience on top of everything.
Speech is much effortless than any other forms of communication. Since technology has always been about making people’s life easier, the voice processing is one such tech that is still futuristic yet a reality. With so many voice assistant applications and devices already available in the market today, accepting voice commands to perform tasks, we can expect a more natural communication between man and machines in the near future.