Not a single day passes without Mountain View-headquartered search giant Google releasing its latest accomplishment in AI. Now, reportedly, researchers from Google-owned London-based DeepMind and Oxford University have used AI to deliver a lip-reading software that bests human beings as well.
Google’s algorithm factory DeepMind that is carrying out groundbreaking research in healthcare and energy among other areas teamed up with Oxford researchers to develop a lip-reading software. The research cites how the reason behind developing such a software — machine that can lip read opens up a host of applications, from dictating’ instructions or messages to a phone in a noisy environment; transcribing and re-dubbing archival silent films; resolving multi-talker simultaneous speech; and, improving the performance of automated speech recognition in general.
In an earlier interview, DeepMind CEO Demis Hassabis revealed how the AI company is focusing on smartphone technologies that will have great potential.
Some of the noteworthy contributions of the researchers are (1) a ‘Watch, Listen, Attend and Spell’ (WLAS) network that learns to transcribe videos of mouth motion to characters; (2) a curriculum learning strategy to accelerate training and to reduce overfitting; (3) a ‘Lip Reading Sentences’ (LRS) dataset for visual speech recognition, consisting of over 100,000 natural sentences from British television.
Interestingly, the WLAS model trained on the LRS dataset outshines the performance of all previous work on standard lip reading benchmark datasets, that too by a great margin. The paper claims that their lip reading software has bested professional lip reader on videos from BBC television. The model uses a dual attention mechanism that works on both visual input and audio input.