Alexa, Amazon’s poster child for connected home devices, has recently received yet another update. Researchers have developed a method of implementing improved natural language processing in the model, cutting the error rate by 8%. This was done through a combination of transfer learning and utilising AI to generate ‘embeddings’ of words.
Transfer learning was implemented with a neural network to train an AI on a large dataset of annotated speech samples, thus enabling researchers to bootstrap training in a new domain with sparse data. This technique taps into millions of unnannotated interactions with Alexa, fueled by data from users of Amazon’s Echo products.
These interactions were utilised to train a model to generate something known as embeddings, which are direct numerical representations of words. Post the generation of these embeddings, words with similar functions are grouped closely together. This grouping is conducted with reference to their “co-occurence” with other words, as this is the way that natural language works.
Thus, by determining how many co-occuring words two words have in common, it is possible to know their distance from each other in the embeddings. This allows Amazon to capture information about words that are similar to each other semantically, without the requirement of human labour transcribing the words from speech.
These embeddings are organised using a scheme known as embeddings from language models, or ELMo for short. Alexa researchers created a version of ELMo that fits in with the requirements of Alexa. As it requires real-time processing of spoken text, they created a version of ELMo they called ELMo Light (ELMoL).
Proceeding this, the researchers created 3 models, one with ELMo, one with ELMoL, and one without any embedding scheme. The first two networks had their embedding layers trained on upwards of 250 million unannotated requests to the assistant, with the second step being utilising 4 million annotated requests from existing Alexa services. Keeping in mind that this also utilised transfer learning, the models were retrained on limited data to perform new tasks.
The results showed that the network that utilised ELMo came in first, with ELMoL following close behind. Researchers also observed that the improvements were inversely proportional to the amount of data used in the final retraining step. This proved that the transfer learning algorithm was successful in providing results from smaller amounts of data.