Last month at AWS re:INVENT developers conference, Amazon announced two new services — Amazon Transcribe and Amazon Translate — with an aim to improve the company’s artificial intelligence and machine learning capabilities.
Amazon Transcribe can analyse audio files (.wav, .mp3, .flac) stored on Amazon S3. On the other hand, Amazon Translate provides a fast translation of text-based content to create a multilingual experience on the web. It can also simply mass-translate documents on command.
“A lot of companies are talking about the potential of machine learning and artificial intelligence, and thinking about how to incorporate these technologies in their applications,” said Swami Sivasubramanian, vice president of Machine Learning at AWS. “AWS changed all this with the introduction of Amazon SageMaker that makes ML accessible to everyday developers by eliminating the heavy lifting of building, training, and deploying models,” he added.
Developers can now integrate Translate and Transcribe into their applications and link it with other services from Amazon. These two services further extend the language capabilities provided on AWS with Amazon Lex, Amazon Polly and Amazon Comprehend, for processing natural language to discover insights and contextual relationship in a text.
How Amazon Transcribe Works
Transcribe recognises speech from audio files and converts it into text. It works in English and Spanish to create applications which incorporate the content of the audio files. For instance, you can use it to transcribe the audio track from a video recording to create closed captions for the video.
Developers can also use it with other services like:
- Transcribing audio, sending the transcription to Amazon Comprehend to identify topics, keywords or sentiments.
- Converting voice to text and then sending it to Amazon Polly to speak the translated text.
Amazon Transcribe uses three operations to transcribe an audio file:
- StartTranscriptionJob: It transcribes speech in a studio file text.
- ListTransciptionJob: It returns a list of transcription jobs that have been started. For instance, a developer can get a list of all the pending jobs or a list of completed jobs.
How Amazon Transcribe Uses Deep Learning
With the help of ML, Amazon Transcribe can identify multiple speakers in an audio file and add timestamps, making it easier for developers to isolate each part of the conversation. The technique is called speaker identification or diarisation.
For example, developers can use diarisation to identify the customer as well as the support representative in a recorded customer support call. It identifies characters for closed captions and also identifies the speaker, questioners in a recorded press conference.
Moreover, the web service can recognise the number of voices in an audio clip and can work with low-bitrate audio files like call recordings. It even gives developers an option to add their own vocabulary which is specific to their domain.
Amazon Transcribe can also help developers create subtitles automatically for online videos. Transcribe uses deep learning to add punctuation and format text automatically so that the output can be used without any editing.
How Amazon Translate Works
Translate uses neural networks to liaise between language pairs to drive the service. It provides real-time translation of text-based content. Through Translate, developers can engage with web forums, check out hotel reviews or access documents even if the content is not in their native language. It allows developers, businesses and companies to reach customers around the world in their native languages.
The web service can be used with a number of language-oriented AI services on AWS, including Polly and Lex. Developers can translate text to or from English to any of these languages: Arabic, Chinese, French, German, Portuguese and Spanish. Reportedly, the company also plans to add six more languages in the coming months: Japanese, Russian, Italian, Turkish, Czech and Traditional Chinese.
How Amazon Translate Uses Deep Learning
Amazon Translate consists of two components — encoder and decoder.
Encoder component reads a sentence from the source language and creates a representation that captures the meaning of the text provided.
Decoder component formulates a semantic representation used to generate a translation of the text from the source language to target language. Additionally, it uses attention mechanism to build context from each word of the source text.
According to AWS, Translate produces more accurate and fluent translation than traditional statistical and rule-based translation models.
The neural machine translation system is built on a neural network that creates more fluent and accurate translations. It takes account of the entire context of the sentence, as well as the translation it has generated so far.
When the developers have large quantities of pre-existing text to translate, the web service can perform batch translation and can identify the source language when it is not specified.
Additionally, Amazon Translate can also perform a real-time translation when developers want to deliver on-demand translations of content as a feature of their applications. Alternatively, developers can even use it to instantly translate customer service chat conversations.
To Sum Up
The field of machine translation has been exploding with activity in the past two years as researchers and companies have found success by adopting ML. Tech companies like Google, Apple, Facebook and Microsoft are already developing and using AI and ML for their own products. However, these new offerings from AWS could make it easier and more affordable for startups and less tech-savvy companies to implement deep learning technology. These two web services could help AWS further diversify its revenue away from computing and storage resources that its rivals provide.