The text-to-speech technology has made our lives very easy. It is helping us save time and effort, and is delivering required information in a jiffy. This area of speech recognition technologies is being explored and empowered by tech giants Google and Amazon with the help of their products Google Speech and Amazon Transcribe respectively. Here we are listing a quick comparative analysis of both of them with respect to certain common features, as below.
Google Speech: It has a very wide range, as it supports 119 languages. There are thirteen variations of the English accent itself — from countries like Australia, Canada, Ghana, UK, India, Ireland, Kenya, New Zealand, Nigeria, Philippines, South Africa, Tanzania and the US. It also has a total of nine Indian languages — Bengali, Hindi, Gujarati, Kannada, Malayalam, Marathi, Tamil, Telugu and Urdu.
Amazon Transcript: It allows fewer English language accents such as British, Canadian, Australian to name a few. In addition to this, it currently has six other languages — Arabic, Chinese, French, German, Portuguese and Spanish. It also plans to add six more languages, Japanese, Russian, Italian, Turkish, Czech and Traditional Chinese to their vocabulary. Audio Size
Google Speech: It has separate systems for long speech and the short speech. The long speech is for transcription whereas short speech is for voice interfaces.
Amazon Transcribe: It has one common input for any length of the audio. Both Amazon and Google’s platforms provide an input audio limit for transcription of 120 minutes per API call.
Google Speech: It supports Python, node.js, Java, C++, C#, PHP and Ruby.
Google Speech: It allows the advantage of data privacy by giving a user an option of ‘data logging’ program. In this program, Google uses customer data to learn from it and use it to improve its speech recognition machine learning models. Users have a ‘disable data logging’ option if they wish to not log the data from a specific project.
Amazon Transcribe: Amazon, on the other hand, stores its voice data on its Transcribe to improve machine learning models and the data is available to a select few employees. However, you can ask for a deletion of the voice recording by contacting the AWS support.
Google Speech: In case of Google Speech, it can be FLAC, AMR, PCMU or WAV. Also, SDKs are available for C#, Go, Java, Node.js, PHP, Python and Ruby. Speech does not require an additional tool for noise cancellation. It has optimized its service to transcribe noisy audio without having the need of an additional noise cancellation. However, for best use, the microphone has to the user as possible.
Amazon Transcribe: The format of the input audio for Amazon Transcript can be FLAC, MP3, MP4 or WAV. The language and format of the input audio file must be mentioned.
Google Speech: It is not flexible with allowing the users to create their own custom vocabulary. Although it does have quite a larger set of language support compared to the Amazon Transcribe.
Amazon Transcribe: It allows the user to customise vocabulary. For example, if there are certain organisation-related terms that the user uses frequently and therefore want it to be a part of their Transcribe vocabulary, they can do that. But this feature is not available for Australian-accented and Canadian-accented English language.
Google Speech allows users to dictate emoticons as well. To do this, the user has to name of the emoticon that he/she wants to be typed. For example, just say, “Add ‘smiling emoji’, or “winky face emoji”. This function is only available in the English language. Amazon has no such function to type emoticons by dictating them.
Also, Amazon’s Transcribe can automate punctuations using machine learning wherever required, whereas Google Speech does not do that.
As a whole, both the platforms, Amazon Transcribe and Google Speech, are equally competitive. One beats the other in certain specific attributes, such as Google has a far greater variety of languages to support compared to Amazon or that Amazon Transcribe flaunts custom vocabulary, unlike Google. In certain respects, they are quite alike such as the number of support languages.
More or less, both the platforms present a fair set of advantages and disadvantages in the spectrum of voice-to-text technology. It will be interesting to see how the two of them use machine learning algorithms in the future and come up with new tech, to overpower the other competitor.
Try deep learning using MATLAB