Amazon Web Services, the cloud computing arm of the e-commerce giant, recently launched an ML service for automated text and data extraction. The service, known as Textract, is fully cloud-hosted and managed by AWS, and allows users to parse various forms of data easily.
The service is said to be more than just an optical character recognition algorithm, as it can parse data tables, whole pages, forms, scans, PDFs, photos and more. Moreover, it also identifies fields and tables, so as to contextualize the data and allow for the collection of cleaner datasets with deeper insights.
The company states that it can process millions of document pages “accurately” in just a few hours. All the data is exported to a JSON format, and can integrate easily with other ML-based AWS services. What sets this product apart is that there is no need to maintain any code or template, and that there is no ML experience required to operate or manage the product.
Amazon states that they have trained Textract on “tens of millions of documents from virtually every industry”, making it suitable for use in any scenario. It can “automatically detect a document’s layout”, preserving the key elements in the page and perform optimal data collection by understanding the relationships between the data.
Amazon is billing it as a lower-cost alternative to manual data entry, with an ease-of-use benefits. Moreover, as with every cloud computing service, it is provided on a pay-as-you-go basis, with accessible APIs. Swami Sivasubramanian, Vice President, Amazon Machine Learning, stated:
“Amazon Textract makes it possible for customers to gain real meaning from their file collections, operate more efficiently, improve security compliance, automate data entry, and facilitate faster business decisions.”
Currently, the service is available in US East (Ohio), US East (N. Virginia), US West (Oregon), EU (Ireland), with Amazon stating that further expansion will happen within the year.
Many prominent companies have already begun using the service, such as The Globe and Mail, a Canadian media outlet, Met Office, the UK’s national weather service and PriceWaterhouseCoopers, one of the world’s biggest accounting firms. The rise of accessible data entry ML models might be the beginning of the end for low-level jobs such as data entry.