MITB Banner

Watch More

Data Extraction Just Got Smarter With ML: AWS Announces Textract

Amazon Web Services, the cloud computing arm of the e-commerce giant, recently launched an ML service for automated text and data extraction. The service, known as Textract, is fully cloud-hosted and managed by AWS, and allows users to parse various forms of data easily.

The service is said to be more than just an optical character recognition algorithm, as it can parse data tables, whole pages, forms, scans, PDFs, photos and more. Moreover, it also identifies fields and tables, so as to contextualize the data and allow for the collection of cleaner datasets with deeper insights.

The company states that it can process millions of document pages “accurately” in just a few hours. All the data is exported to a JSON format, and can integrate easily with other ML-based AWS services. What sets this product apart is that there is no need to maintain any code or template, and that there is no ML experience required to operate or manage the product.

Amazon states that they have trained Textract on “tens of millions of documents from virtually every industry”, making it suitable for use in any scenario. It can “automatically detect a document’s layout”, preserving the key elements in the page and perform optimal data collection by understanding the relationships between the data.

Amazon is billing it as a lower-cost alternative to manual data entry, with an ease-of-use benefits. Moreover, as with every cloud computing service, it is provided on a pay-as-you-go basis, with accessible APIs. Swami Sivasubramanian, Vice President, Amazon Machine Learning, stated:

“Amazon Textract makes it possible for customers to gain real meaning from their file collections, operate more efficiently, improve security compliance, automate data entry, and facilitate faster business decisions.”

Currently, the service is available in US East (Ohio), US East (N. Virginia), US West (Oregon), EU (Ireland), with Amazon stating that further expansion will happen within the year.

Many prominent companies have already begun using the service, such as The Globe and Mail, a Canadian media outlet, Met Office, the UK’s national weather service  and PriceWaterhouseCoopers, one of the world’s biggest accounting firms. The rise of accessible data entry ML models might be the beginning of the end for low-level jobs such as data entry.

Access all our open Survey & Awards Nomination forms in one place >>

Picture of Anirudh VK

Anirudh VK

I am an AI enthusiast and love keeping up with the latest events in the space. I love video games and pizza.

Download our Mobile App

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
Recent Stories