Google launches AutoML Natural Language with improved text classification and model training
Earlier this year, Google took the wraps off of AutoML Natural Language, an extension of its Cloud AutoML machine learning platform to the natural language processing domain. After a months-long beta, AutoML today launched in general availability for customers globally, with support for tasks like classification, sentiment analysis, and entity extraction, as well as a range of file formats, including native and scanned PDFs.
By way of refresher, AutoML Natural Language taps machine learning to reveal the structure and meaning of text from emails, chat logs, social media posts, and more. It can extract information about people, places, and events both from uploaded and pasted text or Google Cloud Storage documents, and it allows users to train their own custom AI models to classify, detect, and analyze things like sentiment, entities, content, and syntax. It furthermore offers custom entity extraction, which enables the identification of domain-specific entities within documents that don’t appear in standard language models.
AutoML Natural Language has over 5,000 classification labels and allows training on up to 1 million documents up to 10MB in size, which Google says makes it an excellent fit for “complex” use cases like comprehending legal files or document segmentation for organizations with large content taxonomies. It has been improved in the months since its reveal, specifically in the areas of text and document entity extraction — Google says that AutoML Natural Language now considers additional context (such as the spatial structure and layout information of a document) for model training and prediction to improve the recognition of text in invoices, receipts, resumes, and contracts.
Additionally, Google says that AutoML Natural Language is now FedRAMP-authorized at the Moderate level, meaning it has been vetted according to U.S. government specifications for data where the impact of loss is limited or serious. It says that this — along with newly introduced functionality that lets customers create a data set, train a model, and make predictions while keeping the data and related machine learning processing within a single server region — makes it easier for federal agencies to take advantage.
Already, Hearst is using AutoML Natural Language to help organize content across its domestic and international magazines, and Japanese publisher Nikkei Group is leveraging AutoML Translate to publish articles in different languages. Chicory, a third early adopter, tapped it to develop custom digital shopping and marketing solutions for grocery retailers like Kroger, Amazon, and Instacart.
The ultimate goal is to provide organizations, researchers, and businesses who require custom machine learning models a simple, no-frills way to train them, explained product manager for natural language Lewis Liu in a blog post. “Natural language processing is a valuable tool used to reveal the structure and meaning of text,” he said. “We’re continuously improving the quality of our models in partnership with Google AI research through better fine-tuning techniques, and larger model search spaces. We’re also introducing more advanced features to help AutoML Natural Language understand documents better.”
Notably, the launch of AutoML follows on the heels of AWS Textract, Amazon’s machine learning service for text and data extraction, which debuted in May. Microsoft offers a comparable service in Azure Text Analytics.