Google releases SimCLR, an AI framework that can classify images with limited labeled data

Upgrade & Secure Your Future with DevOps, SRE, DevSecOps, MLOps!

We spend hours on Instagram and YouTube and waste money on coffee and fast food, but won’t spend 30 minutes a day learning skills to boost our careers.
Master in DevOps, SRE, DevSecOps & MLOps!

Learn from Guru Rajesh Kumar and double your salary in just one year.

Post Views: 124

Source: venturebeat.com

A team of Google researchers recently detailed a framework called SimCLR, which improves previous approaches to self-supervised learning, a family of techniques for converting an unsupervised learning problem (i.e., a problem in which AI models train on unlabeled data) into a supervised one by creating labels from unlabeled data sets. In a preprint paper and accompanying blog post, they say that SimCLR achieved a new record for image classification with a limited amount of annotated data and that it’s simple enough to be incorporated into existing supervised learning pipelines.

That could spell good news for enterprises applying computer vision to domains with limited labeled data.

SimCLR learns basic image representations on an unlabeled corpus and can be fine-tuned with a small set of labeled images for a classification task. The representations are learned through a method called contrastive learning, where the model simultaneously maximizes agreement between differently transformed views of the same image and minimizes agreement between transformed views of different images.

SimCLR first randomly draws examples from the original data set, transforming each sample twice by cropping, color-distorting, and blurring them to create two sets of corresponding views. It then computes the image representation using a machine learning model, after which it generates a projection of the image representation using a module that maximizes SimCLR’s ability to identify different transformations of the same image. Finally, following the pretraining stage, SimCLR’s output can be used as the representation of an image or tailored with labeled images to achieve good performance for specific tasks.

Google says that in experiments SimCLR achieved 85.8% top 5 accuracy on a test data set (ImageNet) when fine-tuned on only 1% of the labels, compared with the previous best approach’s 77.9%.

“[Our results show that] preretraining on large unlabeled image data sets has the potential to improve performance on computer vision tasks,” wrote research scientist Ting Chen and Google Research VP and engineering fellow and Turing Award winner Geoffrey Hinton in a blog post. “Despite its simplicity, SimCLR greatly advances the state of the art in self-supervised and semi-supervised learning.”

Both the code and pretrained models of SimCLR are available on GitHub.

Upgrade & Secure Your Future with DevOps, SRE, DevSecOps, MLOps!

Google releases SimCLR, an AI framework that can classify images with limited labeled data

Related Posts

Google fires second AI ethics leader

Total and Google to launch AI tool Solar Mapper in Europe