Image recognition is one of the most important components in artificial intelligence systems today. Facebook has one of the most competent and brilliant AI groups in the world and has managed to attract the best in the fields to innovate further. Facebook is continuously making great progress in AI and computer vision with the latest research focussing on generating great audio captions of photos to assist visually-impaired users. But even though these machine learning models are great, they are feed on data and labels. To tackle this problem, Facebook came up with an innovative way to push the power of deep learning even further — by using hashtags put up by users on Instagram.
The research of weakly-supervised learning was a broad collaboration that included Facebook’s applied machine learning (AML) and Facebook Artificial Intelligence Research (FAIR). This research titled Exploring the Limits of Weakly Supervised Pre-Training was carried out by Dhruv Mahajan, Ross Girshick, Vignesh Ramanathan, Kaiming He, Manohar Paluri, Yixuan Li, Ashwin Bharambe, and Laurens van der Maaten. Currently the models are trained on data that have been hand annotated by individuals and this system is hard to scale. Giving the system more images simply does not work. The process of hand-labeling is obviously labour-intensive. Supervised learning process often yields the best performance results, but hand-labeled datasets are already nearing their functional limits in terms of size. Facebook currently is training some models on as many as 50 million images; and scaling up to billions of training images is unfeasible when all supervision is to be supplied by hand.
The Hashtag Idea
The researchers and engineers at Facebook tried to tackle the issue of scaled datasets by looking into hashtags, the largest one of them included 3.5 billion images and 17,000 hashtags. At the heart of this approach is leveraging the already existing user supplied hashtags as labels. The hashtags provided by users replace the process of manually categorising each picture. The researchers say that this approach has helped massively in the testing phase. By training their computer vision systems with a one billion image version of this data set, we achieved a record high score — 85.4 percent accuracy — on ImageNet, a common benchmarking tool.
This simple yet powerful idea has delivered genuine breakthroughs in image recognition performance and this has thrown a spotlight on the path to move from supervised learning to weakly supervised learning machine learning. Facebook also plans to open source the embeddings of these models in the future, so the research community at large can use and build on these representations for high-level tasks. So remember, whenever you put Instagram hashtags you are helping researchers build better machine learning models (#respect). Obviously, the privacy implications of this move are debatable. Should Facebook take your permission before using your hashtags? But to be fair, Facebook is only picking up hashtags from profiles and photos that are visible to the general public.
Using Hashtags At Scale
Hashtags are very popular among users of Instagram and researchers saw this trend as an ideal source of training data for models. It enables the researchers to hashtags in the most perfect way: to make images more accessible, based on what people assume others will find relevant. But hashtags also often reference non-visual and abstract concepts, such as #tbt for “throwback Thursday”, or #metoo for the international movement against sexual harassment and assault. The researchers mentioned other vague tags such the tag #party, which could describe an activity, a setting, or both. For image recognition purposes, tags function as weakly-supervised data, and vague and or irrelevant hashtags appear as incoherent label noise which can confuse deep learning models.
It seems clear that there has to be a way to use hashtag in supervised learning and weed out useless hashtags. Since there are always multiple hashtags per image, the research had to sort through hashtag synonyms, and balance the influence of frequent hashtags and rare ones. The approach showed excellent transfer learning results, hence the models could be used in other applications of AI. This new work builds on previous research at Facebook including investigations of image classification based on user comments, hashtags, and videos.
Results and Future Scope
The researchers hoped to see some performance gains in image recognition, the results were surprising. On the ImageNet image recognition benchmark — one of the most common benchmarks in the field — the best model achieved 85.4 percent accuracy by training on one billion images with a vocabulary of 1,500 hashtags. The use of billions of images along with hashtags for deep learning leads to relative improvements of up to 22.5 percent. On another major benchmark, the COCO object-detection challenge, we found that using hashtags for pretraining can boost the average precision of a model by more than 2 percent.
The gains in image recognition, is confirmation that training computer vision models on hashtags can work. The researchers used some basic techniques that merge similar hashtags and down-weight others. Additionally there was no need for complex “cleaning” procedures to eliminate label noise. Since networks trained on billions of images were shown to robust to label noise since many hashtags were useless. The researchers envision other tricks to use hashtags as labels for computer vision. Newer applications include using the technique to understand video footage or understand image ranking in Facebook feeds. Hashtags can also be used to put images in categories and subcategories.