The accuracy of state-of-the-art object detection systems is often under scanner for seemingly obvious reasons. From unlocking the phone to self-driving cars, object detection is almost everywhere.
As computer vision applications grow in popularity, it has become crucial to keep their flaws in check or at least detect them in the first place. These flaws usually are a culmination of unhealthy data collection strategies and biases — both inductive and engineered.
A wrongly identified image can spew results which can spiral into a catastrophe.
For example, consider an extreme case of failed recognition, where a self-driving car classifies a fleet of bystanders as dry twigs or any other insignificant object and runs over them. The correlation here is totally uncanny but who is to be blamed? After all, machine learning models are called black-box models for a reason.
In an attempt to evaluate the performance of existing object detection systems, a group of researchers at Facebook conducted a study on something very unusual yet ingenious — object recognition for people across different countries and income levels!
Soap In London Is Sandwich In Nepal
In their investigation of current object recognition systems, the authors found that these systems are less effective at recognising household items that are common in non-Western countries or in low-income communities.
When used to recognise such items, the error rate of object-recognition systems for households with an income of less than (USD) $50 per month is approximately 10% lower compared to households making more than $3,500 per month
As can be seen in the above illustration, the flurry of results by Google, Amazon and other top services for the ground truth ‘soap’ varies from accurate to absurd in a jiffy.
The absolute difference in the accuracy of recognising items in the United States compared to recognising them in Somalia is around 15−20%. The authors claim that these findings are consistent across a range of commercial cloud services for image recognition.
For experiments, the Dollar Street image dataset was used. This dataset consists of common household items. From this dataset, classes like “most loved items”, were removed and the training was done with remaining classes. Information like the location of the photograph collected and monthly consumption of the photographed family in the dataset established a perfect setting for analysis.
The accuracy of the object recognition systems was measured through cloud services, namely, the systems provided by Microsoft Azure, Clarifai, Google Cloud Vision, Amazon Rekognition, and IBM Watson.
In addition to the cloud-based systems, we also analysed a state-of-the-art object recognition system that was trained exclusively on publicly available data — a ResNet-101 model that was trained on the Tencent ML Images dataset and achieves an ImageNet validation accuracy of 78.8% (top-1 accuracy).
Why Such Inaccuracies
The authors observe that there are two main reasons for the discrepancies surfacing in image classification tasks:
- The geographical sampling of image datasets is unrepresentative of the world population distribution and
- Most image datasets were gathered using English as the “base language”.
In this choropleth map, red indicates an accuracy of ~60% and green indicates an accuracy of ~90%. High income, highly developed nations like the US had better accuracies with image classification whereas, parts of Eastern Africa were displayed red. The results indicate that the economic status of a location does make a difference in a sample.
Pursuit For Fairness
With great ML deployment comes great responsibility. Studies such as these keep pushing for the fairness in machine learning applications while exposing the lack of diversity in datasets. If there are notable disparities in something as trivial as the image of toothbrush in a well-furnished room and that of outside, then this definitely demands for a prudent evaluation of the existing datasets of dire nature.
You can read the full study here.