With machine learning, the world relies on technology for recommendations recognition systems. But a lot of these systems are corrupted because they have a certain bias associated with them and are hence not accurate with their functioning.
Here are different kinds of biases that have existed in ML algorithms.
Kinds Of Biases
From research conducted by ProPublica, a non-profit research institution, it was found that COMPAS, a machine learning algorithm used to determine criminal defendants’ likelihood to recommit crimes, was biased in how it made predictions. The algorithm is used by judges in over a dozen states to make decisions on pre-trial conditions, and sometimes, in actual sentencing. ProPublica found that the algorithm was two times more likely to incorrectly predict black defendants were high risk for recommitting a crime, and conversely two times more likely to incorrectly predict white defendants were low risk for recommitting a crime.
A Google photo posted on Twitter has used facial recognition tagging the user and his friend as gorillas. The machine learning algorithm in the Google Photo software learned incorrectly.
In contrast to racial bias, there has been literature highlighted on its impact on the lives of humans in regards to algorithms being programmed into AI systems. Literature written about gender bias is still in the early stages, most of the content written about the topic are news articles that haven’t been backed with academic studies.
A study by AdFisher had revealed that men were six times more likely than women to see Google ads for high paying jobs. The immediate consequence of this machine bias is that a woman may not see a high paying job and therefore is less likely to know about it and apply. The long-term result could mean more ingrained gender discrepancies in high ranking positions. This is an instance of the algorithm being gender biased and might be a result of the data containing unequal distributions of occupation types with respect to gender.
The project highlighted that the biases in the word embedding are in fact closely aligned with the social conception of gender stereotypes. Stereotypes have been described as both unconscious and conscious biases that are held among a group of people. A number of research studies have explored stereotypes playing a contributory position towards the data being used to train A.I.
Human Biases That Can Result Into ML Biases
1.Reporting Bias/Sample Bias:
Reporting bias occurs when the frequency of events, properties and the results in a data set do not reflect their real-world data accurately. This bias can arise because people tend to focus on documenting circumstances that are unusual or especially memorable. This should not be done and equal distribution of large datasets should be used. There is virtually no situation where an algorithm can be trained on the entire universe of data it could interact with. But there should be a way to choose a subset of this universe that is large enough and also a good representation.
If the training data that is influenced by stereotypes like culture. The training data must have an equal number of all kinds of data. The training data decisions should not reflect on social stereotypes. This bias can be avoided by not taking into account the facts revolving the occupations with regard to gender.
This kind of bias tends to skew the data in a particular direction. This will result in systematic value distortion and is attributed to the fault in the device used to measure.
Automation bias is a tendency to favour results generated by automated systems over those generated by non-automated systems. It does not take into account the error rates of both the automated and non-automated systems.
5.Group attribution Bias:
This bias assumes a particular attribute to the entire group. For example, if the majority of women are in the designing industry and a majority of men in the hardware, it tends to assume the respective professions for both.
In machine learning, bias is a mathematical property of an algorithm. The counterpart to bias in this context is variance. ML algorithms with a high value of variance can easily fit into training data and welcome complexity but are sensitive to noise. High bias models are more rigid and not largely by variations in the data.
What Can Be Done To Prevent Biases
Biases in ML algorithms come only as a result of the training data being biased. Humans themselves naturally tend to be at least slightly bias in general and that a lot of times reflects in these ML systems, and they may harm human life in an unfair capacity. Some consequences of these biased algorithms may come off as hypothetical and indirect, but on the other hand, some are direct and immediate.
In order to have a good, unbiased machine learning for any system, it should be trained with as much train data as possible. Additionally, this training data is also not supposed to be lopsided to one kind, it should have plenty of data that has samples of all kinds. If the training data has inherent biases, the model will naturally learn and adapt to them. Not just that, it will also amplify these biases and so the final result would be even more biased.
Another problem with bias algorithms is that every parameter that you use which has no bias might inherently contain some hidden biases in itself. But removing these variables is not an option. Instead, allow the model to measure it correctly and then subtract the effect of that bias on the outcome, for an unbiased approach.