Not a day goes by without news articles talking about artificial intelligence capabilities and its effects on our lives. The implications for the economy and the workforce are very profound. Companies use AI today for numerous tasks, such as marketers to buy their products and the financial industry to process credit applications, along with newer applications such as medical diagnosis.
Digital Footprint Is A Big Risk Factor For Data Privacy
Big data is a part of the fuel mix propelling these advances. There are vast datasets constantly and rapidly being added to, which are frequently made up of personal data. Apart from basic user information and demographics, companies want to know about their shopping and personal habits.
They start by collecting data which you explicitly entrust them with, then collecting some from online transactions and from the breadcrumbs you leave behind through your daily use of communication networks and as you pass through sensors. Lastly, they buy data completing the picture, from your ethnicity, education level and job history, to those topics you talk about online. As social media and smartphones expand our digital footprint with amazing speed, this task is getting easier by the day.
Such use of personal data has obvious privacy implications, especially personally identifiable information (PII) can be used to identify individuals and may be sensitive. A commonplace, prevalent fear is that individuals must be careful about what personal information makes its way online because it will be there forever, or because it can’t be rectified. Since 1995, policymakers have begun to act, with the European Union spearheading such policy efforts proposing citizen digital rights.
This week, the new EU General Data Protection Regulation (GDPR) is due to enter into force, strengthening the role of consent for personal data processing, adding digital rights for citizens, and focusing on how organisations should design their data privacy and protection processes.
Data Privacy In AI Applications
I’d like to concentrate on the tension between the need for privacy and the need for usability of data for AI applications, which exists because of the so-called privacy harms predicted by AI. In fact, the individual pieces in the personal data jigsaw may be satisfactorily covered by the citizen’s consent; but as they are aggregated and fed to AI algorithms, these can produce new, sensitive PII. The famous NYT article about Target predicting pregnancy shows that purchasing data can be mined for patterns to reveal particularly sensitive information about people. Such inferences may become real harms, whether they are correct or not. A correct inference that someone has a health issue might have an impact on their employment or health insurance; while an incorrect inference about a woman’s pregnancy might lead to discrimination as she may not be granted a job interview.
For these reasons, GDPR has introduced a principle of transparency. Before processing the data, privacy notices be given to citizens to ensure that they have full knowledge. In some cases, privacy impact assessments are requested to identify and mitigate privacy risks. Also, a principle of fairness, through a right to explanation after processing: the right to obtain human intervention for citizens to express their point of view, to obtain an explanation of decisions based on automated processing, and to challenge such decisions.
These new rights have spurred a heated debate. A recent report by Center of Data Innovation says that in the context of protecting consumer interests, GDPR’s provisions addressing AI may slow down research and innovation. The report enumerates aspects of the GDPR that could have a negative effect on the development of AI in Europe, of which the topmost two refer to the right to explanation:
- Requiring companies to manually review significant algorithmic decisions raises the overall cost of AI.
- The right to explanation could reduce AI accuracy.
The report argues that the more variables an AI algorithm represents in its model, and the more complex the links between them, the harder it is for humans to assess how the algorithm arrived at any decision. This suggests that there is a trade-off between accuracy and interpretability, thereby concluding that a human explanation of algorithmic decisions necessarily implies a limitation on the accuracy of the algorithm. Reported experiments illustrate how these algorithms (such as spam filtering) learn without regards for human comprehension and how their decisions may be opaque. Many cases of unfair decisions taken by opaque ML algorithms have indeed arisen.
On A Concluding Note
Mitigating predicted privacy harms is a complex, multi-faceted problem that will be with us for many years to come. GDPR retains all the attention as it will regulate the world’s biggest economic area. But other approaches exist that might have an impact, from documenting model-building decisions to creating due process rights. On the technology realm, explainable algorithms is an area ripe for AI innovation: some ongoing efforts on algorithmic transparency are summarised in the ICO Report.
In my view, as these efforts mature, GDPR could indeed make the requirement for fairness technology-neutral rather than necessarily involving humans. But GDPR is right in insisting on fairness, not only on moral grounds, but also because it is good for business: there is a business case for developing an approach that builds trust based on transparency and fairness. An HBR study found that people will accept potentially intrusive uses of their data, like predictions about their behaviours, in return for services like Google Now, and it’s not only about value: trust is also important. On the other hand, a Pew Research study shows that people’s trust in the way corporations handle their data is eroding. As the Target pregnancy case shows, when customers lose trust, the result could be a public-relations disaster. Corporations should seize the opportunity to regain people’s trust by handling their data with fairness.