Open data is a massive force for betterment of the society. According to a Mckinsey report, “more than $3 trillion in economic value globally that could be generated each year through enhanced use of open data”. The open data nowadays is machine readable and therefore can be easily put to use. There are a number of examples where open data has made important change in society. For example, in Burkina Faso, a country in Western Africa held their elections after 3 decades. Many open data storages and apps gave an advantage to the people. The open data gave the public the power to choose the right candidate and get prior knowledge of election results. The country which had no experience of democracy leveraged the power of open data to avoid rumours and tensions and make the right decision. The data store and technological innovations helped Burkina Faso conduct elections fairly.
Problems With The Open Data Movement
As many closed data sources become open it becomes easier build community services and products from them. There is a possibility that even though open data might seem a good idea in theory, in application it will lead to increase in digital divide and social inequality. There have to be a set of rules under which open data projects must be undertaken as to maximise the utility and minimise the risk.
Besides, there is also the danger of declaring open data sets that are in technical formats and make no sense to the general public. Hence only the tech experts are able to leverage the power of such data. Also people who can hire and employ such experts are well positioned to take advantage of the data. All these situations will surely make the digital and social divide even bigger. Such things can very easily turn into economic advantages.
Another great problem looming over concept of open data is privacy. Privacy has already become a critical issue in the context of open data. When large amounts of data is made public there is something known as Mosaic effect. According to Elizabeth Gorgue, privacy officer at the e-Government Office at the County of Santa Clara, California, “Data elements that in isolation look relatively innocuous can amount to a privacy breach when combined”.
Therefore the data that contains personally identifiable information (PII) should not be put out publicly. There are some preprocessing steps that have to take place before releasing it. How to achieve openness with privacy remains the most critical question of the day. The Centre for Open Data Enterprise has come out with a report that shines a light on a slew of issues by data and privacy experts:
- What are the potential benefits of using unaggregated data (or microdata) for the public good?
- What are the risks of using these datasets if they contain or could lead to the discovery of personally identifiable information, and how can those risks be minimized?
- What are the best technical, ethical, and policy approaches to ensure strong privacy protections while maximizing the benefits of open data?
Striking A Balance
There has to be a balance between openness, effectiveness of open data and privacy protection. For a long time privacy has been ignored but now policy makers and technicians have made privacy the center of open data initiatives. Along with the problem with the privacy there is also an additional issue of bias in datasets which is being addressed by researchers. The data however public wont make it free of bias.
The White House report on big data and privacy, put some limitations on the publicity of educational data. “As students begin to share information with educational institutions, they expect that they are doing so in order to develop knowledge and skills, not to have their data used to build extensive profiles about their strengths and weaknesses that could be used to their disadvantage in later years.”
We might also benefit from observing what approaches have been successful in the past. The report suggests some ways to address the problem of the Mosaic effect and the misuse of public data. As the US President’s Council of Advisors on Science and Technology puts it as “anonymization remains somewhat useful as an added safeguard, but it is not robust against near‐term future re‐ identification methods.” The de-identification approach suggests anonymizing datasets will always make it hard bad actors to misuse data.
The report suggests another approach of “semi open data” where data is open for only some use cases and not open for others. One example of this approach is to make personal data available only to the individual user and not to the whole public.
Try deep learning using MATLAB