As companies rush to embrace the open source ecosystem, mainstream enterprises like Google and Microsoft are jumping on to this wave. Dubbed as an excellent open data effort by one of the leading cloud providers, Microsoft is striving hard to gain developer and community trust by embracing the open data movement. Earlier last week, Microsoft’s director of Data Science Outreach, Vani Mandava wrote about launching Microsoft Research Open Data – a cloud data repository. They plan for it to be an excellent collection of free datasets to push state-of-the-art research in areas such as natural language processing, computer vision, and domain-specific sciences. The datasets are available in several categories like:
- Computer science
- Information science
- Social sciences
(To find out more about the datasets, click here.)
Why Did Microsoft Decide To Release Their High-quality Data In Public Domain?
Now, publicly available datasets can be used to solve some of the most pressing big data problems. Through this open-source model for sharing datasets, Microsoft joined the league of big tech firms such as Public Datasets on AWS, Google Public Datasets, Google Custom Datasets and Twitter Datasets that are freely available. According to Mandava, Microsoft Research Open Data is designed to simplify access to these datasets, facilitate collaboration between researchers using cloud-based resources and enable reproducibility of research.
However, for public datasets to be useful for research, they have to be continuously updated and Mandava indicates the company will continue adding to its repository and include features based on feedback from the community. Sam Madden, Professor at Massachusetts Institute of Technology was quoted in the post, “This is a game-changer for the big data community. Initiatives like Microsoft Research Open Data reduce barriers to data sharing and encourage reproducibility by leveraging the power of cloud computing”.
Some of the key features of Microsoft’s repository are that the data meets the highest standards for sharing publicly, is easily accessible, interoperable, reusable and it does not contain any personally identifiable information.
Economic Value In Open Sourcing Datasets
- Open sourcing datasets will ease the burden of those looking for specific types of data set.
- One underlying idea is that the increased transparency will help to create trust in users and developers, as well as offer a way to create new services based on the collected data.
- Open sourcing datasets can also be an effective tool in enabling greater transparency and weeding out gaps in datasets.
- Open sourcing data is the best way of fueling economic growth and innovation, and also useful for building data-driven products.
What Does Microsoft Stand To Gain From This?
Deloitte UK report emphasises that open sourcing data is a revolutionary way to compete and has a massive potential to generate a great ROI. By open sourcing their datasets, Microsoft will be benefiting the academic research community and enable the developer community. Open datasets will mobilise and strengthen the academic exchanges and cooperation. But doesn’t open sourcing endanger the company’s competitive advantage? On the other hand, open sourcing datasets is a great way to unshackle the data monopoly led by tech conglomerates and establish more transparency. It also reinforces the Microsoft’s tech-for-good mantra which they have been working on ever since Satya Nadella took the reins. By positioning themselves as enablers of an open source ecosystem, Microsoft is also driving cloud adoption — open source datasets and related software is an easy route to push the developer base towards their Azure-based data science virtual machine. Interestingly, the Data Science virtual machine comes preloaded with a variety of development tools popular with researchers and practitioners, notes the blog. It is also an excellent way to foster AI talent and collaborate with the wider community.
Open data drives growth and innovation in this age where businesses and startups are at a tipping point and governments are making a serious attempt at building critical mass for AI-led transformation. Open data repositories can foster transparency, cement the position of tech giants as contributing to the open source ecosystem and helps startups and businesses use the data to build ground-breaking applications. According to a Deloitte report, big tech giants can work with governments to establish new paradigms in data governance. Another upside for leading IT bellwethers is that they can reap a lot of economic value from open sourcing proprietary dataset – for example by making it publicly available, data can be combined from other sources and at the same time drive cloud adoption. Besides fostering the academic research community, it will also help leading businesses like Microsoft collaborate effectively with their partners.
Try deep learning using MATLAB