MITB Banner

Microsoft’s Move To Launch ‘Research Open Data’ Is A Revolutionary Way To Compete With Google And AWS

Share

As companies rush to embrace the open source ecosystem, mainstream enterprises like Google and Microsoft are jumping on to this wave. Dubbed as an excellent open data effort by one of the leading cloud providers, Microsoft is striving hard to gain developer and community trust by embracing the open data movement. Earlier last week, Microsoft’s director of Data Science Outreach, Vani Mandava wrote about launching Microsoft Research Open Data – a cloud data repository. They plan for it to be an excellent collection of free datasets to push state-of-the-art research in areas such as natural language processing, computer vision, and domain-specific sciences. The datasets are available in several categories like:

  1. Biology
  2. Computer science
  3. Engineering
  4. Information science
  5. Mathematics
  6. Physics
  7. Social sciences

(To find out more about the datasets, click here.)

Why Did Microsoft Decide To Release Their High-quality Data In Public Domain?

Now, publicly available datasets can be used to solve some of the most pressing big data problems. Through this open-source model for sharing datasets, Microsoft joined the league of big tech firms such as Public Datasets on AWS, Google Public Datasets, Google Custom Datasets and Twitter Datasets that are freely available. According to Mandava, Microsoft Research Open Data is designed to simplify access to these datasets, facilitate collaboration between researchers using cloud-based resources and enable reproducibility of research.

However, for public datasets to be useful for research, they have to be continuously updated and Mandava indicates the company will continue adding to its repository and include features based on feedback from the community. Sam Madden, Professor at Massachusetts Institute of Technology was quoted in the post, “This is a game-changer for the big data community. Initiatives like Microsoft Research Open Data reduce barriers to data sharing and encourage reproducibility by leveraging the power of cloud computing”.

Some of the key features of Microsoft’s repository are that the data meets the highest standards for sharing publicly, is easily accessible, interoperable, reusable and it does not contain any personally identifiable information.

Economic Value In Open Sourcing Datasets

  • Open sourcing datasets will ease the burden of those looking for specific types of data set.
  • One underlying idea is that the increased transparency will help to create trust in users and developers, as well as offer a way to create new services based on the collected data.
  • Open sourcing datasets can also be an effective tool in enabling greater transparency and weeding out gaps in datasets.
  • Open sourcing data is the best way of fueling economic growth and innovation, and also useful for building data-driven products.

What Does Microsoft Stand To Gain From This?

Deloitte UK report emphasises that open sourcing data is a revolutionary way to compete and has a massive potential to generate a great ROI. By open sourcing their datasets, Microsoft will be benefiting the academic research community and enable the developer community. Open datasets will mobilise and strengthen the academic exchanges and cooperation. But doesn’t open sourcing endanger the company’s competitive advantage? On the other hand, open sourcing datasets is a great way to unshackle the data monopoly led by tech conglomerates and establish more transparency. It also reinforces the Microsoft’s tech-for-good mantra which they have been working on ever since Satya Nadella took the reins. By positioning themselves as enablers of an open source ecosystem, Microsoft is also driving cloud adoption — open source datasets and related software is an easy route to push the developer base towards their Azure-based data science virtual machine. Interestingly, the Data Science virtual machine comes preloaded with a variety of development tools popular with researchers and practitioners, notes the blog. It is also an excellent way to foster AI talent and collaborate with the wider community.

Outlook

Open data drives growth and innovation in this age where businesses and startups are at a tipping point and governments are making a serious attempt at building critical mass for AI-led transformation. Open data repositories can foster transparency, cement the position of tech giants as contributing to the open source ecosystem and helps startups and businesses use the data to build ground-breaking applications. According to a Deloitte report, big tech giants can work with governments to establish new paradigms in data governance. Another upside for leading IT bellwethers is that they can reap a lot of economic value from open sourcing proprietary dataset – for example by making it publicly available, data can be combined from other sources and at the same time drive cloud adoption. Besides fostering the academic research community, it will also help leading businesses like Microsoft collaborate effectively with their partners.

Share
Picture of Richa Bhatia

Richa Bhatia

Richa Bhatia is a seasoned journalist with six-years experience in reportage and news coverage and has had stints at Times of India and The Indian Express. She is an avid reader, mum to a feisty two-year-old and loves writing about the next-gen technology that is shaping our world.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.