The ACM Turing Award is the most prestigious technical award in the computing industry. Michael Stonebraker is the recipient of the 2014 ACM Turing Award for fundamental contributions to the concepts and practices underlying modern database systems and has been an adjunct professor of computer science at MIT since 2001.
Through a series of academic prototypes and commercial startups, Stonebraker’s research and products are central to many relational database systems. He is also the founder of many database companies, including Ingres Corporation, Illustra, StreamBase Systems, Vertica and VoltDB, and served as chief technical officer of Informix. He is also an editor for the book Readings in Database Systems.
Here’s our exclusive interview with Dr Stonebraker.
[dropcap size=”2″]AIM[/dropcap]Analytics India Magazine: There’s a constant fear around AI and many experts suggest that it may lead to automation, risking human jobs. What is your take on it?
[dropcap size=”2″]MS[/dropcap]Michael Stonebraker: I think that’s absolutely true and well understood. For example – President Trump has been talking a lot about US jobs being taken away by companies offshoring them to other countries. That’s really not what’s happening at all; it’s automation that is taking manufacturing jobs. And robotics is going to continue to do that. In fact, it’s going to accelerate. I think a very good example of that is self-driving cars. In my opinion, they’re at most a decade away and the early application is going to be long haul truck drivers. I think there are more than a million of those jobs and they’re going to go away.
The other area where it’s going to make a huge difference is predictive analytics and machine learning. We’re already seeing predictive modeling financial planners and they will replace low-end human jobs.
AIM: How do you think the future of AI is going to be? Will it witness a prolific rise or AI winter is near?
MS: I think machine learning and robotics are going to prosper for at least the next decade. There is every reason to believe that there will be no AI winter anytime soon. And I think the previous AI winter was basically the AI guys making claims that didn’t turn out to be realistic and didn’t translate into products. I think this time around, there are very obvious markets. As an example – the machine learning course at Massachusetts Institute of Technology is wildly popular, as it is at Stanford University.
AIM: Since having won the Turing Award, what has been your research focus?
MS: I’ve been working on three different things.
For the first one – Data science is all the rage these days and everyone predicts that data science is going to replace business intelligence in dealing with large data sets. However, most data scientists spend at least 80 percent of their time in data prep – finding data sets of interest, cleaning them, getting access to them, sorting them into common units and de-duplicating them. Most don’t spend more than one day a week doing the job which they were actually hired for. Instead, they spend four days a week on data prep or “data munging.” So, I along with a bunch of others am involved in a research project called Data Civilizer. The idea is to knock down that 80 percent of work that goes into data munging. And in my opinion, that’s the important problem in data science – it’s not the algorithms that actually do the data science, because researchers spend only one day a week doing that kind of stuff.
Number two: Today, computer networking is getting faster more rapidly than nodes in a computer network are getting beefier. In the database world, essentially all database systems have been architected based on networking being the high pole in the tent – that’s the thing you want to worry about the most. But it looks like that is no longer true, and I’ve been thinking about re-architecting, especially data warehouse systems based on networking not being the high pole in the tent.
Number three: For the last 35 years or so, the database community has had a standard way of suggesting how people should do physical database design, which included how to decide what set of tables you’re going to put in a database.
However, it turns out that in the real-world, database administrators do not use this traditional wisdom at all. So, I’ve been working on figuring out why they don’t use the traditional wisdom and what to do instead.
So those are the three things I’m pretty much working on.
AIM: What trends do you foresee in analytics, data and related technology in the coming future?
MS: Analysts have an insatiable desire to correlate more and more data. For example – several years ago, I had to make a sales call on Miller Brewing Company, and they have a traditional data warehouse for sales of beer by brand, zip code, etc.., which they used to forecast sales. The year I visited them was a year that El Nino, which is a mid-equatorial upwelling of warm water, was predicted to be especially strong. It’s well understood that in El Nino years, it’s warmer than normal in the Northeast and it’s wetter than normal on the Pacific coast of the United States. So, I asked the Miller Beer guys that with the El Nino winter, if there is any correlation between beer sales and temperature or precipitation?” And they said that they would really like to know the answer to that question because, of course, it would impact beer sales in the coming winter. But weather data was not in the warehouse so they couldn’t ask that question.
So, business analysts just have an insatiable desire to correlate more and more features that could lead to better predictions. And I think that trend will continue and accelerate.
The second trend I expect is that predictive modeling is going to be applied to more and more application areas. Again, for example, I listened to a talk by a startup that was trying to predict what you ought to charge for hotel rooms in major cities. They assembled all kinds of data including how many people landed at the airport, etc… They predicted occupancy based on a whole bunch of these features and then it was a simple matter to build a pricing model that could change prices dynamically. It would never have occurred to me to say, “let’s apply predictive modeling to hotel occupancy or hotel pricing.” I think analytics and predictive modeling is just going to be applied more and more broadly.
The last thing – I think business intelligence will give way to data science in the data warehouse space. Business intelligence folks are very good at running front ends that issue SQL queries to data warehouses. For example, there were four hurricanes in Florida in the 2007 hurricane season and someone had to stock the Walmart stores during the hurricane season. So, a business intelligence person would find out what sold in the week before the hurricane, what sold in the week after the hurricane and compare that with same store sales in Georgia, produce a big chart of numbers and then plot all kinds of pictures. That’s pejoratively what a business intelligence person does. They’re basically SQL jockeys who produce charts and pictures of trends. On the other hand, if you’re a data scientist, you don’t look at hurricanes that way at all. You attempt to build a predictive model to predict what will sell based on a bunch of factors including the strength of the hurricane, etc.
As a business owner, would you rather have a big table of numbers or a predictive model? Everybody will say, “I’ll take the predictive model, thank you very much.”[quote]Predictive modeling data science is going to take over as soon as we can train enough data scientists to fill all of the positions in enterprises. That will be the mega trend. Data science is going to increase in scope and is going to replace business intelligence over time as the way to interact with data warehouses.[/quote]
AIM: Being a pioneer in database technology, what suggestions do you have for students looking to pursue this field and the startups keen on exploring it as a business opportunity?
MS: If you’re in school, get a computer science degree from the best computer science institution that you can get into. Then make sure you understand both database management and data science and become very adept at writing computer programs. Get adept at coding and learn about data management and data science.
As far as what startups to join, in the US, the market is awash in venture capital money for startups. Start with a good idea – one that at least a couple of enterprises are willing to buy. So basically, prove that there is a market for your idea. And secondly, prototype it to an extent that will demonstrate that it works. Prove there’s a market and prove you can build it, and based on that, you can probably get funding for your idea.
In terms of joining other people’s startups, I have basically the same advice. Make sure what’s being proposed is feasible and make sure there’s a market for it.