Data Analytics is much more efficient than traditional Business Intelligence solutions because of the quantum jump in computing power it has seen in recent times, as well as the pervasiveness of Big Data. Applying machine learning algorithm to learn abstract patterns from data and interpret its results may take up less than 20 percent of the total effort. But then why do companies fail to integrate data properly and often doom a promising analytics project?
Dr Sheng-Chuan Wu, Vice President at Franz.Inc, spoke on these lines at Cypher 2017, India’s most exciting Analytics summit.
Resource Description Framework (RDF), said Dr Wu, can be used to represent any form of information in the world.
“In my wide experience and with all the consulting projects that I have worked upon, I still haven’t found a single case or example that I could not model with the simple RDF representation,” said Dr Wu. RDF is a totally schemaless system — you can add more information, you can do whatever you want — it is very flexible, he added.
But according to Dr Wu, there are many naïve views of data analytics. One of the most common among them is the gross underestimation of the effort required to prepare data for analysis, for example, the ETL and integration of data from heterogeneous sources.
With the great fanfare of AlphaGo beating the world’s number 1 Go Chess player, Deep Learning (ANN coupled with massive GPU power) has become the face of AI machine learning. However, Deep Learning may work wonderfully with balanced data set such as Go chess games and images, it is not as effective on many other machine learning tasks. There are perhaps more than 10 major machine learning algorithms (with many derivatives), each of which may be good at certain problems but ineffective on others.
“The machines these days have all the sensory information. When you get the sensory information, you must do something about it,” said Dr Wu. Typically, we either try to prevent problems, in some cases improve product quality or reduce energy consumption. All of them use the same machine learning algorithm to predict or to improve the information.
If you consider the failure of IBM Watson at the MD Anderson Cancer Centre of Texas, there are a few takeaways from the incident. Machine Learning technologies are currently the subject of a mega-hype by vendors such as IBM and consulting firms such as PwC. Therefore, boards need to ask searching and tough questions right from the start of an AI project.
Dr Wu summarised the talk by saying that for companies and projects to be successful in the real world, they would require an effective data integration approach. “If you don’t have good data integration, that’s what kills you… Not your algorithm, not your fancy natural language processors…none of that. It is the difficult, dirty work of data integration,” he said.