Data science has established itself as an important asset in the technology sector. Businesses are observing a significant rejig in their functions as most organisations are now dealing with surplus data. Data Science has also got fields like data analytics, artificial intelligence and machine learning which are growing parallely with it. Though data science has been largely accepted, these advancements have sparked a question in the tech community: “How much of ‘science’ is in data science?”
The criticism has also led to some people calling it a pseudoscience. In this article, we will discuss why there’s a big difference between labelling it as ‘science’ or ‘pseudoscience’.
Orienting Data Science Towards Science
Data science users should always open up about the methods used, instead of black-boxing them. For instance, data science algorithms are the ones that have an analytical aspect. They bring out the essence in the field. If algorithms are improved, they would definitely lead to a more methodical approach.
Secondly, since this field looks specifically into data, the approach is to analyse a model or project with lots of useful data around it. This means digging deep around better and richer datasets for analysis. In addition, the model in consideration should also incorporate a statistical and mathematical viewpoint. This gives a more scientific allure.
Thirdly, the data sources should be authentic. Collating from any type of sources across various tech domains will lead to more confusion and less understanding. Michael Howard, CEO of MariaDB provides an interesting take on this.
Howard says, “Many software companies try to get around having rich data sets by claiming they use more signals than anybody else. A signal is a single data point — like a government database on education. Some companies engage in signal spin, counting every column of data sources they could potentially use (but usually don’t) to increase their signal count. To distinguish the heavyweights from lightweights, you have to have a data scientist dig deeper and start reviewing confusion matrices and F1 scores.” This signifies that companies should focus on its sources rather than following intuition.
All of these factors contribute largely in establishing a better scientific outlook in data science.
When Can Data Science Become A Pseudoscience?
Beliefs or practices that float around in the pretext of science is known as pseudoscience. It has been a subject of discussion right from the 19th century. Noted philosopher Karl Popper brought the concept of falsifiability to identify abstraction surrounding scientific theories, which led to the unrevealing of pseudoscience. Many scientific theories such as the String Theory and Sigmund Freud’s psychoanalytic theories among others, are hard to base on scientific facts. Contrastingly, it is not right to reject them as non-scientific. This conundrum is always present. The only way science can be called as ‘science’ is with solid, provable facts.
Famous physicist Richard Feynman once said that the key to build a scientific approach was to present it in a simple language and in layman’s terms without the obvious use of technical terms, and see if it makes perfect sense. This is relevant when it comes to the context of data science. More technicalities in the field mean obfuscation. Aligning data science with simplification would make it easy to understand and clear the notion of it being a pseudoscience.
The reason for critics arguing data science to be less of a science lies with the interpretation of the subject under a business framework. Mark Beyer, renowned analyst at Gartner, offers an insightful thought on this. He says that, “What business strategists need is ‘real’ data science. Real data science is the practice of building out competing interpretations of data, many multi-layered analytic theorems that intentionally challenge the inferences used by the others.” This approach is exactly what data science professionals need to focus on. They tend to rely more on the current market trends to scrutinise data. Although, this is not wrong but this diminishes the scientific context.
Understanding Data Science
Data science at times can be mystifying. It is because understanding algorithms may sometimes be beyond people’s immediate comprehension. At times, this uncertainty can lead to nowhere, which may dampen data science enthusiasts as well as their methods. On top of this, hardware and computing resources have progressed rapidly which makes data science tasks perform even better and discover newer techniques — useful or not.
Another challenge is the vast amount of open-source information available in the subject. There is no dearth of online resources on data science, which may be overwhelming. This information overload can sometimes mislead the process of data science itself — on a scientific level. The scientific integrity may be lost in such cases.
Data science analyses hoards of data and brings out inferences in hundreds or thousand of ways. These inferences may not always be scientifically true or genuine. This is where it needs to bridge the gap. It needs to identify what inferences hold good and which ones don’t by justifying assumptions with facts. In fact, science is all about testing assumptions and constructing solid fact-based theories. The prospect of pseudoscience will gradually weaken if data scientists and researchers perpetuate richer knowledge on a consistent basis.
Try deep learning using MATLAB