With analytics gaining widespread adoption, businesses are incorporating analytical tools and techniques to seek insights which help them thrive in today’s competitive market. This is the first in a series of interviews for our theme of the month, “Tools And Techniques Used By Analytics Practitioners” which will give you an overview of the methods used in the current analytics industry.
AIM spoke to Rahul Kulhari, head of data science at EdGE Networks. Kulhari started his career as a data scientist and is a postgraduate in advanced computing from C-DAC Hyderabad. In this conversation with Analytics India Magazine, he shares his views and thoughts on trending analytics tools and techniques they use in their organisation.
Analytics India Magazine: What are the most commonly used tools in analytics and data science?
Rahul Kulhari: Some of the tools that we use most commonly in analytics and data science are Pandas, Scikit Learn and NumPy. For visualization, we use tools like Seaborn, Matplotlib, Plotly, Bokeh and ggplot. The visualization tools used mostly are Python programming language. We also use TensorFlow, PyTorch, Keras and Chainer. These tools are mostly used by AI and Deep Learning scientists. Among the ETL tools we frequently use LuiGi and Airflow. Python and R are the most commonly used programming languages.
AIM: According to you, which is the most productive tool in analytics?
RK: Pandas, according to me is definitely the most productive tool among the ones that we use. If we classify Data Science into two groups on the basis of application and use, we have statistical & analytical part on one hand and AI & Deep Learning on the other side.
For analytical & statistical part of data science, the most productive tools would be Pandas, Numpy, Scikit Learn and Spark. While for AI and deep learning purposes, tools like TensorFlow, Python and Keras are some of the most productive tools.
AIM: Do researchers prefer tools that are open source or paid?
RK: The AI and data science community is very strong, and they provide all the tools in the open source. All the functions, programmes and features that we need are easily available and accessible from the open source. In fact, some of these features which are found in the open source tools are not found in the paid tools.
Open source tools also provide multiple frame works which we can be easily accessed and utilized basis the problem statement we are working on. Paid tools also have certain limitations which aren’t there in open source. Thus, Data Science researchers would anytime prefer open source tools over paid tools.
AIM: How do they select tools for a given task?
RK: Tools are chosen depending on the problem statement. But one of the most common tools which we can use all the time irrespective of the case is Pandas, Numpy, Scikit-learn, Tensorflow. The tools we choose depend on what kind of problem we are solving, whether it is NLP or analytics. There are different sets of tools to solve these problem statements.
AIM: What are the most user-friendly languages and tools that they have come across?
RK: Some of most user-friendly languages and tools we prefer using include Python, R, Pandas, TensorFlow and Jupyter Notebook.
AIM: Which is the most preferred language used by the team?
RK: We prefer using Python for most of our work. In fact, it is the most used language by data scientists across the globe, closely followed by R. One of the reasons why data scientists prefer Python over other languages is that you need to write less programming codes as compared to other languages due to its features which are interactive, interpretive, modular, dynamic, portable, productive and extendable in C++ and C.
AIM: Do you believe there is an ideal toolkit for data scientists?
RK: Depending on the two types of data scientists, that is, analytical and statistical data scientists, the ideal tool kit would comprise of Pandas, Airflow, Scikit Learn, NumPy and TensorFlow. While for visualization data scientists, the ideal kit must have Seaborn and Matplotlib. However, the categories are slowly converging and use all these tools irrespective of problem statements they address.