Data Science is an emerging field which is now being integrated with industries across all sectors. This year Analytics India Magazine, in association with Great Learning, decided to find out what goes on behind the making of a good Data Scientist. We spent a lot of time finding out the tools and techniques used by these new technology professionals.
From language to coding and GPUs, we garnered interesting and insightful answers from our comprehensive survey.
About The Study
The samples were collected by asking respondents to fill in a survey created by AIM about what tools and techniques data scientists use at work. This included various sub-topics such as data visualisation tools, preferred operating systems and programming languages, among others. We took opinions from all those who practice data science — from professionals with less than two years of experience to CXOs — to get a thorough idea of the working environment in this growing field.
Our survey was met with much enthusiasm — and we got some great insights from it. Some of them were expected, and many of them were real eye-openers.
Which Language Do Data Scientists Prefer For Statistical Modelling?
- The favourite language for data scientists in today’s era is Python, as almost 44% of the professionals use it the most
- A close second is R at 35% — another clear favourite with the data scientists, due to its versatility
- SQL (6%) and SAS (7%) claim only a minor share of the attention of the data scientists
Which Data Science Methods Are The Most Popular At Work?
In this section, we asked the data scientists to pick out the most frequently-used statistical method.
- 72% scientists answered that they used Logistic Regression most at work
- This was followed by Decision Trees at 56% and Neural Network at 48%
Which Is The Most Popular Python General Purpose Library?
Python is one of largest programming community in the world. There are plenty of libraries which a data scientist can use to analyse large amounts of data. But here are our readers’ favourites:
- Pandas emerged as a clear choice for most data scientists at almost 41%
- Numpy was the second-favourite at 24%
- Sklearn and MatPlotLib followed at 17% and 14% respectively
Which Tools Do Data Scientists Prefer?
With a plethora of data analytics tools available online, we asked data scientists if they were willing to use open sourced tools at work. The answer was a resounding yes.
- Almost 89% of the data scientists said that they preferred to work with open sourced tools
- Only 8% data scientists said that they liked to work with custom-made tools which are tweaked and personalised for their particular projects
Which Dashboard/Visualisation Tools Do Data Scientists Prefer?
Data visualisation may be a tricky path for many data scientists. Crunching numbers is one thing, but telling a story with numbers is a whole different deal. When we asked about this to our readers they had one clear winner:
- More than half the respondents, 51%, said that they preferred to use Tableau as a dashboard or visualisation tool.
Which Cloud Provider Do Data Scientists Prefer?
Information flow is a part of data science. While data usage and storage are important, security and privacy of the data are also key to the job.
- Amazon Web Services is a clear winner here with over 45% of the votes
- Google Cloud is the second favourite with over almost 34% votes
What Kind Of Learning Resources Do Data Scientists Use To Keep Themselves Updated?
With the ever-changing technology, it is vitally important for data scientists to keep themselves updated. And they seem to have found out an interesting way to do so!
- 76% of our readers said that they liked watching tutorials and videos on YouTube.
- Almost 54% of the data scientists said that they like learning the old-school way — through books and e-books.
- 46% of respondents also look at MOOCs as a way to upskill themselves.
Where Do Data Scientists Find Open Data?
Finding open data is not that hard, but getting clean open data is often a trying experience. No data scientist wants to waste their time cleaning it. There were four clear popular options here:
- 27% respondents use GitHub
- 22% readers used university websites and the data uploaded by them for research
- 20% data scientists also use data publicly uploaded on official government websites
- 15% of the respondents source their data manually
Which OS Do Most Data Scientists Use At Work?
Compatibility with their tools and ease of use are two key factors. For this question, the respondents had a liking for one OS:
- Almost 69% of data scientists use Windows OS
- 24% prefer Linux
- And only 7% prefer MacOS
Preferred Development Environment
An integrated development environment (IDE) is very important to set up and streamline data science processes. Among the many options presented, the data scientists who took part in our survey chose:
- Almost 38% prefer using RStudio
- And close to 37% data scientists like using Notebook
How Is Code Shared At Your Workplace?
Like we said earlier, privacy, operational efficiency and security are of paramount importance in any organisation that deals with data. Here’s what we found out:
- Over 45% of the respondents use Git to share codes at workplaces
- 28% of the data scientists said that their organisations use cloud-based programmes to share codes
- And 24% of our readers shared codes over non-cloud based programmes
What Is The Neural Network Architecture Data Scientists Use Most Frequently?
Neural networks are a crucial part of programming as well as data science. We got a clear picture that the data scientists, as well as their organisations, use a variety of architectures. According to our study, convolutional neural network was the most frequently used NN at 33%.
Which Big Data Tool Have You Used The Most?
From open source tools to paid or customised ones, many professionals prefer different tools based on the projects or the organisation they are working for. Data scientists from our survey rated their most-favoured big data tools in the following order:
- 52% of the users said they used Hadoop the most
- Almost 22% data scientists used NoSQL
Which GPUs Do Data Scientists Use At Work?
Over 19% of our respondents said that they preferred using the NVIDIA GeForce GTX 8 Series for intensive data usage. The GTX 8 series model is a middle-level GPU — multipurpose and flexible.
Our Respondents’ Profile:
As the Analytics industry grows at the rate of 33.5% CAGR, more professionals are expected to segue into the Data Science and Analytics sector. We realised that apart from hard work and dedication, the tools and skillsets also play a key role in the success of data scientists. Some of the eye-opening inferences were that Python is still the all-time favourite programming language preferred in the Analytics and Data Science sector. The most popular Data Visualisation tool used in this industry right now is Tableau. Another interesting aspect that we found was professionals were aware of the importance of upskilling themselves and how willing they were to do so. Most working professionals like to keep themselves updated by watching videos and reading books. Overall, the study reveals a positive picture of the Indian Analytics and Data Science sector