Analytics India Magazine got in touch with Anil Arora who is the principal data scientist at SAS. With 11+ years of analytics experience, he has worked across areas such as banking, insurance, telecom, retail, e-commerce, utilities, public services industries and more. Analytics India Magazine got in touch with Arora to get an insight on the various kind of analytics and data science tools that are used by analytics practitioners at SAS. Below is the complete Q&A with his detailed insights.
Analytics India Magazine: What are the most commonly used tools in analytics, AI, data science?
Anil Arora: As far as commercial software is concerned, SAS is a dominant force in the space of Advanced Analytics and Predictive Analytics with a market share of more than 30% as per IDC, followed by other major players such as IBM, Microsoft, SAP, Alteryx, Oracle and many more. In case of Free Open Source Software, at this moment in terms of usage, Python seems to be winning the race against R by a fair distance.
AIM: What is the most productive tool that you have come across?
AA: SAS provides a cohesive, unified analytics platform in the form of Viya that addresses the complete analytics lifecycle covering data management, data discovery, model building and model deployment. It is the foundation of a suite of offerings, including machine learning and visualisation, to address any analytic challenge. The SAS platform supports diversity, enables scale and promotes trust.
The extensiveness of Open Source Software libraries provides organisations with massive opportunities to experiment for innovation, however there are few stumbling challenges with respect to operationalising open source models. Many organisations are seeing a lot of value by adopting a hybrid approach combining Commercial Software and Open Source Software for shaping analytics initiatives.
SAS Viya supports new analytic methods that can be accessed from SAS and other programming languages, initially Python, Lua and Java, as well as public REST APIs. The Forrester Wave™: Multimodal Predictive Analytics and Machine Learning (PAML) Platforms, Q3 2018 ranked SAS as a Leader while noting that “SAS builds the first truly multimodal PAML solution.”
We want to build upon the openness by creating a community for knowledge sharing. Users will be able to contribute code, procedures, visuals and services, and collaborate on ideas and it is an exciting shift for us.
AIM: Do you prefer tools that are open source or paid? Please elaborate on the benefits, some open source and paid tools that you prefer.
AA: Commercial and open source software, both have their own merits & demerits that should be thoroughly evaluated by any enterprise, prior to any decisioning regarding the choice of the analytical platform.
Some of the top factors for choosing commercial analytics software include, confidence in the accuracy of results, ability to solve complex problems, ability to handle scale and ability to combine multiple analytical methods.
As far as Free Open Source Software is concerned, besides being freely available for use, quicker releases and availability of newer ideas & techniques provides organizations freedom and flexibility to experiment. However, these very advantages could also lead to pitfalls such as inability to keep-up-with version releases and lack of dedicated support when issues are encountered. There are other challenges as well particularly with respect to operationalizing models in production, hidden security vulnerabilities and lack of skills available internally.
Commercial Software on the other hand comes at a price, specifically the license-costs, but product stability, reliability of results, tailored support and lower business risk are the key attributes of commercial software that organisations cannot ignore.
Commercial software vendors provide extensive support through rich documentation, technical support hotlines, newsletters, and professional training courses.
From our perspective, for business critical systems, commercial software or a hybrid approach combining Open Source for experimentation and commercial software for operationalisation should be the way forward for enterprises to move forward in the analytics journey. SAS embraces open source and towards this end we have built Viya – an Open, Cloud ready Analytics platform where the benefits of proprietary platform can be combined with that of open source technology. Viya helps minimize the time between early-stage analytical exploration and the end result of business value.
AIM: Is open source considered an important attribute when choosing the tool of your choice
AA: Open source software on the account of being freely available provide lower entry barriers for organizations to invest. Hence, it appears to be an attractive option upfront, however many organizations are unable to take into account the true cost of open source arising from future deployment challenges, requirements to scale, engineering skills and manpower needed to make it work.
AIM: What are the most common issues you face while dealing with data? How is selecting the right tool critical for problem-solving?
AA: The common issues faced while dealing with data are as follows:-
- Handling poor quality of data such as dirty data, missing values, inadequate data size
- Selecting the right data grain
- Supporting diverse data types – structured, semi-structured and unstructured
- Enabling scale and speed of data for real-time decisioning.
- Dealing with huge datasets that require distributed approaches.
- Lack of understanding/lack of diffusion of data handling techniques
- Lack of good literature on important data mining topics and techniques
- Little to no documentation of the parameters taken into consideration for analytics projects
Tools form the bridge between work and working; they link the performer to the task. Tools are not simply implementations of algorithms. Beyond mere implementations, they can also provide capabilities that can be used at any step in the process of working through an analytical problem. Tools that have an intuitive interface that can build models faster without the need to write complex code, makes analytics approachable and easy to use.
Coming back to issues dealing with data, data management is a critical aspect of the analytics lifecycle that cannot be ignored. The ideal analytics platform or tool that provides a unified environment with characteristics such as:
- intuitive interface
- approachable analytics
- comprehensive data management
- streamlined model development & deployment and
- tight data & model governance
AIM: What are the most user friendly languages and tools that you have come across?
AA: SAS outscores other products in terms of being most user-friendly. One of the principal themes behind the SAS platform innovation is “making analytics easy & approachable” to diverse skill sets and diverse roles within the organisation.
AIM: What is an ideal data scientist toolkit like?
AA: A Hybrid-approach as discussed above. Few technologies for consideration are: SAS platform suite, Python libraries such as Pandas, Numpy, Scikit-Learn, Matplotlib, Interfaces such as Rstudio, Jupyter, Open Source AI and Deep-learning libraries, Google Tensorflow, and so on.
AIM: What is the most preferred language used by the team?
AA: Generally speaking, it is SAS, Python and R.
AIM: What is the most preferred cloud provider— AWS, Google or Azure?
AA: As of 2018, AWS and Azure appear to be the most preferred cloud providers
AIM: What are some of the tools used for scaling data science workloads; for e.g. Dockers are gaining popularity vis a vis spark?
AA: While we talk about scaling data science workloads, there are many aspects to consider – (i) handling big data, (ii) quantity, complexity & resiliency of analytics workloads and (iii) streamlined deployment of analytical models into production.
To support data at scale, the analytical platforms need to provide features such as multithreading, parallel-processing & in-memory processing. To support analytics-at-scale, the analytical platforms need to support model building, auto-tuning and management at scale, for e.g. thousands of models for that many subsegments in the target population. Lastly, to support streamlined model deployment, the analytical platforms need to support analytics execution in the database, in-stream and on the edge. Proprietary analytical platforms such as SAS provide the above-mentioned features as a unified platform.
With tools such as Docker and Kubernetes, analytics platform/environment can be deployed in containers which can be clustered to reap benefits of parallel-processing. These tools are gaining attention as they bring in enhanced flexibilities w.r.t. cluster management and auto-scaling. However, this still does not need undermine the need to enable scale right down at the base analytics platform layer.
AIM: What are some of the proprietary tools developed in-house by the company?
AA: While we have talked about at length on SAS Viya, SAS actually has comprehensive solutions tailored for a large number of industries. Our solutions can be broadly classified into the following: –
Business Intelligence & Analytics – Solutions such as SAS Visual Analytics that empower even non-technical users to get the right information when they want it and where they want it.
Advanced Analytics – SAS’ advanced analytics software is infused with cutting-edge, innovative algorithms that can help customers solve even your most intractable problems and unearth opportunities they would otherwise miss. Our solutions cater to Data Mining, Statistical Analysis, Predictive Forecasting & Text Analytics
Customer Intelligence – Solutions that help organizations orchestrate individualized, contextual interactions that its customers will find relevant, satisfying and valuable. Our comprehensive digital marketing hub delivers insights that are fueled by data from every touch point and data source which in turn help marketers create customer experiences that truly matter.
Data Management – Solutions that range from Data Integration, Data quality, Data Governance, Event Stream Processing & Data Preparation for Hadoop. SAS® Data Management solutions are designed to help organisations transform big data into big opportunity.
Fraud & Security Intelligence – Solutions that are tailored to take a unified approach to fraud, compliance and security. Specific solutions include fraud prevention, helping companies comply with regulations and prevent crime and terrorism.
Risk Management – SAS has proven methodologies and best practices to help organisations establish a risk-aware culture, optimize capital and liquidity, and meet regulatory demands. This means on-demand, high-performance risk analytics in the hands of risk professionals to ensure greater efficiency and transparency.
SAS Cloud Analytics – SAS Cloud Analytics provides an easy and cost-effective way to deliver valuable business insight through on-demand access to SAS technology. It helps reduce the costs and administrative headaches of traditional software implementations
Results-as-a-Service (RaaS): We are excited to bring in our services model for analytics, particularly the Results-as-a-service model. We see strong inclination towards this as the costs of analytics implementation shifts to the opex side rather than capex which is often easier in-terms of justifying the value of the project to key stakeholders. With RaaS engagements, customers share their business problem and their data with SAS. Then, SAS uses its talent, software and infrastructure to provide answers the customers can act on. RaaS offers flexible delivery channels leveraging SAS talent internationally and a SAS Solutions OnDemand-managed environment – either SAS-hosted or on-site at a customer office – to ensure security and control.
SAS Analytics for IoT covers the full IoT analytics life cycle – from data capture and integration to analytics and deployment.
Supply Chain Management – Solutions that help supply chain professionals understand demand patterns, supply networks, operations, quality and customer service requirements like never before.