R is without a doubt one of the most sought-after software tools for today’s data scientist. It is very flexible, powerful and enables the user to access many algorithms and statistical tools and techniques with ease. It also works well with many large data sets. The book by Seema Acharya, Data Analytics Using R (McGraw Hill Education, 2018) is a timely book for R practitioners.
Seema Acharya is a senior lead principal at Infosys and her experience clearly reflects in the large textbook for R. The book consists of over 300 self-assessment progress questions, more than 10 practical hands-on experience and 200+ multiple choice questions for any practitioner to test his/her knowledge. The best feature in the book is definitely the fact that it starts from the basics. Basic syntax and algorithms are covered totally. The book also introduces many R interfaces with various kinds of data sources such as csv, JSON, XML and RDBMS and others.
The book is laid out and structured in such a manner that it will be useful for many levels and kinds of R practitioners. The book can be used as a guide by aspiring data scientists, data analysts or executives. It can be used by a professional at the executive and management level to aid his/her decision making.
The book is neatly organised in 12 chapters — starting from the introductory chapter instructing users how to install and use R packages. From the first chapter itself, the book is designed to get the reader comfortable with using and programming with R. R is a tricky subject to write about because of its extensive ecosystem of packages and tools. Acharya still makes a great attempt at building a complete information set.
Following chapters deal with introducing the readers to many basic commands which can be used by practitioners to analyse content or datasets. This chapter is followed by instructions that introduce the reader the inner working of R when used to process and load data from many data sources. One of the highlights of the book is that this book teaches the reader to work with many databases such as MYSQL, SQLite, PostgreSQL and others.
Along with this book, the readers can use additional resources such as Question Bank and Weblinks for reference material. It is advised that the reader read through all the chapters and make us of all the hands-on material to practice programming. The sequence of the chapter is well thought-out and suits very well for a person who is just starting a career in data analytics and R.
The fourth chapter of the textbook teaches descriptive statistics with commands such as dim(), summary(), etc. As the book advances it slowly increases the standard of subjects handed. The next chapters handle various details of regression analysis, logistic regression and various kinds of logistic regressions. The book also explains important terms like Residual, Goodness-of-Fit tests carefully.
The rest of the book handles tricky advanced topics such as handling and processing of time series data and clustering. The breadth of algorithms which are handled in the section of time series and clustering sections is very impressive with the reader being introduced to a wide range of possible approaches. After this, the book eases the reader into advanced data mining techniques like association rules and text mining. The book also delves into topics like parallel computing with an introduction to MapReduce algorithms and distributed computing.
All in all, Acharya’s Data Analytics Using R is a near-complete book which is a rare combination of good algorithm, tools education and a crisp explanation of the R language. The book comes at a great time when there is a great demand for R in the market and usage of the language is only increasing.