Big data & Analytics: terms that frequently pop up in newspapers, magazines, airports or even during pub chats to pimp a conversation. These days, everybody talks about it, but only few are actually doing it successfully! One of the reasons is that firms often lack a clear insight into the critical success factors for building actionable analytical models. Hence, in this column, we provide some recent research insights based upon partnerships we initiated with firms world-wide, and which are further elaborated on in the new E-learning course Advanced Analytics in a Big Data World.
In order to be successful, an analytical model needs to satisfy various requirements. A first key requirement is business relevance. The analytical model should solve the business problem that it was developed for! It makes no sense to have a high-performing analytical model that was sidetracked from the original business problem. In other words, if the business problem is detecting insurance fraud, then the analytical model must be sure to detect insurance fraud. Obviously, this requires a thorough business knowledge and understanding of the problem to be addressed before any analysis can start. Some example kick-off questions are: how do we define, measure and manage fraud?
Another important success factor is statistical performance and validity. In other words, the analytical model should make sense statistically. It should be significant and provide good predictive or descriptive performance. Depending on the type of analytics, various performance metrics can be used. In customer segmentation, statistical evaluation measures will contrast intra-cluster similarity with inter-cluster dissimilarity. Analytical churn prediction models will be evaluated in terms of their ability to assign high churn scores to the most likely churners.
Interpretability refers to the fact that the analytical model should be comprehensible or understandable to the decision maker (e.g. marketer, fraud analyst, credit expert). Justifiability indicates that the model is in accordance with the expectations and business knowledge of the expert. Both interpretability and justifiability are subjective and depend on the knowledge and experience of the decision maker. Both often need to be balanced against statistical performance, which implies that complex, non-interpretable models (e.g. neural networks, random forests, …) are often better performing in a statistical sense. In settings like credit risk modeling, interpretability and justifiability are very important because of the societal impact of these models. However, in settings like fraud detection and marketing response modeling, they are typically less of an issue.
Operational efficiency relates to the effort that is needed to evaluate, monitor, backtest or rebuild the model. From this perspective, it is quite obvious that a neural network or random forest is less efficient that e.g. a plain vanilla regression model or decision tree. In settings like credit card fraud detection, operational efficiency is very important because a decision should be made within a few seconds after the credit card transaction was initiated.
Economical cost refers to the cost that is needed to gather the model inputs, run the model and process its outcome(s). Also the cost of external data and/or models should be taken into account here. This will enable you to calculate the economic return on the analytical model, which is typically not a straightforward exercise.
Finally, regulatory compliance is becoming more and more important. This refers to the extent to which the model is compliant with regulation and legislation. In a credit risk modeling setting, it is important that the models are compliant with the Basel II and III regulations. In an analytical insurance setting, the Solvency II accord must be respected.
To conclude, in this blog article we briefly zoomed into the critical success factors for building analytical models. As already mentioned, the importance of each of them depends on the application field in which you are working. For more information, we are happy to refer you to the new E-learning course Advanced Analytics in a Big Data World.
Self-Paced E-learning course: Advanced Analytics in a Big Data World
The E-learning course starts by refreshing the basic concepts of the analytics process model: data preprocessing, analytics and post processing. We then discuss decision trees and ensemble methods (bagging, boosting, random forests), neural networks, support vector machines (SVMs), Bayesian networks, survival analysis, social networks, monitoring and backtesting analytical models. Throughout the course, we extensively refer to our industry and research experience. Various business examples (e.g. credit scoring, churn prediction, fraud detection, customer segmentation, etc.) and small case studies are also included for further clarification. The E-learning course consists of more than 20 hours of movies, each 5 minutes on average. Quizzes are included to facilitate the understanding of the material. Upon registration, you will get an access code which gives you unlimited access to all course material (movies, quizzes, scripts, …) during 1 year. The E-learning course focusses on the concepts and modeling methodologies and not on the SAS software. To access the course material, you only need a laptop, iPad, iPhone with a web browser. No SAS software is needed. See https://support.sas.com/edu/schedules.html?ctry=us&id=2169 for more details.
Try deep learning using MATLAB