Decision making and storytelling are two important facets of a data scientist’s job description. Models can be tweaked and computational powers can be pumped up. But to choose a certain test or a method, will have great implications on the product lifecycle. From cost-cutting to life-saving, hypothesis testing is prevalent in the world of statistics and with the conception of statistical machine learning, the tests have been imbibed and are made more accessible with the Python’s ever-increasing and improving, task-specific libraries.
Statistical tests are commonly classified as parametric and non-parametric tests. Parametric tests are conducted, with an assumption that the data follows a Gaussian distribution. If this assumption fails, then non-parametric tests are considered for hypothesis testing.
Here we list few widely used statistical tests(parametric and non-parametric) available in Python:
Chi-Squared Test
Chi-squared test is a well-known test even for those who are starting with statistical machine learning. Here, this test is used to check whether two categorical variables are related or independent. And, it is assumed that the observations used in the calculation of the contingency table are independent.
Python Code
from scipy.stats import chi2_contingency
table = ...
stat, p, dof, expected = chi2_contingency(table)
Student’s t-test
Tests whether the means of two independent samples are significantly different.
Observations in each sample are independent and identically distributed (iid). Observations in each sample are normally distributed. Observations in each sample have the same variance.
Python Code
from scipy.stats import ttest_ind
data1, data2 = ...
stat, p = ttest_ind(data1, data2)
Analysis of Variance Test (ANOVA)
ANOVA is another widely popular test which is used to test how independent two samples are of each other. Here the observations are assumed to follow a normal distribution without any change in the variance.
Python Code
from scipy.stats import f_oneway
data1, data2, ... = ...
stat, p = f_oneway(data1, data2, ...)
Shapiro-Wilk Test
This test is used to check whether the sample data has a Gaussian distribution.
Python Code
from scipy.stats import shapiro
data1 = ....
stat, p = shapiro(data)
D’Agostino’s K^2 Test
Similar to Shapiro-Wilk test, this too is used to check for Gaussian distribution in data samples.
Python Code
from scipy.stats import normaltest
data1 = ....
stat, p = normaltest(data)
Pearson’s Correlation Coefficient
A statistical test for checking correlation between two samples and whether they have a linear relationship.
Python Code
from scipy.stats import pearsonr
data1, data2 = ...
corr, p = pearsonr(data1, data2)
Spearman’s Rank Correlation
Observations in each sample are assumed that they can be ranked, for checking whether the relationship is monotonic or not.
Python Code
from scipy.stats import spearmanr
data1, data2 = ...
corr, p = spearmanr(data1, data2)
Mann-Whitney U Test
A non-parametric statistical hypothesis test to check for independent samples and to find whether the distributions are equal or not.
Python Code
from scipy.stats import mannwhitneyu
data1, data2 = ...
stat, p = mannwhitneyu(data1, data2)
Kruskal-Wallis H Test
Like previous tests, Kruskal-Wallis hypothesis test also makes the same assumptions regarding the distribution and ranking of the observations in each sample. And, the test is carried to check for the independence of the observations from each other.
Python Code
from scipy.stats import kruskal
data1, data2, ... = ...
stat, p = kruskal(data1, data2, ...)
Friedman Test
Friedman test checks whether the distributions of two or more paired samples are equal or not.
Python Code
from scipy.stats import friedmanchisquare
data1, data2, ... = ...
stat, p = friedmanchisquare(data1, data2, ...)
Conclusion
The probability of rejecting the null hypothesis is a function of five factors: whether the test is one- or two-tailed, the level of significance, the standard deviation, the amount of deviation from the null hypothesis, and the number of observations. Having said that, statistical tests are also subject to criticism. For instance, while interpreting the p-value, the way multiple comparisons are done is tricky because p-values depend on both data observed and data that might have been observed but wasn’t. Therefore, a statistician or an analyst or a data scientist should be aware of the fact that statistical significance does not imply practical significance and correlation doesn’t imply causation. Every test is only a means to an end which is, often vague.