Last updated October 19, 2020
In AI Mysteries

10 Most Popular Statistical Hypothesis Testing Methods Using Python

Published on January 8, 2019
by Ram Sagar

Decision making and storytelling are two important facets of a data scientist’s job description. Models can be tweaked and computational powers can be pumped up. But to choose a certain test or a method, will have great implications on the product lifecycle. From cost-cutting to life-saving, hypothesis testing is prevalent in the world of statistics and with the conception of statistical machine learning, the tests have been imbibed and are made more accessible with the Python’s ever-increasing and improving, task-specific libraries.

Statistical tests are commonly classified as parametric and non-parametric tests. Parametric tests are conducted, with an assumption that the data follows a Gaussian distribution. If this assumption fails, then non-parametric tests are considered for hypothesis testing.

Here we list few widely used statistical tests(parametric and non-parametric) available in Python:

Chi-Squared Test

Chi-squared test is a well-known test even for those who are starting with statistical machine learning. Here, this test is used to check whether two categorical variables are related or independent. And, it is assumed that the observations used in the calculation of the contingency table are independent.

Python Code

from scipy.stats import chi2_contingency table = ... stat, p, dof, expected = chi2_contingency(table)

Student’s t-test

Tests whether the means of two independent samples are significantly different.

Observations in each sample are independent and identically distributed (iid). Observations in each sample are normally distributed. Observations in each sample have the same variance.

Python Code

from scipy.stats import ttest_ind data1, data2 = ... stat, p = ttest_ind(data1, data2)

Analysis of Variance Test (ANOVA)

ANOVA is another widely popular test which is used to test how independent two samples are of each other. Here the observations are assumed to follow a normal distribution without any change in the variance.

Python Code

from scipy.stats import f_oneway data1, data2, ... = ... stat, p = f_oneway(data1, data2, ...)

Shapiro-Wilk Test

This test is used to check whether the sample data has a Gaussian distribution.

Python Code

from scipy.stats import shapiro data1 = .... stat, p = shapiro(data)

D’Agostino’s K^2 Test

Similar to Shapiro-Wilk test, this too is used to check for Gaussian distribution in data samples.

Python Code

from scipy.stats import normaltest data1 = .... stat, p = normaltest(data)

Pearson’s Correlation Coefficient

A statistical test for checking correlation between two samples and whether they have a linear relationship.

Python Code

from scipy.stats import pearsonr data1, data2 = ... corr, p = pearsonr(data1, data2)

Spearman’s Rank Correlation

Observations in each sample are assumed that they can be ranked, for checking whether the relationship is monotonic or not.

Python Code

from scipy.stats import spearmanr data1, data2 = ... corr, p = spearmanr(data1, data2)

Mann-Whitney U Test

A non-parametric statistical hypothesis test to check for independent samples and to find whether the distributions are equal or not.

Python Code

from scipy.stats import mannwhitneyu data1, data2 = ... stat, p = mannwhitneyu(data1, data2)

Kruskal-Wallis H Test

Like previous tests, Kruskal-Wallis hypothesis test also makes the same assumptions regarding the distribution and ranking of the observations in each sample. And, the test is carried to check for the independence of the observations from each other.

Python Code

from scipy.stats import kruskal data1, data2, ... = ... stat, p = kruskal(data1, data2, ...)

Friedman Test

Friedman test checks whether the distributions of two or more paired samples are equal or not.

Python Code

from scipy.stats import friedmanchisquare data1, data2, ... = ... stat, p = friedmanchisquare(data1, data2, ...)

Conclusion

The probability of rejecting the null hypothesis is a function of five factors: whether the test is one- or two-tailed, the level of significance, the standard deviation, the amount of deviation from the null hypothesis, and the number of observations. Having said that, statistical tests are also subject to criticism. For instance, while interpreting the p-value, the way multiple comparisons are done is tricky because p-values depend on both data observed and data that might have been observed but wasn’t. Therefore, a statistician or an analyst or a data scientist should be aware of the fact that statistical significance does not imply practical significance and correlation doesn’t imply causation. Every test is only a means to an end which is, often vague.

Access all our open Survey & Awards Nomination forms in one place >>

Ram Sagar

I have a master's degree in Robotics and I write about machine learning advancements.

Watch More

10 Most Popular Statistical Hypothesis Testing Methods Using Python

Chi-Squared Test

Student’s t-test

Analysis of Variance Test (ANOVA)

Shapiro-Wilk Test

D’Agostino’s K^2 Test

Pearson’s Correlation Coefficient

Spearman’s Rank Correlation

Kruskal-Wallis H Test

Friedman Test

Conclusion

Ram Sagar

Download our Mobile App

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

3 Ways to Join our Community

Telegram group

Discord Server

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

Recent Stories

World's Biggest Media & Analyst firm specializing in AI

Advertise with us

AIM publishes every day, and we believe in quality over quantity, honesty over spin. We offer a wide variety of branding and targeting options to make it easy for you to propagate your brand.

Branded Content

AIM Brand Solutions, a marketing division within AIM, specializes in creating diverse content such as documentaries, public artworks, podcasts, videos, articles, and more to effectively tell compelling stories.

Corporate Upskilling

ADaSci Corporate training program on Generative AI provides a unique opportunity to empower, retain and advance your talent

Hackathons

With MachineHack you can not only find qualified developers with hiring challenges but can also engage the developer community and your internal workforce by hosting hackathons.

Talent Assessment

Conduct Customized Online Assessments on our Powerful Cloud-based Platform, Secured with Best-in-class Proctoring

Research & Advisory

AIM Research produces a series of annual reports on AI & Data Science covering every aspect of the industry. Request Customised Reports & AIM Surveys for a study on topics of your interest.

Conferences & Events

Immerse yourself in AI and business conferences tailored to your role, designed to elevate your performance and empower you to accomplish your organization’s vital objectives.