In statistics, one always tries to verify the results. One of the most well-known ways of doing this by using statistical significance testing methods. It is a truth universally acknowledged that the result has statistical significance only if it does not occur given the null hypothesis.
To put it more formally, α is the probability of the study rejecting the null hypothesis, if it were true. On the other hand, the p-value of a result, p, is the probability of obtaining a result at least as extreme, given that the null hypothesis was true. The result is statistically significant, by the standards of the study, when p < α. The α is set to 5%. Before any kind of statistical experiment begins, it is assumed that the outcome will be negative and this negative assumption is named as the null hypothesis. This is the assumption that is put to test by the experiment. Any data scientist should understand that a null hypothesis can never be proved — it can only be disproved.
Let us consider a person who wants to test the accuracy of his rifles A and B. He decides to shoot each rifle 10 times. If both the rifles have an equal number of hits then one would reach a conclusion saying that there is no difference between both the rifles. But it would be more interesting if A has 5 hits and B has 6 hits. The statistician needs to decide if this is enough evidence to say if B is a better rifle than A.
Here is the time when we are looking for something which is statistically significant between the two rifles. Here, any probability larger than 5% is very considered insufficient to deny our assumption that results have stemmed from two sources. In statistics, the 5% and 1% probability levels are very important, even though they are arbitrary. It is a way to think about statistical significance testing. It is to be carefully noted that in statistics, “significance” means “beyond the likelihood of chance”. It is also to be noted that there is also a factor of practical significance. An event can be statistically significant but still not have any practical significance.
Concerning with statistical significance, the correct question to ask is “What is the probability of finding a man of 6 or more inches different from average?” rather than asking, “What is the probability of finding a man 6 more inches taller than average?”.
We have a look at many tests used for significance testing. Let us look at some of them and see how they work:
The z test for measurement (hence called zM) compares a random sample of 1 or more measurements with a similar large parent group whose mean and standard deviation is known. The data required for this test are:
n = number of measurements in the sample
m = mean of the sample measurements
M = mean of the largest parent group
S = standard deviation of the largest parent group
Calculate z from the following formula :
z = (n. |M – m| )/ S
The larger the value of z , the less likelihood of our no difference assumption being correct
Student’s t Test
The purpose of this test is to compare a random sample consisting of 3 or more measurements with a larger parent group whose mean is known, but whose standard deviation is not known. This is a modification to the zM test and the data required for this test are:
n = Number of measurements in the sample group.
m = mean of the sample measurements.
S = standard deviation of the sample measurements, calculated along with mean
M = mean of the large parent group
Student’s t uses the following formula:
t = n. |M – m| ) / s
The only difference between zM formula and t formula is that t test uses the standard deviation of the sample group instead of the large parent group.
Wilcoxon’s Stratified Test
The purpose of Wilcoxon’s Stratified Test is to compare two independent stratified random samples of measurements which have comparable strata, and the same number of measurements in both samples. The data required for this test are:
k = number of strata of groups (I, II and III) in each sample.
n = number of measurements in each stratum of each sample, if this number is the same in each,
nI,nII,nIII = number of measurements in stratum I, II, III etc of each sample, if the strata are of different sizes.
Spearman’s Correlation Test
The purpose of Spearman’s Correlation Test was to test for correlation between 2 measurable characteristics.
- In each individual of a sample group
- At the same time
here, d is the difference between the two ranks of each observation.
Try deep learning using MATLAB