An Intuition towards Hypothesis Testing!

An Intuition towards Hypothesis Testing!

Everyone knows at-least two things about Hypothesis testing:

  • Hypothesis testing allows us to draw conclusion about an entire population based on a representative sample
  • If the p-value is less than or equal to alpha, reject the null hypothesis

But how do you connect the dots between sample & population? How can you know something about the population without even looking at it? Let’s try to build an intuition.

What is hypothesis testing: Hypothesis testing is all about whether your belief, which is based on a sample, holds true on the entire population or not. Now, for every belief there exist an exact opposite, a contradictory belief. In hypothesis testing, your belief is called alternate hypothesis while the contradictory belief is called null hypothesis. Here, you don’t try to directly prove alternate hypothesis (your belief). Rather you assume that the null hypothesis (contradictory belief) is true and then work backwards to disprove it. If you recollect, even in high-school mathematics you might have used this approach, where you disprove certain theorems by first assuming they are true. Once you disprove null hypothesis, it implies alternate hypothesis is true.

How do you disprove Null Hypothesis? As mentioned above, in order to disprove null hypothesis, you start with the assumption that null hypothesis is true and hence it represents the population. You then take a sample from this population and calculate the value of the measure to be used in hypothesis testing and associated probability of finding this measure value in the population. Let’s see a quick example.

For e.g. if hypothesis is about intelligence, IQ can be a measure. Suppose you believe IQ of the overall population is greater than 100. So, alternate hypothesis would be: IQ of the population is greater than 100 and null hypothesis would be: IQ of the population is less than or equal to 100. To prove your point (alternate hypothesis) you try to disprove the counter point of view (null hypothesis).

You take an unbiased sample from the population and compute the value of IQ in the sample you have. Say IQ turns out to be 100 in your sample. You then calculate the probability of this IQ being equal to 100 in the population. If this probability is extremely low, you reject the null hypothesis. Why? because, even though the chances of obtaining your sample is extremely rare in the population represented by null hypothesis, you still have the sample, which means your assumption that null hypothesis represents the population is incorrect. This leads to the conclusion that alternate hypothesis represents the population.

But how do you calculate the probability with a single data point? You take a sample and calculate value of the measure on this sample (called sample statistic). You then assume a probability distribution of this measure. (This is done in multiple ways — either using domain knowledge/research or Central limit theorem kicks in or you actually draw multiple samples). The parameters of this probability distribution are the population parameters. But, How do you get population parameters for this probability distribution? Since you assumed null hypothesis represents the population, the value of the measure associated with null hypothesis becomes the population parameter. Finally, you take your sample statistic and use the probability distribution to calculate the probability of obtaining the current value of sample statistic.

Okay! but the chances are rare, not zero: Glad that you though so! yes, the chances of obtaining the sample you have are rare, not zero and it might happen that you received one of the rare samples! In this case, your wrongly rejected the null hypothesis. Yes! it may happen and that’s why the concept of significance level, alpha is around (more on this later). One way to ensure that you are not mistakenly rejecting the null hypothesis is to draw multiple samples and doing hypothesis testing on each of them. If your sample is coming out as rare every-time, then it is certainly not a rare sample :). But obtaining more data is a costly affair and hence rarely done. Another approach is to make a stringent definition of rarity (if you notice, we haven’t yet defined what is rare).

So, how do you define what is rare or small probability? This is defined by comparing p-value with significance level. Let’s see what do they mean. Note: Theoretically, you can’t prove alternate hypothesis, you either reject or fail to reject the null hypothesis, but save those definitions for statisticians, for all practical purposes, your if null is rejected, alternate hypothesis is true.

P Value: P value is the probability you just calculated (of obtaining your sample in the population, if null hypothesis is assumed true). As commonly misinterpreted, P-value is not the probability that the null hypothesis is true, or the probability that the alternative hypothesis is false, rather p-value is probability of you obtaining a value at least as much as your sample statistic if the null hypothesis is true. It indicates how incompatible the data are with a specified statistical model.

Significance Level: Significance level represents the “probability of making the mistake of rejecting null when it is true”. Why you may make such a mistake? Recall “sampling error” earlier discussed. While rejecting the null, you are assuming that your sample closely represents the population. But what if by sheer bad luck, the sample you received was biased and you mistakenly rejected the null. Significance level or alpha represents this probability of occurrence of this error.

As a researcher, you choose the significance level for your hypothesis test. Smaller the alpha you choose, more reliable your finding becomes, but it also becomes more difficult for you to reject null (since p value needs to be less than alpha). On the other hand, higher alpha makes the rejection of null hypothesis easy but it also means that probability of this being a false rejection has increased. Based on how bad would be the consequences of mistakenly rejecting null, you choose a value of alpha for your hypothesis test.

But why do we always insist on rejecting Null, rather than directly proving the Alternate hypothesis: Because with sample data, it is easier to disprove than to prove. Think about it! You went around and counted a million sea creatures, but couldn’t find a mermaid. Can this prove they don’t exist? You don’t know how many sea creatures are still out there. But it take just one mermaid to disprove that they don’t exist.

Still, why to complicate everything by bringing probability distributions into picture: This is because Sample Statistics are always wrong. First of all, no sample is perfect or “true representation” of the population. That’s why we have the concept of sampling error. For a sample to achieve zero sampling error, it should include the entire population (which is impossible, it’s a sample after all). Next, how do you measure sampling error or it’s counterpart “representation of population”? You can’t! because you don’t know the population. That’s why “true representation of population” is just an idea rather than a measurement. Without quantifying how well your sample represents the population, there is no way you can draw concrete conclusions about the population. That is why the notion of probability, p-value and significance level kicks in.

End of this long article. In a Nutshell: 

  • Hypothesis testing is a statistical tool to validate your assumptions about a population of interest, based on the available sample you have from that population. Two obvious requirements then are— your sample should be at least of decent size and should be unbiased (should be representative of entire population).
  • In order to prove your point, you try to disprove your counter point of view. Why? because disproving the counter point is easier than proving your point, when using sample data (remember the mermaid example).
  • So, you compute the measure of interest in your sample (called sample statistics). You then either know (beforehand) or assume a probability distribution of this measure in the population. Since you began with assumption that null hypothesis (counter point of view) about population is true, the parameters of this population distribution are represented by the measures associated with null hypothesis.
  • You then use this probability distribution to compute the probability of obtaining the sample statistics. If the probability of receiving such a sample is very low, you disregard the null hypothesis because you already have this rare sample with you, hence your assumption that null hypothesis is true can’t be right.
  • But it might happen that you received an odd sample and hence you are mistakenly rejecting the null hypothesis, when in fact it was true. This is called type one error. One of the few ways to control this error is to choose significance level wisely, because significance level defines what is a rare sample. Lower the significance level is (for obvious reasons it can’t be zero), more stringent is your definition of rarity. Common practice is to use significance level of 0.05 and sometimes even 0.01, if the cost associated with type one error is very high.

I hope this gives some intuition about hypothesis testing. It is easier to understand this concept by using critical values concept and drawing critical regions on probability distribution charts. Critical value is a related but another approach of hypothesis testing. We focused on p-value approach in this article since it is more popular in machine learning.

ShehabEldin Ehab

Solutions & Projects Delivery Engineer @ Iskraemeco

2y

I had a lot of concepts that were not fully in.... thank you.

Like
Reply
Akanksha Dwivedi

Engineering Program Manager, Google Cloud at Google | Ex-Cisco

5y

Impressive and easy to comprehend..Way to go..!

Like
Reply
Mohit Gupta

Senior Data Scientist at Boeing

5y

Awesome....n very easy to understand...for Lehman...keep it up...

Good Job.. keep posting!

To view or add a comment, sign in

More articles by Arvind Shukla

Insights from the community

Others also viewed

Explore topics