MITB Banner

The Never Ending Fascination Of The Gaussian Distribution

Share

Probability distributions are important in machine learning and data analysis. Not only data scientists but also also researchers and scientists from many other fields deal with probability distributions on a day to day basis. To put it simply, probability distribution is a simply a function which informs us of the likelihood of obtaining the possible values that a random variable can take.

For example, you are walking in your lane where you stay. You are recording the heights of all the building as you go along. Now, what you are doing is actually taking random samples and creating a probability distribution and this can be very useful going forward. They will tell us about which heights are more likely and what is the variance between heights and many other things. To this end, probability distribution can be discrete or continuous.

To simplify, one could think of discrete probability distributions taking strictly discrete number of values. And continuous probability distributions take continuous values. However, physicists, mathematicians, engineers favour a special type of probability distribution, widely known as Gaussian Distribution. The distribution is a continuous Gaussian distribution and it surfaces in our day-to-day life and in nature as well. The other name for the Gaussian distribution, is Normal distribution. It is named so because this particular distribution occurs everywhere and every other distribution is abnormal.

The Gaussian Distribution

The normal distribution, is known to many as the bell curve also. The Gaussian distribution is a two-parameter family of curves. It is represented by:

Here μ is is the mean and σ2 is known as the variance. The parameter µ determines the location of the distribution while σ determines the width of the bell curve. The normal distribution with mean 0 and standard deviation 1, is called the standard normal distribution. Also it is to be noted that the random variable with standard normal distribution is called a standard normal random variable. It is denoted by Z.

The Central Limit Theorem

Technically speaking, The Central Limit Theorem states that the sampling distribution of the sample means approaches a normal distribution as the sample size gets larger. This is an astonishing result and is very counter intuitive. This result is true and is not dependent on the shape of the population distribution. It is more and more prominent with sample size 30 or more than 30. Hence when we extract more and more samples from the population and take the sample means, it looks more and more like a normal distribution. This sudden show up of Gaussian (Normal) distribution makes it very special and gives rise to many phenomena.

Let us look at an application of the Central Limit Theorem. Suppose a man decides to travel through the desert and runs out of fuel in his car. He calls for some help and dials the emergency number to contact government services. He happens to be at the edge of the cell range and his voice is noisy and cant be heard clearly by executives trying to help him on the other end of the call. It would be great if the executive could clean up the noise using signals from some nearby 100 odd towers.

The signals can be denoted by: X1, …, X100,  where  Xi = S + Y

Here S =  true signal being sent to the towers

And    Y = noise in the signal.

Here we can assume that noises Y1, …, Y100 are independent and identically distributed. We can suppose the mean of the noise is 0 and the variance is  σ2 . We also assume that the noise has a normal distribution. The executive can simply clean up the signal by applying the simple averaging formula

X = ( X1 + · · · + X100  / 100) =  S + ( Y1+ · · · + Y100 / 100 )

Now we know that using the Central Limit Theorem,

( Y1 + · · · + Y100 / 100 ) is approximately N(0, σ2 / 100) (Gaussian Distribution)

Hence by understanding the nature of the noise we can reduce the noise considerably.

Gaussian Distributions Can Be Used To Solve Common Problems

As mentioned earlier, scientists in many fields use Gaussians distributions to solve commonly occuring scientific problems. Physicists use Gaussians to maximise entropy for a given energy which can be any kind of energy. Hence the Gaussian distribution governs the probability of a given particle in a bottle of gas at a certain temperature.

There are many operations on Gaussians that give interesting results. For example the following:

  1. Fourier transform of Gaussian is a Gaussian
  2. Sum of two independent Gaussian random variables is Gaussian
  3. Convolution of Gaussian with another Gaussian is a Gaussian
  4. Product of two Gaussian is a Gaussian

In another application, in Fourier analysis the Gaussian or normal distribution is one of the eigenvectors of the Fourier Transform which means the frequency components of a Fourier Transform is represented by a normal distribution. It is widely known that the blood pressure patterns of adult humans also follow the Gaussian distribution.

PS: The story was written using a keyboard.
Share
Picture of Abhijeet Katte

Abhijeet Katte

As a thorough data geek, most of Abhijeet's day is spent in building and writing about intelligent systems. He also has deep interests in philosophy, economics and literature.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India