What is a Probability Distribution ? Determine its Type for Your Data

What is a probability distribution

Probability Distribution is an important topic that each data scientist should know for the analysis of the data. It defines all the related possibility outcomes of a variable. In this, the article you will understand all the Probability Distribution types that help you to determine the distribution for the dataset. There are two types of distribution.

  1. Discrete Distribution
  2. Continuous Distribution

In the discrete Distribution, the sum of the probabilities of all the individuals is equal to one. In the normal distribution, there is a probability curve and area under the probability curve is equal to one.

Types of Discrete Distribution

Discrete Distribution is also known as Probability Mass functions. The following are the types of Discrete Distribution

  1. Uniform Distribution
  2. Binomial Distribution
  3. Poisson Distribution

Uniform Distribution

It is a type of discrete distribution and all the events have the same probability outcome ( Uniform ). For example, if you roll a die then the sample space for a die is {1,2,3,4,5,6} and probability of getting each number on the die is 1/6 that is 0.166. So here the sample space has discrete values that we are using. You will also notice that the range between two numbers and probability is same. When you add all the probabilities, then you get the 1.

Rolling a Fair Dice
Uniform Distribution

Binomial Distribution

In this distribution, you have had two discrete outcomes of a trial that are mutually exclusive. Mutually exclusive means the outcome of one event will not depend on the outcome of the other event. Below are the example of the Binomial outcomes.

  • Head or Tail
  • On or Off
  • Sick or healthy
  • Success or failure

Bernoulli Trial

It is a random experiment in which there are two outcomes. One is a success and the other is a failure.

When you make the same trial for n times that is series of n trials then it becomes Binomial Distribution and probability of success (p) is constant and all the trials do not depend on one another. These two conditions must have to satisfy to becomes a Bernoulli Trial.

In a binomial distribution, you have to calculate the probability mass function. Suppose you have n trials and with each trial probability of getting success in p. The probability mass function for the x observation outcomes in n trials is the below;

Binomial Distribution formula
Binomial Distribution formula

x, related to the number of trials n

How to calculate Binomial Distribution in Excel?

You can easily calculate binomial distribution using the following syntax.
=BINOM.DIST(x,n,p,FALSE)

Where,
x, number of the observation
N, the total number of the trials
p, the probability of success.

Calculate Binomial Distribution on Python

from scipy.stats import binom
binom.pmf(x,n,p)

Poisson Distribution

In the binomial distribution, we focus on the success of the number of trials. But in the Poisson distribution, we focus on the number of success per continuous unit. Like the number of success per unit time.

Calculation of Poisson Probability Mass Function.

Before calculating Poisson probability mass function, you have to calculate the mean expected value ( mu) and is assigned to  (lambda), that is the number of occurrences per interval. You should remember that here interval is the continuous unit.

The formulae for the Poisson Probability Mass function is:

Poisson-Distribution-Formula
Poisson Distribution Formula
Mean Expected Value
Mean Expected Value

 

In a probability distribution, you should also know the term cumulative mass function. And it is the sum of all the discrete probabilities. For example in a Poisson distribution probability of success in fewer than 4 events are.

cmf possion distribution

Real World Example of Poisson Distribution

The number of the Orders in a particular time interval. Many E-commerce companies use it for finding the number of orders received during an interval.

How to calculate Poisson distribution in excel?

You can calculate Poisson distribution on the Excel using the following function.

= POISSON.DIST(x,lambda,FALSE)

Here, x is the number of successes.

False and True are set if you want to find the cumulative mass function. If its True then the function will find the cumulative mass function. In distribution, CMF is in cases like at least, greater than, no zero e.t.c.

Calculate Poisson Distribution on Python

from scipy.stats import poisson
poisson.pmf(x,lamda) # exactly
poisson.cdf(x,lamda) # for cumulative mass function

Continuous Distribution

It is a continuous probability distribution function and also called as probability density functions. Continuous Probability distribution has three types.

  1. Normal Distribution
  2. Exponential Distribution
  3. Beta Distribution

Normal Distribution

In the normal distribution, all the data points or data sources are aligned to the central values such as the mean and the curve form like the Bell Curve.

bell curve
Bell Curve

Keep in mind that in discrete distributions sum off all the probabilities (cumulative probability functions ) is equal to one. But in the normal distributions ( Probability density function ), the area of the bell curve is 1.

Using the normal distribution curve we can only tell the probabilities over a certain range of outcomes.

The mean, mode and median all are equal in the normal distribution.

Standard Normal Distribution

We can say a Normal Distribution is standard Normal Distribution when mean(mu) is 0 and sigma is equal to 1.  We can say from the SND graph that all the 68.27% of the values lies between -sigma and to +sigma. In the same way, 95.45 % values lie between -2sigma to +2sigma.

standard normal distribution
Standard Normal Distribution

There is also a case when the normal distribution is symmetric to a certain value of the mean ( mean not zero ) and sigma not equal to one. Normal Distribution is very useful to study the population. If we have a population that approximates a normal distribution then we can find its mean and standard deviation. And also it’s inferences on the population.

In the real-life example, you will mostly model the normal distribution. Then You can easily convert the values to fit for the standard normal distribution for calculating a percentile. The formulae for the normal distribution is:

Normal Distribution Formula
Normal Distribution Formula

You know that standard normal distribution has meant is 0 and sigma is 1. Therefore you have to find the z score to convert the normal distribution to standardized normal distribution.

The formulae for the z score is

z score

Real Life Example of the Normal Distribution

  • Measurement of the People Height and Weight
  • Measuring the Blood pressure
  • Test Scores – Percentile
  • Errors Measurement.

How to calculate Normal Distribution in Excel?

The following excel function can be used for finding the normal distribution.

If you have the z score, then you can find the probability using the formulae.

= NORMSDIST(B2)

If you have the p score, then you can find the z score using the formulae.

= NORMSINV(B2)

Calculate Normal Distribution in python.

from scipy.stats import stats
# If you have the z score, then you can find the probability using the formulae.
stats.norm.cdf(z)
#If you have the p score, then you can find the z score using the formulae.
stats.norm.ppf(p)

Conclusion

Distribution concept is an important concept for the data scientist. Especially Normal Distribution. You will see in many real-life examples Distribution is used.  Its concepts are useful in sampling data from the population dataset. So We hope you have understood the topic.

If you like to do some improvement in this article. Please contact us. You can also subscribe us to get updates on data science.

Thanks

Data Science Learner Team

Join our list

Subscribe to our mailing list and get interesting stuff and updates to your email inbox.

Thank you for signup. A Confirmation Email has been sent to your Email Address.

Something went wrong.

Meet Sukesh ( Chief Editor ), a passionate and skilled Python programmer with a deep fascination for data science, NumPy, and Pandas. His journey in the world of coding began as a curious explorer and has evolved into a seasoned data enthusiast.
 
Thank you For sharing.We appreciate your support. Don't Forget to LIKE and FOLLOW our SITE to keep UPDATED with Data Science Learner