Measurement of Dispersion : With Formula and Excel Functions

Measurement of Dispersion ac

You can say the measurement of dispersion is an up-gradation of the measurement of central tendency. When you have two data sets then the mean can be the same for the two datasets. But both datasets are different. (Different Category). That’s why the measurement of dispersion comes to exist. There are three types of Measurement of Dispersion

  1. Range
  2. Interquartile Range
  3. Standard Deviation

In this method of measurement, you will know how the data is spread out through range variance and Standard Deviation. Let’s know each of them in detail.

Range

It is the difference between the largest and smallest values in the dataset. You can easily calculate. It is affected by outliers and also not consider all the data values in the dataset. It is mostly used to find the minimum and maximum values in the dataset. For example, consider the following values in a set.

{ 10,3,50,45,30,100,75 } 

Here minimum value is 3 and the maximum value is 100. Therefore range will be the difference between the maximum and minimum value. So here range is 100-3 that is 97.

Standard Deviation

Standard Deviation or SD is a commonly used measure of dispersion. It tells how the data values are spread out about the mean of the dataset. Before calculation standard deviation you have to first find the mean, variance of the values. The formulae for the SD is the square root of the sum of squared deviations from the mean divided by the number of observations.

How to calculate the SD?

Step 1: Find the sample mean.

Sample Mean formula
Sample Mean Formula

Step 2: Calculate  Sample Standard Deviation using the result form the Step1.

Standard Deviation Formula
Standard Deviation Formula

The above formulae are for the Sample Standard Deviation. If you want to calculate Population SD then You will use N ( Number of Elements ) in the denominator.

How to calculate Standard Deviation (SD ) in Excel?

In Excel, you can use The following formulae for finding the Standard Deviation using the following syntax.

= STDEV(number1, [number2] ,....)

The number2 is optional. For example when I have put the = STDEV(B2:B) then I get the result 27.81. It means our data values are spread from the mean by 27.81.

SD in excel
Standard Deviation

 

Where you can use Standard Deviation in Real Life?

  1. Stock Market -To measure the Risk of the market. More risk more Standard Deviation
  2. Financial Prediction
  3. Weather Forecasting
  4. Manufacturing Plants – For testing and quality control.
  5. Real Estate – Buying risk of the household.

Interquartile Range

Interquartile Range is the measurement of variability in the dataset. Here you divide the dataset into four parts using quartiles. The value that divides the set into halves is known as quartiles. Q1, Q2, and Q3.

Initially, Q2 will divide the datasets into two half datasets. Now you have two sub-datasets upper sub-dataset and the lower sub-dataset.

Q1 is the median for the upper half of the sub-dataset and Q3 is the median for the lower half of the sub-dataset.

The interquartile range is equal to Q3 -Q1. In this measurement of dispersion, for considering the outlier you have to determine the fence for it. The formulae for the fence is 1.5 times the IQ (Interquartile Range). In addition, If you will take any number outside the fence then it will be an outlier. The fence determines the value that can be maximum and all the values outside it will be an outlier.

IQR = Q3-Q1

Fence = 1.5 IQR

Identifying Outliers in machine learning is an important task. You can find outliers in your data using the IQR techniques. Why we remove outliers? Its because it affects the training process of the dataset that leads to weak model building and less accuracy to predict the inputs. You have to carefully handle the outliers to remove this problem.

 

How to calculate Interquartile Range (IQR) in Excel?

In excel you can use the following functions for finding the Quartile.

=QUARTILE(data, quartile number)
=QUARTILE(data,1), for Q1
=QUARTILE(data,2), for Q2
=QUARTILE(data,3), for Q3

After finding all the quartiles you can use the mathematical calculation for finding the fence and IQR.

IQR
IQR and FENCE

You can see the Q2 is the median of the whole data. The value of the IQR is 23 and Fence is 34.5. When you add the Fence value with Q3 then you will get the value 57.5. All the values outside 57.5 will be Outliers.

You can also draw Box plot like the above data.

Box Plot
Box Plot

Conclusion

Measurement of dispersion tells how each value of the datasets is spread. In fact, the advantage of using the measurement of dispersion is that it considers all the values of the dataset. Now you must have understood the Measurement of Dispersion and how to calculate and use it in real life. If you have any questions then contact us. In the meantime, you can subscribe and like our Data Science Learner Page.

 

Join our list

Subscribe to our mailing list and get interesting stuff and updates to your email inbox.

Thank you for signup. A Confirmation Email has been sent to your Email Address.

Something went wrong.

Meet Sukesh ( Chief Editor ), a passionate and skilled Python programmer with a deep fascination for data science, NumPy, and Pandas. His journey in the world of coding began as a curious explorer and has evolved into a seasoned data enthusiast.
 
Thank you For sharing.We appreciate your support. Don't Forget to LIKE and FOLLOW our SITE to keep UPDATED with Data Science Learner