Matplotlib Boxplot Example in Python

Matplotlib Boxplot Example in Python

Matplotlib boxplot is a way of summarization a set of data that are mesaured on the interval scale. It considers mean, standard deviation while plotting the figure. Boxplot is very useful in modeling a machine learning model. It allows you to find the outliers in the dataset that has to be removed before training the model. In this entire tutorial you will know how to plot a boxplot in Matplotlib.

Examples of Matplotlib Boxplot

In this section you will first know how to boxplot on simple dataset and then you will know how to plot on real dataset.

Example 1: Simple Boxplot in Matplotlib

Here First we will create Gaussian normal distribution dataset with 100 values. After that I will boxplot the data points. Execute the below lines of code.

import matplotlib.pyplot as plt 
import numpy as np 
  
# dataset 
np.random.seed(10) 
data = np.random.normal(50, 30, 200) 
  
fig = plt.figure(figsize =(20, 14)) 
  
plt.boxplot(data) 
  
# show plot 
plt.show()

Explanation of the code

Here I am first importing matplotlib and numpy python packages. After that creating normal distribution datapoints using the np.random.normal() method. Inside the method there are three parameters. The first one is the mean value, second is the standard deviation values and the last one is the number of datapoints. After that to change  the size of the plot I am using the figsize. Lastly to plot pass the data as an argument to the boxplot() method.

When you will run the code you will get the following plot as an output.

Output

Simple Boxplot using Matplotlib
Simple Boxplot using Matplotlib

You can see in the plot , the yellow line is mean value 50 and the upper horizontal line is the upper extreme and lower is the lower extreme.

Example 2 : Boxplot in Matplotlib for Iris Dataset

In this example I will use the real life Iris dataset. The data set contains 3 classes of 50 instances each, where each class refers to a type of iris plant. Here I will plot the boxplot for the first four columns. Execute the below lines of code.

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
iris_data = pd.read_csv("iris.data",header=None,sep=",")
iris_data.columns = ["sepal length","sepal width","petal length","petal width", "species" ]
data= iris_data.iloc[:,0:4].values # read the values of the first 4 columns

# show plot
fig = plt.figure(figsize =(20, 14))
plt.boxplot(data)
plt.show()

Explanation of the code

Here I am first reading the dataset using the pandas.reas_csv() method. After reading changing the columns name using the iris_data.columns. Then I am taking all the rows values for the first four columns. At last I will plot the boxplot on selected columns. You will get the following figure as an output.

Boxplot using Matplotlib on Iris Dataset
Boxplot using Matplotlib on Iris Dataset

Conclusion

Boxplot is very useful for removing outliers in your Dataset. Outliers values always effect the predictiveness of the machine learning model. These are examples on the implementation of box plot. I hope you have liked this tutorial. If you any doubt then you can contact us for more help.

Source:

Iris Dataset

numpy.random.normal

Boxplot Documentation

Join our list

Subscribe to our mailing list and get interesting stuff and updates to your email inbox.

Thank you for signup. A Confirmation Email has been sent to your Email Address.

Something went wrong.

Meet Sukesh ( Chief Editor ), a passionate and skilled Python programmer with a deep fascination for data science, NumPy, and Pandas. His journey in the world of coding began as a curious explorer and has evolved into a seasoned data enthusiast.
 
Thank you For sharing.We appreciate your support. Don't Forget to LIKE and FOLLOW our SITE to keep UPDATED with Data Science Learner