Matplotlib boxplot is a way of summarization a set of data that are mesaured on the interval scale. It considers mean, standard deviation while plotting the figure. Boxplot is very useful in modeling a machine learning model. It allows you to find the outliers in the dataset that has to be removed before training the model. In this entire tutorial you will know how to plot a boxplot in Matplotlib.
Examples of Matplotlib Boxplot
In this section you will first know how to boxplot on simple dataset and then you will know how to plot on real dataset.
Example 1: Simple Boxplot in Matplotlib
Here First we will create Gaussian normal distribution dataset with 100 values. After that I will boxplot the data points. Execute the below lines of code.
import matplotlib.pyplot as plt import numpy as np # dataset np.random.seed(10) data = np.random.normal(50, 30, 200) fig = plt.figure(figsize =(20, 14)) plt.boxplot(data) # show plot plt.show()
Explanation of the code
Here I am first importing matplotlib and numpy python packages. After that creating normal distribution datapoints using the np.random.normal() method. Inside the method there are three parameters. The first one is the mean value, second is the standard deviation values and the last one is the number of datapoints. After that to change the size of the plot I am using the figsize. Lastly to plot pass the data as an argument to the boxplot() method.
When you will run the code you will get the following plot as an output.
You can see in the plot , the yellow line is mean value 50 and the upper horizontal line is the upper extreme and lower is the lower extreme.
Example 2 : Boxplot in Matplotlib for Iris Dataset
In this example I will use the real life Iris dataset. The data set contains 3 classes of 50 instances each, where each class refers to a type of iris plant. Here I will plot the boxplot for the first four columns. Execute the below lines of code.
import matplotlib.pyplot as plt import numpy as np import pandas as pd iris_data = pd.read_csv("iris.data",header=None,sep=",") iris_data.columns = ["sepal length","sepal width","petal length","petal width", "species" ] data= iris_data.iloc[:,0:4].values # read the values of the first 4 columns # show plot fig = plt.figure(figsize =(20, 14)) plt.boxplot(data) plt.show()
Explanation of the code
Here I am first reading the dataset using the pandas.reas_csv() method. After reading changing the columns name using the iris_data.columns. Then I am taking all the rows values for the first four columns. At last I will plot the boxplot on selected columns. You will get the following figure as an output.
Boxplot is very useful for removing outliers in your Dataset. Outliers values always effect the predictiveness of the machine learning model. These are examples on the implementation of box plot. I hope you have liked this tutorial. If you any doubt then you can contact us for more help.
Join our list
Subscribe to our mailing list and get interesting stuff and updates to your email inbox.