How to Calculate Covariance of a Dataframe in Python : Various Examples

Pandas is the best Python package for data creation and manipulation. There are many inbuilt functions in it that make you quickly find the results on the data. You can find the covariance of the dataframe using one of the functions. In this entire tutorial, you will learn how to calculate covariance of dataframe using various examples.

What is the covariance?

Covariance is used to analyze the relationship between the two variables. If the value of the covariance is positive then you can say that one variable follows the same pattern as the second variable. And if the covariance is negative then you can say one variable follows the opposite pattern as the second variable.

Covariance between two variables
Image Source: Wikipedia

Examples to Calculate covariance of dataframe in Python

In this entire section, you will know the various examples for calculating covariance of pandas dataframe in python. You should note that all the coding demonstration has been done on the Jupyter Notebook. So it’s better that you should do all the coding parts in the notebook only for better understanding.

Example 1: Find covariance for entire datafrmae

Suppose you want to calculate covariance on the entire dataframe. Then you can do so using the pandas.Dataframe.cov(). Just apply cov() on the dataframe and it will find the covariance for the entire columns.

Execute the below lines of code.

import pandas as pd
data = {"col1":[1,5,4],"col2":[3,7,1],"col3":[6,8,1],"col4":[10,7,2]}
df = pd.DataFrame(data)
print(df.cov())

Output

Finding covariance for the entire dataframe

Example 2: Covariance for the entire dataframe with NaN value

If there is NaN, None, null value in the dataframe, then you can also find the covariance. In this case, the method cov() will ignore the NaN value and it will be not used for calculating the covariance of dataframe in python.

Run the below lines of code.

import pandas as pd
import numpy as np
data = {"col1":[1,5,np.nan],"col2":[3,np.nan,1],"col3":[6,8,1],"col4":[10,7,np.nan]}
df = pd.DataFrame(data)
print(df.cov())

Output

Finding covariance for the entire dataframe with NaN values

Example 3: Covariance of dataframe with Series

Suppose you have one dataframe and one Series . Using the Series you want to calculate the covariance of the dataframe. Then you can do so using the apply() method.

Execute the below lines of code.

import pandas as pd
import numpy as np
data = {"col1":[1,5,np.nan],"col2":[3,np.nan,1],"col3":[6,8,1],"col4":[10,7,np.nan]}
df = pd.DataFrame(data)
s = pd.Series(np.random.rand(3))
df.apply(lambda column: s.cov(column))

Output

Finding covariance for the entire dataframe using Series

Here I have used np.random.rand() method to create a Series of three rows that is the same number of rows of the dataframe. The apply() method will take each value of the Series and dataframe to calculate the covariance.

Conclusion

Covariance has many useful applications in real life. Using it you can find relationships and patterns between many variables in the dataset. These are the methods to calculate the covariance of dataframe in python.

I hope you have liked this tutorial. If you have any queries then you can contact us for more help.