How to Compare two CSV files in Python using Pandas ( Steps )

How to Compare two CSV files in Python using Pandas ( Steps )

Pandas is the best Python library for creating and manipulating dataframe. You can read CSV files, manipulate them and also export the final CSV file after manipulation. Suppose you have two CSV files and want to compare both of them. How you will do? In this entire tutorial, you will learn how to how to compare two CSV files in python using pandas using various methods.

What are CSV Files ?

The full for mos CSV file is Comma-Separated Values. It is a plain text that allows you to store tabular data. Todays CSV files are widely used file format that allows many users to transfer data among various software applications  like Microsoft Excel or Google Sheets. In CSV file each single line represent row of data and it is separated through comma. Thats why it named Comma Separated Value.

Steps to Compare Two CSV Files in Python

In this section, you will know all the steps required for comparing two CSV files in Python using pandas. Just follow all the steps for a better understanding.

Step 1: Create a Sample CSVs file

The first step is to create sample CSV files for the method. Firstly I will create two sample dataframes and then export each dataframe to a CSV file. You will use the pd.DataFrame() function to create a dataframe. These data frames will use to read the CSV file and compare them.

Execute the below lines of code to create two sample CSVs files.

import pandas as pd
data1 = {"country":["India","USA","UK","Germany"],"dial_code":[91,1,44,49]}
df1 = pd.DataFrame(data1)
df1.to_csv("data1.csv",index=None)
data2 = {"country":["India","USA","UK","Germany","Australia","China"],"dial_code":[91,1,44,49,61,86]}
df2 = pd.DataFrame(data2)
df2.to_csv("data2.csv",index=None)
print(df1,"\n")
print(df2)

Output

Sample dataframe creation for comparing between them
Sample dataframe creation for comparing between them

Step 2: Read  the CSV files

The second step is to read the created CSV files. You can read the CSV file using the pandas read_csv() method. Just pass the filename of the CSV file. It will convert the CSV file data to dataframe for manipulation.

Run the below lines of code to read your CSV files.

import pandas as pd
df1 = pd.read_csv("data1.csv")
df2 = pd.read_csv("data2.csv")
print(df1,"\n")
print(df2,"\n")

Output

  country  dial_code
0    India         91
1      USA          1
2       UK         44
3  Germany         49 

     country  dial_code
0      India         91
1        USA          1
2         UK         44
3    Germany         49
4  Australia         61
5      China         86 

 

Step 3: Implement the method to compare two CSV files in Python using pandas

Now the CSV files have been read. Let’s compare both files. You will learn the various method to compare to CSV files in Python.

Method 1: Using the isin() method

The pandas package has a function isin() that allows you to check whether there are records in both CSV files or not. If it finds then returns true else returns false. After that, we can get the values using the df[“boolean_result”].

Execute the below lines of code to compare the two CSV files.

# method 1 isin() metthod
import pandas as pd
df1 = pd.read_csv("data1.csv")
df2 = pd.read_csv("data2.csv")
c_result = df1[df1.apply(tuple,1).isin(df2.apply(tuple,1))]
print(c_result)

Output

Comparing two csv files using the isin() method
Comparing two csv files using the isin() method

You can see I am using the two CSV files with the names data1 and data2. Here I am also using the apply()  method for comparing each row record with each CSV file.

Method 2: Compare CSV files using the merge() method

Pandas also have a function merge() that is useful in comparing the two CSV files. It performs an inner join, outer join or both join on columns. You have to just pass the dataframes you want to compare as a list inside the merge() method. The function will compare and returns the dataframe.

Run the below lines of code to compare the CSV files.

# method 2 merge() method
import pandas as pd
df1 = pd.read_csv("data1.csv")
df2 = pd.read_csv("data2.csv")
c_result_m = pd.merge(df1,df2)
print(c_result_m)

Output

Comparing two csv files using the merge() method
Comparing two csv files using the merge() method

Method 3: Using iteration

You can also compare two CSV files using the for loop or iteration. Here you will read both CSV files and compare each row of one CSV file to another CSV file. This method is best if the size of the CSV file is small. It will return the lines when you compare and use the compare() function. If there are lines that start with “-” dash or ” + ” plus then CSV files are identical and if it is not then it’s not identical.

Method 4: Using the Using csv-diff

Another method to compare the CSV file is the external module and it is csv-diff. Here after reading both CSV files, you will pass them to the function csv_diff.diff() function. It is best suitable if the file size of the CSV file is larget.

Method 5: Using the difflib module

You can also use the difflib module to compare the two CSV Files. Here you first open both the CSV files using the filestram open() and read its content. After that you will compare the CSV file using the difflib.Differ() function.

Conclusion

Pandas is the best Python package for manipulating large datasets. If you have CSV files then you can compare them using the above methods.

I hope you have liked this tutorial. If you have any suggestions or want to include some methods in this tutorial then you can contact us for more information.

Join our list

Subscribe to our mailing list and get interesting stuff and updates to your email inbox.

Thank you for signup. A Confirmation Email has been sent to your Email Address.

Something went wrong.

Meet Sukesh ( Chief Editor ), a passionate and skilled Python programmer with a deep fascination for data science, NumPy, and Pandas. His journey in the world of coding began as a curious explorer and has evolved into a seasoned data enthusiast.
 
Thank you For sharing.We appreciate your support. Don't Forget to LIKE and FOLLOW our SITE to keep UPDATED with Data Science Learner