Pandas is the best Python library for creating and manipulating dataframe. You can read CSV files, manipulate them and also export the final CSV file after manipulation. Suppose you have two CSV files and want to compare both of them. How you will do? In this entire tutorial, you will learn how to how to compare two CSV files in python using pandas using various methods.
What are CSV Files ?
The full for mos CSV file is Comma-Separated Values. It is a plain text that allows you to store tabular data. Todays CSV files are widely used file format that allows many users to transfer data among various software applications like Microsoft Excel or Google Sheets. In CSV file each single line represent row of data and it is separated through comma. Thats why it named Comma Separated Value.
Steps to Compare Two CSV Files in Python
In this section, you will know all the steps required for comparing two CSV files in Python using pandas. Just follow all the steps for a better understanding.
Step 1: Create a Sample CSVs file
The first step is to create sample CSV files for the method. Firstly I will create two sample dataframes and then export each dataframe to a CSV file. You will use the pd.DataFrame() function to create a dataframe. These data frames will use to read the CSV file and compare them.
Execute the below lines of code to create two sample CSVs files.
import pandas as pd
data1 = {"country":["India","USA","UK","Germany"],"dial_code":[91,1,44,49]}
df1 = pd.DataFrame(data1)
df1.to_csv("data1.csv",index=None)
data2 = {"country":["India","USA","UK","Germany","Australia","China"],"dial_code":[91,1,44,49,61,86]}
df2 = pd.DataFrame(data2)
df2.to_csv("data2.csv",index=None)
print(df1,"\n")
print(df2)
Output

Step 2: Read the CSV files
The second step is to read the created CSV files. You can read the CSV file using the pandas read_csv() method. Just pass the filename of the CSV file. It will convert the CSV file data to dataframe for manipulation.
Run the below lines of code to read your CSV files.
import pandas as pd
df1 = pd.read_csv("data1.csv")
df2 = pd.read_csv("data2.csv")
print(df1,"\n")
print(df2,"\n")
Output
country dial_code
0 India 91
1 USA 1
2 UK 44
3 Germany 49
country dial_code
0 India 91
1 USA 1
2 UK 44
3 Germany 49
4 Australia 61
5 China 86
Step 3: Implement the method to compare two CSV files in Python using pandas
Now the CSV files have been read. Let’s compare both files. You will learn the various method to compare to CSV files in Python.
Method 1: Using the isin() method
The pandas package has a function isin() that allows you to check whether there are records in both CSV files or not. If it finds then returns true else returns false. After that, we can get the values using the df[“boolean_result”].
Execute the below lines of code to compare the two CSV files.
# method 1 isin() metthod
import pandas as pd
df1 = pd.read_csv("data1.csv")
df2 = pd.read_csv("data2.csv")
c_result = df1[df1.apply(tuple,1).isin(df2.apply(tuple,1))]
print(c_result)
Output

You can see I am using the two CSV files with the names data1 and data2. Here I am also using the apply() method for comparing each row record with each CSV file.
Method 2: Compare CSV files using the merge() method
Pandas also have a function merge() that is useful in comparing the two CSV files. It performs an inner join, outer join or both join on columns. You have to just pass the dataframes you want to compare as a list inside the merge() method. The function will compare and returns the dataframe.
Run the below lines of code to compare the CSV files.
# method 2 merge() method
import pandas as pd
df1 = pd.read_csv("data1.csv")
df2 = pd.read_csv("data2.csv")
c_result_m = pd.merge(df1,df2)
print(c_result_m)
Output

Method 3: Using iteration
You can also compare two CSV files using the for loop or iteration. Here you will read both CSV files and compare each row of one CSV file to another CSV file. This method is best if the size of the CSV file is small.
Method 4: Using the Using csv-diff
Another method to compare the CSV file is the external module and it is csv-diff. Here after reading both CSV files, you will pass them to the function csv_diff.diff() function. It is best suitable if the file size of the CSV file is larget.
Conclusion
Pandas is the best Python package for manipulating large datasets. If you have CSV files then you can compare them using the above methods.
I hope you have liked this tutorial. If you have any suggestions or want to include some methods in this tutorial then you can contact us for more information.
Join our list
Subscribe to our mailing list and get interesting stuff and updates to your email inbox.