Somethings We have the dataset that is provided not in single CSVs files. These are in separate excel sheets. And you already know that Its better that We should do all the computational or preprocessing tasks on a Single Dataset that more than one datasets. It reduces our time for doing all the preprocessing tasks. If you want to do so then this entire post is for you. In this tutorial, you will Know to Join or Merge Two CSV files using the Popular Python Pandas Library.
Steps By Step to Merge Two CSV Files
Step 1: Import the Necessary Libraries
import pandas as pd
Here all things are done using pandas python library. So I am importing pandas only.
Step 2: Load the Dataset
I have created two CSV datasets on Stocks Data one is a set of stocks and the other is the turnover of the stocks. Read it using the Pandas read_csv() method. I have included all the datasets in the Conclusion Section.
csv1 = pd.read_csv("data/TurnoverList.csv") csv1.head()
csv2 = pd.read_csv("data/EquityList.csv") csv2.head()
Step 3: Merge the Sheets
Now to merge the two CSV files you have to use the dataframe.merge() method and define the column, you want to do merging. If the data is not available for the specific columns in the other sheets then the corresponding rows will be deleted. You can verify using the shape() method. Use the following code.
merged_data = csv1.merge(csv2,on=["Security Code"]) merged_data.head()
Other Things you can Do
Now there is a case when you want to append the rows only of one sheet to another sheet and vice-versa. To this, you have to use concate() method. Suppose I have two sheets of the same dataset and I want to work on a single sheet. Then I have to first add all the rows of one sheet to another. After that I can do anything from that dataset. Below is the code for appending the rows in a Dataframe.
data1 = pd.read_csv("data/sheet1.csv") data1.head()
data2 = pd.read_csv("data/sheet2.csv") data2.head()
concate_data = pd.concat([data1,data2]) concate_data.head()
Most of the Data Scientist do data analysis on the single sheets. When you search online for any Dataset then you will mostly see the dataset in a single sheet. You should also do this as doing analysis on a single sheet increase efficiency and reduce computational task.
I hope you have understood how to Join Two CSV Files in Python Using Pandas. If you have any query please contact us for more information. Below is the dataset for all the examples taken here.
1. You are getting the error ” Columns not found in either dataset: …”
You may get this error while joining two CSV files. You are getting this error as one of two CSV files does not have a columns name on which you are merging. To solve it you have to make sure the columns exist in CSV files before joining.
2. Getting MergeError: No common columns to perform merge on”
If you are getting this error then interpreter is telling you that all the CSV files you want to join do not have any common columns. To solve this issue you have to do merging on different columns or you have to add a common column on the CSV files you want to perform the merge.
Join our list
Subscribe to our mailing list and get interesting stuff and updates to your email inbox.