Pandas is the best package for data manipulation. You can create dataframe and do many things using the inbuilt pandas function. Let’s say you have a column that contains many special characters and want to remove them. Then how you can do so? In this tutorial, you will learn how to remove special characters from rows in pandas through steps.
Sample Dataframe
Before going to the steps let’s create a dummy dataframe that will be used in our example. However, you can choose your own datasets or read CSV files.
Below are the lines of code to create a sample dataframe.
import pandas as pd
data = {
'Name': ['John', 'Emma', 'Michael', 'Sophia'],
'Age': [25, 28, 32, 29],
'City': ['New York/', 'London/', 'Paris/', 'Tokyo/'],
'Salary': [50000, 60000, 70000, 55000]
}
# Create the DataFrame
df = pd.DataFrame(data)
print(df)
Output
Steps to remove special characters from rows in Pandas
Let’s know all the steps that you will use to remove the special characters from rows in pandas. Just follow them for deep understanding.
Step 1: Import all the required libraries
The first step is to import all the necessary libraries. In this example, I have used the pandas library only, so I will import using the import statement.
import pandas as pd
Step 2: Identify the column that contains special characters
The next step is to find the columns that contain special characters. In our example, you can see the City column contains “/” special character and it has to be removed.
Step 3: Remove special characters from rows in Pandas
To remove the special character from the column values you will use the str.replace() function on the City column. The replace() function will accept two parameters regex for special characters and regex=True as the second argument.
Use the below lines of code to remove the character.
df['City'] = df['City'].str.replace('[^a-zA-Z0-9]', '', regex=True)
Step 4: Display the dataframe
Now the last step is to display the dataframe to verify the City column. Use the below line of code to display the dataframe.
print(df)
You can also export the dataframe in CSV file using the pd.to_csv(“filaname.csv”)
Full Code
import pandas as pd
data = {
'Name': ['John', 'Emma', 'Michael', 'Sophia'],
'Age': [25, 28, 32, 29],
'City': ['New York/', 'London/', 'Paris/', 'Tokyo/'],
'Salary': [50000, 60000, 70000, 55000]
}
# Create the DataFrame
df = pd.DataFrame(data)
df['City'] = df['City'].str.replace('[^a-zA-Z0-9]', '', regex=True)
print(df)
Output
Conclusion
There may be cases when your datasets are not organized or contain special characters in the values. To optimize the dataset you have to remove them. The above steps are the only things you have to do to remove the special character.
I hope you have liked this tutorial. If you have any queries then you can contact us for more help.
Join our list
Subscribe to our mailing list and get interesting stuff and updates to your email inbox.