Remove special characters from rows using Pandas : 4 Steps Only

Pandas is the best package for data manipulation. You can create dataframe and do many things using the inbuilt pandas function. Let’s say you have a column that contains many special characters and want to remove them. Then how you can do so? In this tutorial, you will learn how to remove special characters from rows in pandas through steps.

Sample Dataframe

Before going to the steps let’s create a dummy dataframe that will be used in our example. However, you can choose your own datasets or read CSV files.

Below are the lines of code to create a sample dataframe.

import pandas as pd
data = {
    'Name': ['John', 'Emma', 'Michael', 'Sophia'],
    'Age': [25, 28, 32, 29],
    'City': ['New York/', 'London/', 'Paris/', 'Tokyo/'],
    'Salary': [50000, 60000, 70000, 55000]
}

# Create the DataFrame
df = pd.DataFrame(data)

print(df)

Output

sample dataframe for removing special character from rows

Steps to remove special characters from rows in Pandas

Let’s know all the steps that you will use to remove the special characters from rows in pandas. Just follow them for deep understanding.

Step 1: Import all the required libraries

The first step is to import all the necessary libraries. In this example, I have used the pandas library only, so I will import using the import statement.

import pandas as pd

Step 2: Identify the column that contains special characters

The next step is to find the columns that contain special characters. In our example, you can see the City column contains “/” special character and it has to be removed.

Step 3: Remove special characters from rows in Pandas

To remove the special character from the column values you will use the str.replace() function on the City column. The replace() function will accept two parameters regex for special characters and regex=True as the second argument.

Use the below lines of code to remove the character.

df['City'] = df['City'].str.replace('[^a-zA-Z0-9]', '', regex=True)

Step 4: Display the dataframe

Now the last step is to display the dataframe to verify the City column. Use the below line of code to display the dataframe.

print(df)

You can also export the dataframe in CSV file using the pd.to_csv(“filaname.csv”)

Full Code

import pandas as pd
data = {
    'Name': ['John', 'Emma', 'Michael', 'Sophia'],
    'Age': [25, 28, 32, 29],
    'City': ['New York/', 'London/', 'Paris/', 'Tokyo/'],
    'Salary': [50000, 60000, 70000, 55000]
}

# Create the DataFrame
df = pd.DataFrame(data)
df['City'] = df['City'].str.replace('[^a-zA-Z0-9]', '', regex=True)

print(df)

Output

Conclusion

There may be cases when your datasets are not organized or contain special characters in the values. To optimize the dataset you have to remove them. The above steps are the only things you have to do to remove the special character.

I hope you have liked this tutorial. If you have any queries then you can contact us for more help.