Remove special characters from rows using Pandas : 4 Steps Only

Remove special characters from rows using Pandas

Pandas is the best package for data manipulation. You can create dataframe and do many things using the inbuilt pandas function. Let’s say you have a column that contains many special characters and want to remove them. Then how you can do so? In this tutorial, you will learn how to remove special characters from rows in pandas through steps.

Sample Dataframe

Before going to the steps let’s create a dummy dataframe that will be used in our example. However, you can choose your own datasets or  read CSV files.

Below are the lines of code to create a sample dataframe.

import pandas as pd
data = {
    'Name': ['John', 'Emma', 'Michael', 'Sophia'],
    'Age': [25, 28, 32, 29],
    'City': ['New York/', 'London/', 'Paris/', 'Tokyo/'],
    'Salary': [50000, 60000, 70000, 55000]
}

# Create the DataFrame
df = pd.DataFrame(data)

print(df)

Output

sample dataframe for removing special character from rows
sample dataframe for removing special character from rows

Steps  to remove special characters from rows in Pandas

Let’s know all the steps that you will use to remove the special characters from rows in pandas. Just follow them for deep understanding.

Step 1: Import all the required libraries

The first step is to import all the necessary libraries. In this example, I have used the pandas library only, so I will import using the import statement.

import pandas as pd

Step 2: Identify the column that contains special characters

The next step is to find the columns that contain special characters. In our example, you can see the City column contains “/” special character and it has to be removed.

Step 3: Remove special characters from rows in Pandas

To remove the special character from the column values you will use the str.replace() function on the City column. The replace() function will accept two parameters regex for special characters and regex=True as the second argument.

Use the below lines of code to remove the character.

df['City'] = df['City'].str.replace('[^a-zA-Z0-9]', '', regex=True)

Step 4: Display the dataframe

Now the last step is to display the dataframe to verify the City column. Use the below line of code to display the dataframe.

print(df)

You can also export the dataframe in CSV file using the pd.to_csv(“filaname.csv”)

Full Code

import pandas as pd
data = {
    'Name': ['John', 'Emma', 'Michael', 'Sophia'],
    'Age': [25, 28, 32, 29],
    'City': ['New York/', 'London/', 'Paris/', 'Tokyo/'],
    'Salary': [50000, 60000, 70000, 55000]
}

# Create the DataFrame
df = pd.DataFrame(data)
df['City'] = df['City'].str.replace('[^a-zA-Z0-9]', '', regex=True)

print(df)

Output

removed the special character from the rows

Conclusion

There may be cases when your datasets are not organized or contain special characters in the values. To optimize the dataset you have to remove them. The above steps are the only things you have to do to remove the special character.

I hope you have liked this tutorial. If you have any queries then you can contact us for more help.

Join our list

Subscribe to our mailing list and get interesting stuff and updates to your email inbox.

Thank you for signup. A Confirmation Email has been sent to your Email Address.

Something went wrong.

Meet Sukesh ( Chief Editor ), a passionate and skilled Python programmer with a deep fascination for data science, NumPy, and Pandas. His journey in the world of coding began as a curious explorer and has evolved into a seasoned data enthusiast.
 
Thank you For sharing.We appreciate your support. Don't Forget to LIKE and FOLLOW our SITE to keep UPDATED with Data Science Learner