Get Dummy Variables for a column in Pandas: pandas.get_dummies()

Get Dummy Variables for a column in Pandas using pandas.get_dummies()
Get Dummy Variables for a column in Pandas using pandas.get_dummies()

Do you want to convert the categorical variable to the dummy variable? If yes then this post is for you. Here you will know how to get dummy variables for a column in pandas using the pandas get_dummies method.

Syntax of Pandas get_dummies method

Before going to the demonstration part let’s learn the syntax of the method.

pandas.get_dummies(data, prefix=None, prefix_sep='_', dummy_na=False, 
columns=None, sparse=False, drop_first=False, dtype=None)

The explanation of the most used parameters is below.

data: Your input dataframe or a column of it.

prefix: String to append before the name of columns of the dataframe.

prefix_sep: Use to add custom words separator. The default value is “_”.

dummy_na: Use to ignore or consider the NaN value in a column. The default value is False.

columns: On which column you want to encode. If it is None then the encoding will be done on all columns.

sparse: Whether the dummy-encoded columns should be backed by a SparseArray (True) or a regular NumPy array (False).

drop_first: Use it to get k-1 dummies out of k categorical levels by removing the first level.

dtype: Define the type of the column. Only a single dtype is allowed.

In the next section, you will know the steps to implement pandas get_dummies() method

Step to implement Pandas get_dummies method

Step 1: Import the necessary libraries.

Here I am using two python modules one is pandas for dataframe creation. And the other module is NumPy for creating NaN values. So let’s import them.

import pandas as pd
import numpy as np

Step 2: Create a Sample Dataframe.

Let’s create a dataframe to implement the pandas get_dummies() function in python. You can use your own dataset but for the sake of simplicity, I am creating a very simple dataframe. Use the below code to create it.

import pandas as pd
import numpy as np
data = {"col1":["A","B","C","D"],"col2":["E","D","F","G"],"col3":[1,2,3,4]}
df = pd.DataFrame(data=data)

Output

Sample Datafrme for implementing the get_dummies method
Sample Datafrme for implementing the get_dummies method

Step 3: Get Dummy Variables for Dataframe using pandas get_dummies()

Now let’s apply the get_dummies() method and convert categorical values into dummy variables. You will know each example.

Example 1: Finding Dummy Variables For Whole Dataframe.

To find a dummy variable for the whole dataframe, you have to just pass the dataframe and it will create it.

pd.get_dummies(df)

Output

Finding Dummy Variables For Whole Dataframe
Finding Dummy Variables For the Whole Dataframe

You can see each categorical value has been converted to a dummy variable.

Example 2: Finding Dummy Variables For a Single Column

Suppose I want to create a dummy variable for a single column. To do so you have to pass that column as an argument. For example, I want to create dummy variables for the col1, then I will execute the following code.

pd.get_dummies(df.col1)

Output

Finding Dummy Variables For a Single Column
Finding Dummy Variables For a Single Column

Example 3: Dummy variables with NaN value

In this example, I will explain how to include the NaN value and ignore it while the creation of the dummy variable.  But before that let’s add NaN value.

Run the below code.

import pandas as pd
import numpy as np
data = {"col1":["A","B","C",np.nan],"col2":["E","D","F","G"],"col3":[1,2,3,4]}
df = pd.DataFrame(data=data)

Output

Sample Dataftame with NaN value
Sample Dataftame with NaN value

Now let’s create the Dummy variable on col1 with the additional parameter dummy_na=True. It will also consider NaN as the category variable.

pd.get_dummies(df.col1,dummy_na=True)

Output

get_dummies() implementation of Dataframe with NaN
get_dummies() implementation of Dataframe with NaN

If you want to ignore NaN then use dummy_na= False.

pd.get_dummies(df.col1,dummy_na=False)

Output

get_dummies() implementation of Dataframe with dummy_na= False
get_dummies() implementation of Dataframe with dummy_na= False

Example 4: Dropping the First Categorical Variable

Suppose I want to ignore the first variable then you will use drop_first=True as an additional argument. It will remove the first categorical variable and convert dataframe to dummy variables using the remaining variables.

Run the code and see the output.

pd.get_dummies(df,drop_first=True)

Output

Dropping the First Categorical Variable
Dropping the First Categorical Variable

Conclusion

The pandas get_dummies() method allows you to convert the categorical variable to dummy variables. It is also known as hot encoding. And this feature is very useful in making good machine learning models. These are the examples I have compiled for you for deep understanding. Even if you have any queries then you can contact us.

Source:

Pandas Documentation

Join our list

Subscribe to our mailing list and get interesting stuff and updates to your email inbox.

Thank you for signup. A Confirmation Email has been sent to your Email Address.

Something went wrong.

Meet Sukesh ( Chief Editor ), a passionate and skilled Python programmer with a deep fascination for data science, NumPy, and Pandas. His journey in the world of coding began as a curious explorer and has evolved into a seasoned data enthusiast.
 
Thank you For sharing.We appreciate your support. Don't Forget to LIKE and FOLLOW our SITE to keep UPDATED with Data Science Learner