Pandas is the Python package that allows you to create dataframe and manipulate it. To do so it has many inbuilt functions. Suppose you have a column that is categorical, then how y ou can get a list of categories columns in pandas? In this post, you will learn how to Get the Cagtegorical Columns in Pandas through steps.
Before going to the steps let’s first create a sample data frame that will be used in this dataframe. Run the below lines of code to create a dataframe with the column “Gender” as the category column.
import pandas as pd
# Sample data in dictionary format
data = {
'Name': ['John', 'Megan', 'Sarah', 'Jake', 'Amy'],
'Gender': ['Male', 'Female', 'Female', 'Male', 'Female'],
'Age': [25, 30, 27, 32, 28]
}
# Convert data to a DataFrame
df = pd.DataFrame(data)
# Convert 'Gender' column to categorical
df['Gender'] = df['Gender'].astype('category')
# Print the DataFrame
print(df)
Output
Lets know all the steps that will you use to get the categorical columns in pandas.
The first step is to import the required library. In our example, only the pandas library will be used so let’s import it using the import statement.
import pandas as pd
I am using the above sample dataframe. So I will use it here. But in case you have already a CSV file then use the below line of code to read the CSV file.
data = pd.read_csv('data.csv')
You can go to step 3 if you want to use the sample dataframe.
Now the third step is to know the data type of each column. To do so you will use the data.types. Add the below line of code.
data_types = data.dtypes
After identifying the columns you will filter the column that will contain the datatype as “category” and find its index. Use the below line of code.
categorical_columns = data_types[data_types == 'category'].index
That is all you have to do to get the categorical columns in pandas.
Full Code
import pandas as pd
# Sample data in dictionary format
data = {
'Name': ['John', 'Megan', 'Sarah', 'Jake', 'Amy'],
'Gender': ['Male', 'Female', 'Female', 'Male', 'Female'],
'Age': [25, 30, 27, 32, 28]
}
# Convert data to a DataFrame
df = pd.DataFrame(data)
# Convert 'Gender' column to categorical
df['Gender'] = df['Gender'].astype('category')
data_types = df.dtypes
categorical_columns = data_types[data_types == 'category'].index
print(categorical_columns)
Output
Sometimes you want to know the category of the dataframe to build the correct machine-learning model. The above steps will be very useful in categorizing the dataset and Get the Categorical Columns in Pandas. Just follow it for a deep understanding.
I hope you have liked this tutorial. If you have any queries then you can contact us for more help.