Valueerror: cannot reindex from a duplicate axis ( Solved )

Valueerror cannot reindex from a duplicate axis

Valueerror: cannot reindex from a duplicate axis error occurs while processing any pandas dataframe with duplicate indexes. There are multiple scenario where we explode one existing rows in multiple rows and that end up in duplicate indexes. Well, this is not the only reason for duplicate indexes, There are multiple other ways like incorrect initialization with externally provided indexes or merging and appending pandas dataframe can also end in duplicate indexes. Now let’s see how can we fix this error.

Valueerror: cannot reindex from a duplicate axis ( Replication) –

Before we proceed with a solution we need to create a dummy dataframe with duplicate indexes. Here is the code for this.

import pandas as pd
data = {
    'Col1' : ['A', 'B', 'C', 'D'],
    'Col2' : [23, 21, 22, 21],
}
df = pd.DataFrame(data)
#initailaizing duplicate indexes
index = pd.Index(['1', '1', '2','3'])
df = df.set_index(index)
df.head()

Now we will try to re-index this dataframe and it will throw the above error.

Valueerror: cannot reindex from a duplicate axis
Valueerror: cannot reindex from a duplicate axis

Valueerror: cannot reindex from a duplicate axis ( Solution ) –

We can fix up this error in multiple ways.  Let’s explore them one by one.

Solution 1: Dropping Rows with duplicate index –

We will use .loc() function to select the rows which have unique indexes. For filtering, we will use ‘~’ operator and select duplicate index. Since we are using ‘~’ operator it will ignore the duplicate index row and return the dataframe.

df= df.loc[~df.index.duplicated(), :]
cannot reindex from a duplicate axis solution using dropping duplicate
cannot reindex from a duplicate axis solution using dropping duplicate

Solution 2 : Setting up ignore_index=True –

Here basically we will set ignore_index=True while any concat operation or union operation etc. Here is the syntax for the same.

df = pd.concat(dfs,axis=0,ignore_index=True)

Solution 3 : Resetting Index –

We can reset the index then the older duplicate index will overwrite the newly initialized. Here is the code which will reset the pandas dataframe index.

df.reset_index(level=0, inplace=True)

 

Solution 4: Setting allows_duplicate_labels as False –

We can set the below property to avoid any duplicate index. Please refer the below syntax. We need to insert this assertion-type statement at the start of the pandas code.

df.flags.allows_duplicate_labels = False

 

Thanks
Data Science Learner Team

Join our list

Subscribe to our mailing list and get interesting stuff and updates to your email inbox.

Thank you for signup. A Confirmation Email has been sent to your Email Address.

Something went wrong.

Meet Abhishek ( Chief Editor) , a data scientist with major expertise in NLP and Text Analytics. He has worked on various projects involving text data and have been able to achieve great results. He is currently manages Datasciencelearner.com, where he and his team share knowledge and help others learn more about data science.
 
Thank you For sharing.We appreciate your support. Don't Forget to LIKE and FOLLOW our SITE to keep UPDATED with Data Science Learner