Pandas

Valueerror: cannot reindex from a duplicate axis ( Solved )

Valueerror: cannot reindex from a duplicate axis error occurs while processing any pandas dataframe with duplicate indexes. There are multiple scenario where we explode one existing rows in multiple rows and that end up in duplicate indexes. Well, this is not the only reason for duplicate indexes, There are multiple other ways like incorrect initialization with externally provided indexes or merging and appending pandas dataframe can also end in duplicate indexes. Now let’s see how can we fix this error.

Valueerror: cannot reindex from a duplicate axis ( Replication) –

Before we proceed with a solution we need to create a dummy dataframe with duplicate indexes. Here is the code for this.

import pandas as pd
data = {
    'Col1' : ['A', 'B', 'C', 'D'],
    'Col2' : [23, 21, 22, 21],
}
df = pd.DataFrame(data)
#initailaizing duplicate indexes
index = pd.Index(['1', '1', '2','3'])
df = df.set_index(index)
df.head()

Now we will try to re-index this dataframe and it will throw the above error.

Valueerror: cannot reindex from a duplicate axis

Valueerror: cannot reindex from a duplicate axis ( Solution ) –

We can fix up this error in multiple ways.  Let’s explore them one by one.

Solution 1: Dropping Rows with duplicate index –

We will use .loc() function to select the rows which have unique indexes. For filtering, we will use ‘~’ operator and select duplicate index. Since we are using ‘~’ operator it will ignore the duplicate index row and return the dataframe.

df= df.loc[~df.index.duplicated(), :]
cannot reindex from a duplicate axis solution using dropping duplicate

Solution 2 : Setting up ignore_index=True –

Here basically we will set ignore_index=True while any concat operation or union operation etc. Here is the syntax for the same.

df = pd.concat(dfs,axis=0,ignore_index=True)

Solution 3 : Resetting Index –

We can reset the index then the older duplicate index will overwrite the newly initialized. Here is the code which will reset the pandas dataframe index.

df.reset_index(level=0, inplace=True)

 

Solution 4: Setting allows_duplicate_labels as False –

We can set the below property to avoid any duplicate index. Please refer the below syntax. We need to insert this assertion-type statement at the start of the pandas code.

df.flags.allows_duplicate_labels = False

 

Thanks
Data Science Learner Team