Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

You can eliminate duplicate entries following the merging of two dataframes using an inner join by using the drop_duplicates() function in pandas.

Here is an example code:

import pandas as pd

df1 = pd.DataFrame({'ID': [1, 2, 3, 4], 'Name': ['John', 'Jane', 'Alice', 'Bob']})
df2 = pd.DataFrame({'ID': [1, 2, 3, 5], 'Age': [30, 25, 40, 35]})

df_merge = pd.merge(df1, df2, on='ID', how='inner')
df_merge = df_merge.drop_duplicates(subset=['ID'], keep='first')
print(df_merge)

In this example, we are merging two dataframes df1 and df2 using an inner join on the 'ID' column. After merging, we are dropping any duplicated rows based on the 'ID' column using the drop_duplicates() function. The subset parameter specifies which column to check for duplicates, and keep parameter specifies which duplicate row to keep (in this case we keep the first occurrence).

The output of this code will be:

   ID Name  Age
0   1 John   30
1   2 Jane   25
2   3 Alice  40

As you can see, the duplicated row with ID=4 is eliminated.