You can eliminate duplicate entries following the merging of two dataframes using an inner join by using the drop_duplicates()
function in pandas.
Here is an example code:
import pandas as pd
df1 = pd.DataFrame({'ID': [1, 2, 3, 4], 'Name': ['John', 'Jane', 'Alice', 'Bob']})
df2 = pd.DataFrame({'ID': [1, 2, 3, 5], 'Age': [30, 25, 40, 35]})
df_merge = pd.merge(df1, df2, on='ID', how='inner')
df_merge = df_merge.drop_duplicates(subset=['ID'], keep='first')
print(df_merge)
In this example, we are merging two dataframes df1
and df2
using an inner join on the 'ID' column. After merging, we are dropping any duplicated rows based on the 'ID' column using the drop_duplicates()
function. The subset
parameter specifies which column to check for duplicates, and keep
parameter specifies which duplicate row to keep (in this case we keep the first occurrence).
The output of this code will be:
ID Name Age
0 1 John 30
1 2 Jane 25
2 3 Alice 40
As you can see, the duplicated row with ID=4 is eliminated.
Asked: 2023-03-28 11:00:00 +0000
Seen: 1 times
Last updated: Feb 20 '22