Ask Your Question
2

How can I eliminate duplicate entries following the merging of two dataframes using an inner join?

asked 2023-03-28 11:00:00 +0000

bukephalos gravatar image

edit retag flag offensive close merge delete

1 Answer

Sort by ยป oldest newest most voted
0

answered 2022-02-20 07:00:00 +0000

nofretete gravatar image

You can eliminate duplicate entries following the merging of two dataframes using an inner join by using the drop_duplicates() function in pandas.

Here is an example code:

import pandas as pd

df1 = pd.DataFrame({'ID': [1, 2, 3, 4], 'Name': ['John', 'Jane', 'Alice', 'Bob']})
df2 = pd.DataFrame({'ID': [1, 2, 3, 5], 'Age': [30, 25, 40, 35]})

df_merge = pd.merge(df1, df2, on='ID', how='inner')
df_merge = df_merge.drop_duplicates(subset=['ID'], keep='first')
print(df_merge)

In this example, we are merging two dataframes df1 and df2 using an inner join on the 'ID' column. After merging, we are dropping any duplicated rows based on the 'ID' column using the drop_duplicates() function. The subset parameter specifies which column to check for duplicates, and keep parameter specifies which duplicate row to keep (in this case we keep the first occurrence).

The output of this code will be:

   ID Name  Age
0   1 John   30
1   2 Jane   25
2   3 Alice  40

As you can see, the duplicated row with ID=4 is eliminated.

edit flag offensive delete link more

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account. This space is reserved only for answers. If you would like to engage in a discussion, please instead post a comment under the question or an answer that you would like to discuss

Add Answer


Question Tools

Stats

Asked: 2023-03-28 11:00:00 +0000

Seen: 1 times

Last updated: Feb 20 '22