Pandas can convert the data in a column into serial numbers by using the method "pd.factorize()". This method converts a column of categorical data into an array of numerical values. Here is an example:
import pandas as pd
# create a dataframe with a categorical column
df = pd.DataFrame({'fruit': ['apple', 'banana', 'apple', 'banana', 'orange']})
# use pd.factorize() to convert the categorical column 'fruit' into numerical values
df['fruit_id'] = pd.factorize(df['fruit'])[0]
# display the new dataframe with serial numbers in 'fruit_id' column
print(df)
Output:
fruit fruit_id
0 apple 0
1 banana 1
2 apple 0
3 banana 1
4 orange 2
In this example, the "fruit" column is converted into serial numbers in the "fruit_id" column. The values for "apple", "banana", and "orange" are converted to 0, 1, and 2, respectively. Note that the "factorize()" method also returns a tuple with the unique categorical values, but we are only interested in the numerical values, so we use the first element of the tuple.
Please start posting anonymously - your entry will be published after you log in or create a new account. This space is reserved only for answers. If you would like to engage in a discussion, please instead post a comment under the question or an answer that you would like to discuss
Asked: 2023-02-21 11:00:00 +0000
Seen: 14 times
Last updated: Sep 11 '22