Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

Pandas can convert the data in a column into serial numbers by using the method "pd.factorize()". This method converts a column of categorical data into an array of numerical values. Here is an example:

import pandas as pd

# create a dataframe with a categorical column
df = pd.DataFrame({'fruit': ['apple', 'banana', 'apple', 'banana', 'orange']})

# use pd.factorize() to convert the categorical column 'fruit' into numerical values
df['fruit_id'] = pd.factorize(df['fruit'])[0]

# display the new dataframe with serial numbers in 'fruit_id' column
print(df)

Output:

    fruit  fruit_id
0   apple         0
1  banana         1
2   apple         0
3  banana         1
4  orange         2

In this example, the "fruit" column is converted into serial numbers in the "fruit_id" column. The values for "apple", "banana", and "orange" are converted to 0, 1, and 2, respectively. Note that the "factorize()" method also returns a tuple with the unique categorical values, but we are only interested in the numerical values, so we use the first element of the tuple.