How can I get the names of the features after applying OneHotEncode in ColumnTransformer using Sklearn Pipeline?

asked 2021-06-20 11:00:00 +0000

bukephalos gravatar image

1 Answer

answered 2023-02-09 17:00:00 +0000

plato gravatar image

You can access the names of the features generated by OneHotEncode in ColumnTransformer using the get_feature_names() method. Here's an example:

from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import OneHotEncoder

# Define ColumnTransformer to apply OneHotEncoder to categorical features
preprocessor = ColumnTransformer(transformers=[('cat', OneHotEncoder(), ['gender', 'education'])])

# Define pipeline to apply preprocessor and other steps
pipeline = Pipeline(steps=[('preprocessor', preprocessor), ('classifier', LogisticRegression())])

# Fit pipeline to data, y_train)

# Get the names of the generated features
feature_names = pipeline.named_steps['preprocessor'].transformers_[0].named_steps['onehotencoder'].get_feature_names(['gender', 'education'])

In the example above, get_feature_names() is used to get the names of the generated features for the columns gender and education. The resulting feature_names list will contain the names of the generated features in the order they were created by the OneHotEncoder.

Asked: 2021-06-20 11:00:00 +0000

Last updated: Feb 09