Python One-Hot Encoding Function for Pandas DataFrames
This Python function provides a convenient way to perform one-hot encoding on a specific column within a Pandas DataFrame.
import pandas as pd
def onehot_encode(df, col_name):
'Performs one-hot encoding on a specified column in a Pandas DataFrame.'
# Get all unique values in the column
unique_values = df[col_name].unique()
# Construct new column names
new_col_names = [f'{col_name}_{value}' for value in unique_values]
# Create one-hot encoding matrix
onehot_matrix = pd.get_dummies(df[col_name])
onehot_matrix.columns = new_col_names
# Add one-hot encoding matrix to the original DataFrame
df = pd.concat([df, onehot_matrix], axis=1)
# Remove the original column
df.drop(col_name, axis=1, inplace=True)
return df
Usage Example:
# Create sample data
data = {
'id': [1, 2, 3, 4],
'color': ['red', 'green', 'blue', 'red']
}
df = pd.DataFrame(data)
# Perform one-hot encoding on the 'color' column
df = onehot_encode(df, 'color')
print(df)
Output:
id color_red color_green color_blue
0 1 1 0 0
1 2 0 1 0
2 3 0 0 1
3 4 1 0 0
This function allows you to easily transform categorical data into a numerical representation suitable for machine learning models.
原文地址: https://www.cveoy.top/t/topic/ohWR 著作权归作者所有。请勿转载和采集!