This Python function provides a convenient way to perform one-hot encoding on a specific column within a Pandas DataFrame.

import pandas as pd

def onehot_encode(df, col_name):
    'Performs one-hot encoding on a specified column in a Pandas DataFrame.'
    # Get all unique values in the column
    unique_values = df[col_name].unique()
    
    # Construct new column names
    new_col_names = [f'{col_name}_{value}' for value in unique_values]
    
    # Create one-hot encoding matrix
    onehot_matrix = pd.get_dummies(df[col_name])
    onehot_matrix.columns = new_col_names
    
    # Add one-hot encoding matrix to the original DataFrame
    df = pd.concat([df, onehot_matrix], axis=1)
    
    # Remove the original column
    df.drop(col_name, axis=1, inplace=True)
    
    return df

Usage Example:

# Create sample data
data = {
    'id': [1, 2, 3, 4],
    'color': ['red', 'green', 'blue', 'red']
}
df = pd.DataFrame(data)

# Perform one-hot encoding on the 'color' column
df = onehot_encode(df, 'color')

print(df)

Output:

   id  color_red  color_green  color_blue
0   1          1            0           0
1   2          0            1           0
2   3          0            0           1
3   4          1            0           0

This function allows you to easily transform categorical data into a numerical representation suitable for machine learning models.

Python One-Hot Encoding Function for Pandas DataFrames

原文地址: https://www.cveoy.top/t/topic/ohWR 著作权归作者所有。请勿转载和采集!

免费AI点我,无需注册和登录