Python Pandas One-Hot Encoding for CSV Data: Transforming Categorical Features
This guide demonstrates how to use Python's Pandas library to one-hot encode specific columns in a CSV file and save the transformed data to a new file. This technique is crucial for preparing data for machine learning models that often require numerical features.
Steps
-
Import Pandas: Begin by importing the Pandas library.
import pandas as pd -
Load CSV: Read your original CSV file into a Pandas DataFrame.
df = pd.read_csv('original.csv') -
One-Hot Encoding: Apply one-hot encoding to the desired columns ('proto', 'service', 'state', 'attack_cat' in this example).
df = pd.get_dummies(df, columns=['proto', 'service', 'state', 'attack_cat']) -
Drop Columns: Remove unnecessary columns like 'id' and 'label'.
df = df.drop(['id', 'label'], axis=1) -
Save to CSV: Save the modified DataFrame to a new CSV file.
df.to_csv('new.csv', index=False)
Code Example
import pandas as pd
# Load the original CSV file
# Replace 'original.csv' with your actual file path
df = pd.read_csv('original.csv')
# One-hot encode specific columns
df = pd.get_dummies(df, columns=['proto', 'service', 'state', 'attack_cat'])
# Drop unwanted columns
df = df.drop(['id', 'label'], axis=1)
# Save to a new CSV file
# Replace 'new.csv' with your desired output file name
df.to_csv('new.csv', index=False)
Important:
- Replace 'original.csv' and 'new.csv' with the actual paths to your files.
- Ensure that the specified columns ('proto', 'service', 'state', 'attack_cat', 'id', 'label') are present in your CSV file.
This code snippet will create a new CSV file ('new.csv') with one-hot encoded features. The original columns will be replaced with their corresponding binary representations, making your data suitable for machine learning models.
原文地址: https://www.cveoy.top/t/topic/i41z 著作权归作者所有。请勿转载和采集!