Pandas One-Hot Encoding for CSV Data: Removing Columns and Saving Results
This guide demonstrates how to use Pandas in Python to one-hot encode specific columns in a CSV file, remove unnecessary columns, and save the processed data to a new CSV file.
Steps:
- Load the CSV File: Begin by importing the pandas library and reading your CSV file into a Pandas DataFrame using
pd.read_csv(). Replace 'input.csv' with the path to your original CSV file.
import pandas as pd
data = pd.read_csv('input.csv')
- Remove Unwanted Columns: Remove columns like 'id' and 'label' using the
drop()method. Specify the column names and setaxis=1to indicate you're dropping columns.
data = data.drop(['id', 'label'], axis=1)
- One-Hot Encode Specific Columns: Select the columns you want to one-hot encode (e.g., 'proto', 'service', 'state', 'attack_cat') and store them in a list. Apply one-hot encoding using
pd.get_dummies(), specifying the columns list. This expands each categorical column into multiple binary columns representing each unique value.
one_hot_columns = ['proto', 'service', 'state', 'attack_cat']
data = pd.get_dummies(data, columns=one_hot_columns)
- Save the Processed Data: Finally, save the processed DataFrame to a new CSV file using
to_csv(). Replace 'output.csv' with the desired output file path. Setindex=Falseto avoid saving the index as a column.
data.to_csv('output.csv', index=False)
Example:
import pandas as pd
data = pd.read_csv('input.csv')
data = data.drop(['id', 'label'], axis=1)
one_hot_columns = ['proto', 'service', 'state', 'attack_cat']
data = pd.get_dummies(data, columns=one_hot_columns)
data.to_csv('output.csv', index=False)
By running this code, you'll generate a new CSV file containing the one-hot encoded data without the 'id' and 'label' columns.
原文地址: https://www.cveoy.top/t/topic/i5qJ 著作权归作者所有。请勿转载和采集!