Data Preparation and Decision Tree Modeling in Python
This code snippet demonstrates data preparation and decision tree modeling using Python libraries like Pandas and Scikit-learn. Here's a step-by-step breakdown:
-
Define Feature Names:
feature_col = ['Terrorism', 'Thought-tendency', 'gender', 'Special-behavior-trajectory', 'Tobacco-alcohol']This line creates a list called
feature_colcontaining the names of features that will be used for building the machine learning model. -
Extract Features:
X = my_data[['Terrorism', 'Thought-tendency', 'gender', 'Special-behavior-trajectory', 'Tobacco-alcohol']]This line extracts the specified features from a DataFrame called
my_dataand assigns them to the variableX. This creates a new DataFrame containing only the selected features. -
Create Dummy Variables:
for n, my_str in enumerate(feature_col): my_dummy = pd.get_dummies(my_data[[my_str]], prefix=my_str) X = pd.concat([X, my_dummy], axis=1)This loop iterates through each feature in
feature_col. For each feature, it usespd.get_dummiesto create dummy variables (one-hot encoding) representing different categories within the feature. Theprefixargument ensures that the new dummy variable column names are labeled with the original feature name. Thepd.concatfunction then combines the original feature DataFrameXwith the newly created dummy variables DataFramemy_dummyalong the columns (axis=1). -
Define Original Feature List:
XX_feature = ['Terrorism', 'Thought-tendency', 'gender', 'Special-behavior-trajectory', 'Tobacco-alcohol']This line creates a new list called
XX_featurecontaining only the names of the original features (without the dummy variables). -
Extract Original Features:
XX = X[XX_feature]This line extracts the original features from the DataFrame
Xusing the listXX_featureand assigns them to the variableXX. -
Extract Target Variable:
Y = X['Terrorism']This line extracts the target variable, which is assumed to be 'Terrorism' in this case, from the DataFrame
Xand assigns it to the variableY. -
Split Data into Training and Testing Sets:
X_train, X_test, Y_train, Y_test = train_test_split(XX, Y, test_size=0.3, random_state=0)This line uses the
train_test_splitfunction from Scikit-learn to split the data into training and testing sets. Thetest_size=0.3argument specifies that 30% of the data will be used for testing, while the remaining 70% will be used for training. Therandom_state=0ensures that the splitting is consistent and reproducible. -
Create Decision Tree Classifier:
my_tree = DecisionTreeClassifier()This line creates an instance of the
DecisionTreeClassifiermodel from Scikit-learn and assigns it to the variablemy_tree. -
Fit the Model:
my_tree.fit(X_train, Y_train)This line uses the
fitmethod of themy_treemodel to train the decision tree model using the training dataX_trainand the corresponding target valuesY_train.
原文地址: https://www.cveoy.top/t/topic/ok1I 著作权归作者所有。请勿转载和采集!