Credit Card Fraud Detection Using Ensemble Learning Techniques
Credit Card Fraud Detection Using Ensemble Learning Techniques
This article explores the use of various ensemble learning techniques for credit card fraud detection. It showcases how to build these models in Python using the sklearn library, analyze their accuracy on a test set, and compare the results of different voting strategies for the ensemble model.
Data Acquisition and Preparation
The first step involves acquiring the credit card transaction data. The dataset used in this demonstration is available at https://www.kaggle.com/code/pierra/credit-card-dataset-svm-classification/input. This dataset contains transactions labeled as either 'fraudulent' or 'genuine'.
# Import necessary libraries
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import AdaBoostClassifier, RandomForestClassifier, VotingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
# Read the dataset
data = pd.read_csv('creditcard.csv')
# Separate features and labels
X = data.drop(['Class'], axis=1)
y = data['Class']
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
Building and Evaluating Individual Models
AdaBoost
The AdaBoost algorithm sequentially builds weak learners, focusing on misclassified instances from previous iterations. This approach aims to improve the model's overall accuracy.
# Create an AdaBoost model
ada = AdaBoostClassifier(n_estimators=50, random_state=42)
# Train the model
ada.fit(X_train, y_train)
# Make predictions on the test set
y_pred_ada = ada.predict(X_test)
# Evaluate the model's accuracy
print('Accuracy of AdaBoost:', accuracy_score(y_test, y_pred_ada))
Random Forest
Random Forest utilizes multiple decision trees, each trained on a random subset of the features and data points. The final prediction is based on a majority vote among the trees.
# Create a Random Forest model
rf = RandomForestClassifier(n_estimators=100, random_state=42)
# Train the model
rf.fit(X_train, y_train)
# Make predictions on the test set
y_pred_rf = rf.predict(X_test)
# Evaluate the model's accuracy
print('Accuracy of Random Forest:', accuracy_score(y_test, y_pred_rf))
Building an Ensemble Model
Ensemble learning combines multiple base learners to create a more robust and accurate model. In this example, we will build a Voting Classifier using different base learners, including Decision Trees, Support Vector Machines (linear and with RBF kernel), and K-Nearest Neighbors.
# Create base learners
dt = DecisionTreeClassifier(random_state=42)
svm_linear = SVC(kernel='linear', probability=True, random_state=42)
svm_rbf = SVC(kernel='rbf', probability=True, random_state=42)
knn1 = KNeighborsClassifier(n_neighbors=1)
knn3 = KNeighborsClassifier(n_neighbors=3)
# Create Voting Classifier with majority voting
voting_clf = VotingClassifier(estimators=[('dt', dt), ('svm_linear', svm_linear), ('svm_rbf', svm_rbf), ('knn1', knn1), ('knn3', knn3)], voting='hard')
# Train the model
voting_clf.fit(X_train, y_train)
# Make predictions on the test set
y_pred_voting = voting_clf.predict(X_test)
# Evaluate the model's accuracy
print('Accuracy of Voting Classifier with majority voting:', accuracy_score(y_test, y_pred_voting))
# Create Voting Classifier with relative majority voting (soft voting)
voting_clf_soft = VotingClassifier(estimators=[('dt', dt), ('svm_linear', svm_linear), ('svm_rbf', svm_rbf), ('knn1', knn1), ('knn3', knn3)], voting='soft')
# Train the model
voting_clf_soft.fit(X_train, y_train)
# Make predictions on the test set
y_pred_voting_soft = voting_clf_soft.predict(X_test)
# Evaluate the model's accuracy
print('Accuracy of Voting Classifier with relative majority voting:', accuracy_score(y_test, y_pred_voting_soft))
Conclusion
This demonstration highlights how ensemble learning techniques can be used to enhance credit card fraud detection. By combining the strengths of different base learners, the ensemble model achieves improved accuracy compared to individual models. Further exploration with different base learners, voting strategies, and hyperparameter tuning can potentially lead to even better results.
原文地址: https://www.cveoy.top/t/topic/nF7T 著作权归作者所有。请勿转载和采集!