线性可分数据分类：Logistic回归与感知机算法对比

本实验使用两个经典的线性分类器，Logistic回归与感知机算法，对线性可分数据进行分类，并使用多种指标对实验结果进行分析。

数据集

训练集train.txt: 每行是一个样本点数据（-100~+100之间），每行的最后一个元素为label（+1，-1），训练数据确定线性可分。
测试集test.txt: 每行一个样本，数据与train.txt中的样本点数据独立同分布。

实验目标

分别使用Logistic回归与感知机算法，对测试集数据进行分类，输出其对应的label，输出文件名为result.txt，每行一个数据∈{1，-1}。

实验结果分析

由于没有提供数据，无法给出具体代码。以下是一般的实验结果分析方法：

准确率分析

使用混淆矩阵计算分类准确率、精确率、召回率和F1值。

ROC曲线分析

使用sklearn库的'roc_curve'函数绘制ROC曲线，分析分类器的性能。

特征重要性分析

对于Logistic回归算法，可以使用sklearn库的'coef_'属性获取特征的权重，分析各个特征对分类器的影响。
对于感知机算法，可以使用sklearn库的'feature_importances_'属性获取特征的重要性，分析各个特征对分类器的影响。

学习曲线分析

使用sklearn库的'learning_curve'函数绘制学习曲线，分析模型的欠拟合或过拟合情况。

参数调优

使用网格搜索或随机搜索等方法寻找最优的模型参数，提高分类器的性能。

代码示例

以下是一个使用sklearn库进行Logistic回归和感知机分类的示例代码：

from sklearn.linear_model import LogisticRegression
from sklearn.linear_model import Perceptron
from sklearn.metrics import accuracy_score, confusion_matrix, roc_curve, auc
from sklearn.model_selection import learning_curve
import pandas as pd
import matplotlib.pyplot as plt

# 加载训练集和测试集
train_data = pd.read_csv('train.txt', header=None)
X_train = train_data.iloc[:, :-1]
y_train = train_data.iloc[:, -1]
test_data = pd.read_csv('test.txt', header=None)
X_test = test_data.iloc[:, :-1]
y_test = test_data.iloc[:, -1]

# Logistic回归
logistic_model = LogisticRegression()
logistic_model.fit(X_train, y_train)
y_pred_logistic = logistic_model.predict(X_test)

# 感知机
perceptron_model = Perceptron()
perceptron_model.fit(X_train, y_train)
y_pred_perceptron = perceptron_model.predict(X_test)

# 准确率分析
print('Logistic Regression Accuracy:', accuracy_score(y_test, y_pred_logistic))
print('Perceptron Accuracy:', accuracy_score(y_test, y_pred_perceptron))

# 混淆矩阵
print('Logistic Regression Confusion Matrix:')
print(confusion_matrix(y_test, y_pred_logistic))
print('Perceptron Confusion Matrix:')
print(confusion_matrix(y_test, y_pred_perceptron))

# ROC曲线
fpr_logistic, tpr_logistic, thresholds_logistic = roc_curve(y_test, y_pred_logistic)
roc_auc_logistic = auc(fpr_logistic, tpr_logistic)

fpr_perceptron, tpr_perceptron, thresholds_perceptron = roc_curve(y_test, y_pred_perceptron)
roc_auc_perceptron = auc(fpr_perceptron, tpr_perceptron)

plt.figure()
plt.plot(fpr_logistic, tpr_logistic, label='Logistic Regression (AUC = %0.2f)' % roc_auc_logistic)
plt.plot(fpr_perceptron, tpr_perceptron, label='Perceptron (AUC = %0.2f)' % roc_auc_perceptron)
plt.plot([0, 1], [0, 1], 'k--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver Operating Characteristic (ROC) Curve')
plt.legend(loc='lower right')
plt.show()

# 特征重要性分析
print('Logistic Regression Feature Weights:', logistic_model.coef_)
print('Perceptron Feature Importance:', perceptron_model.feature_importances_)

# 学习曲线
train_sizes, train_scores, test_scores = learning_curve(LogisticRegression(), X_train, y_train, cv=5, train_sizes=np.linspace(0.1, 1.0, 5))

plt.figure()
plt.plot(train_sizes, np.mean(train_scores, axis=1), 'o-', label='Training Score')
plt.plot(train_sizes, np.mean(test_scores, axis=1), 'o-', label='Cross-Validation Score')
plt.xlabel('Training Examples')
plt.ylabel('Score')
plt.title('Learning Curve')
plt.legend(loc='best')
plt.show()

# 参数调优
# ...

本示例代码展示了如何使用sklearn库进行分类、绘制ROC曲线和学习曲线，以及获取特征重要性。根据具体的数据集和实验需求，可以调整代码以实现更深入的分析。