信用卡使用意图预测：KNN 模型应用与评估

该代码使用了 KNN 模型对信用卡使用意图进行预测。KNN 模型是一种基于实例的学习方法，它通过计算新样本与已知样本之间的距离，来找到最近邻居，并将新样本分配给它的最近邻居中最常见的类别。在此代码中，KNN 模型被用于分类任务，即将信用卡使用意图分为两个类别：0 表示不使用信用卡，1 表示使用信用卡。

此代码的输出包括训练集和测试集的 ROC 得分、分类报告和混淆矩阵。ROC 得分是用于评估分类器性能的一种指标，它衡量了分类器正确识别正例的能力。分类报告提供了关于分类器性能的详细信息，包括精度、召回率和 F1 分数等。混淆矩阵提供了分类器在每个类别上的正确和错误分类数量。

从输出结果可以看出，KNN 模型在训练集上表现不错，ROC 得分为 0.88，分类报告显示在类别 0 上的精度和召回率都比较高，但在类别 1 上的表现较差。在测试集上，ROC 得分为 0.71，分类报告显示在类别 0 上的精度和召回率仍然较高，但在类别 1 上的表现比训练集更差。可以看出，KNN 模型在预测信用卡使用意图时存在一定的局限性，需要进一步优化。

以下是代码示例：

print('****************************************************')
print('Results for model :  KNN')
from sklearn import neighbors
KNN=neighbors.KNeighborsClassifier(n_neighbors=5)
KNN.fit(x_train, y_train)
y_train_pred = KNN.predict(x_train)
y_train_prob = KNN.predict_proba(x_train)[:, 1] 
print('ROC score for train is :', roc_auc_score(y_train, y_train_prob))
print('Classification report for train:
')
print(classification_report(y_train, y_train_pred))
print(confusion_matrix(y_train, y_train_pred))
y_test_pred = KNN.predict(x_test)
y_test_prob = KNN.predict_proba(x_test)[:, 1]
print('ROC score for test is :', roc_auc_score(y_test, y_test_prob))
print('Classification report for test :
')
print(classification_report(y_test, y_test_pred))
print(confusion_matrix(y_test, y_test_pred))

代码结果：

Results for model :  KNN
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning.
  mode, _ = stats.mode(_y[neigh_ind, k], axis=1)
ROC score for train is : 0.8777148653911166
Classification report for train:

              precision    recall  f1-score   support

           0       0.85      0.94      0.90    105018
           1       0.72      0.47      0.57     32588

    accuracy                           0.83    137606
   macro avg       0.79      0.71      0.73    137606
weighted avg       0.82      0.83      0.82    137606

[[99070  5948]
 [17220 15368]]
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning.
  mode, _ = stats.mode(_y[neigh_ind, k], axis=1)
ROC score for test is : 0.7099540705989693
Classification report for test :

              precision    recall  f1-score   support

           0       0.81      0.90      0.86     44985
           1       0.52      0.33      0.40     13989

    accuracy                           0.77     58974
   macro avg       0.66      0.62      0.63     58974
weighted avg       0.74      0.77      0.75     58974

[[40628  4357]
 [ 9350  4639]]

该代码示例中，需要将 x_train、y_train、x_test、y_test 替换为实际的训练数据和测试数据。

通过分析结果，我们可以发现 KNN 模型在预测信用卡使用意图时，对于类别 0 (不使用信用卡) 的预测效果较好，但对于类别 1 (使用信用卡) 的预测效果较差。这可能是由于数据集中类别 1 的样本数量较少，导致模型难以学习到类别 1 的特征。

为了提高 KNN 模型的预测效果，可以尝试以下方法：

增加类别 1 的样本数量，例如通过数据增强技术生成更多类别 1 的样本。
选择更合适的距离度量方式，例如使用曼哈顿距离或余弦距离。
调整 K 值，找到最佳的邻居数量。
使用其他机器学习模型，例如逻辑回归或支持向量机。

总之，KNN 模型在信用卡使用意图预测方面具有一定的应用价值，但需要根据实际情况进行调整和优化。