Iris 数据集不同性能评估方法下的性能指标分析

好的，我将以 Iris 数据集为例，对不同的性能评估方法下的性能指标进行分析，并绘制相应的图表。

首先，导入需要的库和数据集：

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import precision_score, recall_score, f1_score

# 加载 Iris 数据集
iris = load_iris()
X, y = iris.data, iris.target

接下来，我们将使用决策树分类器来对 Iris 数据集进行分类，并在不同的性能评估方法下计算性能指标。

test_split 方法：

# 使用 test_split 方法划分训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# 构建决策树分类器
clf = DecisionTreeClassifier()

# 在训练集上训练模型
clf.fit(X_train, y_train)

# 在测试集上进行预测
y_pred = clf.predict(X_test)

# 计算性能指标
precision = precision_score(y_test, y_pred, average='micro')
recall = recall_score(y_test, y_pred, average='micro')
f1 = f1_score(y_test, y_pred, average='micro')

print('Test split results:')
print('Precision:', precision)
print('Recall:', recall)
print('F-score:', f1)

K-fold 交叉验证方法：

# 构建决策树分类器
clf = DecisionTreeClassifier()

# 使用 K-fold 交叉验证计算性能指标
precision_scores = cross_val_score(clf, X, y, cv=5, scoring='precision_macro')
recall_scores = cross_val_score(clf, X, y, cv=5, scoring='recall_macro')
f1_scores = cross_val_score(clf, X, y, cv=5, scoring='f1_macro')

print('K-fold cross-validation results:')
print('Precision scores:', precision_scores)
print('Recall scores:', recall_scores)
print('F-score scores:', f1_scores)

接下来，我们将绘制柱状图来比较不同方法下的性能指标。

# 绘制柱状图
methods = ['Test split', 'K-fold']
precision_scores = [precision, np.mean(precision_scores)]
recall_scores = [recall, np.mean(recall_scores)]
f1_scores = [f1, np.mean(f1_scores)]

x = np.arange(len(methods))
width = 0.2

fig, ax = plt.subplots()
rects1 = ax.bar(x - width, precision_scores, width, label='Precision')
rects2 = ax.bar(x, recall_scores, width, label='Recall')
rects3 = ax.bar(x + width, f1_scores, width, label='F-score')

# 添加标签和标题
ax.set_ylabel('Scores')
ax.set_title('Performance evaluation')
ax.set_xticks(x)
ax.set_xticklabels(methods)
ax.legend()

# 添加数值标签
def autolabel(rects):
    for rect in rects:
        height = rect.get_height()
        ax.annotate('{}'.format(round(height, 2)), xy=(rect.get_x() + rect.get_width() / 2, height),
                    xytext=(0, 3), textcoords='offset points', ha='center', va='bottom')

autolabel(rects1)
autolabel(rects2)
autolabel(rects3)

plt.tight_layout()
plt.show()

以上代码将生成包含柱状图的图表，图表将显示在不同的性能评估方法下的精确度 (precision)、召回率 (recall) 和 F-score 的比较。

通过这些图表，我们可以观察不同性能评估方法下的性能指标表现，并得出结论。由于限于文本环境，无法直接呈现图表，但你可以复制以上代码到本地 Python 环境中运行并查看图表结果。