Iris 数据集不同性能评估方法下的性能指标分析
好的,我将以 Iris 数据集为例,对不同的性能评估方法下的性能指标进行分析,并绘制相应的图表。
首先,导入需要的库和数据集:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import precision_score, recall_score, f1_score
# 加载 Iris 数据集
iris = load_iris()
X, y = iris.data, iris.target
接下来,我们将使用决策树分类器来对 Iris 数据集进行分类,并在不同的性能评估方法下计算性能指标。
- test_split 方法:
# 使用 test_split 方法划分训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# 构建决策树分类器
clf = DecisionTreeClassifier()
# 在训练集上训练模型
clf.fit(X_train, y_train)
# 在测试集上进行预测
y_pred = clf.predict(X_test)
# 计算性能指标
precision = precision_score(y_test, y_pred, average='micro')
recall = recall_score(y_test, y_pred, average='micro')
f1 = f1_score(y_test, y_pred, average='micro')
print('Test split results:')
print('Precision:', precision)
print('Recall:', recall)
print('F-score:', f1)
- K-fold 交叉验证方法:
# 构建决策树分类器
clf = DecisionTreeClassifier()
# 使用 K-fold 交叉验证计算性能指标
precision_scores = cross_val_score(clf, X, y, cv=5, scoring='precision_macro')
recall_scores = cross_val_score(clf, X, y, cv=5, scoring='recall_macro')
f1_scores = cross_val_score(clf, X, y, cv=5, scoring='f1_macro')
print('K-fold cross-validation results:')
print('Precision scores:', precision_scores)
print('Recall scores:', recall_scores)
print('F-score scores:', f1_scores)
接下来,我们将绘制柱状图来比较不同方法下的性能指标。
# 绘制柱状图
methods = ['Test split', 'K-fold']
precision_scores = [precision, np.mean(precision_scores)]
recall_scores = [recall, np.mean(recall_scores)]
f1_scores = [f1, np.mean(f1_scores)]
x = np.arange(len(methods))
width = 0.2
fig, ax = plt.subplots()
rects1 = ax.bar(x - width, precision_scores, width, label='Precision')
rects2 = ax.bar(x, recall_scores, width, label='Recall')
rects3 = ax.bar(x + width, f1_scores, width, label='F-score')
# 添加标签和标题
ax.set_ylabel('Scores')
ax.set_title('Performance evaluation')
ax.set_xticks(x)
ax.set_xticklabels(methods)
ax.legend()
# 添加数值标签
def autolabel(rects):
for rect in rects:
height = rect.get_height()
ax.annotate('{}'.format(round(height, 2)), xy=(rect.get_x() + rect.get_width() / 2, height),
xytext=(0, 3), textcoords='offset points', ha='center', va='bottom')
autolabel(rects1)
autolabel(rects2)
autolabel(rects3)
plt.tight_layout()
plt.show()
以上代码将生成包含柱状图的图表,图表将显示在不同的性能评估方法下的精确度 (precision)、召回率 (recall) 和 F-score 的比较。
通过这些图表,我们可以观察不同性能评估方法下的性能指标表现,并得出结论。由于限于文本环境,无法直接呈现图表,但你可以复制以上代码到本地 Python 环境中运行并查看图表结果。
原文地址: http://www.cveoy.top/t/topic/TOa 著作权归作者所有。请勿转载和采集!