Python 随机森林模型代码实战：最佳预估器及结果对比

本文将提供 Python 随机森林模型代码示例，涵盖模型构建、最佳预估器选择、结果对比以及模型保存等步骤，并以鸢尾花数据集为例进行演示。

from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris

# 加载鸢尾花数据集
iris = load_iris()

# 将数据集分为训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.2, random_state=42)

# 创建随机森林模型
rf = RandomForestClassifier(n_estimators=100, random_state=42)

# 训练模型
rf.fit(X_train, y_train)

# 预测测试集
y_pred = rf.predict(X_test)

# 计算准确率
accuracy = accuracy_score(y_test, y_pred)
print('随机森林模型的准确率为：', accuracy)

# 创建最佳模型
best_rf = RandomForestClassifier(n_estimators=500, max_depth=3, random_state=42)

# 训练最佳模型
best_rf.fit(X_train, y_train)

# 预测测试集
y_pred_best = best_rf.predict(X_test)

# 计算准确率
accuracy_best = accuracy_score(y_test, y_pred_best)
print('最佳随机森林模型的准确率为：', accuracy_best)

# 保存最佳模型
import joblib
joblib.dump(best_rf, 'best_random_forest_model.pkl')

代码解读

导入必要的库：
- sklearn.ensemble：包含随机森林分类器 (RandomForestClassifier)。
- sklearn.metrics：包含准确率评估指标 (accuracy_score)。
- sklearn.model_selection：包含数据集划分工具 (train_test_split)。
- sklearn.datasets：包含鸢尾花数据集 (load_iris)。
- joblib：用于保存模型。
加载数据集：
- 使用 load_iris() 加载鸢尾花数据集。
划分数据集：
- 使用 train_test_split() 将数据集划分为训练集和测试集，比例为 8:2，并设置随机种子为 42 保证结果可复现。
创建随机森林模型：
- 使用 RandomForestClassifier 创建模型，并设置参数 n_estimators=100 和 random_state=42。
训练模型：
- 使用 fit() 方法训练模型。
预测测试集：
- 使用 predict() 方法预测测试集结果。
评估模型性能：
- 使用 accuracy_score() 计算模型的准确率。
创建最佳模型：
- 通过调整模型参数，例如 n_estimators 和 max_depth，创建最佳模型。
训练最佳模型：
- 使用 fit() 方法训练最佳模型。
预测测试集：
- 使用 predict() 方法预测测试集结果。
评估最佳模型性能：
- 使用 accuracy_score() 计算最佳模型的准确率。
保存最佳模型：
- 使用 joblib.dump() 保存最佳模型到文件 best_random_forest_model.pkl 中。

小结

本文介绍了 Python 随机森林模型的代码实现，并展示了如何找到最佳预估器以及保存模型。您可以根据自己的需求调整代码中的参数，并应用到其他机器学习任务中。

希望本文对您有所帮助。