Python代码中random_state参数的影响：线性回归模型训练示例

这两段代码唯一的区别在于'train_test_split'函数中的'random_state'参数不同，一个为0，一个为10。这个参数用于控制数据集的随机性，相同的参数会使得每次随机得到的数据集相同。因此，这两个代码的作用是一样的，都是读取数据、清洗数据、可视化数据、拟合线性回归模型并评估模型的性能。唯一的影响是，使用不同的随机种子可能会得到略微不同的评分和预测结果。

第一段代码：

import pandas as pd
df_ads = pd.read_csv('易速鲜花微信软文.csv') 
df_ads.head(10)
df_ads.isna().sum()
df_ads = df_ads.dropna()
import matplotlib.pyplot as plt
import seaborn as sns
plt.plot(df_ads['点赞数'],df_ads['浏览量'],'r.', label='Training data') 
plt.xlabel('点赞数') 
plt.ylabel('浏览量') 
plt.legend()
plt.show()
data = pd.concat([df_ads['浏览量'], df_ads['热度指数']], axis=1)
fig = sns.boxplot(x='热度指数', y='浏览量', data=data) 
fig.axis(ymin=0, ymax=800000);
X = df_ads.drop(['浏览量'],axis=1) 
y = df_ads.浏览量
X.head()
y.head()
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
from sklearn.linear_model import LinearRegression 
model = LinearRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
df_ads_pred = X_test.copy() 
df_ads_pred['浏览量真值'] = y_test 
df_ads_pred['浏览量预测值'] = y_pred 
df_ads_pred
print('线性回归预测集评分：', model.score(X_test, y_test))
print('线性回归训练集评分：', model.score(X_train, y_train))

第二段代码：

import pandas as pd
df_ads = pd.read_csv('易速鲜花微信软文.csv') 
df_ads.head(10)
df_ads.isna().sum()
df_ads = df_ads.dropna()
import matplotlib.pyplot as plt
import seaborn as sns
plt.plot(df_ads['点赞数'],df_ads['浏览量'],'r.', label='Training data') 
plt.xlabel('点赞数') 
plt.ylabel('浏览量') 
plt.legend()
plt.show()
data = pd.concat([df_ads['浏览量'], df_ads['热度指数']], axis=1)
fig = sns.boxplot(x='热度指数', y='浏览量', data=data) 
fig.axis(ymin=0, ymax=800000);
X = df_ads.drop(['浏览量'],axis=1) 
y = df_ads.浏览量
X.head()
y.head()
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=10)
from sklearn.linear_model import LinearRegression 
model = LinearRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
df_ads_pred = X_test.copy() 
df_ads_pred['浏览量真值'] = y_test 
df_ads_pred['浏览量预测值'] = y_pred 
df_ads_pred
print('线性回归预测集评分：', model.score(X_test, y_test))
print('线性回归训练集评分：', model.score(X_train, y_train))

'random_state'参数的作用是为了确保在多次运行代码时，数据集的划分保持一致。在实际应用中，我们通常会设置一个固定的'random_state'值，以便于结果的可重复性。如果省略'random_state'参数，每次运行代码时都会随机划分数据集，这会导致结果不稳定。

需要注意的是，'random_state'参数只是影响了数据集的划分方式，并不会影响模型本身的训练过程。模型的训练结果仍然取决于模型的参数和数据集本身的特性。

希望以上解释能够帮助您理解'random_state'参数的作用。