线性回归时间序列预测实验 - 使用 Kaggle 和 Corporation acion Fau lita 数据
在这个实验中,我们将使用在线服务 Kaggle。你需要在今天下午 7 点之前提交你的文件和你的名字。
本实验的目的是探索使用线性回归与时间序列。实验室的范围在这个链接中:
https://www.kaggle.com/code/ryanholbrook/linear-regression-with-time系列
然后,您被要求在近 1800 个产品类别中为 Corporation acion Fau lita(一家位于厄瓜多尔的大型杂货零售商)实现销售预测:
https://www.kaggle.com/code/scratchpad/notebook240dd34254/ed
Setup feedback system
from learntools.core import binder binder.bind(globals()) from learntools.time_series.ex1 import *
Setup notebook
from pathlib import Path from learntools.time_series.style import * # plot style settings
import pandas as pd import matplotlib.pyplot as plt import numpy as np import seaborn as sns from sklearn.linear_model import LinearRegression
data_dir = Path('../input/ts-course-data/') comp_dir = Path('../input/store-sales-time-series-forecasting')
book_sales = pd.read_csv( data_dir / 'book_sales.csv', index_col='Date', parse_dates=['Date'], ).drop('Paperback', axis=1) book_sales['Time'] = np.arange(len(book_sales.index)) book_sales['Lag_1'] = book_sales['Hardcover'].shift(1) book_sales = book_sales.reindex(columns=['Hardcover', 'Time', 'Lag_1'])
ar = pd.read_csv(data_dir / 'ar.csv')
dtype = { 'store_nbr': 'category', 'family': 'category', 'sales': 'float32', 'onpromotion': 'uint64', } store_sales = pd.read_csv( comp_dir / 'train.csv', dtype=dtype, parse_dates=['date'], infer_datetime_format=True, ) store_sales = store_sales.set_index('date').to_period('D') store_sales = store_sales.set_index(['store_nbr', 'family'], append=True) average_sales = store_sales.groupby('date').mean()['sales'] 给出相应的代码内容:实验代码如下:
# 导入所需的库
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
# 读取数据
data_dir = '../input/store-sales-time-series-forecasting/'
train = pd.read_csv(data_dir + 'train.csv')
# 数据预处理
train['date'] = pd.to_datetime(train['date'])
train['year'] = train['date'].dt.year
train['month'] = train['date'].dt.month
train['day'] = train['date'].dt.day
train['day_of_week'] = train['date'].dt.dayofweek
train['weekend'] = (train['day_of_week'] >= 5).astype(int)
# 构建特征和目标变量
X = train[['year', 'month', 'day', 'day_of_week', 'weekend']]
y = train['sales']
# 构建线性回归模型
model = LinearRegression()
model.fit(X, y)
# 预测销售额
predictions = model.predict(X)
# 打印预测结果
print(predictions)
请注意,这只是一个示例代码,可能需要根据实际情况进行修改和调整。
原文地址: https://www.cveoy.top/t/topic/bM1o 著作权归作者所有。请勿转载和采集!