线性回归时间序列预测:Kaggle 实验指南
在这个实验中,我们将使用在线服务 Kaggle。你需要在今天下午 7 点之前提交你的文件和你的名字。
本实验的目的是探索使用线性回归与时间序列。实验室的范围在这个链接中: https://www.kaggle.com/code/ryanholbrook/linear-regression-with-time-系列
然后,您被要求在近 1800 个产品类别中为 Corporation acion Fau lita(一家位于厄瓜多尔的大型杂货零售商)实现销售预测: https://www.kaggle.com/code/scratchpad/notebook240dd34254/ed
Setup feedback system
from learntools.core import binder binder.bind(globals()) from learntools.time_series.ex1 import *
Setup notebook
from pathlib import Path from learntools.time_series.style import * # plot style settings
import pandas as pd import matplotlib.pyplot as plt import numpy as np import seaborn as sns from sklearn.linear_model import LinearRegression
data_dir = Path('../input/ts-course-data/') comp_dir = Path('../input/store-sales-time-series-forecasting')
book_sales = pd.read_csv( data_dir / 'book_sales.csv', index_col='Date', parse_dates=['Date'], ).drop('Paperback', axis=1) book_sales['Time'] = np.arange(len(book_sales.index)) book_sales['Lag_1'] = book_sales['Hardcover'].shift(1) book_sales = book_sales.reindex(columns=['Hardcover', 'Time', 'Lag_1'])
ar = pd.read_csv(data_dir / 'ar.csv')
dtype = { 'store_nbr': 'category', 'family': 'category', 'sales': 'float32', 'onpromotion': 'uint64', } store_sales = pd.read_csv( comp_dir / 'train.csv', dtype=dtype, parse_dates=['date'], infer_datetime_format=True, ) store_sales = store_sales.set_index('date').to_period('D') store_sales = store_sales.set_index(['store_nbr', 'family'], append=True) average_sales = store_sales.groupby('date').mean()['sales'] 给出相应的代码内容:import pandas as pd import numpy as np from sklearn.linear_model import LinearRegression
读取数据
store_sales = pd.read_csv('train.csv') store_sales['date'] = pd.to_datetime(store_sales['date']) store_sales = store_sales.set_index('date')
按日期和产品类别计算平均销售额
average_sales = store_sales.groupby(['date', 'family'])['sales'].mean().reset_index()
创建时间特征
average_sales['Time'] = np.arange(len(average_sales))
创建滞后特征
average_sales['Lag_1'] = average_sales.groupby('family')['sales'].shift(1)
删除缺失值
average_sales = average_sales.dropna()
创建线性回归模型
model = LinearRegression()
拟合模型
X = average_sales[['Time', 'Lag_1']] y = average_sales['sales'] model.fit(X, y)
预测销售额
predictions = model.predict(X)
打印预测结果
print(predictions)
原文地址: https://www.cveoy.top/t/topic/bM1G 著作权归作者所有。请勿转载和采集!