在这个实验中,我们将使用在线服务 Kaggle。你需要在今天下午 7 点之前提交你的文件和你的名字。

本实验的目的是探索使用线性回归与时间序列。实验室的范围在这个链接中: https://www.kaggle.com/code/ryanholbrook/linear-regression-with-time-系列

然后,您被要求在近 1800 个产品类别中为 Corporation acion Fau lita(一家位于厄瓜多尔的大型杂货零售商)实现销售预测: https://www.kaggle.com/code/scratchpad/notebook240dd34254/ed

Setup feedback system

from learntools.core import binder binder.bind(globals()) from learntools.time_series.ex1 import *

Setup notebook

from pathlib import Path from learntools.time_series.style import * # plot style settings

import pandas as pd import matplotlib.pyplot as plt import numpy as np import seaborn as sns from sklearn.linear_model import LinearRegression

data_dir = Path('../input/ts-course-data/') comp_dir = Path('../input/store-sales-time-series-forecasting')

book_sales = pd.read_csv( data_dir / 'book_sales.csv', index_col='Date', parse_dates=['Date'], ).drop('Paperback', axis=1) book_sales['Time'] = np.arange(len(book_sales.index)) book_sales['Lag_1'] = book_sales['Hardcover'].shift(1) book_sales = book_sales.reindex(columns=['Hardcover', 'Time', 'Lag_1'])

ar = pd.read_csv(data_dir / 'ar.csv')

dtype = { 'store_nbr': 'category', 'family': 'category', 'sales': 'float32', 'onpromotion': 'uint64', } store_sales = pd.read_csv( comp_dir / 'train.csv', dtype=dtype, parse_dates=['date'], infer_datetime_format=True, ) store_sales = store_sales.set_index('date').to_period('D') store_sales = store_sales.set_index(['store_nbr', 'family'], append=True) average_sales = store_sales.groupby('date').mean()['sales'] 给出相应的代码内容:import pandas as pd import numpy as np from sklearn.linear_model import LinearRegression

读取数据

store_sales = pd.read_csv('train.csv') store_sales['date'] = pd.to_datetime(store_sales['date']) store_sales = store_sales.set_index('date')

按日期和产品类别计算平均销售额

average_sales = store_sales.groupby(['date', 'family'])['sales'].mean().reset_index()

创建时间特征

average_sales['Time'] = np.arange(len(average_sales))

创建滞后特征

average_sales['Lag_1'] = average_sales.groupby('family')['sales'].shift(1)

删除缺失值

average_sales = average_sales.dropna()

创建线性回归模型

model = LinearRegression()

拟合模型

X = average_sales[['Time', 'Lag_1']] y = average_sales['sales'] model.fit(X, y)

预测销售额

predictions = model.predict(X)

打印预测结果

print(predictions)

线性回归时间序列预测:Kaggle 实验指南

原文地址: https://www.cveoy.top/t/topic/bM1G 著作权归作者所有。请勿转载和采集!

免费AI点我,无需注册和登录