traincsv - 训练集文件某只股票近5个月开盘日每5分钟的数据935~1500即每天48个数据。包括日期时间开盘最高最低收盘成交量成交额六个特征日期使用数字作为代替。testcsv - 测试集文件以5分钟为间隔24个时间点为一个周期我们给定21个时间点股票的所有数据需要参赛者预测3个空白时间点的收盘价。注:给定的测试数据并非按照时间顺序只是通过随机抽取得到。使用python中的transfo
首先需要导入必要的库和读入数据:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, LSTM, Dropout, Transformer
# 读入训练集和测试集
train_df = pd.read_csv('train.csv')
test_df = pd.read_csv('test.csv')
接下来,我们需要对数据进行预处理,包括日期的处理、缺失值的处理和特征工程等。
# 合并训练集和测试集,方便进行数据预处理
df = pd.concat([train_df, test_df], axis=0).reset_index(drop=True)
# 处理日期特征
df['date'] = pd.to_datetime(df['date'], format='%Y%m%d')
df['year'] = df['date'].dt.year
df['month'] = df['date'].dt.month
df['day'] = df['date'].dt.day
df['day_of_week'] = df['date'].dt.dayofweek
# 处理缺失值
df = df.fillna(method='ffill')
# 特征工程
df['close_diff'] = df['close'].diff()
df['close_diff_ratio'] = df['close_diff'] / df['close'].shift(1)
df['volume_delta'] = df['volume'].diff()
df['volume_delta_ratio'] = df['volume_delta'] / df['volume'].shift(1)
df['amount_delta'] = df['amount'].diff()
df['amount_delta_ratio'] = df['amount_delta'] / df['amount'].shift(1)
# 删除无用特征
df = df.drop(['date', 'time'], axis=1)
在进行模型训练之前,我们需要将数据进行归一化处理,以避免特征值之间的差异对模型产生影响。
# 数据归一化处理
scaler = MinMaxScaler()
scaled_df = scaler.fit_transform(df)
# 划分训练集和测试集
train_size = len(train_df)
train_data = scaled_df[:train_size, :]
test_data = scaled_df[train_size:, :]
X_train, y_train = train_data[:, :-1], train_data[:, -1]
X_test, y_test = test_data[:, :-1], test_data[:, -1]
# 调整数据形状
X_train = X_train.reshape((X_train.shape[0], 1, X_train.shape[1]))
X_test = X_test.reshape((X_test.shape[0], 1, X_test.shape[1]))
接下来,我们可以构建Transformer模型并进行训练。
# 定义Transformer模型
model = Sequential()
model.add(Transformer(d_model=8, num_heads=2, input_shape=(1, X_train.shape[2])))
model.add(LSTM(32, activation='relu'))
model.add(Dropout(0.1))
model.add(Dense(1))
# 编译模型
model.compile(optimizer='adam', loss='mse')
# 训练模型
model.fit(X_train, y_train, epochs=50, batch_size=64, verbose=2, validation_split=0.2)
# 评估模型
train_score = model.evaluate(X_train, y_train, verbose=0)
test_score = model.evaluate(X_test, y_test, verbose=0)
print('Train Score: %.2f MSE' % (train_score))
print('Test Score: %.2f MSE' % (test_score))
最后,我们可以使用模型对测试集进行预测,并将结果保存到csv文件中。
# 使用模型进行预测
y_pred = model.predict(X_test)
# 反归一化处理
y_pred = scaler.inverse_transform(np.concatenate((X_test[:, 0, :-1], y_pred), axis=1))[:, -1]
# 保存结果到csv文件中
submission = pd.DataFrame({'close': y_pred})
submission.to_csv('submission.csv', index=False)
``
原文地址: https://www.cveoy.top/t/topic/fsQ6 著作权归作者所有。请勿转载和采集!