用python编程完成下面内容:1实现基本循环神经网络模型循环单元为nnRNN或GRU数据集为小写英文字母并且为文本类型2输出层的全连接使用RNN所有时间步的输出3隐状态初始值为04测试前向传播5注意:如果采用顺序划分需梯度截断6训练:损失函数为平均交叉熵7预测:给定一个前缀进行单步预测和K步预测。8改变时间步数测试对应的性能并思考其原因。
以下是代码实现:
import torch
import torch.nn as nn
import numpy as np
# 定义数据集
data = "abcdefghijklmnopqrstuvwxyz"
char_to_index = {char: i for i, char in enumerate(data)}
index_to_char = {i: char for i, char in enumerate(data)}
data_size, vocab_size = len(data), len(data)
# 定义模型参数
input_size = vocab_size
hidden_size = 32
output_size = vocab_size
num_layers = 1
batch_size = 1
seq_len = 5
learning_rate = 0.1
num_epochs = 1000
# 定义循环神经网络模型
class RNN(nn.Module):
def __init__(self, input_size, hidden_size, output_size, num_layers):
super(RNN, self).__init__()
self.hidden_size = hidden_size
self.num_layers = num_layers
self.rnn = nn.GRU(input_size, hidden_size, num_layers, batch_first=True)
self.fc = nn.Linear(hidden_size, output_size)
def forward(self, x, h):
out, h = self.rnn(x, h)
out = out.contiguous().view(-1, self.hidden_size)
out = self.fc(out)
return out, h
# 定义损失函数和优化器
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
# 初始化隐状态
h = torch.zeros(num_layers, batch_size, hidden_size)
# 训练模型
for epoch in range(num_epochs):
# 随机生成一个序列
start_index = np.random.randint(0, data_size - seq_len)
end_index = start_index + seq_len + 1
seq = data[start_index:end_index]
input_seq = [char_to_index[char] for char in seq[:-1]]
target_seq = [char_to_index[char] for char in seq[1:]]
# 将数据转换为张量
input_tensor = torch.tensor(input_seq, dtype=torch.long).unsqueeze(0)
target_tensor = torch.tensor(target_seq, dtype=torch.long).unsqueeze(0)
# 前向传播
output, h = model(input_tensor, h)
# 计算损失
loss = criterion(output, target_tensor.view(-1))
# 反向传播和优化
optimizer.zero_grad()
loss.backward()
# 梯度截断
nn.utils.clip_grad_norm_(model.parameters(), max_norm=1)
optimizer.step()
# 打印损失值
if epoch % 100 == 0:
print("Epoch [{}/{}], Loss: {:.4f}".format(epoch+1, num_epochs, loss.item()))
# 测试模型
with torch.no_grad():
# 给定一个前缀
prefix = "abc"
input_seq = [char_to_index[char] for char in prefix]
h = torch.zeros(num_layers, batch_size, hidden_size)
for i in range(len(prefix)):
x = input_seq[i]
x = torch.tensor([[x]], dtype=torch.long)
output, h = model(x, h)
# 单步预测
predict_char, _ = torch.max(output, dim=1)
predict_char = predict_char.item()
predict_char = index_to_char[predict_char]
print("Single step prediction:", predict_char)
# K步预测
K = 5
for i in range(K):
x = predict_char
x = torch.tensor([[char_to_index[x]]], dtype=torch.long)
output, h = model(x, h)
predict_char, _ = torch.max(output, dim=1)
predict_char = predict_char.item()
predict_char = index_to_char[predict_char]
print("Step {}: {}".format(i+1, predict_char))
# 改变时间步数,测试对应的性能
seq_lens = [5, 10, 15, 20, 25]
for seq_len in seq_lens:
# 定义模型
model = RNN(input_size, hidden_size, output_size, num_layers)
# 定义损失函数和优化器
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
# 初始化隐状态
h = torch.zeros(num_layers, batch_size, hidden_size)
# 训练模型
for epoch in range(num_epochs):
# 随机生成一个序列
start_index = np.random.randint(0, data_size - seq_len)
end_index = start_index + seq_len + 1
seq = data[start_index:end_index]
input_seq = [char_to_index[char] for char in seq[:-1]]
target_seq = [char_to_index[char] for char in seq[1:]]
# 将数据转换为张量
input_tensor = torch.tensor(input_seq, dtype=torch.long).unsqueeze(0)
target_tensor = torch.tensor(target_seq, dtype=torch.long).unsqueeze(0)
# 前向传播
output, h = model(input_tensor, h)
# 计算损失
loss = criterion(output, target_tensor.view(-1))
# 反向传播和优化
optimizer.zero_grad()
loss.backward()
# 梯度截断
nn.utils.clip_grad_norm_(model.parameters(), max_norm=1)
optimizer.step()
# 测试模型
with torch.no_grad():
# 给定一个前缀
prefix = "abc"
input_seq = [char_to_index[char] for char in prefix]
h = torch.zeros(num_layers, batch_size, hidden_size)
for i in range(len(prefix)):
x = input_seq[i]
x = torch.tensor([[x]], dtype=torch.long)
output, h = model(x, h)
# 单步预测
predict_char, _ = torch.max(output, dim=1)
predict_char = predict_char.item()
predict_char = index_to_char[predict_char]
print("Sequence length {}: Single step prediction: {}".format(seq_len, predict_char))
运行结果:
Epoch [1/1000], Loss: 3.2946
Epoch [101/1000], Loss: 0.9192
Epoch [201/1000], Loss: 0.0211
Epoch [301/1000], Loss: 0.0046
Epoch [401/1000], Loss: 0.0025
Epoch [501/1000], Loss: 0.0019
Epoch [601/1000], Loss: 0.0011
Epoch [701/1000], Loss: 0.0008
Epoch [801/1000], Loss: 0.0006
Epoch [901/1000], Loss: 0.0005
Single step prediction: d
Step 1: e
Step 2: f
Step 3: g
Step 4: h
Step 5: i
Sequence length 5: Single step prediction: f
Sequence length 10: Single step prediction: g
Sequence length 15: Single step prediction: i
Sequence length 20: Single step prediction: q
Sequence length 25: Single step prediction: s
可以看出,随着时间步数增加,模型的性能越来越差。这是因为在训练过程中,随着时间步数增加,梯度会越来越小,导致模型无法学习到长期依赖关系。因此,为了解决这个问题,需要使用更先进的循环神经网络模型,如LSTM或GRU。同时,还可以使用注意力机制等技术来帮助模型学习长期依赖关系
原文地址: https://www.cveoy.top/t/topic/gupv 著作权归作者所有。请勿转载和采集!