用python编程完成下面内容:1实现基本循环神经网络模型循环单元为nnRNN或GRU数据集为小写英文字母并且为文本类型kaggleinputpinyin-datapinyintxt2输出层的全连接使用RNN所有时间步的输出3隐状态初始值为04测试前向传播5注意:如果采用顺序划分需梯度截断6训练:损失函数为平均交叉熵7预测:给定一个前缀进行单步预测和K步预测。8改变时间步数测试对应的性能并思考其原
代码如下:
import torch
import torch.nn as nn
import numpy as np
# 加载数据集
with open('/kaggle/input/pinyin-data/pinyin.txt', 'r', encoding='utf-8') as f:
text = f.read()
# 构建字典
chars = list(set(text))
char2int = {c:i for i, c in enumerate(chars)}
int2char = {i:c for i, c in enumerate(chars)}
# 将文本转为整数序列
encoded = np.array([char2int[c] for c in text])
# 定义模型
class RNN(nn.Module):
def __init__(self, input_size, hidden_size, output_size, num_layers, cell_type):
super(RNN, self).__init__()
self.hidden_size = hidden_size
self.num_layers = num_layers
self.cell_type = cell_type
if cell_type == 'rnn':
self.cell = nn.RNN(input_size, hidden_size, num_layers, batch_first=True)
elif cell_type == 'gru':
self.cell = nn.GRU(input_size, hidden_size, num_layers, batch_first=True)
self.fc = nn.Linear(hidden_size, output_size)
def forward(self, x, h0=None):
if self.cell_type == 'rnn':
out, hn = self.cell(x, h0)
elif self.cell_type == 'gru':
out, hn = self.cell(x, h0)
out = self.fc(out)
return out, hn
# 定义模型参数
input_size = len(chars)
hidden_size = 128
output_size = len(chars)
num_layers = 2
cell_type = 'rnn'
learning_rate = 0.01
batch_size = 128
num_epochs = 10
seq_len = 64
clip = 5
# 实例化模型
model = RNN(input_size, hidden_size, output_size, num_layers, cell_type)
# 定义损失函数
criterion = nn.CrossEntropyLoss()
# 定义优化器
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
# 定义数据生成器
def get_batches(arr, batch_size, seq_len):
batch_size_total = batch_size * seq_len
n_batches = len(arr) // batch_size_total
arr = arr[:n_batches * batch_size_total]
arr = arr.reshape((batch_size, -1))
for n in range(0, arr.shape[1], seq_len):
x = arr[:, n:n+seq_len]
y = np.zeros_like(x)
y[:, :-1], y[:, -1] = x[:, 1:], x[:, 0]
yield torch.from_numpy(x), torch.from_numpy(y)
# 训练模型
for epoch in range(num_epochs):
h0 = None
loss_total = 0
n_batches = 0
for x, y in get_batches(encoded, batch_size, seq_len):
n_batches += 1
x = nn.functional.one_hot(x, num_classes=input_size).float()
y = y.view(-1)
optimizer.zero_grad()
output, h0 = model(x, h0)
h0 = h0.detach()
loss = criterion(output.view(-1, output_size), y)
loss.backward()
nn.utils.clip_grad_norm_(model.parameters(), clip)
optimizer.step()
loss_total += loss.item()
print('Epoch [{}/{}], Loss: {:.4f}'.format(epoch+1, num_epochs, loss_total/n_batches))
# 单步预测
def predict(model, char, h0=None, k=1):
x = torch.tensor([[char2int[char]]])
x = nn.functional.one_hot(x, num_classes=input_size).float()
output, h0 = model(x, h0)
output = output.squeeze()
output = nn.functional.softmax(output, dim=0)
topk_probs, topk_indices = output.topk(k)
topk_indices = topk_indices.tolist()
topk_chars = [int2char[idx] for idx in topk_indices]
return topk_probs.tolist(), topk_chars, h0
# K步预测
def predict_k(model, prefix, h0=None, k=1):
for char in prefix:
_, _, h0 = predict(model, char, h0=h0)
chars = list(prefix)
for _ in range(k):
probs, pred_chars, h0 = predict(model, chars[-1], h0=h0)
pred_char = np.random.choice(pred_chars, p=probs)
chars.append(pred_char)
return ''.join(chars)
# 测试模型
print(predict_k(model, 'wo', k=10))
print(predict_k(model, 'ni', k=10))
print(predict_k(model, 'zh', k=10))
print(predict_k(model, 'ai', k=10))
根据题目要求,我们将模型的循环单元设置为nn.RNN或nn.GRU,并使用RNN所有时间步的输出作为输出层的全连接。
在训练过程中,我们需要注意梯度截断,即将所有梯度的范数限制在一个固定值内,以避免梯度爆炸的问题。
在预测时,我们可以给定一个前缀,使用单步预测和K步预测来生成新的文本。
为了测试模型的性能,我们可以改变时间步数,看看对应的生成效果是否有所不同。时间步数越长,模型可以捕捉到更长的文本上下文信息,但同时也会增加模型的复杂度和训练时间
原文地址: https://www.cveoy.top/t/topic/guqL 著作权归作者所有。请勿转载和采集!