PyTorch 和 Bees 算法优化 LSTM 模型文本分类:数据处理示例
加载数据集
def load_dataset(): # 样本数据 data = ['I love this movie!', 'This film is great', 'Amazing movie', 'I do not like this movie', 'This movie is terrible', 'I hate this film'] # 对应的标签 labels = [1, 1, 1, 0, 0, 0] # 1表示正面评价,0表示负面评价 return data, labels
对数据进行预处理,将文本转化为数字
def preprocess(data, labels): # 创建一个包含所有单词的词汇表 vocab = set(word for sentence in data for word in sentence.split()) # 将词汇表转化为数字 word_to_idx = {word: i for i, word in enumerate(vocab)} # 将每个句子转化为数字序列 data = [[word_to_idx[word] for word in sentence.split()] for sentence in data] # 将标签转化为Tensor类型 labels = torch.tensor(labels) # 返回数字化的数据和标签 return data, labels
加载数据集并进行预处理
data, labels = load_dataset() data, labels = preprocess(data, labels)
打印一条样本数据和标签
print('样本数据:', data[0]) print('标签:', labels[0])
原文地址: https://www.cveoy.top/t/topic/lLel 著作权归作者所有。请勿转载和采集!