PyTorch RNN Sentiment Analysis: Implementation and Evaluation

This assignment focuses on using PyTorch to implement Recurrent Neural Networks (RNNs) for the sentiment analysis task. Sentiment analysis aims to classify sentences (input) into specific sentiments (output labels), including 'positive', 'negative', and 'neutral'.

We will utilize the benchmark SST dataset for this assignment. The SST dataset is downloaded from the torchtext package, and preprocessing is done to build a vocabulary and split the dataset into training, validation, and test sets. This initial code snippet is provided and does not need modification.

import copy
import torch
from torch import nn
from torch import optim
import torchtext
from torchtext import data
from torchtext import datasets

TEXT = data.Field(sequential=True, batch_first=True, lower=True)
LABEL = data.LabelField()

# load data splits
train_data, val_data, test_data = datasets.SST.splits(TEXT, LABEL)

# build dictionary
TEXT.build_vocab(train_data)
LABEL.build_vocab(train_data)

# hyperparameters
vocab_size = len(TEXT.vocab)
label_size = len(LABEL.vocab)
padding_idx = TEXT.vocab.stoi['<pad>']
embedding_dim = 128
hidden_dim = 128

# build iterators
train_iter, val_iter, test_iter = data.BucketIterator.splits(
    (train_data, val_data, test_data), 
    batch_size=32)

Defining Training and Evaluation Functions:

The following code defines the training and evaluation functions for our sentiment analysis model.

def train(model, iterator, optimizer, criterion):
    model.train()
    epoch_loss = 0
    epoch_acc = 0

    for batch in iterator:
        optimizer.zero_grad()
        text, text_lengths = batch.text
        predictions = model(text, text_lengths).squeeze(1)
        loss = criterion(predictions, batch.label)
        acc = accuracy(predictions, batch.label)
        loss.backward()
        optimizer.step()
        epoch_loss += loss.item()
        epoch_acc += acc.item()

    return epoch_loss / len(iterator), epoch_acc / len(iterator)


def evaluate(model, iterator, criterion):
    model.eval()
    epoch_loss = 0
    epoch_acc = 0

    with torch.no_grad():
        for batch in iterator:
            text, text_lengths = batch.text
            predictions = model(text, text_lengths).squeeze(1)
            loss = criterion(predictions, batch.label)
            acc = accuracy(predictions, batch.label)
            epoch_loss += loss.item()
            epoch_acc += acc.item()

    return epoch_loss / len(iterator), epoch_acc / len(iterator)


def accuracy(preds, y):
    preds = torch.argmax(preds, dim=1)
    correct = (preds == y).float()
    acc = correct.sum() / len(correct)
    return acc

These functions will be used to train and evaluate the RNN model, calculating loss and accuracy metrics for each epoch.

PyTorch RNN Sentiment Analysis: Implementation and Evaluation