RNN Classifier with Regularization for Overfitting Prevention

This article explores a recurrent neural network (RNN) classifier architecture and delves into the issue of overfitting. Learn how to add regularization techniques like L2 and dropout to prevent overfitting and improve model performance on unseen data.

Model Architecture

The provided model, RNNClassifier, implements a classic RNN architecture. It consists of:

Embedding Layer: Converts words into vector representations.
LSTM Layer: Captures sequential dependencies in the input text.
Fully Connected Layer: Maps the hidden state to the output label probabilities.

Here's the Python code defining the model:

class RNNClassifier(nn.Module):
    def __init__(self, vocab_size, embedding_dim, hidden_dim, label_size, padding_idx):
        super(RNNClassifier, self).__init__()
        self.vocab_size = vocab_size
        self.embedding_dim = embedding_dim
        self.hidden_dim = hidden_dim
        self.label_size = label_size
        self.num_layers = 2  # change the number of layers here
        self.dropout = nn.Dropout(0.5)  # add dropout layer

        # Embedding Layer
        self.embedding = nn.Embedding(vocab_size, embedding_dim, padding_idx=padding_idx)
        self.embedding_dropout = nn.Dropout(0.1)  # add embedding dropout layer

        # LSTM Layer
        self.lstm = nn.LSTM(embedding_dim, hidden_dim, num_layers=self.num_layers, batch_first=True)
        self.lstm_dropout = nn.Dropout(0.1)  # add lstm dropout layer

        # Fully Connected Layer
        self.fc = nn.Linear(hidden_dim, label_size)
        self.fc_dropout = nn.Dropout(0.1)  # add fc dropout layer

    def zero_state(self, batch_size):
        #hidden = torch.zeros(1, batch_size, self.hidden_dim) # change from 2-D to 3-D (num_layers=1 here)
        #cell = torch.zeros(1, batch_size, self.hidden_dim)   
        
        hidden = torch.zeros(self.num_layers, batch_size,  self.hidden_dim)
        cell = torch.zeros(self.num_layers, batch_size,  self.hidden_dim)
        return hidden, cell

    def forward(self, text):
        # text shape = [batch_size, seq_len]
        # Embedding
        emb = self.embedding(text)  # shape = [batch_size, seq_len, embedding_dim]
        emb = self.embedding_dropout(emb)  # apply dropout on embedding
        emb = torch.mean(emb, dim=1)  # mean pooling over time step

        # LSTM Layer
        h0, c0 = self.zero_state(text.size(0))  # shape = [num_layers, batch_size, hidden_dim]
        output, (hn, cn) = self.lstm(emb.unsqueeze(1), (h0, c0))  # add unsqueeze to convert 2-D to 3-D
        output = self.lstm_dropout(output)  # apply dropout on output of lstm

        # Fully Connected Layer
        output = torch.mean(output, dim=1)  # mean pooling over time step
        output = self.fc(output)  # pass through fully connected layer
        output = self.fc_dropout(output)  # apply dropout on output of fc layer

        return output

Overfitting Problem

The provided training results show a concerning trend: the validation loss (val loss) increases over training epochs. This indicates overfitting, where the model has learned the training data too well and struggles to generalize to unseen data.

Addressing Overfitting with Regularization

To mitigate overfitting, we can employ regularization techniques. These techniques add constraints to the model's learning process, preventing it from becoming too complex and fitting noise in the training data.

1. L2 Regularization

L2 regularization penalizes large weights in the model. This encourages the model to distribute weights more evenly, preventing individual weights from becoming overly influential. We can implement L2 regularization by adding the weight_decay parameter to the nn.Linear layer:

self.fc = nn.Linear(hidden_dim, label_size, weight_decay=0.01)

The weight_decay value controls the strength of the L2 penalty. A higher value corresponds to stronger regularization.

2. Dropout Regularization

Dropout randomly drops out units (neurons) during training. This prevents co-adaptation between units and forces the model to learn more robust representations. Dropout is already implemented in the model, but you can experiment with increasing its probability to further increase its effect:

self.dropout = nn.Dropout(0.5)
self.embedding_dropout = nn.Dropout(0.5)
self.lstm_dropout = nn.Dropout(0.5)
self.fc_dropout = nn.Dropout(0.5)

Conclusion

Regularization techniques are crucial for training robust and generalizable models. By incorporating L2 regularization and adjusting dropout probabilities, you can effectively combat overfitting and improve the performance of your RNN classifier on unseen data. Experiment with different values of these hyperparameters to find the optimal settings for your specific problem.

RNN Classifier with Regularization for Overfitting Prevention