RNN Classifier with Regularization for Overfitting Prevention
RNN Classifier with Regularization for Overfitting Prevention
This article explores a recurrent neural network (RNN) classifier architecture and delves into the issue of overfitting. Learn how to add regularization techniques like L2 and dropout to prevent overfitting and improve model performance on unseen data.
Model Architecture
The provided model, RNNClassifier, implements a classic RNN architecture. It consists of:
- Embedding Layer: Converts words into vector representations.
- LSTM Layer: Captures sequential dependencies in the input text.
- Fully Connected Layer: Maps the hidden state to the output label probabilities.
Here's the Python code defining the model:
class RNNClassifier(nn.Module):
def __init__(self, vocab_size, embedding_dim, hidden_dim, label_size, padding_idx):
super(RNNClassifier, self).__init__()
self.vocab_size = vocab_size
self.embedding_dim = embedding_dim
self.hidden_dim = hidden_dim
self.label_size = label_size
self.num_layers = 2 # change the number of layers here
self.dropout = nn.Dropout(0.5) # add dropout layer
# Embedding Layer
self.embedding = nn.Embedding(vocab_size, embedding_dim, padding_idx=padding_idx)
self.embedding_dropout = nn.Dropout(0.1) # add embedding dropout layer
# LSTM Layer
self.lstm = nn.LSTM(embedding_dim, hidden_dim, num_layers=self.num_layers, batch_first=True)
self.lstm_dropout = nn.Dropout(0.1) # add lstm dropout layer
# Fully Connected Layer
self.fc = nn.Linear(hidden_dim, label_size)
self.fc_dropout = nn.Dropout(0.1) # add fc dropout layer
def zero_state(self, batch_size):
#hidden = torch.zeros(1, batch_size, self.hidden_dim) # change from 2-D to 3-D (num_layers=1 here)
#cell = torch.zeros(1, batch_size, self.hidden_dim)
hidden = torch.zeros(self.num_layers, batch_size, self.hidden_dim)
cell = torch.zeros(self.num_layers, batch_size, self.hidden_dim)
return hidden, cell
def forward(self, text):
# text shape = [batch_size, seq_len]
# Embedding
emb = self.embedding(text) # shape = [batch_size, seq_len, embedding_dim]
emb = self.embedding_dropout(emb) # apply dropout on embedding
emb = torch.mean(emb, dim=1) # mean pooling over time step
# LSTM Layer
h0, c0 = self.zero_state(text.size(0)) # shape = [num_layers, batch_size, hidden_dim]
output, (hn, cn) = self.lstm(emb.unsqueeze(1), (h0, c0)) # add unsqueeze to convert 2-D to 3-D
output = self.lstm_dropout(output) # apply dropout on output of lstm
# Fully Connected Layer
output = torch.mean(output, dim=1) # mean pooling over time step
output = self.fc(output) # pass through fully connected layer
output = self.fc_dropout(output) # apply dropout on output of fc layer
return output
Overfitting Problem
The provided training results show a concerning trend: the validation loss (val loss) increases over training epochs. This indicates overfitting, where the model has learned the training data too well and struggles to generalize to unseen data.
Addressing Overfitting with Regularization
To mitigate overfitting, we can employ regularization techniques. These techniques add constraints to the model's learning process, preventing it from becoming too complex and fitting noise in the training data.
1. L2 Regularization
L2 regularization penalizes large weights in the model. This encourages the model to distribute weights more evenly, preventing individual weights from becoming overly influential. We can implement L2 regularization by adding the weight_decay parameter to the nn.Linear layer:
self.fc = nn.Linear(hidden_dim, label_size, weight_decay=0.01)
The weight_decay value controls the strength of the L2 penalty. A higher value corresponds to stronger regularization.
2. Dropout Regularization
Dropout randomly drops out units (neurons) during training. This prevents co-adaptation between units and forces the model to learn more robust representations. Dropout is already implemented in the model, but you can experiment with increasing its probability to further increase its effect:
self.dropout = nn.Dropout(0.5)
self.embedding_dropout = nn.Dropout(0.5)
self.lstm_dropout = nn.Dropout(0.5)
self.fc_dropout = nn.Dropout(0.5)
Conclusion
Regularization techniques are crucial for training robust and generalizable models. By incorporating L2 regularization and adjusting dropout probabilities, you can effectively combat overfitting and improve the performance of your RNN classifier on unseen data. Experiment with different values of these hyperparameters to find the optimal settings for your specific problem.
原文地址: https://www.cveoy.top/t/topic/pbli 著作权归作者所有。请勿转载和采集!