Gated Residual Network (GRN) for Time Series Data with TemporalLayer
class GatedResidualNetwork(nn.Module):
'''
The Gated Residual Network gives the model flexibility to apply non-linear
processing only when needed. It is difficult to know beforehand which
variables are relevant and in some cases simpler models can be beneficial.
GRN(a, c) = LayerNorm(a + GLU(eta_1)) # Dropout is applied to eta_1
eta_1 = W_1*eta_2 + b_1
eta_2 = ELU(W_2*a + W_3*c + b_2)
Args:
input_size (int): Size of the input
hidden_size (int): Size of the hidden layer
output_size (int): Size of the output layer
dropout (float): Fraction between 0 and 1 corresponding to the degree of dropout used
context_size (int): Size of the static context vector
is_temporal (bool): Flag to decide if TemporalLayer has to be used or not
'''
def __init__(self, input_size, hidden_size, output_size, dropout, context_size=None, is_temporal=True):
super().__init__()
self.input_size = input_size
self.output_size = output_size
self.context_size = context_size
self.hidden_size = hidden_size
self.dropout = dropout
self.is_temporal = is_temporal
if self.is_temporal:
if self.input_size != self.output_size:
self.skip_layer = TemporalLayer(nn.Linear(self.input_size, self.output_size))
# Context vector c
if self.context_size != None:
self.c = TemporalLayer(nn.Linear(self.context_size, self.hidden_size, bias=False))
# Dense & ELU
self.dense1 = TemporalLayer(nn.Linear(self.input_size, self.hidden_size))
self.elu = nn.ELU()
# Dense & Dropout
self.dense2 = TemporalLayer(nn.Linear(self.hidden_size, self.output_size))
self.dropout = nn.Dropout(self.dropout)
# Gate, Add & Norm
self.gate = TemporalLayer(GLU(self.output_size))
self.layer_norm = TemporalLayer(nn.BatchNorm1d(self.output_size))
else:
if self.input_size != self.output_size:
self.skip_layer = nn.Linear(self.input_size, self.output_size)
# Context vector c
if self.context_size != None:
self.c = nn.Linear(self.context_size, self.hidden_size, bias=False)
# Dense & ELU
self.dense1 = nn.Linear(self.input_size, self.hidden_size)
self.elu = nn.ELU()
# Dense & Dropout
self.dense2 = nn.Linear(self.hidden_size, self.output_size)
self.dropout = nn.Dropout(self.dropout)
# Gate, Add & Norm
self.gate = GLU(self.output_size)
self.layer_norm = nn.BatchNorm1d(self.output_size)
def forward(self, x, c=None):
'''
Args:
x (torch.tensor): tensor thas passes through the GRN
c (torch.tensor): Optional static context vector
'''
if self.input_size!=self.output_size:
a = self.skip_layer(x)
else:
a = x
x = self.dense1(x)
if c != None:
c = self.c(c.unsqueeze(1))
x += c
eta_2 = self.elu(x)
eta_1 = self.dense2(eta_2)
eta_1 = self.dropout(eta_1)
gate = self.gate(eta_1)
gate += a
x = self.layer_norm(gate)
return x
class TemporalLayer(nn.Module):
def __init__(self, module):
super().__init__()
'''
Collapses input of dim T*N*H to (T*N)*H, and applies to a module.
Allows handling of variable sequence lengths and minibatch sizes.
Similar to TimeDistributed in Keras, it is a wrapper that makes it possible
to apply a layer to every temporal slice of an input.
'''
self.module = module
def forward(self, x):
'''
Args:
x (torch.tensor): tensor with time steps to pass through the same layer.
'''
t, n = x.size(0), x.size(1)
x = x.reshape(t * n, -1)
x = self.module(x) # Error occurs here
x = x.reshape(t, n, x.size(-1))
return x
# Explanation of the Error and How to Fix It
The error message 'two matrices cannot be multiplied' in the TemporalLayer class, specifically in the line `x = self.module(x)`, indicates that the output shape of `self.module(x)` is incompatible with the expected shape of `x` for matrix multiplication. This usually arises when the `self.module` (the wrapped module) doesn't handle the reshaped input correctly or the input itself has an incorrect shape.
To resolve the error, follow these steps:
1. **Inspect the `self.module`:**
- Check if the wrapped `self.module` is designed to work with batched input (e.g., `nn.Linear` is suitable, but `nn.Conv1d` might not be if used directly without additional modifications).
- If the module itself requires a specific input shape, ensure that the reshaping in `TemporalLayer` aligns with those requirements.
2. **Verify Input Shape:**
- Ensure that the `x` tensor coming into `TemporalLayer` has the expected shape of `(T, N, H)`, where `T` is the number of time steps, `N` is the batch size, and `H` is the feature dimension.
3. **Debug Output Shapes:**
- Insert print statements within `TemporalLayer` to examine the shapes of `x` before and after reshaping, as well as the output shape of `self.module(x)`. This will help you pinpoint the mismatch.
4. **Adapt Module or Reshaping:**
- If the `self.module` is incompatible with the reshaped input, modify it or find a suitable alternative (e.g., a TimeDistributed wrapper might be needed).
- If the input shape is incorrect, modify the code that generates `x` to provide the expected `(T, N, H)` shape.
**Example Modification:**
Let's say the error arises because `self.module` is a `nn.Conv1d` layer, which expects a 3D input with the time dimension as the last one. You could fix it by modifying `TemporalLayer` to transpose the input before passing it to the convolution:
```python
class TemporalLayer(nn.Module):
# ... (rest of the code remains the same)
def forward(self, x):
t, n = x.size(0), x.size(1)
x = x.reshape(t * n, -1)
# Transpose for Conv1d
x = x.view(t * n, 1, x.size(-1)).transpose(1, 2)
x = self.module(x)
# Transpose back
x = x.transpose(1, 2).reshape(t, n, x.size(-1))
return x
By carefully analyzing the shapes and addressing the mismatch, you can resolve the error and ensure that the TemporalLayer class functions correctly for your time series data.
原文地址: http://www.cveoy.top/t/topic/lRra 著作权归作者所有。请勿转载和采集!