Enhancing ResNet with 3x3 Convolutions and Average Pooling

This article delves into ResNet-D, an enhanced version of the ResNet architecture. ResNet-D introduces two key modifications for improved performance:

Replacing the 7x7 Convolution: The initial 7x7 convolutional layer in ResNet is replaced with three consecutive 3x3 convolutional layers. This adjustment maintains receptive field size while reducing computational complexity. The backbone's stem channels are set to 64 for optimal feature extraction.
Introducing Average Pooling: A 2x2 average pooling layer is strategically added to the shortcut connection preceding the 1x1 convolutional layer within the transitioning blocks. This inclusion, coupled with a stride of 2, aids in preserving spatial information and improving feature propagation.

These modifications, adopted from ResNet-D[13], significantly contribute to the architecture's effectiveness in image classification tasks.

Enhancing ResNet with 3x3 Convolutions and Average Pooling