ResNet50 with CBAM: Enhancing Feature Extraction by Spatial Attention

Among them, MS is the spatial attention module, F represents the operation of the convolutional layer, and F' represents the feature obtained after passing through the spatial attention module. The CBAM module is integrated into the operation of each ResNet50 potential residual block. When downsampling is needed, the classical ResNet model uses a 1×1 convolution kernel with a stride of 2 to perform convolutional operation, which inevitably leads to information loss. Therefore, the convolutional operation of 1×1 and 2 stride of the convolution kernel will be avoided in this paper. Figure 3 describes the new residual block structure, the original structure (a) and the modified structure (b). The downsampling 1×1 convolution implementation also changed the convolution in the 3×3 layer of the residual component, in addition to introducing the CBAM module. When downsampling is needed, the direct mapping component adjusts the size of the convolution kernel from 1×1 to 2×2.