ResNet50 with CBAM: Enhancing Feature Extraction and Reducing Information Loss

Among them, MS is the spatial attention module, F represents the operation of convolutional layer, and F' represents the features obtained after the spatial attention module. The CBAM module is integrated into the work of each potential residual block in ResNet50. When downsampling is needed, the classic ResNet model uses a 1×1 convolution kernel with a stride of 2 to perform convolutional operation, which inevitably leads to information loss. Therefore, convolutional operations with 1×1 and 2 stride of the convolution kernel will be avoided in this paper. Figure 3 describes the structure of the new residual block, the original structure (a) and the modified structure (b). The downsampling 1×1 convolution is also changed to the residual component to change the convolution in the 3×3 layer, except for introducing the CBAM module. When downsampling is needed, the size of the convolution kernel is directly mapped to 2×2 by adjusting the additional mapping component.