ResNet50 with CBAM for Image Classification: An Enhanced Residual Block Design

Among them, MS is the spatial attention module, F represents the operation of the convolutional layer, and F' represents the feature obtained after passing through the spatial attention module. The CBAM module is integrated into the operation of each ResNet50 potential residual block. When downsampling is required, the classic ResNet model performs convolutional operation with a 1×1 convolution kernel and a stride of 2, which inevitably leads to information loss. Therefore, the convolutional operation of the 1×1 and 2 stride of the convolution kernel will be avoided in this paper. Figure 3 describes the structure of the new residual block, the original structure (a) and the modified structure (b). The 1×1 convolution used for downsampling also changes the convolution in the residual components to 3×3 layers, except for introducing the CBAM module. When downsampling is required, the direct mapping component adjusts the size of the convolution kernel from 1×1 to 2×2.