Convolutional Block Attention Module (CBAM): An Adaptive Attention Mechanism for Deep Neural Networks

Woo et al. proposed CBAM [18], a novel convolutional attention module that guides the neural network model to focus on relevant information adaptively. Extensive research has been conducted on the depth and width of deep neural networks to improve their performance. As shown in Figure 2, CBAM is a typical attention mechanism module consisting of two sequential sub-modules, the channel attention module (CAM) and the spatial attention module (SAM), which perform adaptive filtering of input features in the channel and spatial dimensions, respectively. The CAM first receives the input features. To efficiently calculate the channel attention, the tensor-wide channel feature matrix is computed using global maximum pooling and global average pooling, resulting in two matrices. These two weight matrices are then placed in the same multi-layer perceptron to learn the optimized weights. The two output components are then added together to form the channel weighting module. The Sigmoid activation function compresses the data between 0 and 1, which is then multiplied by the input features, as shown in Equations (1) and (2).