Multi-Scale Feature Fusion: Weight Calculation and its Impact on Encoder Performance

After the multi-scale feature fusion operation, we can obtain the calculation formula for weight WF as follows:

ψ represents the RELU function, ⊕ represents the concatenation operation, and δ represents the sigmoid function. The RELU function and concatenation operation can transform the feature information from different levels into weights, while the sigmoid function ensures that the values of WF are within the range of [0, 1]. Therefore, the fused feature can be obtained from the following equation:

F is obtained by multiplying the sum of features from four stages with weight WF. This is because in multi-scale features, each stage emphasizes different semantic and positional information. By allocating weights, we can obtain feature information from different stages and effectively utilize useful information from different scales, especially utilizing more low-level information. As a result, the encoder can achieve better output results.

Multi-Scale Feature Fusion: Weight Calculation and its Impact on Encoder Performance