Understanding ResNest Blocks: Feature Map Splitting and Convolutional Transformations for Enhanced Efficiency

The ResNest block enhances efficiency in deep learning models by strategically dividing the input feature map. This process involves two key hyperparameters: K (number of groups) and R (number of splits per group).

In a ResNest block, the input feature map is divided into G = KR segments. Through ablation experiments, this study identified K = 2 and R = 2 as the most efficient configuration. Each segment then undergoes a series of convolutional transformations, represented as {F1, F2 ... FG}.

As depicted in Figure 2, the process begins with a 1x1 convolution, which further divides the feature maps into subsets denoted as x_i, where i∈{1,2,…s}. Each x_i subset maintains the same spatial size as the input feature map but possesses only 1/s the number of channels. Notably, x_1 is an exception and doesn't undergo a corresponding 3x3 convolution.

Each remaining feature subset x_i undergoes a corresponding 3x3 convolution denoted as K_i(), with the output represented as y_i. To facilitate feature interaction and information flow, x_i is added to the output of K_(i-1)() before being fed into K_i(). These transformed features are collectively represented as U_i() and can be expressed mathematically as shown in equation (1).

Understanding ResNest Blocks: Feature Map Splitting and Convolutional Transformations for Enhanced Efficiency