The ResNest block enhances efficiency in deep learning models by strategically dividing the input feature map. This process involves two key hyperparameters: K (number of groups) and R (number of splits per group).

In a ResNest block, the input feature map is divided into G = KR segments. Through ablation experiments, this study identified K = 2 and R = 2 as the most efficient configuration. Each segment then undergoes a series of convolutional transformations, represented as {F1, F2 ... FG}.

As depicted in Figure 2, the process begins with a 1x1 convolution, which further divides the feature maps into subsets denoted as x_i, where i∈{1,2,…s}. Each x_i subset maintains the same spatial size as the input feature map but possesses only 1/s the number of channels. Notably, x_1 is an exception and doesn't undergo a corresponding 3x3 convolution.

Each remaining feature subset x_i undergoes a corresponding 3x3 convolution denoted as K_i(), with the output represented as y_i. To facilitate feature interaction and information flow, x_i is added to the output of K_(i-1)() before being fed into K_i(). These transformed features are collectively represented as U_i() and can be expressed mathematically as shown in equation (1).

Understanding ResNest Blocks: Feature Map Splitting and Convolutional Transformations for Enhanced Efficiency

原文地址: https://www.cveoy.top/t/topic/jPdw 著作权归作者所有。请勿转载和采集!

免费AI点我,无需注册和登录