Hence, the CNN backbone network employed in this study is derived from the optimized network Contextual Transformer Networks (CoTNet-50) based on ResNet-50. The network structure and parameters of CoTNet-50 are presented in Table 1. CoTNet-50 utilizes a Transformer-style architecture, enabling it to effectively capture both global and adjacent context information in the image. This approach enhances the learning of self-attention in a resource-efficient manner, thereby improving the expressive capacity of the output features. For a more comprehensive understanding of CoTNet-50, please refer to [40] for a detailed explanation

润色:Therefore the CNN backbone network used in this paper is based on the optimized network Contextual Transformer Networks CoTNet-50 on ResNet-50 and its network structure and parameters are shown in

原文地址: http://www.cveoy.top/t/topic/iRyy 著作权归作者所有。请勿转载和采集!

免费AI点我,无需注册和登录