U-Net with Transformers: Advancing Medical Image Segmentation

The U-shaped network architecture of encoder-decoder with CNN has been widely used in the field of medical image analysis. However, as mentioned earlier, CNN-based encoder-decoder networks struggle to capture the global information of medical images, which limits their ability to handle complex medical image segmentation tasks. In recent years, there have been efforts to combine transformers with U-shaped networks. TransUNet adds transformer layers to the encoding part of the U-shaped network, SwinUNet designs 12 layers of Swin Transformer blocks into the U-shaped network, and U-Transform adds MHSA and MHCA modules to the classic UNet network.