error correctionWe use the above data sets to train and test several classi-cal and popular segmentation networks as well as the seg-mentation networks proposed in this paper U-Net is an image segment

We use the aforementioned datasets to train and test several classical and popular segmentation networks as well as the segmentation networks proposed in this paper. U-Net is an image segmentation network based on CNN's classic encoding-decoding structure, which serves as a foundation for later image segmentation methods. The DeepLab series segmentation network is another classical encoding-decoding structure partition network, and its hole convolution is considered one of the greatest contributions [43]. PSPnet is a FCN-based network that introduces a global pyramid pool, which integrates characteristics of different scales to obtain more global information [44]. Therefore, we initially compared RTMOSeg with U-Net, DeepLab v3+, and PSPnet.

With the application of Transformer in image processing, more excellent Transformer-type segmentation networks have emerged, such as Swin-Unet and TransuUNet. Swin-Unet is a transformer-based architecture specifically designed for the segmentation of medical images. It follows a U-shaped structure and includes an encoder, bottleneck, decoder, and skip connections. By utilizing these components, Swin-Unet can effectively segment medical images with high accuracy. TransuUNet also employs an encoder-decoder structure similar to "U". It incorporates Transformer in the encoder to combine Transformer and UNet, allowing the network to extract more abstract global information and utilize CNN to capture local details. Therefore, we also compared RTMOSeg with Swin-Unet and TransuUNet. We trained and tested these models using dataset 1 and dataset 2, respectively. The experimental results are presented in Table II and Table III