Surface Classification Using Deep Learning Models: Techniques, Advantages, and Challenges

According to the application requirement main objective of surface, classification is a division of regions or pixels in remotely sensed imagery.18 According to Reference 84, deep learning models for analyzing land cover usually depends on the sparse autoencoder (SAE),84 Deep belief network (DBN),71 Convolution neural network (CNN)84 as seen in Figure 4A-C, among all these techniques CNN (ie, convolution neural network) is the most prominent method.18 Table 2 represents the deep learning-based image classification techniques used for various purposes. Traditionally, AlexNet and VGG Net are used as deep convolution neural networks. Although, Classification using AlexNet and VGG Net modify image into the corresponding eigenvector through convolution, pooling, and fully connected layers. Value present in eigenvector represents the output of the image classification. The biggest problem seen with AlexNet and VGG Net is the problem in the classification of integrated images.18 AlexNet was implemented in 2012.86 AlexNet can solve image classification problem with input image having 1000 different classes and output as 1000 vectors of numbers.86 The ith vector in output can be interpreted as ith class of the input image. Land cover classification problem comes under image segmentation, the main problem enforced to addressed is multi-classification after linking each pixel in single image to a class label.86 Issue of image classification at pixel level and multi-classification is addressed by91 implementing FCN (fully CNN) based on semantic segmentation. FCN relies on CNN; hence it replaces fully connected layers along with pooling layers with convolutional layers. Transposed convolution layer is obtained as output at the end of FCN, and it is responsible for upsampling of image features predicting the size of output image depending on the input image and classification will be performed. End-to-end semantic segmentation will be perceived by FCN, but the performance of FCN is not very well in case of handling edge and classification certainty. Later on, SegNet, deep convolutional Encoder-Decoder Architecture is proposed by Reference 18. SegNet encoder architecture, depends on first 13 layers of VGG-16 architecture, accompanying enhancement in the translation stage of upsampling, except every decoder includes its own corresponding encoder, maintaining same segmentation accuracy with fewer parameters used for training and low memory overhead. To resolve the problem shortened resolution due to sub-sampling, Deeplab91 accept Atrous convolution for expanding receptive fields to get detailed contextual information.18 The latest version of Deeplab is Deeplab V3+92,93 having improved Atrous algorithm. For the integration of multiscale information, Deeplab V3+92,93 proposes encoder and decoder architecture and Xception model will be adopted. The dominant network to extract feature is ResNet,94 and it is accomplished by pre-training on ImageNet. In each convolution step of ResNet, divergent expansion rates along with Atrous convolution are used for capturing multi-scale based contextual knowledge.94 Presently there are different deep learning methods available for classifying surface. The Encoding step involves sub-sampling, convolution, and pooling for the acquisition of segmentation features. In Decoding stage, for labeling image with same features transposed convolution, pooling, and upsampling will be done, resulting in surface classification by using semantic segmentation. Parallelly, for improving classification accuracy, few deep learning methods use a post-enhancement step for removing clamor or noise and optimizing the boundary.