Vision Transformer Models for BigEarthNet Land Cover Classification: A Comparative Study

Abstract:

The increasing availability of satellite imagery has made land cover and land use classification a crucial task in remote sensing. BigEarthNet, a publicly accessible dataset of Sentinel-2 imagery, comprises 590,326 patches representing 10 different land cover types. Deep learning models have emerged as a prominent solution for BigEarthNet classification, although their performance can be hindered by the image size and dataset complexity.

This paper explores the application of Vision Transformer (ViT) models for BigEarthNet classification. ViTs, transformer-based models, have demonstrated promising results in image classification. We conduct a comparative analysis of ViT models with benchmark results achieved using conventional deep learning models like Convolutional Neural Networks (CNNs). Our investigation examines whether any ViTs are currently employed for BigEarthNet and evaluates their performance.

Our experimental findings reveal that ViT models surpass traditional CNNs in terms of accuracy and speed. We also observe varying levels of effectiveness among different ViT variants. Our results advocate for ViT models as a potent tool for classifying BigEarthNet imagery.

This paper offers a comprehensive assessment of ViT model utilization for classifying BigEarthNet imagery. We delve into the potential advantages of using ViT models in remote sensing applications and present our observations on the efficiency of various ViT variants. Our findings highlight the significance of considering ViT models for classifying Sentinel-2 imagery.

Keywords: BigEarthNet, Vision Transformer, ViT, classification, remote sensing.

Vision Transformer Models for BigEarthNet Land Cover Classification: A Comparative Study