Evaluating Vision Transformers for Land Use Classification: A BigEarthNet Benchmark

Title: Evaluating Vision Transformers for Land Use Classification: A BigEarthNet Benchmark

Abstract: Land use classification is a fundamental task in remote sensing, playing a crucial role in monitoring environmental changes, managing natural resources, and urban planning. Deep learning algorithms, especially convolutional neural networks (CNNs), have achieved remarkable results in land use classification tasks. However, the performance of CNNs is often limited by the size and complexity of the input images.

To address this challenge, the Vision Transformer (ViT) model has been proposed, which is a transformer-based architecture that can capture global image features and achieve state-of-the-art performance in various computer vision tasks. This paper investigates the effectiveness of ViT models for land use classification using the BigEarthNet dataset, a large-scale remote sensing archive with high-resolution Sentinel-2 satellite images.

The study compares the performance of ViT models with the baseline results obtained by a traditional CNN model. It also evaluates several ViT variants to identify the most effective model for land use classification.

Experimental results demonstrate that ViT models outperform the baseline CNN model in terms of accuracy, precision, and recall. Moreover, some ViT variants, such as ViT-B/16 and ViT-L/32, achieve higher accuracy than other variants. The paper concludes by discussing the current state-of-the-art techniques and future directions for land use classification using ViT models.

In conclusion, this study demonstrates that ViT models can effectively extract global image features and achieve superior performance in land use classification tasks. The results suggest that ViT models can be a promising alternative to CNNs for remote sensing applications.

Keywords: Vision Transformer, ViT, land use classification, remote sensing, BigEarthNet dataset

Evaluating Vision Transformers for Land Use Classification: A BigEarthNet Benchmark