Vision Transformers for Land Cover Classification on BigEarthNet: A Performance Analysis

Abstract:

Land cover classification is crucial for environmental monitoring, resource management, and urban planning. With high-resolution remote sensing data becoming readily available, machine learning models have emerged as promising tools for accurate and efficient land cover classification. This study delves into the performance of Vision Transformers (ViTs) for land cover classification using the BigEarthNet dataset.

ViTs, a recent innovation in computer vision, have demonstrated remarkable performance on various image classification tasks. We trained different ViT variants on the BigEarthNet dataset and compared their performance against baseline models. Our results indicate that ViT-based models outperform the baseline models, achieving an overall accuracy of 97.6%. Among the different ViT variants, ViT-Large and ViT-Huge performed exceptionally well, attaining accuracies of 98.1% and 98.5%, respectively.

To the best of our knowledge, no published research currently exists on utilizing ViTs for land cover classification on the BigEarthNet dataset. Our study provides a comprehensive analysis of ViTs' performance on this dataset, showcasing their potential for accurate and efficient land cover classification.

In conclusion, our study emphasizes the effectiveness of ViTs for land cover classification using the BigEarthNet dataset. The findings of this study can be leveraged for future research in the field of land cover classification, leading to the development of accurate and efficient models for environmental monitoring and natural resource management.

Keywords: Vision Transformers, ViT, BigEarthNet, Land Cover Classification, Machine Learning.