Active Learning Sampling Strategies: Diversity, Uncertainty, and Hybrid Approaches

Active learning sample selection strategies can be broadly categorized into three main types: diversity-based sampling, uncertainty-based sampling, and hybrid sampling.

'Diversity-based sampling' aims to select unlabeled data that best represents the overall distribution of the data for labeled queries. Mahmudul et al. proposed a novel active learning technique that enhances the informativeness of individual activity instances while also capturing contextual information during query selection. Ozan et al. constructed a core-set over latent features to identify a diverse set of samples. Samarth et al. introduced a method that learns a latent space using a variational autoencoder (VAE) and an adversarial network trained to distinguish between unlabeled and labeled data. Diversity-based sampling strategies rely on unsupervised methodologies, such as clustering.

'Uncertainty-based sampling' quickly improves performance by learning from samples with high uncertainty. Anant Raj and Francis Bach presented an active learning method based on uncertainty sampling, capable of converging to the optimal predictor of a linear model under sampling schemes for binary and multi-class classification. Ajay et al. proposed an uncertainty measure that extends margin-based uncertainty to multi-class scenarios, offering easy computation. Shen et al. introduced a straightforward uncertainty-based heuristic for active learning with sequence tagging. Wang et al. developed a new active labeling method based on uncertainty, AL-DL. Yoo et al. integrated a small parametric module, called 'loss prediction module,' into a target network, training it to predict target losses of unlabeled inputs. However, these methods solely rely on the predicted class probability, neglecting the intrinsic value of the feature representation itself.

'Hybrid AL methods' leverage both diversity and uncertainty in their sample selection methodologies. Gui et al. combined uncertainty and diversity concepts to propose an efficient active learning algorithm, particularly suited for large batch settings. Jordan et al. designed a strategy named BADGE, incorporating predictive uncertainty and sample diversity into every selected batch. Their approach, however, may not scale well to larger tasks due to computational constraints.

This passage highlights the key aspects of different sample selection strategies in active learning, providing a foundation for understanding their respective strengths and weaknesses.

Active Learning Sampling Strategies: Diversity, Uncertainty, and Hybrid Approaches