Entropy Sampling: Active Learning Strategy for Data Selection

This is an active learning strategy called Entropy Sampling, designed to choose the next batch of samples for labeling. Given labeled data, it calculates the entropy of unlabeled samples and selects the top 'n' samples with the highest entropy. Higher entropy indicates greater uncertainty about a sample's classification, making these selections beneficial for improving model accuracy. This strategy relies on the model's predicted probabilities and log probabilities, requiring computation and storage during model training.

class EntropySampling(Strategy):
    def __init__(self, X, Y, idxs_lb, net, handler, args):
        super(EntropySampling, self).__init__(X, Y, idxs_lb, net, handler, args)

    def query(self, n):
        idxs_unlabeled = np.arange(self.n_pool)[~self.idxs_lb]
        probs = self.predict_prob(self.X[idxs_unlabeled], self.Y.numpy()[idxs_unlabeled])
        log_probs = torch.log(probs)
        U = (probs*log_probs).sum(1)
        return idxs_unlabeled[U.sort()[1][:n]]

Entropy Sampling: Active Learning Strategy for Data Selection