Python Entropy Calculation: Measuring Uncertainty in SST and QNLI Datasets

This problem involves computing the entropy of a random variable using its probability distribution. Entropy measures the uncertainty associated with a variable.

We'll first create a Python function called 'compute_entropy' that takes a probability distribution as input and returns the entropy value. Here's the implementation:

import math

def compute_entropy(prob_dist):
    entropy = 0
    for p in prob_dist:
        if p != 0:
            entropy -= p * math.log2(p)
    return entropy

Next, we'll calculate the word-level entropy for the SST and QNLI datasets. You'll need the probability distributions for these datasets (created in Problem 2).

sst_prob_dist = [0.2, 0.3, 0.1, 0.4]
qnli_prob_dist = [0.1, 0.2, 0.3, 0.4]

sst_entropy = compute_entropy(sst_prob_dist)
qnli_entropy = compute_entropy(qnli_prob_dist)

print('Word-level entropy of SST:', sst_entropy)
print('Word-level entropy of QNLI:', qnli_entropy)

Replace sst_prob_dist and qnli_prob_dist with the actual probability distributions from Problem 2. Running this code will display the word-level entropy for SST and QNLI.

Note: Entropy is measured in bits. A higher entropy value signifies greater uncertainty or randomness within the distribution.