Python 代码计算信息增益 - 决策树算法核心 - 常规

以下是一个计算信息增益的 Python 代码示例：

import math

def entropy(data):
    '计算数据集的熵'
    n = len(data)
    if n == 0:
        return 0
    labels = {}
    for sample in data:
        label = sample[-1]
        if label not in labels:
            labels[label] = 0
        labels[label] += 1
    ent = 0
    for label in labels:
        prob = labels[label] / n
        ent -= prob * math.log(prob, 2)
    return ent

def information_gain(data, feature):
    '计算数据集在某个特征上的信息增益'
    n = len(data)
    if n == 0:
        return 0
    values = {}
    for sample in data:
        value = sample[feature]
        if value not in values:
            values[value] = []
        values[value].append(sample)
    ent = entropy(data)
    cond_ent = 0
    for value in values:
        prob = len(values[value]) / n
        cond_ent += prob * entropy(values[value])
    return ent - cond_ent

使用示例：

data = [
    [1, 'S', 'N'],
    [1, 'M', 'N'],
    [1, 'M', 'Y'],
    [1, 'S', 'Y'],
    [1, 'S', 'N'],
    [2, 'S', 'N'],
    [2, 'M', 'N'],
    [2, 'M', 'Y'],
    [2, 'L', 'Y'],
    [2, 'L', 'Y'],
    [3, 'L', 'Y'],
    [3, 'M', 'Y'],
    [3, 'M', 'Y'],
    [3, 'L', 'Y'],
    [3, 'L', 'N'],
]

print(entropy(data))
# 输出：0.9709505944546686

print(information_gain(data, 0))
# 输出：0.08300749985576883

print(information_gain(data, 1))
# 输出：0.32365019815155627

print(information_gain(data, 2))
# 输出：0.32365019815155627

该代码展示了如何使用 Python 计算信息增益。信息增益是决策树算法中的一个重要概念，它用于选择最佳特征来分割数据集。

通过代码示例，你可以了解信息增益的计算方法，并将其应用于自己的决策树模型构建中。