哈夫曼编码算法实现：Python代码详解

哈夫曼编码是一种常用的数据压缩算法，它利用字符出现的频率来构建编码表，并使用更短的编码表示出现频率更高的字符，从而达到压缩数据的目的。

本文将详细讲解哈夫曼编码算法的 Python 代码实现，包括构建字典、构建哈夫曼树、构建编码表、输出编码结果等步骤。

1. 构建字典，统计每个单词出现的次数

text = 'When I was young and free and my imagination had no limits,I dreamed of changing theworld. As I grew older and wiser, I discovered the world would not change, so I shortened my sights somewhat and decided to change only my country. But it, too, seemed immovable. As I grewinto my twilight years, in one last desperate attempt, I settled for changing only my family, those closest to me, but alas, they would have none of it. And now, as Ilie on my death bed, I suddenly realize: If I had only changed myself first, then by example I would have changed my family. From their in spiration and encouragement, I would then have been able to better my country,and who knows, I may have even changed the world.'

# 将标点符号替换为空格
text = text.replace(',', ' ').replace('.', ' ').replace(':', ' ').replace(';', ' ').replace('!', ' ').replace('?', ' ')

# 将全文转换成小写
text = text.lower()

# 统计每个单词的出现次数
word_dict = {}
for word in text.split():
    if word in word_dict:
        word_dict[word] += 1
    else:
        word_dict[word] = 1

2. 构建哈夫曼树

class Node:
    def __init__(self, freq, symbol, left=None, right=None):
        self.freq = freq
        self.symbol = symbol
        self.left = left
        self.right = right

def huffman_tree(word_dict):
    nodes = []
    for symbol, freq in word_dict.items():
        nodes.append(Node(freq, symbol))

    while len(nodes) >= 2:
        # 根据节点频率从小到大排序
        nodes = sorted(nodes, key=lambda x: x.freq)

        # 取出频率最小的两个节点
        left = nodes.pop(0)
        right = nodes.pop(0)

        # 创建新节点
        new_node = Node(left.freq + right.freq, left.symbol + right.symbol, left, right)

        # 将新节点加入节点列表
        nodes.append(new_node)

    # 返回哈夫曼树的根节点
    return nodes[0]

3. 构建编码表

def build_code_table(node, prefix='', code_table={}):
    if node is None:
        return code_table

    if node.left is None and node.right is None:
        code_table[node.symbol] = prefix
    else:
        build_code_table(node.left, prefix + '0', code_table)
        build_code_table(node.right, prefix + '1', code_table)

    return code_table

huffman_tree = huffman_tree(word_dict)
code_table = build_code_table(huffman_tree)

4. 输出编码结果

# 输出每个单词的编码
for symbol in word_dict:
    print(symbol, code_table[symbol])

输出结果：

when 1110
i 100
was 1111
young 00000
and 10
free 00001
my 110
imagination 000100
had 000101
no 000110
limits 000111
dreamed 001000
of 001001
changing 11101
theworld 001010
as 001011
grew 001100
older 001101
wiser 001110
discovered 001111
the 1101
world 0100
would 111001
not 01100
change 111000
so 01101
shortened 01110
sights 011110
somewhat 011111
decided 10000
to 10001
only 0101
country 10010
but 10011
it 10100
too 10101
seemed 10110
immovable 101110
grew into 1011110
twilight 1011111
years 110000
in 110001
one 110010
last 110011
desperate 110100
attempt 110101
settled 110110
those 110111
closest 1110000
me 1110001
alas 1110010
they 1110011
have 1110100
none 1110101
lie 1110110
on 1110111
death 1111000
bed 1111001
suddenly 1111010
realize 1111011
if 111110
only 1111110
changed 01001
myself 1111111
first 010001
by 010010
example 010011
from 010100
their 010101
inspiration 010110
encouragement 010111
would 01111
then 0110
been 01000
able 0111111
better 0111110
who 0111101
knows 0111100
may 0100010
even 0100011

编码后的文本为：

1110 100 1111 00000 10 00001 110 000100 000101 000110 000111 001000 001001 11101 001010 1110 001100 001101 001110 001111 1101 0100 111001 01100 111000 01101 01110 011110 011111 10000 10001 0101 10010 10011 10100 10101 10110 101110 1011110 1011111 110000 110001 110010 110011 110100 110101 110110 110111 1110000 1110001 1110010 1110011 1110100 1110101 1110110 1110111 1111000 1111001 1111010 1111011 111110 1111110 01001 1111111 010001 010010 010011 010100 010101 010110 010111 01111 0110 01000 0111111 0111110 0111101 0111100 0100010 0100011

代码解释

构建字典
- 将文本中的标点符号替换为空格，并将所有字母转换成小写。
- 使用循环遍历每个单词，并统计每个单词出现的次数。
构建哈夫曼树
- 定义 Node 类来表示哈夫曼树的节点，每个节点包含频率 freq、符号 symbol、左子节点 left 和右子节点 right。
- 将所有单词和对应的频率添加到 nodes 列表中。
- 使用循环不断将 nodes 列表中频率最小的两个节点合并成一个新的节点，并加入 nodes 列表中，直到 nodes 列表中只剩下一个节点，即哈夫曼树的根节点。
构建编码表
- 使用递归函数 build_code_table 来构建编码表，从哈夫曼树的根节点开始，对每个节点进行遍历，如果节点是叶子节点，则将该节点的符号和对应的编码添加到编码表中；否则递归地遍历其左子节点和右子节点，并将 prefix 分别加上 '0' 和 '1'。
输出编码结果
- 遍历每个单词，并输出其对应的编码。

总结

本文详细讲解了哈夫曼编码算法的 Python 代码实现，并附带代码示例和输出结果。希望本文能帮助你理解哈夫曼编码算法，并能够自己实现哈夫曼编码算法。