Python 使用 pytesseract 库识别图片验证码

本文将介绍如何使用 Python 的 pytesseract 库识别图片验证码，并提供代码示例。

import pytesseract
from PIL import Image, ImageFilter

# 读取图片并转为灰度图像
img = Image.open('D:\excel\yzm.png').convert('L')

# 去除干扰线
img = img.filter(ImageFilter.MedianFilter())

# 二值化图像
threshold = 150
table = []
for i in range(256):
    if i < threshold:
        table.append(0)
    else:
        table.append(1)
img = img.point(table, '1')

# 识别验证码
code = pytesseract.image_to_string(img, config='--psm 6')
print(code)

代码说明：

导入库： 首先需要导入 pytesseract 和 PIL 库。
读取图片： 使用 Image.open() 函数打开图片文件，并使用 convert('L') 方法将其转换为灰度图像。
去除干扰线： 使用 img.filter(ImageFilter.MedianFilter()) 方法去除图片中的干扰线。
二值化图像： 使用 threshold 和 table 变量对图像进行二值化处理，将图像转换为黑白图像。
识别验证码： 使用 pytesseract.image_to_string() 函数识别图像中的验证码，并将识别结果存储在 code 变量中。
打印结果： 使用 print() 函数打印识别结果。

代码错误：

代码中可能存在错误，例如：

缺少 pytesseract 库安装，请先安装：pip install pytesseract
'D:\excel\yzm.png' 的路径可能错误，请检查您的图片路径
代码中缺少必要的库导入语句，例如：from PIL import Image, ImageFilter

解决方法：

检查代码中的缩进是否正确。
检查代码中是否有缺失的括号、引号等符号。
检查图片路径是否正确。
安装必要的库。
如果问题依然存在，请提供完整的报错信息和代码，以便更好地帮助您解决问题。

注意事项：

pytesseract 库需要安装 Tesseract OCR 引擎，并配置环境变量。
验证码的识别率会受到图片质量、字体、背景等因素的影响。
为了提高识别率，可以对图片进行预处理，例如去除噪声、增强对比度等。