python tesseract 安装中文语言包

要在Python中安装Tesseract的中文语言包，可以按照以下步骤进行操作：

安装Tesseract OCR引擎

在安装中文语言包之前，需要先安装Tesseract OCR引擎。可以在命令行中输入以下命令来安装：

sudo apt-get install tesseract-ocr

如果你使用的是Windows系统，可以从Tesseract的官方网站下载可执行文件并安装：https://github.com/UB-Mannheim/tesseract/wiki

下载中文语言包

Tesseract的中文语言包可以从以下网址下载：

https://github.com/tesseract-ocr/tessdata/blob/master/chi_sim.traineddata

将文件chi_sim.traineddata保存到任何你想要的目录中。

安装Pytesseract

Pytesseract是Tesseract的Python封装，需要先安装它。可以通过以下命令来安装：

pip install pytesseract

将中文语言包移动到Tesseract的语言数据目录

将下载的chi_sim.traineddata文件移动到Tesseract的语言数据目录中。在Ubuntu系统中，该目录位于/usr/share/tesseract-ocr/4.00/tessdata/。在Windows系统中，该目录位于Tesseract的安装目录下的tessdata文件夹中。

在Python中使用中文语言包

现在，你可以在Python中使用Tesseract的中文语言包了。在代码中，将语言参数设置为'chi_sim'即可：

import pytesseract

# 设置Tesseract的语言参数
pytesseract.pytesseract.tesseract_cmd = '/usr/bin/tesseract'
pytesseract.pytesseract.lang = 'chi_sim'

# 读取图片并进行OCR识别
text = pytesseract.image_to_string('image.png')
print(text)

这将使用Tesseract的中文语言包来识别图片，并将结果打印到控制台上。