Python结巴分词报错：'empty vocabulary; perhaps the documents only contain stop words' 解决方法

使用jieba库进行中文分词时，可能会遇到以下报错信息：'empty vocabulary; perhaps the documents only contain stop words'。

该错误的原因是文本中只包含停用词，没有其他有效的词语。可能是因为你的停用词表中包含了所有可能的词语，导致文本中的所有词都被判定为停用词。

解决方法：

检查停用词表： 仔细检查你的停用词表，看看是否存在问题。例如，停用词表中是否包含了所有可能的词语？是否包含了一些不必要的词语？
测试程序： 使用一些其他的文本来测试一下程序，看看是否能够正常工作。如果程序能够正常工作，那么问题可能出在你的停用词表上。

示例代码：

import jieba
import pandas as pd

def m_cut(intxt):
    stoplist = list(pd.read_csv(r'C:\Users\Lenovo\Desktop\习题和作业\文本分析\停用词.txt', names = ['w'], sep = 'aaa',
                            encoding = 'utf-8', engine='python').w)
    return [ w for w in jieba.lcut(intxt) if w not in stoplist ]

df = pd.DataFrame({'txt': ['这是一段测试文本']})
reschap = [', '.join(m_cut(w)) for w in df['txt'][0] ]
print(reschap)

注意：

确保你的停用词表是正确的，并且不包含所有可能的词语。
使用不同的文本测试程序，以确保程序能够正常工作。

Python结巴分词报错：'empty vocabulary; perhaps the documents only contain stop words' 解决方法