Python 邮件分类模型：使用词频向量进行垃圾邮件检测

日期: 2028-12-22 00:33:21

标签: 常规

def predict(txtFile):
    # 调用 getWordFromFile 获取所有单词
    words = getWordFromFile(r'C:/Users/cloud/Desktop/5/email_ds' + txtFile)
    
    # 计算 topWords 在测试邮件中的词频向量
    currentVector = [words.count(x) for x in topWords]
    
    # 调用训练好的模型进行预测
    result = model.predict(np.array(currentVector).reshape(1, -1))
    
    # 返回的是一个二分类的结果，predict_proba返回的是对于预测为各个类别的概率
    print(model.predict_proba(np.array(currentVector).reshape(1, -1)))
    
    return '垃圾邮件' if result == 1 else '正常邮件'

for mail in ('%d.txt' % i for i in range(151, 156)):
    print(mail, predict(mail), sep=':')

原文地址: https://www.cveoy.top/t/topic/fmUm 著作权归作者所有。请勿转载和采集!