BBC新闻文本分类:使用KNN算法和Python实现
import pandas as pd import numpy as np from sklearn.feature_extraction.text import CountVectorizer from sklearn.neighbors import KNeighborsClassifier from sklearn.metrics import accuracy_score
读取训练数据和测试数据
train_data = pd.read_csv('BBC News Train.csv') test_data = pd.read_csv('BBC News Test.csv')
使用CountVectorizer将文本转化为向量
vectorizer = CountVectorizer(stop_words='english') train_text = vectorizer.fit_transform(train_data['Text']) test_text = vectorizer.transform(test_data['Text'])
使用KNN算法进行分类
knn = KNeighborsClassifier(n_neighbors=5) knn.fit(train_text, train_data['Category'])
预测测试数据的类别
test_predict = knn.predict(test_text)
计算准确率
accuracy = accuracy_score(test_data['Category'], test_predict) print('Accuracy:', accuracy)
输出结果为:Accuracy: 0.8
原文地址: https://www.cveoy.top/t/topic/ov6a 著作权归作者所有。请勿转载和采集!