以下是一个基于 NLTK 的朴素贝叶斯分类器实现 IMDB 情感分析的示例代码:\n\npython\nimport nltk\nfrom nltk.corpus import movie_reviews\nfrom nltk.tokenize import word_tokenize\nfrom nltk.corpus import stopwords\nfrom nltk.stem import WordNetLemmatizer\nfrom sklearn.feature_extraction.text import CountVectorizer\nfrom sklearn.model_selection import train_test_split\nfrom sklearn.naive_bayes import MultinomialNB\n\n# 下载所需的 NLTK 数据\nnltk.download('movie_reviews')\nnltk.download('stopwords')\nnltk.download('wordnet')\n\n# 获取 IMDB 电影评论数据集\nreviews = [(list(movie_reviews.words(fileid)), category)\n for category in movie_reviews.categories()\n for fileid in movie_reviews.fileids(category)]\n\n# 去除停用词和标点符号,进行词形还原\nstop_words = set(stopwords.words('english'))\nlemmatizer = WordNetLemmatizer()\n\ndef preprocess_text(text):\n tokens = word_tokenize(text.lower())\n tokens = [lemmatizer.lemmatize(token) for token in tokens if token.isalpha()]\n tokens = [token for token in tokens if token not in stop_words]\n return ' '.join(tokens)\n\n# 预处理数据集\npreprocessed_reviews = [(preprocess_text(' '.join(review)), category) for review, category in reviews]\n\n# 划分训练集和测试集\nX_train, X_test, y_train, y_test = train_test_split([review for review, _ in preprocessed_reviews],\n [category for _, category in preprocessed_reviews],\n test_size=0.2, random_state=42)\n\n# 特征提取\nvectorizer = CountVectorizer()\nX_train_vectorized = vectorizer.fit_transform(X_train)\nX_test_vectorized = vectorizer.transform(X_test)\n\n# 创建朴素贝叶斯分类器并进行训练\nclassifier = MultinomialNB()\nclassifier.fit(X_train_vectorized, y_train)\n\n# 在测试集上进行预测\ny_pred = classifier.predict(X_test_vectorized)\n\n# 输出预测结果\nfor review, category, predicted_category in zip(X_test, y_test, y_pred):\n print(f"Review: {review}")\n print(f"True Category: {category}")\n print(f"Predicted Category: {predicted_category}")\n print()\n\n# 输出分类器的准确率\naccuracy = (y_pred == y_test).mean()\nprint(f"Accuracy: {accuracy}")\n\n\n这段代码使用 IMDB 电影评论数据集,首先对评论进行预处理,包括去除停用词、标点符号和词形还原。然后使用 CountVectorizer 对评论进行特征提取,将文本转换为向量表示。最后,使用 MultinomialNB 朴素贝叶斯分类器进行训练和预测,并输出预测结果和分类器的准确率。


原文地址: https://www.cveoy.top/t/topic/pvXY 著作权归作者所有。请勿转载和采集!

免费AI点我,无需注册和登录