The code you provided has a few errors. Here is the corrected version:

import re
import jieba
import pandas as pd
import numpy as np

commentList = []

for i in range(10):
    num = i + 1
    commentList_temp = getCommentsById(num)
    commentList.append(commentList_temp)

comments = ''
for k in range(len(commentList)):
    comments = comments + (str(commentList[k])).strip()

pattern = re.compile(r'[\u4e00-\u9fa5]+')
filterdata = re.findall(pattern, comments)
cleaned_comments = ''.join(filterdata)

segment = jieba.lcut(cleaned_comments)
words_df = pd.DataFrame({'segment': segment})

stopwords = pd.read_csv("chineseStopWords.txt", index_col=False, sep="\t", names=['stopword'], encoding='utf-8')
words_df = words_df[~words_df.segment.isin(stopwords.stopword)]

words_stat = words_df.groupby(by=['segment'])['segment'].agg([('计数', np.size)])

words_stat = words_stat.reset_index().sort_values(by=["计数"], ascending=False)

print(words_stat)

Please note that you need to replace getCommentsById with the actual function or method that retrieves comments by ID. Also, make sure that the file chineseStopWords.txt is saved in UTF-8 encoding and the separator for the file is a tab ("\t")

#循坏获取第一个电影的前1O页评论commentList = for i in range10 num = i + 1 commentList_temp = getCommentsByIdnum commentListappendcommentList_temp #将列表中的数据转换为字符串comments = for k in rangelencommentList

原文地址: https://www.cveoy.top/t/topic/iYPs 著作权归作者所有。请勿转载和采集!

免费AI点我,无需注册和登录