要将主题数目改为指定数目,需要在创建LDA模型时指定主题数目。以下是修改后的代码:

from sklearn.feature_extraction.text import CountVectorizer from sklearn.decomposition import LatentDirichletAllocation

n_topics = 5 # 设置主题数目为5

tf_vectorizer = CountVectorizer(max_df=0.95, min_df=2, stop_words='english') tf = tf_vectorizer.fit_transform(data)

lda = LatentDirichletAllocation(n_components=n_topics, random_state=0) lda.fit(tf)

tf_feature_names = tf_vectorizer.get_feature_names()

topic_word_prob = lda.components_ / lda.components_.sum(axis=1)[:, np.newaxis] for i, topic_prob in enumerate(topic_word_prob): top_words_idx = topic_prob.argsort()[:-21:-1] top_words = [tf_feature_names[idx] for idx in top_words_idx] print(f"Topic {i}: {', '.join(top_words)}") print(f"Word Prob: {', '.join([str(prob) for prob in topic_prob[top_words_idx]])}\n")

from sklearnfeature_extractiontext import CountVectorizer ldafittf topic_word_prob = ldacomponents_ ldacomponents_sumaxis=1 npnewaxis for i topic_prob in enumeratetopic_word_prob top_words_idx = topi

原文地址: http://www.cveoy.top/t/topic/bwlq 著作权归作者所有。请勿转载和采集!

免费AI点我,无需注册和登录