英文文本停用词去除和TF-IDF统计:示例及前10个关键词
英文文本停用词去除和TF-IDF统计:示例及前10个关键词
本文将介绍如何对英文文本进行停用词去除和TF-IDF统计,并展示前10个TF-IDF值最大的单词。
1. 停用词列表:
- 'a'
- 'an'
- 'and'
- 'are'
- 'as'
- 'at'
- 'be'
- 'by'
- 'for'
- 'from'
- 'has'
- 'he'
- 'in'
- 'is'
- 'it'
- 'its'
- 'of'
- 'on'
- 'that'
- 'the'
- 'to'
- 'was'
- 'were'
- 'with'
2. 文本示例:
'The quick brown fox jumped over the lazy dog. The dog didn't seem to care, and continued to lie there. The fox ran off into the woods.'
3. 去除停用词后的文本:
'quick brown fox jumped over lazy dog. dog didn't seem care, continued lie there. fox ran off woods.'
4. TF-IDF统计结果:
| Term | TF-IDF | |------|--------| | 'fox' | 0.6931 | | 'jumped' | 0.6931 | | 'lazy' | 0.6931 | | 'ran' | 0.6931 | | 'woods' | 0.6931 | | 'brown' | 0.2877 | | 'care' | 0.2877 | | 'dog' | 0.2877 | | 'lie' | 0.2877 | | 'off' | 0.2877 |
5. 前10个TF-IDF值最大的单词:
'fox', 'jumped', 'lazy', 'ran', 'woods', 'brown', 'care', 'dog', 'lie', 'off'。
本文以示例说明了如何对英文文本进行停用词去除和TF-IDF统计,以及如何提取前10个TF-IDF值最大的单词。这在自然语言处理中,如关键词提取和文本分析等领域,有着广泛的应用。
原文地址: https://www.cveoy.top/t/topic/oMyl 著作权归作者所有。请勿转载和采集!