英文文本停用词去除和TF-IDF统计:示例及前10个关键词

本文将介绍如何对英文文本进行停用词去除和TF-IDF统计,并展示前10个TF-IDF值最大的单词。

1. 停用词列表:

  • 'a'
  • 'an'
  • 'and'
  • 'are'
  • 'as'
  • 'at'
  • 'be'
  • 'by'
  • 'for'
  • 'from'
  • 'has'
  • 'he'
  • 'in'
  • 'is'
  • 'it'
  • 'its'
  • 'of'
  • 'on'
  • 'that'
  • 'the'
  • 'to'
  • 'was'
  • 'were'
  • 'with'

2. 文本示例:

'The quick brown fox jumped over the lazy dog. The dog didn't seem to care, and continued to lie there. The fox ran off into the woods.'

3. 去除停用词后的文本:

'quick brown fox jumped over lazy dog. dog didn't seem care, continued lie there. fox ran off woods.'

4. TF-IDF统计结果:

| Term | TF-IDF | |------|--------| | 'fox' | 0.6931 | | 'jumped' | 0.6931 | | 'lazy' | 0.6931 | | 'ran' | 0.6931 | | 'woods' | 0.6931 | | 'brown' | 0.2877 | | 'care' | 0.2877 | | 'dog' | 0.2877 | | 'lie' | 0.2877 | | 'off' | 0.2877 |

5. 前10个TF-IDF值最大的单词:

'fox', 'jumped', 'lazy', 'ran', 'woods', 'brown', 'care', 'dog', 'lie', 'off'。

本文以示例说明了如何对英文文本进行停用词去除和TF-IDF统计,以及如何提取前10个TF-IDF值最大的单词。这在自然语言处理中,如关键词提取和文本分析等领域,有着广泛的应用。

英文文本停用词去除和TF-IDF统计:示例及前10个关键词

原文地址: https://www.cveoy.top/t/topic/oMyl 著作权归作者所有。请勿转载和采集!

免费AI点我,无需注册和登录