C++ 英文文本单词统计:基础版与进阶版详解

本教程将使用 C++ 语言实现英文文本的单词统计功能,并提供基础版和进阶版的代码示例,助您轻松掌握文本分析技巧。

需求说明:

  • 输入:若干行英文文本,最后以 !!!!! 结束。
  • 输出:
    • 单词总数
    • 出现次数排名前 10 的单词及其次数(次数降序排列,次数相同则按字母升序排列)

注:

  • 单词之间以一个或多个空格作为间隔。
  • 忽略空行或空格行。

基础版:

  • 区分字母大小写,不删除指定标点符号。

进阶版:

  • 统计前需要从文字中删除指定标点符号 !.,:*?,用一个空格替换对应字符。
  • 统计单词时忽略大小写。

代码示例

基础版:

#include <iostream>
#include <string>
#include <map>
#include <algorithm>
#include <vector>

using namespace std;

int main() {
    string line, word;
    map<string, int> wordCount;
    int totalWords = 0;

    while (getline(cin, line) && line != '!!!!!') {
        if (line.empty()) continue; // 忽略空行

        // 分割单词
        size_t start = 0, end = line.find(' ');
        while (end != string::npos) {
            word = line.substr(start, end - start);
            wordCount[word]++;
            start = end + 1;
            end = line.find(' ', start);
            totalWords++;
        }
        // 处理最后一个单词
        word = line.substr(start);
        wordCount[word]++;
        totalWords++;
    }

    cout << totalWords << endl;

    // 统计出现次数排名前 10 的单词
    vector<pair<string, int>> topWords(wordCount.begin(), wordCount.end());
    sort(topWords.begin(), topWords.end(), [](const pair<string, int>& a, const pair<string, int>& b) {
        if (a.second == b.second) return a.first < b.first; // 次数相同按字母排序
        return a.second > b.second; // 次数降序排序
    });

    for (int i = 0; i < 10 && i < topWords.size(); i++) {
        cout << topWords[i].first << '=' << topWords[i].second << endl;
    }

    return 0;
}

进阶版:

#include <iostream>
#include <string>
#include <map>
#include <algorithm>
#include <vector>
#include <cctype>

using namespace std;

int main() {
    string line, word;
    map<string, int> wordCount;
    int totalWords = 0;

    while (getline(cin, line) && line != '!!!!!') {
        if (line.empty()) continue; // 忽略空行

        // 删除标点符号
        for (char& c : line) {
            if (c == '!' || c == '.' || c == ',' || c == ':' || c == '*' || c == '?' || c == ':') {
                c = ' ';
            }
        }

        // 分割单词
        size_t start = 0, end = line.find(' ');
        while (end != string::npos) {
            word = line.substr(start, end - start);
            // 忽略大小写
            transform(word.begin(), word.end(), word.begin(), ::tolower);
            wordCount[word]++;
            start = end + 1;
            end = line.find(' ', start);
            totalWords++;
        }
        // 处理最后一个单词
        word = line.substr(start);
        transform(word.begin(), word.end(), word.begin(), ::tolower);
        wordCount[word]++;
        totalWords++;
    }

    cout << totalWords << endl;

    // 统计出现次数排名前 10 的单词
    vector<pair<string, int>> topWords(wordCount.begin(), wordCount.end());
    sort(topWords.begin(), topWords.end(), [](const pair<string, int>& a, const pair<string, int>& b) {
        if (a.second == b.second) return a.first < b.first; // 次数相同按字母排序
        return a.second > b.second; // 次数降序排序
    });

    for (int i = 0; i < 10 && i < topWords.size(); i++) {
        cout << topWords[i].first << '=' << topWords[i].second << endl;
    }

    return 0;
}

输入输出样例

输入样例 1:

failure is probably the fortification in your pole

it is like a peek your wallet as the thief when you
are thinking how to spend several hard-won lepta
          
when you are wondering whether new money it has laid
background because of you then at the heart of the
     
most lax alert and most low awareness and left it

godsend failed
!!!!!

输出样例 1:

46
the=4
it=3
you=3
and=2
are=2
is=2
most=2
of=2
when=2
your=2

输入样例 2:

Failure is probably The fortification in your pole!

It is like a peek your wallet as the thief when You
are thinking how to. spend several hard-won lepta.

when yoU are? wondering whether new money it has laid
background Because of: yOu?, then at the heart of the
Tom say: Who is the best? No one dare to say yes.
most lax alert and! most low awareness and* left it

godsend failed
!!!!!

输出样例 2:

54
the=5
is=3
it=3
you=3
and=2
are=2
most=2
of=2
say=2
to=2

总结

本教程详细介绍了使用 C++ 语言实现英文文本单词统计功能,并提供了基础版和进阶版的代码示例,帮助您轻松掌握文本分析技巧。您可以根据自己的需求进行修改和扩展。

希望本教程对您有所帮助!

C++ 英文文本单词统计:基础版与进阶版详解

原文地址: https://www.cveoy.top/t/topic/oelM 著作权归作者所有。请勿转载和采集!

免费AI点我,无需注册和登录