英文文章单词正确性检查工具 - 利用单词索引表进行拼写错误检测 - 常规

该程序利用给定的单词索引表（保存在当前目录下的文件 index.txt 中，且全为小写字母，按照字典序由小到大排列，每个单词独占一行）对某一英文文章（保存在当前目录下的另一个文件 in.txt 中）进行单词正确性检查。若该英文文章中出现的单词（只有连续字母组成）没有出现在单词索引文件中（检查时大小写无关），则将该出错的单词（其中的字母全部转换为小写）输出到当前目录下的另一文件 error.txt 中，每个单词独占一行，并且以字典序由小到大的顺序输出。

假设：

in.txt 中的文章有可能没有经过排版，格式有可能杂乱无章，也有可能没有写完整。
index.txt 中的单词个数不超过 1000 个，每个单词的长度不超过 50 个字母。
若出错的单词多次出现，则多次输出。

输入形式：

保存单词索引表的文件 index.txt 和保存英文文章的文件 in.txt 都位于当前目录下。

输出形式：

将出错的单词以字典序由小到大的顺序输出到当前目录下的文件 error.txt 中，每个单词单独占一行，多次出错的单词多次输出。若没有出现错误单词，则什么也不输出。

样例输入 1：

假设文件 in.txt 内容为：

There are two verrsions of the international standards for C.
Thee first version was ratified in 1989 by the American National
Standards Institue (ANS1) C standard committee.It is often
referred as ANS1 C or C89. The secand C standard was completed
in 1999. This standard is comonly referred to as C99. C99 is a
milestone in C's evolution into a viable programing languga
for numerical and scientific computing.

文件 index.txt 中的单词索引表内容为：

a
american
and
ansi
are
as
by
c
committee
commonly
completed
computing
evolution
first
for
in
institue
international
into
is
it
language
milestone
national
numerical
of
often
or
programming
ratified
referred
s
scientific
secand
standard
standards
the
there
this
to
two
version
versions
viable
was

样例输出 1：

文件 error.txt 中出错的单词应为：

ans
ans
comonly
languga
programing
thee
verrsions

样例 1 说明：

用 index.txt 中的单词索引表对 in.txt 中出现的每一个单词进行检查，检查时大小写无关，所以第一个单词 There 出现在索引表中，不是错误单词；单词 verrsions 没有出现在索引表中，拼写错误，所以作为出错单词输出；单词 ANSI 拼写成了 ANS1，将其中字母都转换为小写后输出，并且多次出现，多次输出；其他出错单词类似。错误单词输出按照字典序由小到大输出到 error.txt 文件中。

样例输入 2：

假设文件 in.txt 内容为：

There are two versions of the international standard fo

文件 index.txt 中的单词索引表内容为：

are
for
international
of
standards
the
there
two
versions

样例输出 2：

文件 error.txt 中出错的单词应为：

fo
standard

样例 2 说明：

文件 in.txt 中的单词 standard 没有出现在索引表文件 index.txt 中，所以作为错误单词输出。注意：样例 2 中 in.txt 文件内容还不完整，最后的单词 fo 后没有任何字符，fo 也没有出现在索引表中，所以也作为错误单词输出。

C语言内容：思路：

读入单词索引表，存入 set 中。
读入英文文章，逐个单词检查是否在 set 中出现。如果没有出现，则将该单词转换为小写后存入 set2 中。
将 set2 中的单词按照字典序排序后输出到 error.txt 文件中。

注意，单词的划分需要注意一些细节，比如单词中间可能出现连字符或者数字等非字母字符。

代码：

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#include <setjmp.h>
#include <set>

using namespace std;

int main() {
    // 打开单词索引表文件
    FILE *fp_index = fopen('index.txt', 'r');
    if (fp_index == NULL) {
        perror('打开 index.txt 文件失败');
        return 1;
    }

    // 创建一个 set 来存储单词索引表
    set<string> index_set;
    char word[51];
    while (fscanf(fp_index, '%s', word) != EOF) {
        index_set.insert(word);
    }
    fclose(fp_index);

    // 打开英文文章文件
    FILE *fp_in = fopen('in.txt', 'r');
    if (fp_in == NULL) {
        perror('打开 in.txt 文件失败');
        return 1;
    }

    // 创建一个 set 来存储出错的单词
    set<string> error_set;
    char ch;
    string current_word = '';
    while ((ch = fgetc(fp_in)) != EOF) {
        if (isalpha(ch)) {
            current_word += tolower(ch);
        } else {
            if (!current_word.empty()) {
                if (index_set.find(current_word) == index_set.end()) {
                    error_set.insert(current_word);
                }
                current_word = '';
            }
        }
    }
    if (!current_word.empty()) {
        if (index_set.find(current_word) == index_set.end()) {
            error_set.insert(current_word);
        }
    }
    fclose(fp_in);

    // 打开输出文件
    FILE *fp_error = fopen('error.txt', 'w');
    if (fp_error == NULL) {
        perror('打开 error.txt 文件失败');
        return 1;
    }

    // 将出错的单词按字典序排序后输出
    for (auto it = error_set.begin(); it != error_set.end(); ++it) {
        fprintf(fp_error, '%s
', it->c_str());
    }
    fclose(fp_error);

    return 0;
}