C++ 判断 UNICODE 字符串截断是否乱码

要判断一个 UNICODE 字符串是否截短无乱码，可以使用以下步骤：

将 UNICODE 字符串转换为 UTF-8 编码的字符串。
从 UTF-8 字符串的开头开始逐个字节读取，如果当前字节的最高位是 0，则说明这个字节是 ASCII 字符，可以直接跳过；如果最高位是 1，则说明这个字节是 UTF-8 编码的一部分，需要继续向后读取，直到读取到一个最高位为 0 的字节为止。
如果在读取 UTF-8 编码时出现了错误，比如读取到了一个无效的字节序列，或者读取到了一个截断的字节序列，就说明这个 UNICODE 字符串被截短了，并且出现了乱码。
如果能够顺利读取整个 UTF-8 字符串，就说明这个 UNICODE 字符串没有被截短，也没有出现乱码。

以下是一个示例代码：

#include <iostream>
#include <string>
#include <vector>

bool is_truncated(const std::wstring &wstr)
{
    std::wstring_convert<std::codecvt_utf8<wchar_t>> conv;
    std::string utf8_str = conv.to_bytes(wstr);

    bool in_sequence = false;
    for (char c : utf8_str) {
        if ((c & 0x80) == 0) {
            if (in_sequence) {
                return true;
            }
        } else if ((c & 0xC0) == 0x80) {
            if (!in_sequence) {
                return true;
            }
        } else if ((c & 0xE0) == 0xC0) {
            in_sequence = true;
        } else if ((c & 0xF0) == 0xE0) {
            in_sequence = true;
        } else if ((c & 0xF8) == 0xF0) {
            in_sequence = true;
        } else {
            return true;
        }
    }

    return false;
}

int main()
{
    std::wstring wstr1 = L'Hello, world!';
    std::wstring wstr2 = L'你好，世界！';
    std::wstring wstr3 = L'Hello, 世界！';
    std::wstring wstr4 = L'你好，world!';

    std::vector<std::wstring> wstrs = {wstr1, wstr2, wstr3, wstr4};

    for (const auto &wstr : wstrs) {
        bool truncated = is_truncated(wstr);
        std::wcout << wstr << L' is ' << (truncated ? L'truncated' : L'not truncated') << std::endl;
    }

    return 0;
}

输出结果：

Hello, world! is not truncated
你好，世界！ is not truncated
Hello, 世界！ is truncated
你好，world! is truncated