C++ 如何在UTF-8 C+中读取文件流+；_C++_Windows_Unicode_Utf 8_Filestream

C++ 如何在UTF-8 C+中读取文件流+；

c++ windows unicode utf-8

C++ 如何在UTF-8 C+中读取文件流+；,c++,windows,unicode,utf-8,filestream,C++,Windows,Unicode,Utf 8,Filestream,通过重定向终端上的输入和输出，然后使用wcin和wcout，我能够成功地读入UTF8字符文本文件 _setmode(_fileno(stdout), _O_U8TEXT); _setmode(_fileno(stdin), _O_U8TEXT); 现在我希望能够使用filestreams读取UTF8文本，但我不知道如何设置filestreams的模式，以便它可以像使用stdin和stdout一样读取这些字符。我尝试过使用wifstreams/wofstreams和那些仍然自己读写垃圾的wifs

通过重定向终端上的输入和输出，然后使用wcin和wcout，我能够成功地读入UTF8字符文本文件

_setmode(_fileno(stdout), _O_U8TEXT);
_setmode(_fileno(stdin), _O_U8TEXT);

现在我希望能够使用filestreams读取UTF8文本，但我不知道如何设置filestreams的模式，以便它可以像使用stdin和stdout一样读取这些字符。我尝试过使用wifstreams/wofstreams和那些仍然自己读写垃圾的wifstreams。

C++的

库不支持从一种文本编码到另一种文本编码的转换。如果需要将输入文本从utf-8转换为另一种格式（例如，编码的底层代码点），则需要手动编写转换

std::string data;
std::ifstream in("utf8.txt");
in.seekg(0, std::ios::end);
auto size = in.tellg();
in.seekg(0, std::ios::beg);
data.resize(size);
in.read(data.data(), size);
//data now contains the entire contents of the file

uint32_t partial_codepoint = 0;
unsigned num_of_bytes = 0;
std::vector<uint32_t> codepoints;
for(char c : data) {
    uint8_t byte = uint8_t(c);
    if(byte < 128) {
        //Character is just a basic ascii character, so we'll just set that as the codepoint value
        codepoints.push_back(byte);
        if(num_of_bytes > 0) {
            //Data was malformed: error handling?
            //Codepoint abruptly ended
        }
    } else {
        //Character is part of multi-byte encoding
        if(partial_codepoint) {
            //We've already begun storing the codepoint
            if((byte >> 6) != 0b10) {
                //Data was malformed: error handling?
                //Codepoint abruptly ended
            }
            partial_codepoint = (partial_codepoint << 6) | (0b0011'1111 & byte);
            num_of_bytes--;
            if(num_of_bytes == 0) {
                codepoints.emplace_back(partial_codepoint);
                partial_codepoint = 0;
            }
        } else {
            //Beginning of new codepoint
            if((byte >> 6) == 0b10) {
                //Data was malformed: error handling?
                //Codepoint did not have proper beginning
            }
            while(byte & 0b1000'0000) {
                num_of_bytes++;
                byte = byte << 1;
            }
            partial_codepoint = byte >> num_of_bytes;
        }
    }
}

std:：字符串数据；
std:：ifstream-in（“utf8.txt”）；
in.seekg（0，std:：ios:：end）；
自动大小=in.tellg（）；
in.seekg（0，std:：ios:：beg）；
数据。调整大小（大小）；
in.read（data.data（），size）；
//数据现在包含文件的全部内容
uint32部分码点=0；
字节的无符号数=0；
向量码点；
for（字符c：数据）{
uint8_t字节=uint8_t（c）；
如果（字节<128）{
//字符只是一个基本的ascii字符，所以我们将其设置为代码点值
代码点。推回（字节）；
如果（字节数>0）{
//数据格式不正确：错误处理？
//代码点突然结束
}
}否则{
//字符是多字节编码的一部分
if（部分_码点）{
//我们已经开始存储代码点了
如果（（字节>>6）！=0b10）{
//数据格式不正确：错误处理？
//代码点突然结束
}
partial_codepoint=（partial_codepoint num_of_字节；
}
}
}

此代码将可靠地从[正确编码的]utf-8转换为utf-32，这通常是直接转换为glyphs+字符的最简单形式，但请记住这一点

为了保持代码的一致性，我建议使用

std:：string

将utf-8编码文本存储在程序中，并将utf-32编码文本存储为

std:：vector

您可以使用

std:：ifstream

（或

std:：cin

）读取

utf-8

如果您想在程序中使用

utf-8

，则无需进行任何调整。当您想在程序中使用不同的编码时，就会出现问题。然后需要进行一些转换。那么，您希望将

utf-8

转换为数字以查看gematria的目标编码是什么？即“研究”寻找律法中的模式和信息，这显然包含了所有问题的答案。