Visual c++ Visual C+中的Unicode文本+；_Visual C++_Unicode_Unicode Escapes_Unicode Literals

Visual c++ Visual C+中的Unicode文本+；

visual-c++ unicode

Visual c++ Visual C+中的Unicode文本+；,visual-c++,unicode,unicode-escapes,unicode-literals,Visual C++,Unicode,Unicode Escapes,Unicode Literals,考虑以下代码： #include <string> #include <fstream> #include <iomanip> int main() { std::string s = "\xe2\x82\xac\u20ac"; std::ofstream out("test.txt"); out << s.length() << ":" << s << std::endl; o

考虑以下代码：

#include <string>
#include <fstream>
#include <iomanip>

int main() {
    std::string s = "\xe2\x82\xac\u20ac";
    std::ofstream out("test.txt");
    out << s.length() << ":" << s << std::endl;
    out << std::endl;
    out.close();
}

Windows下的Visual C++ 2013，包含：

6:€€

4:€\x80

（我所说的“\x80”是指单个8位字符0x80）

我完全无法让任何一个编译器使用

std:：wstring

输出

€

字符

两个问题：

Microsoft编译器认为它对
```
char*
```
literal做了什么？很明显，它在做一些编码，但什么还不清楚
使用
```
std:：wstring
```
和
```
std:：wofstream
```
重写上述代码的正确方法是什么，以便它输出两个
```
€
```
字符

这是因为您使用的是ASCII字符串中的Unicode字符文本

\u20ac

MSVC将

“\xe2\x82\xac\u20ac”

编码为

0xe2、0x82、0xac、0x80、

，即4个窄字符。它基本上将

\u20ac

编码为0x80，因为它将欧元字符映射到标准

GCC正在将Unicode文本

/u20ac

转换为3字节UTF-8序列

0xe2、0x82、0xac

，因此生成的字符串最终为

0xe2、0x82、0xac、0xe2、0x82、0xac

如果使用

std:：wstring=L“\xe2\x82\xac\u20ac”

则MSVC将其编码为

0xe2、0x00、0x82、0x00、0xac、0x00、0xac、0x20

，这是4个宽字符，但由于您将手动创建的UTF-8与UTF-16混合，因此生成的字符串没有多大意义。如果使用

std:：wstring=L“\u20ac\u20ac”

则会像预期的那样在宽字符串中获得2个Unicode字符

下一个问题是MSVC的ofstream和wofstream总是用ANSI/ASCII编写。要使用UTF-8编写，您应该使用

（VS 2010或更高版本）：

#包括
#包括
#包括
#包括
int main（）
{
std:：wstring s=L“\u20ac\u20ac”；
std:：wofstream out（“test.txt”）；
std:：locale loc（std:：locale:：classic（），新std:：codevt_utf8）；
向外注入（loc）；
out L“\x20ac\x20ac”Windows上8位字符串的编码是环境8位代码页，在美国是1252。您使用的是utf8。（您还将输出文件解释为utf8而不是1252。）一个公平点-它包含此在Windows上，根据Notepad++的编码设置为UTF-8.Hmmm，systeminfo将系统和输入局部变量都设置为“en gb；English（英国）”，考虑是否为UTF-8区域设置，但没有说明。不存在UTF-8区域设置。代码页65001（UTF-8）不能是活动代码页。那么“en_gb.utf8”是什么？谢谢你的回答。我认为GCC不支持std:：codecvt_utf8对吗？小更正：“它将\u20ac编码为0x80，因为Unicode字符U+20AC在代码页1252（）的位置80。”@Raymond-非常好。谢谢你的澄清！我会修复它。
#include <string>
#include <fstream>
#include <iomanip>
#include <codecvt>

int main()
{
    std::wstring s = L"\u20ac\u20ac";

    std::wofstream out("test.txt");
    std::locale loc(std::locale::classic(), new std::codecvt_utf8<wchar_t>);
    out.imbue(loc);

    out << s.length() << L":" << s << std::endl;
    out << std::endl;
    out.close();
}

#include <string>
#include <fstream>
#include <iomanip>
#include <codecvt>

int main()
{
    std::wstring s = L"\u20ac\u20ac";

    std::wofstream out("test.txt", std::ios::binary );
    std::locale loc(std::locale::classic(), new std::codecvt_utf16<wchar_t, 0x10ffff, std::little_endian>);
    out.imbue(loc);

    out << s.length() << L":" << s << L"\r\n";
    out << L"\r\n";
    out.close();
}