无法从C++;字符串 我想读一个C++ STD::string,然后将它的STD::字符串传递给一个函数来分析它,然后从中提取Unicode符号和简单的ASCII符号。 我在网上搜索了很多教程,但是他们都提到标准C++没有完全支持Unicode格式。他们中的许多人提到使用ICU C++。
实际产量:无法从C++;字符串 我想读一个C++ STD::string,然后将它的STD::字符串传递给一个函数来分析它,然后从中提取Unicode符号和简单的ASCII符号。 我在网上搜索了很多教程,但是他们都提到标准C++没有完全支持Unicode格式。他们中的许多人提到使用ICU C++。,c++,c++11,unicode,icu,icu4c,C++,C++11,Unicode,Icu,Icu4c,实际产量: Hello? 请说明我做错了什么。还建议任何替代/更简单的方法 谢谢 更新1(旧版):工作代码如下: #include <iostream> #include <string> #include <locale> #include "unicode/unistr.h" void f(const std::string & s) { std::wcout << "Inside called function" <&l
Hello?
请说明我做错了什么。还建议任何替代/更简单的方法
谢谢
更新1(旧版):工作代码如下:
#include <iostream>
#include <string>
#include <locale>
#include "unicode/unistr.h"
void f(const std::string & s)
{
std::wcout << "Inside called function" << std::endl;
constexpr char locale_name[] = "";
setlocale( LC_ALL, locale_name );
std::locale::global(std::locale(locale_name));
std::ios_base::sync_with_stdio(false);
std::wcin.imbue(std::locale());
std::wcout.imbue(std::locale());
// at this point s contains a line of text which may be ANSI or UTF-8 encoded
// convert std::string to ICU's UnicodeString
icu::UnicodeString ucs = icu::UnicodeString::fromUTF8(icu::StringPiece(s.c_str()));
// convert UnicodeString to std::wstring
std::wstring ws;
for (int i = 0; i < ucs.length(); ++i)
ws += static_cast<wchar_t>(ucs[i]);
std::wcout << ws << std::endl;
}
int main()
{
constexpr char locale_name[] = "";
setlocale( LC_ALL, locale_name );
std::locale::global(std::locale(locale_name));
std::ios_base::sync_with_stdio(false);
std::wcin.imbue(std::locale());
std::wcout.imbue(std::locale());
std::wcout << "Inside main function" << std::endl;
std::string s=u8"hello☺";
// at this point s contains a line of text which may be ANSI or UTF-8 encoded
// convert std::string to ICU's UnicodeString
icu::UnicodeString ucs = icu::UnicodeString::fromUTF8(icu::StringPiece(s.c_str()));
// convert UnicodeString to std::wstring
std::wstring ws;
for (int i = 0; i < ucs.length(); ++i)
ws += static_cast<wchar_t>(ucs[i]);
std::wcout << ws << std::endl;
std::wcout << "--------------------------------" << std::endl;
f(s);
return 0;
}
更新2(最新):更新1中提到的代码不适用于UTF32符号,如要正确执行此操作,有许多障碍:
- 首先,您的文件(以及其中的笑脸)应编码为UTF-8。笑脸应该由文字字节组成
0xE2 0x98 0xBA
- 您应该使用
decorator:u8
u8“Hello”将字符串标记为包含UTF-8数据☺"代码>
- 接下来,
的文档说明它将Unicode存储为UTF-16。在这种情况下,您很幸运,因为U+263A适合一个UTF-16字符。其他表情符号可能不适合。您应该将其转换为UTF-32,或者非常小心地使用icu::UnicodeString
函数GetChar32At
- 最后,
使用的编码应配置为wcout
,以匹配您的环境所期望的编码。请参阅的答案imbue
wchar\u t
,这是完全不匹配的能够做到这一点。仅在您被迫使用时使用它,例如在与WinAPI接口时。其余使用实际的Unicode类型和UTF-8编码字符串。
#include <iostream>
#include <string>
#include <locale>
#include "unicode/unistr.h"
void f(const std::string & s)
{
std::wcout << "Inside called function" << std::endl;
constexpr char locale_name[] = "";
setlocale( LC_ALL, locale_name );
std::locale::global(std::locale(locale_name));
std::ios_base::sync_with_stdio(false);
std::wcin.imbue(std::locale());
std::wcout.imbue(std::locale());
// at this point s contains a line of text which may be ANSI or UTF-8 encoded
// convert std::string to ICU's UnicodeString
icu::UnicodeString ucs = icu::UnicodeString::fromUTF8(icu::StringPiece(s.c_str()));
// convert UnicodeString to std::wstring
std::wstring ws;
for (int i = 0; i < ucs.length(); ++i)
ws += static_cast<wchar_t>(ucs[i]);
std::wcout << ws << std::endl;
}
int main()
{
constexpr char locale_name[] = "";
setlocale( LC_ALL, locale_name );
std::locale::global(std::locale(locale_name));
std::ios_base::sync_with_stdio(false);
std::wcin.imbue(std::locale());
std::wcout.imbue(std::locale());
std::wcout << "Inside main function" << std::endl;
std::string s=u8"hello☺";
// at this point s contains a line of text which may be ANSI or UTF-8 encoded
// convert std::string to ICU's UnicodeString
icu::UnicodeString ucs = icu::UnicodeString::fromUTF8(icu::StringPiece(s.c_str()));
// convert UnicodeString to std::wstring
std::wstring ws;
for (int i = 0; i < ucs.length(); ++i)
ws += static_cast<wchar_t>(ucs[i]);
std::wcout << ws << std::endl;
std::wcout << "--------------------------------" << std::endl;
f(s);
return 0;
}
Inside main function
hello☺
--------------------------------
Inside called function
hello☺