C++ C++；将UTF-8字符串转换为ICU'；纵梁_C++_String_Utf 8_Icu

C++ C++；将UTF-8字符串转换为ICU'；纵梁

c++ string utf-8

C++ C++；将UTF-8字符串转换为ICU'；纵梁,c++,string,utf-8,icu,C++,String,Utf 8,Icu,第一次在这里发帖，如果我的标题/格式/标签不符合要求，请提前道歉我试图在C++ Windows控制台应用程序中创建一个函数，它将从 STD::WScING/用户输入中删除diaRICTICE。为此，我将使用在帮助下创建的代码，并将我的wstring转换为UTF-8字符串，如下所示： std::string test= wstring_to_utf8 (input); std::string wstring_to_utf8 (const std::wstring& str){ std

第一次在这里发帖，如果我的标题/格式/标签不符合要求，请提前道歉

<>我试图在C++ Windows控制台应用程序中创建一个函数，它将从<代码> STD::WScING/<代码>用户输入中删除diaRICTICE。为此，我将使用在帮助下创建的代码，并将我的wstring转换为UTF-8字符串，如下所示：

std::string test= wstring_to_utf8 (input);

std::string wstring_to_utf8 (const std::wstring& str){
 std::wstring_convert<std::codecvt_utf8<wchar_t>> myconv;
 return myconv.to_bytes(str);
}

std::string output= desaxUTF8(test);

@努斯库勒（嗨！）当然是正确的。在任何情况下，都可以尝试在

UnicodeString

和

std:：wstring

iff

std:：wstring

之间进行转换。（未测试）

std:：wstring剂量测量（const std:：wstring&input）{
#如果（sizeof（wchar）！=sizeof（UChar））
#错误不知道wchar\t实际上是什么（通常未指定）。
#否则
//源是输入数据的只读别名
常量解析源（FALSE，input.data（），input.length（））；
//对数据做点什么
UnicodeString target=SOME_ACTUAL_FUNCTION（source）；//您在混合中直接使用StringPiece试图实现什么？UnicodeString u=UnicodeString:：fromUTF8（str）假设str是包含有效UTF-8的std:：string，应该可以正常工作。我尝试了您推荐的方法，但它会产生相同的错误行为。尽管UnicodeString u=UnicodeString:：fromUTF8（“abcš”）确实有效，因此StringPiece似乎真的没有必要。但是，它并没有解决我的问题，因为它仍然没有在Unicode解构中使用正确的字符串str值。我想在这一点上，我们知道数据来自wstring_到_utf8（）一定是坏的。你的std:：wstring输入中有什么？codevt_utf8是用于UTF-8到UTF-32的。因为你在Windows上，我猜你的std:：wstring中有UTF-16数据，你需要codevt_utf8_utf16。即使我跳过整个wstring部分，只给一个普通的std:：字符串赋值，它仍然不会为wa传输该值rd到fromUTF8（）或StringPiece。例如，std:：string test（“abc”）；UnicodeString source=UnicodeString:：fromUTF8（test）
不起作用。我单独回答。在windows上，wchar\u t

应该是16位UTF-16代码单元。非常感谢！经过一些小的调整后，这对我起了作用，还有一个额外的好处，就是在wstring！

#include <unicode/utypes.h>
#include <unicode/unistr.h>
#include <unicode/ustream.h>
#include <unicode/translit.h>
#include <unicode/stringpiece.h>

std::string desaxUTF8(const std::string& str) {

StringPiece s(str);
UnicodeString source = UnicodeString::fromUTF8(s);
//...
return result;
}

std::wstring Menu::removeDiacritis(const std::wstring &input) {

UnicodeString source(FALSE, input.data(), input.length());
UErrorCode status = U_ZERO_ERROR;
    Transliterator *accentsConverter = Transliterator::createInstance(
    "NFD; [:M:] Remove; NFC", UTRANS_FORWARD, status);
accentsConverter->transliterate(source);

std::wstring output(source.getBuffer(), source.length());
return output;
}

std::wstring doSomething(const std::wstring &input) {

#if(sizeof(wchar_t) != sizeof(UChar))
#error no idea what (typically underspecified) wchar_t actually is.
#else

// source is a read-only alias to the input data
const UnicodeString source(FALSE, input.data(), input.length());

// DO SOMETHING with the data
UnicodeString target = SOME_ACTUAL_FUNCTION(source); // <<<< Put your actual code here

// construct an output wstring 
std::wstring output(target.getBuffer(), target.length());

// return it
return output;
#endif
}