C++ 字符串到UTF-8的转换（C++；_C++_Utf 8

C++ 字符串到UTF-8的转换（C++；

c++ utf-8

C++ 字符串到UTF-8的转换（C++；,c++,utf-8,C++,Utf 8,我有一个字符串Test\xc2\xae用十六进制表示为0x54 0x65 0x73 0x74 0x5c 0x78 0x63 0x32 0x5c 0x78 0x61 0x65。此字符串中的字符集\xc2\xae只不过是®的UTF-8编码（注册商标）我想写一个C++函数，它可以将 \xC2< /C> >（在十六进制代码> 0x5c0x78 0x63 0x32 < /COD>）字符集设置为十六进制值 0xC2< /C> C++函数，可以转换为 xx2\xx0x70x70x740x5x0x6xx0x

我有一个字符串

Test\xc2\xae

用十六进制表示为

0x54 0x65 0x73 0x74 0x5c 0x78 0x63 0x32 0x5c 0x78 0x61 0x65

。此字符串中的字符集

\xc2\xae

只不过是®的UTF-8编码（注册商标）

我想写一个C++函数，它可以将<代码> \xC2< /C> >（在十六进制代码> 0x5c0x78 0x63 0x32 < /COD>）字符集设置为十六进制值<代码> 0xC2< /C>

C++函数，可以转换为<代码> xx2\xx0x70x70x740x5x0x6xx0x3x0x3x0x8xx0x61 0x65 < /代码>到<代码>测试®<代码> [<代码> 0x54 0x73x0x740xc2 0xAe< /代码>

< p>关于我理解您的问题，我认为您尝试转换每个<代码> \x？< /COD>序列（四字符），我想写一篇文章。其中，

？？

是两个十六进制数字的序列，指向一个唯一的字符，该字符的值以十六进制表示

如果您不必使用专用于此的大型库，也许这个简单的算法就可以做到这一点

/**
  g++ -std=c++17 -o prog_cpp prog_cpp.cpp \
      -pedantic -Wall -Wextra -Wconversion -Wno-sign-conversion \
      -g -O0 -UNDEBUG -fsanitize=address,undefined
**/

#include <iostream>
#include <string>
#include <cctype>

std::string
convert_backslash_x(const std::string &str)
{
  auto result=std::string{};
  for(auto start=std::string::size_type{0};;)
  {
    const auto pos=str.find("\\x", start);
    if((pos==str.npos)||  // not found
       (pos+4>size(str))) // too near from the end
    {
      // keep the remaining of the string
      result.append(str, start);
      break;
    }
    // keep everything until this position
    result.append(str, start, pos-start);
    const auto c1=std::tolower(str[pos+2]), c2=std::tolower(str[pos+3]);
    if(std::isxdigit(c1)&&std::isxdigit(c2))
    {
      // convert two hex digits to a char with this value
      const auto h1=std::isalpha(c1) ? 10+(c1-'a') : (c1-'0');
      const auto h2=std::isalpha(c2) ? 10+(c2-'a') : (c2-'0');
      result+=char(h1*16+h2);
      // go on after this \x?? sequence
      start=pos+4; 
    }
    else
    {
      // keep this incomplete \x sequence as is
      result+="\\x";
      // go on after this \x sequence
      start=pos+2;
    }
  }
  return result;
}

int
main()
{
  for(const auto &s: {"Test\\xc2\\xae",
                      "Test\\xc2\\xae Test\\xc2\\xae",
                      "Test\\xc2\\xa",
                      "Test\\x\\xc2\\xa"})
  {
    std::cout << '(' << s << ") --> (" << convert_backslash_x(s) << ")\n";
  }
  return 0;
}

/**
g++-std=c++17-o prog_cpp prog_cpp.cpp\
-pedantic-Wall-Wextra-Wconversion-Wno符号转换\
-g-O0-UNDEBUG-fsanizize=地址，未定义
**/
#包括
#包括
#包括
字符串
转换反斜杠（常量std:：string和str）
{
自动结果=标准：：字符串{}；
对于（auto start=std:：string:：size_type{0}；；）
{
const auto pos=str.find（\\x），start）；
if（（pos==str.npos）| |//未找到
（pos+4>大小（str））//离末端太近
{
//保留字符串的剩余部分
结果.追加（str，start）；
打破
}
//保持一切直到这个位置
结果.追加（str，start，pos start）；
const auto c1=std:：tolower（str[pos+2]），c2=std:：tolower（str[pos+3]）；
if（std:：isxdigit（c1）和&std:：isxdigit（c2））
{
//将两个十六进制数字转换为具有此值的字符
const auto h1=std:：isalpha（c1）？10+（c1-'a'）：（c1-'0'）；
常数自动h2=std:：isalpha（c2）？10+（c2-'a'）：（c2-'0'）；
结果+=字符（h1*16+h2）；
//继续执行此\x？？序列
开始=位置+4；
}
其他的
{
//保持此不完整\x序列不变
结果+=“\\x”；
//继续执行此\x序列
开始=位置+2；
}
}
返回结果；
}
int
main（）
{
对于（const auto&s:{“Test\\xc2\\xae”，
“Test\\xc2\\xae Test\\xc2\\xae”，
“测试\\xc2\\xa”，
“测试\\x\\xc2\\xa”}）
{
std：：我只是想确定一下，目标字体是否包含所需的字符标志符号？我不清楚您是要使用初始字符串的二进制表示形式，还是从文本字符串开始。我不知道如何将以下Python代码重写为C++:'Test\\xc2\\xae'。编码（'utf-8'）。解码（'unicode escape'）.encode（'latin1'）.decode（'utf-8'）
（返回'Test®“
）@beparas它的start=0
，类型为start
。我们本来可以；{}
应该是一样的。@beparas这个循环没有条件，因为它不简洁，无法表达（请参见break；
）没有增量指令，因为有两种可能性（pos+4
或pos+2
）取决于此循环体中发生的情况；start
被设置为此循环的本地，因为我们在外部不需要它。再次感谢您的快速响应，在编译时，我收到了错误：error:调用'std:：\uuucx11:：basic\u string:：append（const string&，long unsigned int&）时没有匹配的函数'result.append（str，start）；
@beparas您使用哪种编译器（和版本）？可能是g++v7，让我检查一下，我想在iMX CPU上运行此程序。交叉编译程序时出现上述错误。注意：错误为std:：u cxx11:：…