C# 将表情符号转换为特定的字符串表示形式_C#_.net_Unicode

C# 将表情符号转换为特定的字符串表示形式

c# .net unicode

C# 将表情符号转换为特定的字符串表示形式,c#,.net,unicode,C#,.net,Unicode,目前，我正在使用一个名为Emoji的元组哈希集将Emoji替换为字符串表示，例如，bomb的Emoji变成U0001F4A3。转换是通过 Emoji.Aggregate(input, (current, pair) => current.Replace(pair.Item1, pair.Item2)); 一切正常然而，我试图在不使用2600多个项目的预定义列表的情况下实现同样的目标。是否有人已经实现了这样一个目标：字符串中的表情符号被替换为对应的表情符号，而不使用前导\ 例如： "Th

目前，我正在使用一个名为Emoji的元组哈希集将Emoji替换为字符串表示，例如，bomb的Emoji变成

U0001F4A3

。转换是通过

Emoji.Aggregate(input, (current, pair) => current.Replace(pair.Item1, pair.Item2));

一切正常

然而，我试图在不使用2600多个项目的预定义列表的情况下实现同样的目标。是否有人已经实现了这样一个目标：字符串中的表情符号被替换为对应的表情符号，而不使用前导\

例如：

"This string contains the unicode character bomb (It sounds like you're happy to replace any character not in the basic multi-lingual plane with its hex representation. The code to do that is slightly longwinded, but it's pretty simple:

using System;
using System.Text;

class Test
{
    static void Main()
    {
        string text = "This string contains the unicode character bomb (\U0001F4A3)";
        Console.WriteLine(ReplaceNonBmpWithHex(text));
    }

    static string ReplaceNonBmpWithHex(string input)
    {
        // TODO: If most string don't have any non-BMP characters, consider
        // an optimization of checking for high/low surrogate characters first,
        // and return input if there aren't any.
        StringBuilder builder = new StringBuilder(input.Length);
        for (int i = 0; i < input.Length; i++)
        {
            char c = input[i];
            // A surrogate pair is a high surrogate followed by a low surrogate
            if (char.IsHighSurrogate(c))
            {
                if (i == input.Length -1)
                {
                    throw new ArgumentException($"High surrogate at end of string");
                }
                // Fetch the low surrogate, advancing our counter
                i++;
                char d = input[i];
                if (!char.IsLowSurrogate(d))
                {
                    throw new ArgumentException($"Unmatched low surrogate at index {i-1}");
                }
                uint highTranslated = (uint) ((c - 0xd800) * 0x400);
                uint lowTranslated = (uint) (d - 0xdc00);
                uint utf32 = (uint) (highTranslated + lowTranslated + 0x10000);
                builder.AppendFormat("U{0:X8}", utf32);
            }
            // We should never see a low surrogate on its own
            else if (char.IsLowSurrogate(c))
            {
                throw new ArgumentException($"Unmatched low surrogate at index {i}");
            }
            // Most common case: BMP character; just append it.
            else
            {
                builder.Append(c);
            }
        }
        return builder.ToString();
    }
}

“此字符串包含unicode字符炸弹（听起来您很乐意用十六进制表示法替换任何不在中的字符。执行此操作的代码有点冗长，但非常简单：
使用系统；
使用系统文本；
课堂测试
{
静态void Main（）
{
string text=“此字符串包含unicode字符炸弹（\U0001F4A3）”；
Console.WriteLine（replaceNonPWithHex（text））；
}
静态字符串替换NonPWithHex（字符串输入）
{
//todo:如果大多数字符串没有任何非BMP字符，请考虑
//首先检查高/低代理字符的优化，
//如果没有输入，则返回输入。
StringBuilder=新的StringBuilder（输入.Length）；
for（int i=0；i

请注意，根据Yury的回答，这不会试图处理多个字符一起使用的情况。它会将每个修饰符/emoji/secondary char替换为一个单独的UXXXXXXXX部分。
听起来您很乐意用十六进制表示法替换任何不在中的字符。这样做的代码有点冗长d、 但这很简单：
使用系统；
使用系统文本；
课堂测试
{
静态void Main（）
{
string text=“此字符串包含unicode字符炸弹（\U0001F4A3）”；
Console.WriteLine（replaceNonPWithHex（text））；
}
静态字符串替换NonPWithHex（字符串输入）
{
//todo:如果大多数字符串没有任何非BMP字符，请考虑
//首先检查高/低代理字符的优化，
//如果没有输入，则返回输入。
StringBuilder=新的StringBuilder（输入.Length）；
for（int i=0；i

请注意，根据Yury的回答，这不会试图处理多个字符一起使用的情况。它会将每个修饰符/emoji/次要字符替换为单独的UXXXXXXXX部分。
恐怕您在这里有一个错误的假设。emoji不仅仅是一个“特殊的Unicode字符”“。特定表情符号的实际长度可以是一行4个或更多字符。例如：

表情符号本身
零宽度接头
次要字符（如刻度盖或麦克风）
性别修饰语（男性或女性）
肤色调节器（菲茨帕特里克比例）

所以，你应该考虑到可变长度
示例：



恐怕你有一个错误的假设。表情符号不仅仅是一个“特殊的Unicode字符”。特定表情符号的实际长度可以是一行4个或更多字符。例如：

表情符号本身
零宽度接头
次字符（如gr