C# 为欧洲字符返回一个以上字节的编码_C#_Encoding_Utf 8

C# 为欧洲字符返回一个以上字节的编码

c# encoding utf-8

C# 为欧洲字符返回一个以上字节的编码,c#,encoding,utf-8,C#,Encoding,Utf 8,如果我将以下字符串编码为UTF8：咖啡馆它以5字节而不是4字节的形式返回。如果可能的话，我希望它返回4个字节 Encoding encoding = Encoding.UTF8; string testString = "café"; Byte[] bytes = encoding.GetBytes(testString); 返回： [0] 99 [1] 97 [2] 102 [3] 195 [4] 169 而“cafe”只返回4个字节。您不能使用正常的编码方案您需要使用所需的代码页创

如果我将以下字符串编码为UTF8：

咖啡馆

它以5字节而不是4字节的形式返回。如果可能的话，我希望它返回4个字节

Encoding encoding = Encoding.UTF8;
string testString = "café";
Byte[] bytes = encoding.GetBytes(testString);

[0] 99
[1] 97
[2] 102
[3] 195
[4] 169

而“cafe”只返回4个字节。

您不能使用正常的编码方案

您需要使用所需的代码页创建自定义编码，如下所示：

Encoding encoding = Encoding.GetEncoding(437);
byte[] bytes = encoding.GetBytes("café");

输出：

{ 99, 97, 102, 130 }

他身高130英寸

假设您要对其进行解码，则需要使用相同的编码对其进行解码。否则会得到奇怪的结果。

é是Unicode U+00E9。Unicode字符U+0080到U+07FF在UTF8中占用两个字节。有关更多详细信息，请参阅

如果您只需要4个字节，那么就不能使用UTF8。理论上，可以使用单字节字符编码。

UTF-8中的字符可以使用1到6个字节。所以对于你的例子，“é”需要2个字节。

您可以在此处阅读有关UTF-8的更多信息：

最终将

UTF8

转换为

ISO8859-1

，它现在返回4个字节，而不是5个字节

Encoding utf8 = Encoding.UTF8;
string testString = "café";
byte[] utfBytes = utf8.GetBytes(testString); // 5 bytes

Encoding iso = Encoding.GetEncoding("ISO-8859-1");
byte[] isoBytes = iso.GetBytes(testString); // 4 bytes
byte[] convertedUtf8Bytes = Encoding.Convert(utf8, iso, utfBytes); // 4 bytes

string msg = iso.GetString(isoBytes);
string msgConverted = iso.GetString(convertedUtf8Bytes);

Console.WriteLine(msg);
Console.WriteLine(msgConverted);

产出：

咖啡馆

咖啡馆？你的期望是什么？您希望UTF8将“é”转换为“e”吗？