如何在不手动指定编码的情况下获得C#中字符串的一致字节表示？_C#_.net_String_Character Encoding

如何在不手动指定编码的情况下获得C#中字符串的一致字节表示？

c# .net string character-encoding

如何在不手动指定编码的情况下获得C#中字符串的一致字节表示？,c#,.net,string,character-encoding,C#,.net,String,Character Encoding,如何在不手动指定特定编码的情况下，将.NET（C#）中的字符串转换为字节[] 我要加密这个字符串。我可以在不转换的情况下对其进行加密，但我仍然想知道为什么在这里使用编码还有，为什么还要考虑编码呢？我不能简单地获取字符串存储在哪个字节中吗？为什么依赖于字符编码？它取决于字符串的编码（，…） byte[] strToByteArray(string str) { System.Text.ASCIIEncoding enc = new System.Text.ASCIIEncoding();

如何在不手动指定特定编码的情况下，将.NET（C#）中的

字符串

转换为

字节[]

我要加密这个字符串。我可以在不转换的情况下对其进行加密，但我仍然想知道为什么在这里使用编码

还有，为什么还要考虑编码呢？我不能简单地获取字符串存储在哪个字节中吗？为什么依赖于字符编码？

它取决于字符串的编码（，…）

byte[] strToByteArray(string str)
{
    System.Text.ASCIIEncoding enc = new System.Text.ASCIIEncoding();
    return enc.GetBytes(str);
}

例如：

byte[] b1 = System.Text.Encoding.UTF8.GetBytes (myString);
byte[] b2 = System.Text.Encoding.ASCII.GetBytes (myString);

为什么编码很重要的一个小示例：

string pi = "\u03a0";
byte[] ascii = System.Text.Encoding.ASCII.GetBytes (pi);
byte[] utf8 = System.Text.Encoding.UTF8.GetBytes (pi);

Console.WriteLine (ascii.Length); //Will print 1
Console.WriteLine (utf8.Length); //Will print 2
Console.WriteLine (System.Text.Encoding.ASCII.GetString (ascii)); //Will print '?'

ASCII根本不具备处理特殊字符的能力

在内部，.NET framework使用来表示字符串，因此如果您只想获得.NET使用的确切字节，请使用

System.Text.Encoding.Unicode.GetBytes（…）

请参阅（MSDN）以了解更多信息。

您需要考虑编码，因为1个字符可以由1个或更多的字节（最多约6个）表示，不同的编码将以不同的方式处理这些字节
Joel在这上面有个帖子：

关键问题是字符串中的标志符号需要32位（字符代码为16位），但一个字节只有8位可用。一对一映射不存在，除非您将自己限制为只包含ASCII字符的字符串。System.Text.Encoding有很多方法可以将字符串映射到字节[]，您需要选择一种可以避免信息丢失的方法，并且当您的客户需要将字节[]映射回字符串时，可以方便地使用这种方法

Utf8是一种流行的编码方式，它紧凑且无损耗。
我不确定，但我认为字符串将其信息存储为字符数组，这对字节效率很低。具体来说，字符的定义是“表示Unicode字符”
以这个示例为例：

String str = "asdf éß"; String str2 = "asdf gh"; EncodingInfo[] info = Encoding.GetEncodings(); foreach (EncodingInfo enc in info) { System.Console.WriteLine(enc.Name + " - " + enc.GetEncoding().GetByteCount(str) + enc.GetEncoding().GetByteCount(str2)); }
请注意，在这两种情况下，Unicode答案都是14个字节，而UTF-8答案第一个只有9个字节，第二个只有7个字节

因此，如果您只需要字符串所使用的字节，只需使用
Encoding.Unicode
，但这会降低存储空间的效率。
其他人已经回答了您问题的第一部分（如何获取字节）：查看
System.Text.Encoding
命名空间
我将回答您的后续问题：为什么需要选择编码？为什么不能从string类本身得到呢
答案分为两部分
首先，string类内部使用的字节并不重要，只要你认为它们是重要的，你就很可能会引入一个bug
如果您的程序完全在.Net世界中，那么您根本不需要担心获取字符串的字节数组，即使您正在通过网络发送数据。相反，使用.Net序列化来担心数据的传输。您不再担心实际的字节：序列化格式化程序会为您做这件事
另一方面，如果将这些字节发送到某个无法保证将从.Net序列化流中提取数据的位置，该怎么办？在这种情况下，您确实需要担心编码，因为显然这个外部系统关心编码。因此，字符串使用的内部字节也无关紧要：您需要选择一种编码，以便能够在接收端明确说明这种编码，即使它与.Net内部使用的编码相同
我理解，在这种情况下，您可能更愿意在可能的情况下使用string变量存储在内存中的实际字节，这样可以节省创建字节流的一些工作。然而，我要告诉你们的是，与确保输出在另一端被理解，以及保证编码必须明确相比，这并不重要。此外，如果您真的想匹配内部字节，您可以选择
Unicode
编码，并获得性能节约
这就引出了第二部分。。。选择
Unicode
编码是告诉.Net使用底层字节。您确实需要选择这种编码，因为当一些新型的Unicode Plus问世时，.Net运行时需要免费使用这种更新、更好的编码模型，而不会破坏您的程序。但是，就目前而言（以及可以预见的未来），只要选择Unicode编码就可以满足您的需求
同样重要的是要理解字符串必须被重新写入wire，这至少涉及到位模式的一些转换，即使在使用匹配编码时也是如此。计算机需要考虑诸如大小端、网络字节顺序、分组、会话信息等因素。
BinaryFormatter bf=new BinaryFormatter（）； BinaryFormatter bf = new BinaryFormatter(); byte[] bytes; MemoryStream ms = new MemoryStream(); string orig = "喂 Hello 谢谢 Thank You"; bf.Serialize(ms, orig); ms.Seek(0, 0); bytes = ms.ToArray(); MessageBox.Show("Original bytes Length: " + bytes.Length.ToString()); MessageBox.Show("Original string Length: " + orig.Length.ToString()); for (int i = 0; i < bytes.Length; ++i) bytes[i] ^= 168; // pseudo encrypt for (int i = 0; i < bytes.Length; ++i) bytes[i] ^= 168; // pseudo decrypt BinaryFormatter bfx = new BinaryFormatter(); MemoryStream msx = new MemoryStream(); msx.Write(bytes, 0, bytes.Length); msx.Seek(0, 0); string sx = (string)bfx.Deserialize(msx); MessageBox.Show("Still intact :" + sx); MessageBox.Show("Deserialize string Length(still intact): " + sx.Length.ToString()); BinaryFormatter bfy = new BinaryFormatter(); MemoryStream msy = new MemoryStream(); bfy.Serialize(msy, sx); msy.Seek(0, 0); byte[] bytesy = msy.ToArray(); MessageBox.Show("Deserialize bytes Length(still intact): " + bytesy.Length.ToString()); 字节[]字节； MemoryStream ms=新的MemoryStream（）；字符串orig=”喂你好谢谢 “谢谢你”； bf.序列化（ms、orig）； Seek女士（0,0）；字节=ms.ToArray（）； Show（“原始字节长度：+bytes.Length.ToString（））； Show（“原始字符串长度：+orig.Length.ToString（））；对于（int i=0；i
另外，请解释为什么应该考虑编码。 public static byte[] GetBytes(string text) { return System.Text.ASCIIEncoding.UTF8.GetBytes(text); } Encoding.UTF8.GetBytes(text) public static byte[] StringToBytes(string str) { byte[] data = new byte[str.Length * 2]; for (int i = 0; i < str.Length; ++i) { char ch = str[i]; data[i * 2] = (byte)(ch & 0xFF); data[i * 2 + 1] = (byte)((ch & 0xFF00) >> 8); } return data; } public static string StringFromBytes(byte[] arr) { char[] ch = new char[arr.Length / 2]; for (int i = 0; i < ch.Length; ++i) { ch[i] = (char)((int)arr[i * 2] + (((int)arr[i * 2 + 1]) << 8)); } return new String(ch); } System.Text.Encoding.UTF8.GetBytes("TEST String"); bytes[] buffer = UnicodeEncoding.UTF8.GetBytes(string something); //for converting to UTF then get its bytes bytes[] buffer = ASCIIEncoding.ASCII.GetBytes(string something); //for converting to ascii then get its bytes const string data = "A string with international characters: Norwegian: ÆØÅæøå, Chinese: 喂谢谢"; var bytes = System.Text.Encoding.UTF8.GetBytes(data); var decoded = System.Text.Encoding.UTF8.GetString(bytes); static byte[] GetBytes(string str) { byte[] bytes = new byte[str.Length * sizeof(char)]; System.Buffer.BlockCopy(str.ToCharArray(), 0, bytes, 0, bytes.Length); return bytes; } // Do NOT use on arbitrary bytes; only use on GetBytes's output on the SAME system static string GetString(byte[] bytes) { char[] chars = new char[bytes.Length / sizeof(char)]; System.Buffer.BlockCopy(bytes, 0, chars, 0, bytes.Length); return new string(chars); } using System; class Program { static void Main(string[] args) { string t = "爱虫"; string s = "Test\ud800Test"; byte[] dumpToBytes = GetBytes(s); string getItBack = GetString(dumpToBytes); foreach (char item in getItBack) { Console.WriteLine("{0} {1}", item, ((ushort)item).ToString("x")); } } static byte[] GetBytes(string str) { byte[] bytes = new byte[str.Length * sizeof(char)]; System.Buffer.BlockCopy(str.ToCharArray(), 0, bytes, 0, bytes.Length); return bytes; } static string GetString(byte[] bytes) { char[] chars = new char[bytes.Length / sizeof(char)]; System.Buffer.BlockCopy(bytes, 0, chars, 0, bytes.Length); return new string(chars); } } T 54 e 65 s 73 t 74 ? d800 T 54 e 65 s 73 t 74 public static unsafe Byte[] GetBytes(String s) { Int32 length = s.Length * sizeof(Char); Byte[] bytes = new Byte[length]; fixed (Char* pInput = s) fixed (Byte* pBytes = bytes) { Byte* source = (Byte*)pInput; Byte* destination = pBytes; if (length >= 16) { do { *((Int64*)destination) = *((Int64*)source); *((Int64*)(destination + 8)) = *((Int64*)(source + 8)); source += 16; destination += 16; } while ((length -= 16) >= 16); } if (length > 0) { if ((length & 8) != 0) { *((Int64*)destination) = *((Int64*)source); source += 8; destination += 8; } if ((length & 4) != 0) { *((Int32*)destination) = *((Int32*)source); source += 4; destination += 4; } if ((length & 2) != 0) { *((Int16*)destination) = *((Int16*)source); source += 2; destination += 2; } if ((length & 1) != 0) { ++source; ++destination; destination[0] = source[0]; } } } return bytes; } [Second String: Length 20] Buffer.BlockCopy: 746ms Unsafe: 557ms [Second String: Length 50] Buffer.BlockCopy: 861ms Unsafe: 753ms [Third String: Length 100] Buffer.BlockCopy: 1250ms Unsafe: 1063ms public static unsafe class StringExtensions { public static Byte[] ToByteArray(this String s) { // Method Code } } public static byte[] StrToByteArray(string str) { System.Text.UTF8Encoding encoding=new System.Text.UTF8Encoding(); return encoding.GetBytes(str); } string s_unicode = "abcéabc"; byte[] utf8Bytes = System.Text.Encoding.UTF8.GetBytes(s_unicode); string text = "string"; byte[] array = System.Text.Encoding.UTF8.GetBytes(text); [0] = 115 [1] = 116 [2] = 114 [3] = 105 [4] = 110 [5] = 103 // using System.Text; Encoding.UTF8.GetBytes(".NET String to byte array") // using System.Text; var text = Encoding.ASCII.GetString(Encoding.ASCII.GetBytes("You win €100")); // -> "You win ?100" Encoding.Unicode.GetBytes(".NET String to byte array") ".NET String to byte array".ToCharArray() // using System.Runtime.InteropServices unsafe byte[] GetRawBytes(String s) { if (s == null) return null; var codeunitCount = s.Length; /* We know that String is a sequence of UTF-16 codeunits and such codeunits are 2 bytes */ var byteCount = codeunitCount * 2; var bytes = new byte[byteCount]; fixed(void* pRaw = s) { Marshal.Copy((IntPtr)pRaw, bytes, 0, byteCount); } return bytes; } string myString = //... some string System.Text.Encoding encoding = System.Text.Encoding.UTF8; //or some other, but prefer some UTF is Unicode is used byte[] bytes = encoding.GetBytes(myString); //next lines are written in response to a follow-up questions: myString = new string(encoding.GetChars(bytes)); byte[] bytes = encoding.GetBytes(myString); myString = new string(encoding.GetChars(bytes)); byte[] bytes = encoding.GetBytes(myString); //how many times shall I repeat it to show there is a round-trip? :-) string s = "Hello World"; // String to Byte[] byte[] byte1 = System.Text.Encoding.Default.GetBytes(s); // OR byte[] byte2 = System.Text.ASCIIEncoding.Default.GetBytes(s); // Byte[] to string string str = System.Text.Encoding.UTF8.GetString(byte1); [DllImport( "msvcrt.dll", EntryPoint = "memcpy", CallingConvention = CallingConvention.Cdecl, SetLastError = false)] private static extern unsafe void* UnsafeMemoryCopy( void* destination, void* source, uint count); public static byte[] GetUnderlyingBytes(string source) { var length = source.Length * sizeof(char); var result = new byte[length]; unsafe { fixed (char* firstSourceChar = source) fixed (byte* firstDestination = result) { var firstSource = (byte*)firstSourceChar; UnsafeMemoryCopy( firstDestination, firstSource, (uint)length); } } return result; } System.Text.Encoding.Unicode.GetBytes() System.Text.Encoding.Unicode.GetString() byte[] myByte= System.Text.ASCIIEncoding.Default.GetBytes(myString); "Can't I simply get what bytes the string has been stored in?" "Why is there a dependency on character encodings?" byte[] bytes = "rubbish_\u9999_string".AsSpan().AsBytes().ToArray(); string s; unsafe { fixed (char* f = &bytes.AsSpan().NonPortableCast<byte, char>().DangerousGetPinnableReference()) { s = new string(f); } } using System.Text; // ... Encoding.Unicode.GetBytes("abc"); # UTF-16 little endian Encoding.UTF8.GetBytes("abc")