C# 需要一个反序列化一百万字符串的快速方法&；c语言中的Guids#_C#_Performance_Deserialization

C# 需要一个反序列化一百万字符串的快速方法&；c语言中的Guids#

c# performance

C# 需要一个反序列化一百万字符串的快速方法&；c语言中的Guids#,c#,performance,deserialization,C#,Performance,Deserialization,我想为性能关键型应用程序反序列化100万对（字符串、Guid）的列表。格式可以是我选择的任何格式，并且序列化没有相同的性能要求什么样的方法是最好的？文本还是二进制？连续写入每一对（字符串、guid），或者写入所有字符串，后跟所有guid 我开始使用LinqPad（只是反序列化字符串的一个简单示例），发现使用TextReader和ReadLine（）比使用BinaryReader和ReadString（）要快一点。（文件系统缓存在捉弄我吗？） public字符串[]反序列化二进制（） { var

我想为性能关键型应用程序反序列化100万对（字符串、Guid）的列表。格式可以是我选择的任何格式，并且序列化没有相同的性能要求

什么样的方法是最好的？文本还是二进制？连续写入每一对（字符串、guid），或者写入所有字符串，后跟所有guid

我开始使用LinqPad（只是反序列化字符串的一个简单示例），发现使用

TextReader

和

ReadLine（）

比使用

BinaryReader

和

ReadString（）

要快一点。（文件系统缓存在捉弄我吗？）

public字符串[]反序列化二进制（）
{
var tmr=System.Diagnostics.Stopwatch.StartNew（）；
长ms=0；
字符串[]arr=null；
使用（var rdr=newbinaryreader（newfilestream（file，FileMode.Open，FileAccess.Read）））
{
var num=rdr.ReadInt32（）；
arr=新字符串[num]；
for（int i=0；i



一些编辑：

我使用RamMap来清除文件系统缓存，结果发现文本和二进制读取器只对字符串有很小的区别
我有一个相当简单的类，它包含字符串和guid。它还保存一个int索引，该索引对应于它在列表中的位置。显然，没有必要在序列化中包含这一点
在交替（二进制）反序列化字符串和guid的测试中，我得到了大约500毫秒
理想的时间是50毫秒，或者尽可能接近。然而，一个简单的实验表明，从速度相当快的SSD驱动器将（压缩的）文件读入内存至少需要120毫秒，根本不需要任何解析。所以50毫秒似乎不太可能
我们的弦没有理论上的长度限制。但是，我们可以假设性能目标仅适用于所有20个字符或更少的情况
计时包括打开文件



读取字符串现在是一个明显的瓶颈（因此我的实验只是序列化字符串）。在我预先分配一个16字节的数组来读取guid之前，JIT_NewFast花费了30%。
使用StreamReader
比使用BinaryReader
读取一堆字符串要快，这并不奇怪StreamReader
从底层流中读取块，并解析该缓冲区中的字符串BinaryReader
没有这样的缓冲区。它从底层流中读取字符串长度，然后读取那么多字符。因此，BinaryReader
对基流的Read
方法进行更多调用
但是反序列化（字符串，Guid）
对不仅仅是读取，还有更多的内容。您还必须解析Guid。如果您用二进制编写文件，那么Guid
将用二进制编写，这使得创建Guid
结构变得更容易、更快。如果是字符串，则在将行拆分为两个字段后，必须调用newguid（string）
来解析文本并创建Guid

很难说哪一个会更快
我无法想象我们在这里谈论了很多时间。当然，读取一个有一百万行的文件需要大约一秒钟的时间。除非绳子真的很长。如果计算分隔符，GUID只有36个字符，对吗
使用BinaryWriter
，您可以这样编写文件：
writer.Write(count); // integer number of records
foreach (var pair in pairs)
{
    writer.Write(pair.theString);
    writer.Write(pair.theGuid.ToByteArray());
}

要阅读它，您必须：
count = reader.ReadInt32();
byte[] guidBytes = new byte[16];
for (int i = 0; i < count; ++i)
{
    string s = reader.ReadString();
    reader.Read(guidBytes, 0, guidBytes.Length);
    pairs.Add(new Pair(s, new Guid(guidBytes));
}

阅读只是稍微有点复杂：
private Dictionary<string, Guid> ReadGuidFile(string filename)
{
    using (var fs = File.OpenRead(filename))
    {
        using (var reader = new BinaryReader(fs, Encoding.UTF8))
        {
            // read the count
            int count = reader.ReadInt32();

            // The guids are in a huge byte array sized 16*count
            byte[] guidsBuffer = new byte[16*count];
            reader.Read(guidsBuffer, 0, guidsBuffer.Length);

            // Strings are all concatenated into one
            var bigString = reader.ReadString();

            // Index is an array of int. We can read it as an array of
            // ((count+1) * 4) bytes.
            byte[] indexBuffer = new byte[4*(count+1)];
            reader.Read(indexBuffer, 0, indexBuffer.Length);

            var guids = new Dictionary<string, Guid>(count);
            byte[] guidBytes = new byte[16];
            int startix = 0;
            int endix = 0;
            for (int i = 0; i < count; ++i)
            {
                endix = BitConverter.ToInt32(indexBuffer, 4*(i+1));
                string key = bigString.Substring(startix, endix - startix);
                Buffer.BlockCopy(guidsBuffer, (i*16),
                                    guidBytes, 0, 16);
                guids.Add(key, new Guid(guidBytes));
                startix = endix;
            }
            return guids;
        }
    }
}

private Dictionary ReadGuidFile（字符串文件名）
{
使用（var fs=File.OpenRead（文件名））
{
使用（var reader=newbinaryreader（fs，Encoding.UTF8））
{
//读伯爵
int count=reader.ReadInt32（）；
//guid位于一个大字节数组中，大小为16*count
字节[]guidsBuffer=新字节[16*计数]；
reader.Read（guidsBuffer，0，guidsBuffer.Length）；
//所有字符串都连接成一个字符串
var bigString=reader.ReadString（）；
//索引是一个int数组。我们可以将其作为
//（（计数+1）*4）字节。
字节[]indexBuffer=新字节[4*（计数+1）]；
reader.Read（indexBuffer，0，indexBuffer.Length）；
var guids=新字典（计数）；
字节[]guidBytes=新字节[16]；
int startix=0；
int-endix=0；
对于（int i=0；i

这里有几个注释。首先，我使用位转换器
将字节数组中的数据转换为整数。使用不安全代码并使用int32*
索引到数组中会更快
你可以通过usi获得一些速度
private void WriteGuidFile(string filename, Dictionary<string, Guid>guids)
{
    using (var fs = File.Create(filename))
    {
        using (var writer = new BinaryWriter(fs, Encoding.UTF8))
        {
            List<int> stringIndex = new List<int>(guids.Count);
            StringBuilder bigString = new StringBuilder();

            // write count
            writer.Write(guids.Count);

            // Write the GUIDs and build the string index
            foreach (var pair in guids)
            {
                writer.Write(pair.Value.ToByteArray(), 0, 16);
                stringIndex.Add(bigString.Length);
                bigString.Append(pair.Key);
            }
            // Add one more entry to the string index.
            // makes deserializing easier
            stringIndex.Add(bigString.Length);

            // Write the string that contains all of the strings, combined
            writer.Write(bigString.ToString());

            // write the index
            foreach (var ix in stringIndex)
            {
                writer.Write(ix);
            }
        }
    }
}

private Dictionary<string, Guid> ReadGuidFile(string filename)
{
    using (var fs = File.OpenRead(filename))
    {
        using (var reader = new BinaryReader(fs, Encoding.UTF8))
        {
            // read the count
            int count = reader.ReadInt32();

            // The guids are in a huge byte array sized 16*count
            byte[] guidsBuffer = new byte[16*count];
            reader.Read(guidsBuffer, 0, guidsBuffer.Length);

            // Strings are all concatenated into one
            var bigString = reader.ReadString();

            // Index is an array of int. We can read it as an array of
            // ((count+1) * 4) bytes.
            byte[] indexBuffer = new byte[4*(count+1)];
            reader.Read(indexBuffer, 0, indexBuffer.Length);

            var guids = new Dictionary<string, Guid>(count);
            byte[] guidBytes = new byte[16];
            int startix = 0;
            int endix = 0;
            for (int i = 0; i < count; ++i)
            {
                endix = BitConverter.ToInt32(indexBuffer, 4*(i+1));
                string key = bigString.Substring(startix, endix - startix);
                Buffer.BlockCopy(guidsBuffer, (i*16),
                                    guidBytes, 0, 16);
                guids.Add(key, new Guid(guidBytes));
                startix = endix;
            }
            return guids;
        }
    }
}