C++ 有没有更好的方法来处理缓冲区中的不完整数据并进行读取？_C++_File Io

C++ 有没有更好的方法来处理缓冲区中的不完整数据并进行读取？

c++ file-io

C++ 有没有更好的方法来处理缓冲区中的不完整数据并进行读取？,c++,file-io,C++,File Io,我正在处理一个由事件组成的二进制文件。每个事件可以具有可变的长度。由于我的读取缓冲区是固定大小的，因此我按如下方式处理： const int bufferSize = 0x500000; const int readSize = 0x400000; const int eventLengthMask = 0x7FFE0000; const int eventLengthShift = 17; const int headerLengthMask = 0x1F000; const int head

我正在处理一个由事件组成的二进制文件。每个事件可以具有可变的长度。由于我的读取缓冲区是固定大小的，因此我按如下方式处理：

const int bufferSize = 0x500000;
const int readSize = 0x400000;
const int eventLengthMask = 0x7FFE0000;
const int eventLengthShift = 17;
const int headerLengthMask = 0x1F000;
const int headerLengthShift = 12;
const int slotMask = 0xF0;
const int slotShift = 4;
const int channelMask = 0xF;
...
//allocate the buffer we allocate 5 MB even though we read in 4MB chunks
//to deal with unprocessed data from the end of a read
char* allocBuff = new char[bufferSize]; //inFile reads data into here
unsigned int* buff = reinterpret_cast<unsigned int*>(allocBuff); //data is interpretted from here
inFile.open(fileName.c_str(),ios_base::in | ios_base::binary);
int startPos = 0;
while(!inFile.eof())
{
    int index = 0;
    inFile.read(&(allocBuff[startPos]), readSize);
    int size = ((readSize + startPos)>>2);
    //loop to process the buffer
    while (index<size)
    {
        unsigned int data = buff[index];
        int eventLength = ((data&eventLengthMask)>>eventLengthShift);
        int headerLength = ((data&headerLengthMask)>>headerLengthShift);
        int slot = ((data&slotMask)>>slotShift);
        int channel = data&channelMask;
        //now check if the full event is in the buffer
        if( (index+eventLength) > size )
        {//the full event is not in the buffer
            break;
        }
        ++index;
        //further processing of the event
    }

    //move the data at the end of the buffer to the beginning and set start position
    //for the next read
    for(int i = index; i<size; ++i)
    {
        buff[i-index] = buff[i];
    }
    startPos = ((size-index)<<2);
}

const int bufferSize=0x500000；
常数int readSize=0x400000；
const int eventLengthMask=0x7FFE0000；
const int eventLengthShift=17；
const int headerLengthMask=0x1F000；
const int headerLengthShift=12；
const int slotMask=0xF0；
常数int slotShift=4；
const int channelMask=0xF；
...
//分配我们分配5 MB的缓冲区，即使我们读取4MB的数据块
//处理读取结束时未处理的数据
char*allocBuff=新字符[bufferSize]//infle在这里读取数据
unsigned int*buff=重新解释强制转换（allocBuff）//从这里解释数据
open（fileName.c_str（），ios_base:：in | ios_base:：binary）；
int startPos=0；
而（！infle.eof（））
{
int指数=0；
infie.read（&（allocBuff[startPos]），readSize；
int size=（（readSize+startPos）>>2）；
//循环以处理缓冲区
while（索引>事件长度移位）；
int headerLength=（（数据和headerLengthMask）>>headerLengthMask）；
int插槽=（（数据和插槽任务）>>插槽移位）；
int通道=数据和通道掩码；
//现在检查完整事件是否在缓冲区中
如果（（索引+事件长度）>大小）
{//缓冲区中没有完整的事件
打破
}
++指数；
//事件的进一步处理
}
//将缓冲区末尾的数据移动到开始位置并设置开始位置
//下一读
对于（int i=index；i，您可以通过使用循环缓冲区而不是简单的数组来改进它。这样，或者在数组上使用循环迭代器。这样您就不需要进行所有复制-数组的“开始”会移动
除此之外，不，不是真的。当我过去遇到这个问题时，我只是复制了
将未处理的数据向下，然后从末尾读取。此
如果个人
元素相当小，缓冲区很大。（在现代计算机上）
“相当小”的机器可以是任何高达几个小时的机器
当然，你必须记录有多少
您已向下复制，以调整指针和
下一读
除此之外：

最好使用std:：vector
作为缓冲区
不能将从磁盘读取的四个字节转换为
unsigned int只需强制转换其地址；您必须插入
将每个字节放入它所属的无符号int
最后：您没有检查读取是否成功
在处理数据之前。使用带有
istream有点棘手：您的循环应该是
差不多
while（infle.read（addr，len）| | infle.gcount（）！=0）.
数字。太神奇了！哇！神奇的数字来自文件格式，它不会随着数据采集硬件的烧录而改变。我想我可以将它们设置为常量或其他形式。是的，这就是我的意思。我曾想过，但我不知道如何让fstream或fread使用它，有什么建议吗？迭代器，dude.Iterators！我已经很久没有接触迭代器了，谢谢你提醒我。fstream.read（char*s，streamsize n）能很好地使用它吗？（我问这个问题是因为我认为这取决于函数内部）“数组的开始移动”：在运行时的成本非常高。根据经验，仅仅复制要简单得多，而且通常也要快一点。@JamesKanze：那么巨大的运行时成本是什么？如果你要谈论缓存命中率，那就更糟了（在“接缝”处）。我想在跟踪迭代器在底层存储中的真实位置时会有一些开销，并且每次迭代都会有一个额外的条件。Meh。如何将向量传递给fstream.read（char*s，streamsize n）？另外，我认为你误解了转换。我分配了一个连续的内存块，5*2^20个单元，每个单元1个字节。然后，重新解释转换让我将该连续内存用作5*2^18个单元，每个单元4个字节，这就是我正在做的，磁盘上的字节顺序是正确的，因此我使用fstream.read逐字节获取所有内容，然后我使用转换数组处理数据。@JamesMattainfle.read（v.data（），v.size（））
。或者如果你没有C++11，infle.read（&v[0]，v.size（））
。至于字节顺序……你知道磁盘上的字节顺序，因为它是由格式定义的。你并不真正知道内存中的字节顺序，因为这可以（事实上，在一种情况下，是这样的）从编译器的一个版本更改到下一个版本；如果您升级机器，它也可能会更改。当然，我知道内存中的字节顺序，字节读取到内存中的方式与磁盘上的完全相同，否则，如果我将4MB的文件读取到char数组中，我将无法转到charArray[0]，并获取文件的第一个字节，依此类推。如果确实如此，则不可能知道如何将字节插入int的适当部分。文件字节顺序为Little Endian，并且由于仍在使用的非x86硬件示例很少，内存顺序也将为Little Endian。a另外，由于某种原因，如果内存是大端顺序的，则可以根据需要在int数组的每个元素上使用bswapl assembly命令。按位或（或加法）和移位的方法要复杂得多。我的方法需要一个movl和一个bswapl，你的方法需要（如果在汇编中优化，只需将下一个字节放入底部或寄存器）4 movb和3 shll，如果在汇编中未优化，则有4 movb、3 shll和3 orl。