在C+中逐行读取巨大的文本文件+；带缓冲我需要从C++中逐行读取大量的35G文件。目前，我的做法如下： ifstream infile("myfile.txt"); string line; while (true) { if (!getline(infile, line)) break; long linepos = infile.tellg(); process(line,linepos); }_C++_Performance_Stl_Buffering

在C+中逐行读取巨大的文本文件+；带缓冲我需要从C++中逐行读取大量的35G文件。目前，我的做法如下： ifstream infile("myfile.txt"); string line; while (true) { if (!getline(infile, line)) break; long linepos = infile.tellg(); process(line,linepos); }

c++ performance

在C+中逐行读取巨大的文本文件+；带缓冲我需要从C++中逐行读取大量的35G文件。目前，我的做法如下： ifstream infile("myfile.txt"); string line; while (true) { if (!getline(infile, line)) break; long linepos = infile.tellg(); process(line,linepos); },c++,performance,stl,buffering,C++,Performance,Stl,Buffering,但是它给了我大约2MB/秒的性能，尽管文件管理器以100Mb/秒的速度复制文件。我猜getline（）没有正确执行缓冲。请提出某种缓冲逐行阅读方法 UPD:process（）不是瓶颈，没有process（）的代码可以以相同的速度工作。使用标准IO流，您将无法获得接近行速度的速度。无论是否缓冲，几乎任何解析都会在数量级上降低速度。我在数据文件上做了实验，数据文件由两个整数和一个双线组成（常春藤桥芯片，SSD）：各种组合的IO流：~10 MB/s。纯解析（f>>i1>>i2>>d）比getlin

但是它给了我大约2MB/秒的性能，尽管文件管理器以100Mb/秒的速度复制文件。我猜

getline（）

没有正确执行缓冲。请提出某种缓冲逐行阅读方法

UPD:process（）不是瓶颈，没有process（）的代码可以以相同的速度工作。

使用标准IO流，您将无法获得接近行速度的速度。无论是否缓冲，几乎任何解析都会在数量级上降低速度。我在数据文件上做了实验，数据文件由两个整数和一个双线组成（常春藤桥芯片，SSD）：

各种组合的IO流：~10 MB/s。纯解析（
```
f>>i1>>i2>>d
```
）比
```
getline
```
先解析为字符串，再解析为
```
sstringstream
```
更快
像
```
fscanf
```
这样的C文件操作速度大约为40 MB/s
```
getline
```
无解析：180 MB/s
```
fread
```
：500-800 MB/s（取决于操作系统是否缓存了文件）

I/O不是瓶颈，解析是瓶颈。换句话说，您的

过程可能是您的慢点
所以我写了一个并行解析器。它由任务组成（使用TBB管道）：
fread
大块（一次一个这样的任务）
重新排列块，使一行不会在块之间分割（一次一个这样的任务）
解析块（许多这样的任务）
我可以有无限的解析任务，因为我的数据是无序的。如果你的不是，那么这对你来说可能不值得。
这种方法使我在4核IvyBridge芯片上获得大约100 MB/s的速度。
我已经从java项目中翻译了我自己的缓冲代码，它满足了我的需要。为了克服M$VC2010编译器tellg的问题，我不得不使用defines，它总是在大文件上给出错误的负值。该算法提供了所需的速度~100MB/s，尽管它有一些新的功能[]
void readFileFast(ifstream &file, void(*lineHandler)(char*str, int length, __int64 absPos)){
        int BUF_SIZE = 40000;
        file.seekg(0,ios::end);
        ifstream::pos_type p = file.tellg();
#ifdef WIN32
        __int64 fileSize = *(__int64*)(((char*)&p) +8);
#else
        __int64 fileSize = p;
#endif
        file.seekg(0,ios::beg);
        BUF_SIZE = min(BUF_SIZE, fileSize);
        char* buf = new char[BUF_SIZE];
        int bufLength = BUF_SIZE;
        file.read(buf, bufLength);

        int strEnd = -1;
        int strStart;
        __int64 bufPosInFile = 0;
        while (bufLength > 0) {
            int i = strEnd + 1;
            strStart = strEnd;
            strEnd = -1;
            for (; i < bufLength && i + bufPosInFile < fileSize; i++) {
                if (buf[i] == '\n') {
                    strEnd = i;
                    break;
                }
            }

            if (strEnd == -1) { // scroll buffer
                if (strStart == -1) {
                    lineHandler(buf + strStart + 1, bufLength, bufPosInFile + strStart + 1);
                    bufPosInFile += bufLength;
                    bufLength = min(bufLength, fileSize - bufPosInFile);
                    delete[]buf;
                    buf = new char[bufLength];
                    file.read(buf, bufLength);
                } else {
                    int movedLength = bufLength - strStart - 1;
                    memmove(buf,buf+strStart+1,movedLength);
                    bufPosInFile += strStart + 1;
                    int readSize = min(bufLength - movedLength, fileSize - bufPosInFile - movedLength);

                    if (readSize != 0)
                        file.read(buf + movedLength, readSize);
                    if (movedLength + readSize < bufLength) {
                        char *tmpbuf = new char[movedLength + readSize];
                        memmove(tmpbuf,buf,movedLength+readSize);
                        delete[]buf;
                        buf = tmpbuf;
                        bufLength = movedLength + readSize;
                    }
                    strEnd = -1;
                }
            } else {
                lineHandler(buf+ strStart + 1, strEnd - strStart, bufPosInFile + strStart + 1);
            }
        }
        lineHandler(0, 0, 0);//eof
}

void lineHandler(char*buf, int l, __int64 pos){
    if(buf==0) return;
    string s = string(buf, l);
    printf(s.c_str());
}

void loadFile(){
    ifstream infile("file");
    readFileFast(infile,lineHandler);
}

void readFileFast（ifstream&file，void（*lineHandler）（char*str，int-length，uu int64 absPos））{
int BUF_尺寸=40000；
seekg（0，ios:：end）；
ifstream:：pos_type p=file.tellg（）；
#ifdef WIN32
__int64 fileSize=*（（字符*）&p）+8；
#否则
__int64 fileSize=p；
#恩迪夫
seekg（0，ios:：beg）；
BUF_SIZE=min（BUF_SIZE，fileSize）；
char*buf=新字符[buf_大小]；
int BUFLENGHT=BUF_尺寸；
文件读取（buf，bufLength）；
国际强度=-1；
int strStart；
__int64 bufPosInFile=0；
while（bufLength>0）{
int i=强度+1；
strStart=strengd；
强度=-1；
对于（；i
使用或编写行解析器。这是sourceforge中的一个示例，如果需要，可以放入一个缓冲区。
getline
不做任何缓冲，istream
做任何缓冲。你为什么要逐行阅读？为什么不一次读几百万行呢？是什么让你相信瓶颈是getline
，而不是process
？你是否在编译时进行了优化？对于G++和CLAN使用-O2或-O3，VisualC++使用发布版本。@ IgorTandetnik，请参见我的更新。尝试没有<代码> TELGG。使用纯getline
ifstream应该会更快。read
应该具有与fread
相同的性能（它也不进行解析）；使用它可能需要对现有代码进行更少的更改。是否缓存了文件以用于getline
测试？我看到询问者的代码为680-800MB/秒（有一个空的进程（）
），没有tellg的代码为1GB/秒。（gcc-4.6.3，-O0
）@user2313838是的，它被缓存了。我的代码看起来也很像asker的.MacOS/SSD/CoreI7（2xPhysical）：ifstream-iff（“out.mp”）；while（iff.good（））{getline（iff，line）；//从文件中获取行}
16727毫秒，文件大小：371M@ArthurKushman这仅为22MB/s，您应该能够获得更多。