Parsing 分块读取文件并将未完成的行追加到下一次读取

Parsing 分块读取文件并将未完成的行追加到下一次读取,parsing,c++11,vector,ifstream,Parsing,C++11,Vector,Ifstream,我正在尝试从以下文件中读取: abcdefghijklmnopqrstuvwxyz abcdefghijklmnopqrstuvwxyz 12345abcdefghijklmnopqrstu abcdefghijklmnopqrstuvwxyz abcdefghijklmnopqrstuvwxyz 代码如下: #include <iostream> #include <fstream> #include <sstream> #include <thre

我正在尝试从以下文件中读取:

abcdefghijklmnopqrstuvwxyz
abcdefghijklmnopqrstuvwxyz
12345abcdefghijklmnopqrstu
abcdefghijklmnopqrstuvwxyz
abcdefghijklmnopqrstuvwxyz
代码如下:

#include <iostream>
#include <fstream>
#include <sstream>
#include <thread>
#include <mutex>
#include <vector>
#include <array>
#include <algorithm>
#include <iterator>

#define CHUNK_SIZE 55

std::mutex queueDumpMutex;

void getLinesFromChunk(std::vector<char>& chunk, std::vector<std::string>& container)
{
    static std::string str;
    unsigned int i = 0;
    while(i < chunk.size())
    {   
        str.clear();
        size_t chunk_sz = chunk.size();

        while(chunk[i] != '\n' && i < chunk_sz )
        {
            str.push_back(chunk[i++]);
        }
        std::cout<<"\nStr = "<<str;

        if (i < chunk_sz)
        {
            std::lock_guard<std::mutex> lock(queueDumpMutex);
            container.push_back(str);
        }
        ++i;
    }
    chunk.clear();
    std::copy(str.begin(), str.end(), std::back_inserter(chunk));
    std::cout << "\nPrinting the chunk out ....." << std::endl;
    std::copy(chunk.begin(), chunk.end(), std::ostream_iterator<char>(std::cout, " "));
}

void ReadFileAndPopulateDump(std::ifstream& in)
{
    std::vector<char> chunk;
    chunk.reserve(CHUNK_SIZE*2);
    std::vector<std::string> queueDump; 
    in.unsetf(std::ios::skipws);
    std::cout << "Chunk capacity: " << chunk.capacity() << std::endl;

    do{
        in.read(&chunk[chunk.size()], CHUNK_SIZE);
        std::cout << "Chunk size before getLines: " << chunk.size() << std::endl;
        getLinesFromChunk(chunk, queueDump);
        std::cout << "Chunk size after getLines: " << chunk.size() << std::endl;
    }while(!in.eof());
}

int main()
{
    std::ifstream in("/home/ankit/codes/more_practice/sample.txt", std::ifstream::binary);
    ReadFileAndPopulateDump(in);
    return 0;
}
容器应类似于:

abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz
而不是:

abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz12
现在我明白了chunk.reserve(chunk\u SIZE)保留给定的内存,实际上并不分配大小。因为如果这样,我就无法从in.read()中读取

如果我使用chunk.resize(chunk\u SIZE)并将其追加到末尾,因为我希望剩余的字符“12”追加其完整行

现在的问题是,代码被重复的次数超过了它应该重复的次数。依我看,条件似乎很好


非常感谢您的帮助。

对不起,我不明白您为什么:

  • 以二进制模式而不是文本模式读取文件
  • 不要使用
    getline()
  • 使用
    向量
    代替
    字符串
就我理解你提出的问题而言,我会这样做

#include <cstdlib>
#include <fstream>
#include <iostream>

int main()
 {
   std::ifstream  f("sample.txt");  // text mode!

   std::size_t const  chunkSizeMax = 55U;

   std::string  str;
   std::string  chunk;

   while ( std::getline(f, str) )
    {
      if ( chunkSizeMax <= (chunk.size() + str.size()) )
       {
         std::cout << "chunk: [" << chunk << "]\n";

         chunk.clear();
       }

      chunk += str;
    }

   std::cout << "last chunk: [" << chunk << "]\n";

   return EXIT_SUCCESS;
 }
#包括
#包括
#包括
int main()
{
std::ifstream f(“sample.txt”);//文本模式!
std::size\t const chunkSizeMax=55U;
std::字符串str;
std::字符串块;
while(std::getline(f,str))
{
if(chunkSizeMax)
#include <cstdlib>
#include <fstream>
#include <iostream>

int main()
 {
   std::ifstream  f("sample.txt");  // text mode!

   std::size_t const  chunkSizeMax = 55U;

   std::string  str;
   std::string  chunk;

   while ( std::getline(f, str) )
    {
      if ( chunkSizeMax <= (chunk.size() + str.size()) )
       {
         std::cout << "chunk: [" << chunk << "]\n";

         chunk.clear();
       }

      chunk += str;
    }

   std::cout << "last chunk: [" << chunk << "]\n";

   return EXIT_SUCCESS;
 }