Parsing 分块读取文件并将未完成的行追加到下一次读取
我正在尝试从以下文件中读取:Parsing 分块读取文件并将未完成的行追加到下一次读取,parsing,c++11,vector,ifstream,Parsing,C++11,Vector,Ifstream,我正在尝试从以下文件中读取: abcdefghijklmnopqrstuvwxyz abcdefghijklmnopqrstuvwxyz 12345abcdefghijklmnopqrstu abcdefghijklmnopqrstuvwxyz abcdefghijklmnopqrstuvwxyz 代码如下: #include <iostream> #include <fstream> #include <sstream> #include <thre
abcdefghijklmnopqrstuvwxyz
abcdefghijklmnopqrstuvwxyz
12345abcdefghijklmnopqrstu
abcdefghijklmnopqrstuvwxyz
abcdefghijklmnopqrstuvwxyz
代码如下:
#include <iostream>
#include <fstream>
#include <sstream>
#include <thread>
#include <mutex>
#include <vector>
#include <array>
#include <algorithm>
#include <iterator>
#define CHUNK_SIZE 55
std::mutex queueDumpMutex;
void getLinesFromChunk(std::vector<char>& chunk, std::vector<std::string>& container)
{
static std::string str;
unsigned int i = 0;
while(i < chunk.size())
{
str.clear();
size_t chunk_sz = chunk.size();
while(chunk[i] != '\n' && i < chunk_sz )
{
str.push_back(chunk[i++]);
}
std::cout<<"\nStr = "<<str;
if (i < chunk_sz)
{
std::lock_guard<std::mutex> lock(queueDumpMutex);
container.push_back(str);
}
++i;
}
chunk.clear();
std::copy(str.begin(), str.end(), std::back_inserter(chunk));
std::cout << "\nPrinting the chunk out ....." << std::endl;
std::copy(chunk.begin(), chunk.end(), std::ostream_iterator<char>(std::cout, " "));
}
void ReadFileAndPopulateDump(std::ifstream& in)
{
std::vector<char> chunk;
chunk.reserve(CHUNK_SIZE*2);
std::vector<std::string> queueDump;
in.unsetf(std::ios::skipws);
std::cout << "Chunk capacity: " << chunk.capacity() << std::endl;
do{
in.read(&chunk[chunk.size()], CHUNK_SIZE);
std::cout << "Chunk size before getLines: " << chunk.size() << std::endl;
getLinesFromChunk(chunk, queueDump);
std::cout << "Chunk size after getLines: " << chunk.size() << std::endl;
}while(!in.eof());
}
int main()
{
std::ifstream in("/home/ankit/codes/more_practice/sample.txt", std::ifstream::binary);
ReadFileAndPopulateDump(in);
return 0;
}
容器应类似于:
abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz
而不是:
abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz12
现在我明白了chunk.reserve(chunk\u SIZE)保留给定的内存,实际上并不分配大小。因为如果这样,我就无法从in.read()中读取
如果我使用chunk.resize(chunk\u SIZE)并将其追加到末尾,因为我希望剩余的字符“12”追加其完整行
现在的问题是,代码被重复的次数超过了它应该重复的次数。依我看,条件似乎很好
非常感谢您的帮助。对不起,我不明白您为什么:
- 以二进制模式而不是文本模式读取文件
- 不要使用
getline()
- 使用
代替向量
字符串
#include <cstdlib>
#include <fstream>
#include <iostream>
int main()
{
std::ifstream f("sample.txt"); // text mode!
std::size_t const chunkSizeMax = 55U;
std::string str;
std::string chunk;
while ( std::getline(f, str) )
{
if ( chunkSizeMax <= (chunk.size() + str.size()) )
{
std::cout << "chunk: [" << chunk << "]\n";
chunk.clear();
}
chunk += str;
}
std::cout << "last chunk: [" << chunk << "]\n";
return EXIT_SUCCESS;
}
#包括
#包括
#包括
int main()
{
std::ifstream f(“sample.txt”);//文本模式!
std::size\t const chunkSizeMax=55U;
std::字符串str;
std::字符串块;
while(std::getline(f,str))
{
if(chunkSizeMax)
#include <cstdlib>
#include <fstream>
#include <iostream>
int main()
{
std::ifstream f("sample.txt"); // text mode!
std::size_t const chunkSizeMax = 55U;
std::string str;
std::string chunk;
while ( std::getline(f, str) )
{
if ( chunkSizeMax <= (chunk.size() + str.size()) )
{
std::cout << "chunk: [" << chunk << "]\n";
chunk.clear();
}
chunk += str;
}
std::cout << "last chunk: [" << chunk << "]\n";
return EXIT_SUCCESS;
}