C++ 解析CSV文件-C++；_C++_Csv_Parsing

C++ 解析CSV文件-C++；

c++ csv parsing

C++ 解析CSV文件-C++；,c++,csv,parsing,C++,Csv,Parsing,C++14 一般来说，大学的工作人员建议我们使用Boost来解析文件，但我已经安装了它，没有成功地用它实现任何东西所以我必须逐行解析一个CSV文件，其中每行由两列组成，当然用逗号分隔。这两列中的每一列都是一个数字。我必须取这两个数字的整数值，最后用它们来构造我的分形对象第一个问题是：文件可能看起来像这样，例如： 1,1 <HERE WE HAVE A NEWLINE> <HERE WE HAVE A NEWLINE> 1,1 此文件格式正常。但我的解决方案输出的是

C++14

一般来说，大学的工作人员建议我们使用Boost来解析文件，但我已经安装了它，没有成功地用它实现任何东西

所以我必须逐行解析一个CSV文件，其中每行由两列组成，当然用逗号分隔。这两列中的每一列都是一个数字。我必须取这两个数字的整数值，最后用它们来构造我的分形对象

第一个问题是：文件可能看起来像这样，例如：

1,1
<HERE WE HAVE A NEWLINE>
<HERE WE HAVE A NEWLINE>

1,1

此文件格式正常。但我的解决方案输出的是“无效输入”，正确的解决方案应该只打印一次相应的分形-1,1。

第二个问题是：文件可能看起来像：

1,1
<HERE WE HAVE A NEWLINE>
1,1

1,1
1,1

这应该是无效的输入，但我的解决方案将其视为正确的输入，只是跳过中间换行符。

也许你可以指导我如何解决这些问题，这对我来说真的很有帮助，因为我每天早上到晚上都在努力做这项练习

这是我当前的解析器：

#include <iostream>
#include "Fractal.h"
#include <fstream>
#include <stack>
#include <sstream>
const char *usgErr = "Usage: FractalDrawer <file path>\n";
const char *invalidErr = "Invalid input\n";
const char *VALIDEXT = "csv";
const char EXTDOT = '.';
const char COMMA = ',';
const char MINTYPE = 1;
const char MAXTYPE = 3;
const int MINDIM = 1;
const int MAXDIM = 6;
const int NUBEROFARGS = 2;
int main(int argc, char *argv[])
{
    if (argc != NUBEROFARGS)
    {
        std::cerr << usgErr;
        std::exit(EXIT_FAILURE);
    }
    std::stack<Fractal *> resToPrint;
    std::string filepath = argv[1]; // Can be a relative/absolute path
    if (filepath.substr(filepath.find_last_of(EXTDOT) + 1) != VALIDEXT)
    {
        std::cerr << invalidErr;
        exit(EXIT_FAILURE);
    }
    std::stringstream ss; // Treat it as a buffer to parse each line
    std::string s; // Use it with 'ss' to convert char digit to int
    std::ifstream myFile; // Declare on a pointer to file
    myFile.open(filepath); // Open CSV file
    if (!myFile) // If failed to open the file
    {
        std::cerr << invalidErr;
        exit(EXIT_FAILURE);
    }
    int type = 0;
    int dim = 0;
    while (myFile.peek() != EOF)
    {
        getline(myFile, s, COMMA); // Read to comma - the kind of fractal, store it in s
        ss << s << WHITESPACE; // Save the number in ss delimited by ' ' to be able to perform the double assignment
        s.clear(); // We don't want to save this number in s anymore as we won't it to be assigned somewhere else
        getline(myFile, s, NEWLINE); // Read to NEWLINE - the dim of the fractal
        ss << s;
        ss >> type >> dim; // Double assignment
        s.clear(); // We don't want to save this number in s anymore as we won't it to be assigned somewhere else

        if (ss.peek() != EOF || type < MINTYPE || type > MAXTYPE || dim < MINDIM || dim > MAXDIM) 
        {
            std::cerr << invalidErr;
            std::exit(EXIT_FAILURE);
        }

        resToPrint.push(FractalFactory::factoryMethod(type, dim));
        ss.clear(); // Clear the buffer to update new values of the next line at the next iteration
    }

    while (!resToPrint.empty())
    {
        std::cout << *(resToPrint.top()) << std::endl;
        resToPrint.pop();
    }

    myFile.close();

    return 0;
}

#包括
#包括“Fractal.h”
#包括
#包括
#包括
const char*usgErr=“用法：FractalDrawer\n”；
const char*invalidErr=“无效输入\n”；
const char*validex=“csv”；
常量字符EXTDOT='；
常量字符逗号='，'；
const char MINTYPE=1；
const char MAXTYPE=3；
常数int MINDIM=1；
常量int MAXDIM=6；
常量int NUBEROFARGS=2；
int main（int argc，char*argv[]）
{
如果（argc！=numberofargs）
{
标准：cerr MAXDIM）
{
std:：cerr我不会更新您的代码。我查看了您的标题解析CSV文件-C++
，并想向您展示如何以更现代的方式读取CSV文件。不幸的是，您仍然使用C++14。使用C++20或ranges库，使用getlines
和split
将非常简单
在C++17中，我们可以使用CTAD和带有初始值设定项的if
，等等
但我们不需要的是boost。C++的标准库就足够了。而且我们从不使用scanf
和类似的旧东西
在我非常谦虚的意见中不应再给出与10年问题的链接。现在是2020年。应该使用更现代和现在可用的语言元素。但正如所说的那样。每个人都可以自由地做他想做的事
<> > C++中，我们可以使用<代码> STD::SrEXExxTokEnter迭代器。它的用法也不会显著减慢程序。双代码> STD:：GETLION/CODE >也可以。虽然它的列数必须是已知的。<代码> STD:：SReXExth-TokKyTyror < /C> >不关心T。他有很多列
请参阅下面的示例代码。在此示例中，我们创建一个tine代理类并覆盖其提取器运算符。然后我们使用std:：istream\u迭代器
，在一个小的一行程序中读取和解析整个csv文件
#include <algorithm>
#include <fstream>
#include <iostream>
#include <iterator>
#include <regex>
#include <string>
#include <vector>

// Define Alias for easier Reading
// using Columns = std::vector<std::string>;
using Columns = std::vector<int>;

// The delimiter
const std::regex re(",");

// Proxy for the input Iterator
struct ColumnProxy {
    // Overload extractor. Read a complete line
    friend std::istream& operator>>(std::istream& is, ColumnProxy& cp) {
        // Read a line
        std::string line;
        cp.columns.clear();
        if(std::getline(is, line) && !line.empty()) {
            // Split values and copy into resulting vector
            std::transform(
                std::sregex_token_iterator(line.begin(), line.end(), re, -1), {},
                std::back_inserter(cp.columns),
                [](const std::string& s) { return std::stoi(s); });
        }
        return is;
    }
    // Type cast operator overload.  Cast the type 'Columns' to
    // std::vector<std::string>
    operator Columns() const { return columns; }

protected:
    // Temporary to hold the read vector
    Columns columns{};
};

int main() {
    std::ifstream myFile("r:\\log.txt");
    if(myFile) {
        // Read the complete file and parse verything and store result into vector
        std::vector<Columns> values(std::istream_iterator<ColumnProxy>(myFile), {});

        // Show complete csv data
        std::for_each(values.begin(), values.end(), [](const Columns& c) {
            std::copy(c.begin(), c.end(),
                      std::ostream_iterator<int>(std::cout, " "));
            std::cout << "\n";
        });
    }
    return 0;
}

所以，现在，你可能会有想法，你可能喜欢它，也可能不喜欢它。随便你。你想做什么就做什么。
你不需要任何特殊的东西来解析.csv
文件，C++11上的STL容器提供了解析几乎任何.csv
文件所需的所有工具。你不需要知道值的数量虽然您需要知道正在从.csv
读取的值的类型，以便应用正确的值转换，但每一行都要先进行解析。您也不需要任何第三方库，如Boost
存储从.csv
文件解析的值的方法有很多种。基本的“处理任何类型”方法是将值存储在std:：vector
中（它本质上提供了一个向量向量向量，其中包含从每行解析的值）。您可以根据需要专门化存储，具体取决于正在读取的类型以及转换和存储值的方式。基本存储可以是struct/class
、std:：pair
、std:：set
，也可以是一种基本类型，如int
。任何适合您的数据
在您的情况下，您的文件中有基本的int
值。对于基本的.csv
解析，唯一需要注意的是，在值行之间可能有空行。这很容易通过任何数量的测试来处理。例如，您可以检查读取的行的.length（）
是否为零，或者是否具有更大的灵活性（在处理包含多个空格或其他非值字符的行时），可以使用.find_first_of（）
查找行中的第一个所需值，以确定该行是否要分析
例如，在您的情况下，您的值行的读取循环可以简单地读取每一行，并检查该行是否包含一个数字
    ...
    std::string line;       /* string to hold each line read from file  */
    std::vector<std::vector<int>> values {};    /* vector vector of int */
    std::ifstream f (argv[1]);                  /* file stream to read  */

    while (getline (f, line)) { /* read each line into line */
        /* if no digits in line - get next */
        if (line.find_first_of("0123456789") == std::string::npos)
            continue;
        ...
    }

每行读取的值数或每个文件读取的值行数没有限制（不超过存储虚拟内存的限制）
将上述内容放在一个简短的示例中，您可以执行类似于以下操作的操作，即读取输入文件，然后在完成后输出收集的整数：
#include <iostream>
#include <fstream>
#include <sstream>
#include <string>
#include <vector>

int main (int argc, char **argv) {

    if (argc < 2) { /* validate at least 1 argument given for filename */
        std::cerr << "error: insufficient input.\nusage: ./prog <filename>\n";
        return 1;
    }

    std::string line;       /* string to hold each line read from file  */
    std::vector<std::vector<int>> values {};    /* vector vector of int */
    std::ifstream f (argv[1]);                  /* file stream to read  */

    while (getline (f, line)) { /* read each line into line */
        /* if no digits in line - get next */
        if (line.find_first_of("0123456789") == std::string::npos)
            continue;
        int itmp;                               /* temporary int */
        std::vector<int> tmp;                   /* temporary vector<int> */
        std::stringstream ss (line);            /* stringstream from line */
        while (ss >> itmp) {                    /* read int from stringstream */
            std::string tmpstr;                 /* temporary string to ',' */
            tmp.push_back(itmp);                /* add int to tmp */
            if (!getline (ss, tmpstr, ','))     /* read to ',' w/tmpstr */
                break;                          /* done if no more ',' */
        } 
        values.push_back (tmp);     /* add tmp vector to values */
    }

    for (auto row : values) {       /* output collected values */
        for (auto col : row)
            std::cout << "  " << col;
        std::cout << '\n';
    }
}

示例使用/输出
结果解析：
$ ./bin/parsecsv dat/csvspaces.csv
  1  1
  2  2
  3  3
  4  4
  5  5
  6  6
  7  7
  8  8
  9  9

示例输入未知/列数不均
您不需要知道.csv
中每行的值数或文件中的值行数。STL容器自动处理内存分配需求，允许您解析所需内容。现在，您可能希望强制执行每行或每文件行的固定值数，但这完全取决于您自己o向读取/解析例程添加简单计数器和检查，以根据需要限制存储的值
没有任何ch
#include <iostream>
#include <fstream>
#include <sstream>
#include <string>
#include <vector>

int main (int argc, char **argv) {

    if (argc < 2) { /* validate at least 1 argument given for filename */
        std::cerr << "error: insufficient input.\nusage: ./prog <filename>\n";
        return 1;
    }

    std::string line;       /* string to hold each line read from file  */
    std::vector<std::vector<int>> values {};    /* vector vector of int */
    std::ifstream f (argv[1]);                  /* file stream to read  */

    while (getline (f, line)) { /* read each line into line */
        /* if no digits in line - get next */
        if (line.find_first_of("0123456789") == std::string::npos)
            continue;
        int itmp;                               /* temporary int */
        std::vector<int> tmp;                   /* temporary vector<int> */
        std::stringstream ss (line);            /* stringstream from line */
        while (ss >> itmp) {                    /* read int from stringstream */
            std::string tmpstr;                 /* temporary string to ',' */
            tmp.push_back(itmp);                /* add int to tmp */
            if (!getline (ss, tmpstr, ','))     /* read to ',' w/tmpstr */
                break;                          /* done if no more ',' */
        } 
        values.push_back (tmp);     /* add tmp vector to values */
    }

    for (auto row : values) {       /* output collected values */
        for (auto col : row)
            std::cout << "  " << col;
        std::cout << '\n';
    }
}

$ cat dat/csvspaces.csv
1,1


2,2
3,3

4,4



5,5
6,6

7,7

8,8


9,9

$ ./bin/parsecsv dat/csvspaces.csv
  1  1
  2  2
  3  3
  4  4
  5  5
  6  6
  7  7
  8  8
  9  9

$ cat dat/csvspaces2.csv
1


2,2
3,3,3

4,4,4,4



5,5,5,5,5
6,6,6,6,6,6

7,7,7,7,7,7,7

8,8,8,8,8,8,8,8


9,9,9,9,9,9,9,9,9

$ ./bin/parsecsv dat/csvspaces2.csv
  1
  2  2
  3  3  3
  4  4  4  4
  5  5  5  5  5
  6  6  6  6  6  6
  7  7  7  7  7  7  7
  8  8  8  8  8  8  8  8
  9  9  9  9  9  9  9  9  9