C++ 从.tsv文件读入_C++ - Fatal编程技术网

C++ 从.tsv文件读入

c++

C++ 从.tsv文件读入,c++,C++,我正在尝试从制表符分隔值文件中读取以下格式的信息： <string> <int> <string> 我现在就是这样做的： string americanName; int pokedexNumber; string japaneseName; inFile >> americanName; inFile >> pokedexNumber inFile >> japaneseName; 我的问题源于“Mr.M

我正在尝试从制表符分隔值文件中读取以下格式的信息：

<string>    <int>    <string>

我现在就是这样做的：

string americanName;
int pokedexNumber;
string japaneseName;

inFile >> americanName;
inFile >> pokedexNumber
inFile >> japaneseName;

我的问题源于“Mr.Mime”中的空格，因为字符串可以包含空格

我想知道如何正确地读取文件。

标准库根据您的系统区域设置，使用诸如确定不同符号的类别和其他依赖于区域设置的内容。由于各种unicode问题，标准流使用它来确定什么是空间

在您的案例中，您可以使用此事实来控制

的含义：

#include <iostream>
#include <locale>
#include <algorithm>

struct tsv_ws : std::ctype<char>
{
    mask t[table_size]; // classification table, stores category for each character

    tsv_ws() : ctype(t) // ctype will use our table to check character type
    {
        // copy all default values to our table;
        std::copy_n(classic_table(), table_size, t);
        // here we tell, that ' ' is a punctuation, but not a space :)
        t[' '] = punct; 
    }
};

int main() {
    std::string s;
    std::cin.imbue(std::locale(std::cin.getloc(), new tsv_ws)); // using our locale, will work for any stream
    while (std::cin >> s) {
        std::cout << "read: '" << s << "'\n";
    }
}

#包括
#包括
#包含格式。
标准库根据您的系统语言环境，使用诸如确定不同符号的类别和其他依赖于语言环境的内容。由于各种unicode问题，标准流使用它来确定什么是空间
在您的案例中，您可以使用此事实来控制'
的含义：
#include <iostream>
#include <locale>
#include <algorithm>

struct tsv_ws : std::ctype<char>
{
    mask t[table_size]; // classification table, stores category for each character

    tsv_ws() : ctype(t) // ctype will use our table to check character type
    {
        // copy all default values to our table;
        std::copy_n(classic_table(), table_size, t);
        // here we tell, that ' ' is a punctuation, but not a space :)
        t[' '] = punct; 
    }
};

int main() {
    std::string s;
    std::cin.imbue(std::locale(std::cin.getloc(), new tsv_ws)); // using our locale, will work for any stream
    while (std::cin >> s) {
        std::cout << "read: '" << s << "'\n";
    }
}

#包括
#包括
#包含格式。
您可以使用非制表符空格提取字符串
std::getline(inFile, americanName, '\t'); // read up to first tab
inFile >> pokedexNumber >> std::ws; // read number then second tab
std::getline(inFile, japaneseName); // read up to first newline

可以使用提取带有非制表符空白的字符串
std::getline(inFile, americanName, '\t'); // read up to first tab
inFile >> pokedexNumber >> std::ws; // read number then second tab
std::getline(inFile, japaneseName); // read up to first newline

似乎您想要读取csv数据，或者您的tsv数据。但让我们坚持使用通用术语“csv”。这是一个标准的任务，我会给你详细的解释。最后，所有读数将在一行中完成
我建议使用“现代”C++方法。< /P>
在搜索了“阅读csv数据”之后，人们仍然在链接到，这些问题是从2009年开始的，现在已经超过10年了。大多数答案都是老生常谈的，而且非常复杂。所以，也许是时候改变了
在现代C++中，你有迭代遍历范围的算法。您经常会看到类似“someAlgoritm（container.begin（）、container.end（）、someLambda）”的内容。我们的想法是迭代一些类似的元素
在您的例子中，我们迭代输入字符串中的标记，并创建子字符串。这称为标记化
正是出于这个目的，我们有了std:：sregex\u令牌\u迭代器
。因为我们有这样的定义，我们应该使用它
这是一个迭代器。用于在字符串上迭代，因此是sregex。开始部分定义了我们将在什么范围内操作输入，然后在输入字符串中有一个std:：regex
来表示应该匹配的内容或不应该匹配的内容。最后一个参数给出了匹配策略的类型

1-->给我在正则表达式中定义的内容，然后
-1-->根据正则表达式给出不匹配的内容

因此，现在我们了解了迭代器，我们可以将标记从迭代器std：：复制到我们的目标，即std：：string
的std：：vector
。由于我们不知道我们有多少列，我们将使用std:：back_inserter
作为目标。这将添加我们从std:：sregex\u token\u迭代器
获得的所有令牌，并将其附加到我们的std:：vector>
中。我们有多少专栏并不重要
好。这样的声明可能看起来像
std::copy(                          // We want to copy something
    std::sregex_token_iterator      // The iterator begin, the sregex_token_iterator. Give back first token
    (
        line.begin(),               // Evaluate the input string from the beginning
        line.end(),                 // to the end
        re,                         // Add match a comma
        -1                          // But give me back not the comma but everything else 
    ),
    std::sregex_token_iterator(),   // iterator end for sregex_token_iterator, last token + 1
    std::back_inserter(cp.columns)  // Append everything to the target container
);

现在我们可以理解这个复制操作是如何工作的了
下一步。我们想从文件中读取。该文件还包含某种相同的数据。相同的数据是行
对于上面，我们可以迭代类似的数据。如果是文件输入或其他。为此，C++具有<代码> STD::istRAMMyTyror < /C>。这是一个模板，作为模板参数，它获取应该读取的数据类型，作为构造函数参数，它获取对输入流的引用。如果输入流是std:：cin
，或std:：ifstream
或std:：istringstream
，则无所谓。所有类型的流的行为都是相同的
由于我们没有SO文件，所以（在下面的示例中）我使用std:：istringstream
来存储输入的csv文件。当然，您可以通过定义std:：ifstream testCsv（filename）
来打开文件。没问题
使用std:：istream_迭代器
，我们对输入进行迭代并读取类似的数据。在我们的例子中，一个问题是我们想要迭代特殊数据，而不是某些内置数据类型
为了解决这个问题，我们定义了一个代理类，它为我们完成内部工作（我们不想知道如何封装在代理中）。在代理中，我们覆盖类型转换操作符，以获得std:：istream\u迭代器的预期类型的结果
最后一个重要步骤。std:：vector
具有范围构造函数。它还有许多其他构造函数，我们可以在定义std:：vector
类型的变量时使用它们。但就我们的目的而言，这个构造函数最适合我们
所以我们定义一个变量csv，并使用它的范围构造函数，给它一个范围的开始和结束。在我们的具体示例中，我们使用std:：istream\u迭代器的开始迭代器和结束迭代器
如果我们结合以上所有内容，读取完整的CSV文件是一个线性过程，它是一个变量的定义，并调用其构造函数
请参阅生成的代码：
#include <iostream>
#include <sstream>
#include <fstream>
#include <string>
#include <vector>
#include <iterator>
#include <regex>
#include <algorithm>

std::istringstream testCsv{ R"(Seaking  119 Azumao
Mr. Mime    122 Barrierd
Weedle  13  Beedle)" };


// Define Alias for easier Reading
using Columns = std::vector<std::string>;
using CSV = std::vector<Columns>;


// Proxy for the input Iterator
struct ColumnProxy {
    // Overload extractor. Read a complete line
    friend std::istream& operator>>(std::istream& is, ColumnProxy& cp) {

        // Read a line
        std::string line; cp.columns.clear();
        if (std::getline(is, line)) {

            // The delimiter
            const std::regex re("\t");

            // Split values and copy into resulting vector
            std::copy(std::sregex_token_iterator(line.begin(), line.end(), re, -1),
                std::sregex_token_iterator(),
                std::back_inserter(cp.columns));
        }
        return is;
    }

    // Type cast operator overload.  Cast the type 'Columns' to std::vector<std::string>
    operator std::vector<std::string>() const { return columns; }
protected:
    // Temporary to hold the read vector
    Columns columns{};
};


int main()
{
    // Define variable CSV with its range constructor. Read complete CSV in this statement, So, one liner
    CSV csv{ std::istream_iterator<ColumnProxy>(testCsv), std::istream_iterator<ColumnProxy>() };

    // Print result. Go through all lines and then copy line elements to std::cout
    std::for_each(csv.begin(), csv.end(), [](Columns & c) {
        std::copy(c.begin(), c.end(), std::ostream_iterator<std::string>(std::cout, " ")); std::cout << "\n";   });
}

#包括
#包括
#包括
#包括
#包括
#包括
#包括
#包括
std:：istringstream testCsv{R“（Seaking 119 Azumao
巴列尔德先生
Weedle 13 Beedle；
//定义别名以便于阅读
使用Columns=std:：vector；
使用CSV=std:：vector；
//输入迭代器的代理
结构列代理{
//重载提取器。读取完整的行
friend std:：istream&operator>>（std:：istream&is、ColumnProxy&cp）{
//读一行
std:：string行；cp.columns.clear（）；
if（std:：getline（is，line））{
//分隔符
常量std:：regex re（“\t”）；
//拆分值并复制到结果向量中
标准：：c