C++ 在C+；中使用类似Fortran的格式迭代文本文件+；_C++_Parsing_Fortran_Text Parsing

C++ 在C+；中使用类似Fortran的格式迭代文本文件+；

c++ parsing fortran

C++ 在C+；中使用类似Fortran的格式迭代文本文件+；,c++,parsing,fortran,text-parsing,C++,Parsing,Fortran,Text Parsing,我正在制作一个处理txt文件数据的应用程序 P>的思想是TXT文件可以采用不同的格式，并且应该被读取到C++中。一个例子可能是3I2，3X，I3，应该这样做：“首先我们有3个长度为2的整数，然后我们有3个空点，然后我们有1个长度为3的整数最好是迭代文件，生成行，然后将行作为字符串进行迭代？哪种有效的迭代方法可以巧妙地忽略要忽略的3个点例如致：可以用SCANF格式翻译 3I2、3X、I3 .< /P> < P> Kyle Kanos给出的链接是一个好的链接；*Snff/*Prtuf格式

我正在制作一个处理txt文件数据的应用程序

<> P>的思想是TXT文件可以采用不同的格式，并且应该被读取到C++中。一个例子可能是

3I2，3X，I3

，应该这样做：“首先我们有3个长度为2的整数，然后我们有3个空点，然后我们有1个长度为3的整数

最好是迭代文件，生成行，然后将行作为字符串进行迭代？哪种有效的迭代方法可以巧妙地忽略要忽略的3个点

例如

致：

可以用SCANF格式翻译<代码> 3I2、3X、I3<代码> .< /P> < P> Kyle Kanos给出的链接是一个好的链接；*Snff/*Prtuf格式字符串映射到FORTRAN格式字符串上。使用C风格IO实际上更容易做到这一点，但是使用C++风格的流也是可行的：

#include <cstdio>
#include <iostream>
#include <fstream>
#include <string>

int main() {
    std::ifstream fortranfile;
    fortranfile.open("input.txt");

    if (fortranfile.is_open()) {

        std::string line;
        getline(fortranfile, line);

        while (fortranfile.good()) {
            char dummy[4];
            int i1, i2, i3, i4;

            sscanf(line.c_str(), "%2d%2d%2d%3s%3d", &i1, &i2, &i3, dummy, &i4);

            std::cout << "Line: '" << line << "' -> " << i1 << " " << i2 << " "
                      << i3 << " " << i4 << std::endl;

            getline(fortranfile, line);
        }
    }

    fortranfile.close();

    return 0;
}

这里我们使用的格式字符串是

%2d%2d%2d%3s%3d

-3份

%2d

（宽度为2的十进制整数），然后是

%3s

（宽度为3的字符串，我们将其读入从未使用过的变量），然后是

%3d

（宽度为3的十进制整数）.

鉴于Fortran很容易从C中调用，您可以编写一个小小的Fortran函数来“本机”执行此操作。毕竟，Fortran READ函数采用您描述的格式字符串

如果你想让它工作，你需要稍微修改一下FORTRAN，然后学习如何用编译器连接Fortran和C++。

Fortran符号的后缀可以隐式地加下划线，因此可以从C调用MYFUNC作为MYFUNC \（）
多维数组的维数顺序相反

如果您的用户实际上应该以Fortran格式输入数据，或者如果您非常快地修改或编写Fortran代码来实现这一点，我会按照John Zwinck和M.S.B.的建议执行。只需编写一个简短的Fortran例程将数据读入数组，然后使用“绑定（c）”“和ISO_C_绑定类型来设置接口。请记住，数组索引在FORTRAN和C++之间会发生变化。否则，我建议使用scanf，如上所述：

如果您不知道每行需要读取的项目数，则可以使用vscanf：

然而，尽管它看起来很方便，但我从未使用过它，所以YMMV。

鉴于您希望，您应该注意：您立即进入了解析器的领域。

除了其他人在此处提到的解析此类输入的其他方法外：

通过使用Fortran和CC/+++绑定为您进行解析
使用纯C++为您解析语法，使用以下组合：
- ```
sscanf
```
- ```
流
```

我的建议是，如果您可以使用它，您可以使用regex和STL容器的组合，为动态操作实现一个简单的解析器
根据您所描述的内容以及在不同位置显示的内容，您可以使用正则表达式捕获来构造您希望支持的语法的简单实现：

(\\d{0,8})([[:alpha:]])(\\d{0,8})

其中，第一组是该变量类型的编号

第二个是变量的类型

第三是变量类型的长度
使用，您可以实现一个简单的解决方案，如下所示：

#include <iostream> #include <string> #include <vector> #include <fstream> #include <cstdlib> #include <boost/regex.hpp> #include <boost/tokenizer.hpp> #include <boost/algorithm/string.hpp> #include <boost/lexical_cast.hpp> //A POD Data Structure used for storing Fortran Format Tokens into their relative forms typedef struct FortranFormatSpecifier { char type;//the type of the variable size_t number;//the number of times the variable is repeated size_t length;//the length of the variable type } FFlag; //This class implements a rudimentary parser to parse Fortran Format //Specifier Flags using Boost regexes. class FormatParser { public: //typedefs for further use with the class and class methods typedef boost::tokenizer<boost::char_separator<char> > bst_tokenizer; typedef std::vector<std::vector<std::string> > vvstr; typedef std::vector<std::string> vstr; typedef std::vector<std::vector<int> > vvint; typedef std::vector<int> vint; FormatParser(); FormatParser(const std::string& fmt, const std::string& fname); void parse(); void printIntData(); void printCharData(); private: bool validateFmtString(); size_t determineOccurence(const std::string& numStr); FFlag setFortranFmtArgs(const boost::smatch& matches); void parseAndStore(const std::string& line); void storeData(); std::string mFmtStr; //this holds the format string std::string mFilename; //the name of the file FFlag mFmt; //a temporary FFlag variable std::vector<FFlag> mFortranVars; //this holds all the flags and details of them std::vector<std::string> mRawData; //this holds the raw tokens //this is where you will hold all the types of data you wish to support vvint mIntData; //this holds all the int data vvstr mCharData; //this holds all the character data (stored as strings for convenience) }; FormatParser::FormatParser() : mFmtStr(), mFilename(), mFmt(), mFortranVars(), mRawData(), mIntData(), mCharData() {} FormatParser::FormatParser(const std::string& fmt, const std::string& fname) : mFmtStr(fmt), mFilename(fname), mFmt(), mFortranVars(), mRawData(), mIntData(), mCharData() {} //this function determines the number of times that a variable occurs //by parsing a numeric string and returning the associated output //based on the grammar size_t FormatParser::determineOccurence(const std::string& numStr) { size_t num = 0; //this case means that no number was supplied in front of the type if (numStr.empty()) { num = 1;//hence, the default is 1 } else { //attempt to parse the numeric string and find it's equivalent //integer value (since all occurences are whole numbers) size_t n = atoi(numStr.c_str()); //this case covers if the numeric string is expicitly 0 //hence, logically, it doesn't occur, set the value accordingly if (n == 0) { num = 0; } else { //set the value to its converted representation num = n; } } return num; } //from the boost::smatches, determine the set flags, store them //and return it FFlag FormatParser::setFortranFmtArgs(const boost::smatch& matches) { FFlag ffs = {0}; std::string fmt_number, fmt_type, fmt_length; fmt_number = matches[1]; fmt_type = matches[2]; fmt_length = matches[3]; ffs.type = fmt_type.c_str()[0]; ffs.number = determineOccurence(fmt_number); ffs.length = determineOccurence(fmt_length); return ffs; } //since the format string is CSV, split the string into tokens //and then, validate the tokens by attempting to match them //to the grammar (implemented as a simple regex). If the number of //validations match, everything went well: return true. Otherwise: //return false. bool FormatParser::validateFmtString() { boost::char_separator<char> sep(","); bst_tokenizer tokens(mFmtStr, sep); mFmt = FFlag(); size_t n_tokens = 0; std::string token; for(bst_tokenizer::const_iterator it = tokens.begin(); it != tokens.end(); ++it) { token = *it; boost::trim(token); //this "grammar" is based on the Fortran Format Flag Specification std::string rgx = "(\\d{0,8})([[:alpha:]])(\\d{0,8})"; boost::regex re(rgx); boost::smatch matches; if (boost::regex_match(token, matches, re, boost::match_extra)) { mFmt = setFortranFmtArgs(matches); mFortranVars.push_back(mFmt); } ++n_tokens; } return mFortranVars.size() != n_tokens ? false : true; } //Now, parse each input line from a file and try to parse and store //those variables into their associated containers. void FormatParser::parseAndStore(const std::string& line) { int offset = 0; int integer = 0; std::string varData; std::vector<int> intData; std::vector<std::string> charData; offset = 0; for (std::vector<FFlag>::const_iterator begin = mFortranVars.begin(); begin != mFortranVars.end(); ++begin) { mFmt = *begin; for (size_t i = 0; i < mFmt.number; offset += mFmt.length, ++i) { varData = line.substr(offset, mFmt.length); //now store the data, based on type: switch(mFmt.type) { case 'X': break; case 'A': charData.push_back(varData); break; case 'I': integer = atoi(varData.c_str()); intData.push_back(integer); break; default: std::cerr << "Invalid type!\n"; } } } mIntData.push_back(intData); mCharData.push_back(charData); } //Open the input file, and attempt to parse the input file line-by-line. void FormatParser::storeData() { mFmt = FFlag(); std::ifstream ifile(mFilename.c_str(), std::ios::in); std::string line; if (ifile.is_open()) { while(std::getline(ifile, line)) { parseAndStore(line); } } else { std::cerr << "Error opening input file!\n"; exit(3); } } //If character flags are set, this function will print the character data //found, line-by-line void FormatParser::printCharData() { vvstr::const_iterator it = mCharData.begin(); vstr::const_iterator jt; size_t linenum = 1; std::cout << "\nCHARACTER DATA:\n"; for (; it != mCharData.end(); ++it) { std::cout << "LINE " << linenum << " : "; for (jt = it->begin(); jt != it->end(); ++jt) { std::cout << *jt << " "; } ++linenum; std::cout << "\n"; } } //If integer flags are set, this function will print all the integer data //found, line-by-line void FormatParser::printIntData() { vvint::const_iterator it = mIntData.begin(); vint::const_iterator jt; size_t linenum = 1; std::cout << "\nINT DATA:\n"; for (; it != mIntData.end(); ++it) { std::cout << "LINE " << linenum << " : "; for (jt = it->begin(); jt != it->end(); ++jt) { std::cout << *jt << " "; } ++linenum; std::cout << "\n"; } } //Attempt to parse the input file, by first validating the format string //and then, storing the data accordingly void FormatParser::parse() { if (!validateFmtString()) { std::cerr << "Error parsing the input format string!\n"; exit(2); } else { storeData(); } } int main(int argc, char **argv) { if (argc < 3 || argc > 3) { std::cerr << "Usage: " << argv[0] << "\t<Fortran Format Specifier(s)>\t<Filename>\n"; exit(1); } else { //parse and print stuff here FormatParser parser(argv[1], argv[2]); parser.parse(); //print the data parsed (if any) parser.printIntData(); parser.printCharData(); } return 0; }
奖金
这个基本解析器也可以处理
字符
（Fortran格式标志“A”，最多8个字符）。通过编辑正则表达式并与类型一起对捕获字符串的长度执行检查，您可以扩展它以支持任何您想要的标志。
可能的改进
如果您可以使用C++11，您可以在某些地方使用
lambdas
，并用
auto
替换迭代器
如果这是在有限的内存空间中运行的，并且您必须解析一个大文件，那么由于
vectors
内部管理内存的方式，vectors将不可避免地崩溃。最好使用
deques
。有关这方面的更多信息，请参见此处讨论的内容：

而且，如果输入文件很大，并且文件I/O是一个瓶颈，则可以通过修改
ifstream
缓冲区的大小来提高性能：

讨论
您将注意到：您正在解析的类型必须在运行时已知，并且类声明和定义中必须支持任何关联的存储容器。
正如您所想象的，在一个主类中支持所有类型是没有效率的。但是，由于这是一个幼稚的解决方案，可以专门使用改进的完整解决方案来支持这些情况
另一个建议是使用。但是，由于Spirit使用了大量模板，当错误可能而且确实发生时，调试这样的应用程序并不适合胆小的人
演出
与，相比，此解决方案速度较慢：
对于10000000行随机生成的输出（124MiB文件），使用相同的行格式（“3I2，3X，I3”）：
平均壁时间为
12.946s
Jonathan Dursi的解决方案：

0m13.082s 0m13.107s 0m12.793s 0m12.851s 0m12.801s 0m12.968s 0m12.952s 0m12.886s 0m13.138s 0m12.882s

0m4.698s 0m4.650s 0m4.690s 0m4.675s 0m4.682s 0m4.681s 0m4.698s 0m4.675s 0m4.695s 0m4.696s
平均壁时间
4.684s的火焰他的速度比我的速度快至少270%，同时使用O2 但是，由于不必每次解析附加格式标志时都修改源代码，因此此解决方案更为理想注意：您可以实施涉及 #include <iostream> #include <string> #include <vector> #include <fstream> #include <cstdlib> #include <boost/regex.hpp> #include <boost/tokenizer.hpp> #include <boost/algorithm/string.hpp> #include <boost/lexical_cast.hpp> //A POD Data Structure used for storing Fortran Format Tokens into their relative forms typedef struct FortranFormatSpecifier { char type;//the type of the variable size_t number;//the number of times the variable is repeated size_t length;//the length of the variable type } FFlag; //This class implements a rudimentary parser to parse Fortran Format //Specifier Flags using Boost regexes. class FormatParser { public: //typedefs for further use with the class and class methods typedef boost::tokenizer<boost::char_separator<char> > bst_tokenizer; typedef std::vector<std::vector<std::string> > vvstr; typedef std::vector<std::string> vstr; typedef std::vector<std::vector<int> > vvint; typedef std::vector<int> vint; FormatParser(); FormatParser(const std::string& fmt, const std::string& fname); void parse(); void printIntData(); void printCharData(); private: bool validateFmtString(); size_t determineOccurence(const std::string& numStr); FFlag setFortranFmtArgs(const boost::smatch& matches); void parseAndStore(const std::string& line); void storeData(); std::string mFmtStr; //this holds the format string std::string mFilename; //the name of the file FFlag mFmt; //a temporary FFlag variable std::vector<FFlag> mFortranVars; //this holds all the flags and details of them std::vector<std::string> mRawData; //this holds the raw tokens //this is where you will hold all the types of data you wish to support vvint mIntData; //this holds all the int data vvstr mCharData; //this holds all the character data (stored as strings for convenience) }; FormatParser::FormatParser() : mFmtStr(), mFilename(), mFmt(), mFortranVars(), mRawData(), mIntData(), mCharData() {} FormatParser::FormatParser(const std::string& fmt, const std::string& fname) : mFmtStr(fmt), mFilename(fname), mFmt(), mFortranVars(), mRawData(), mIntData(), mCharData() {} //this function determines the number of times that a variable occurs //by parsing a numeric string and returning the associated output //based on the grammar size_t FormatParser::determineOccurence(const std::string& numStr) { size_t num = 0; //this case means that no number was supplied in front of the type if (numStr.empty()) { num = 1;//hence, the default is 1 } else { //attempt to parse the numeric string and find it's equivalent //integer value (since all occurences are whole numbers) size_t n = atoi(numStr.c_str()); //this case covers if the numeric string is expicitly 0 //hence, logically, it doesn't occur, set the value accordingly if (n == 0) { num = 0; } else { //set the value to its converted representation num = n; } } return num; } //from the boost::smatches, determine the set flags, store them //and return it FFlag FormatParser::setFortranFmtArgs(const boost::smatch& matches) { FFlag ffs = {0}; std::string fmt_number, fmt_type, fmt_length; fmt_number = matches[1]; fmt_type = matches[2]; fmt_length = matches[3]; ffs.type = fmt_type.c_str()[0]; ffs.number = determineOccurence(fmt_number); ffs.length = determineOccurence(fmt_length); return ffs; } //since the format string is CSV, split the string into tokens //and then, validate the tokens by attempting to match them //to the grammar (implemented as a simple regex). If the number of //validations match, everything went well: return true. Otherwise: //return false. bool FormatParser::validateFmtString() { boost::char_separator<char> sep(","); bst_tokenizer tokens(mFmtStr, sep); mFmt = FFlag(); size_t n_tokens = 0; std::string token; for(bst_tokenizer::const_iterator it = tokens.begin(); it != tokens.end(); ++it) { token = *it; boost::trim(token); //this "grammar" is based on the Fortran Format Flag Specification std::string rgx = "(\\d{0,8})([[:alpha:]])(\\d{0,8})"; boost::regex re(rgx); boost::smatch matches; if (boost::regex_match(token, matches, re, boost::match_extra)) { mFmt = setFortranFmtArgs(matches); mFortranVars.push_back(mFmt); } ++n_tokens; } return mFortranVars.size() != n_tokens ? false : true; } //Now, parse each input line from a file and try to parse and store //those variables into their associated containers. void FormatParser::parseAndStore(const std::string& line) { int offset = 0; int integer = 0; std::string varData; std::vector<int> intData; std::vector<std::string> charData; offset = 0; for (std::vector<FFlag>::const_iterator begin = mFortranVars.begin(); begin != mFortranVars.end(); ++begin) { mFmt = *begin; for (size_t i = 0; i < mFmt.number; offset += mFmt.length, ++i) { varData = line.substr(offset, mFmt.length); //now store the data, based on type: switch(mFmt.type) { case 'X': break; case 'A': charData.push_back(varData); break; case 'I': integer = atoi(varData.c_str()); intData.push_back(integer); break; default: std::cerr << "Invalid type!\n"; } } } mIntData.push_back(intData); mCharData.push_back(charData); } //Open the input file, and attempt to parse the input file line-by-line. void FormatParser::storeData() { mFmt = FFlag(); std::ifstream ifile(mFilename.c_str(), std::ios::in); std::string line; if (ifile.is_open()) { while(std::getline(ifile, line)) { parseAndStore(line); } } else { std::cerr << "Error opening input file!\n"; exit(3); } } //If character flags are set, this function will print the character data //found, line-by-line void FormatParser::printCharData() { vvstr::const_iterator it = mCharData.begin(); vstr::const_iterator jt; size_t linenum = 1; std::cout << "\nCHARACTER DATA:\n"; for (; it != mCharData.end(); ++it) { std::cout << "LINE " << linenum << " : "; for (jt = it->begin(); jt != it->end(); ++jt) { std::cout << *jt << " "; } ++linenum; std::cout << "\n"; } } //If integer flags are set, this function will print all the integer data //found, line-by-line void FormatParser::printIntData() { vvint::const_iterator it = mIntData.begin(); vint::const_iterator jt; size_t linenum = 1; std::cout << "\nINT DATA:\n"; for (; it != mIntData.end(); ++it) { std::cout << "LINE " << linenum << " : "; for (jt = it->begin(); jt != it->end(); ++jt) { std::cout << *jt << " "; } ++linenum; std::cout << "\n"; } } //Attempt to parse the input file, by first validating the format string //and then, storing the data accordingly void FormatParser::parse() { if (!validateFmtString()) { std::cerr << "Error parsing the input format string!\n"; exit(2); } else { storeData(); } } int main(int argc, char **argv) { if (argc < 3 || argc > 3) { std::cerr << "Usage: " << argv[0] << "\t<Fortran Format Specifier(s)>\t<Filename>\n"; exit(1); } else { //parse and print stuff here FormatParser parser(argv[1], argv[2]); parser.parse(); //print the data parsed (if any) parser.printIntData(); parser.printCharData(); } return 0; } g++ -Wall -std=c++98 -pedantic fortran_format_parser.cpp -lboost_regex #include <fstream> #include <cstdlib> #include <ctime> using namespace std; int main(int argc, char **argv) { srand(time(NULL)); if (argc < 2 || argc > 2) { printf("Invalid usage! Use as follows:\t<Program>\t<Output Filename>\n"); exit(1); } ofstream ofile(argv[1], ios::out); if (ofile.is_open()) { for (int i = 0; i < 10000000; ++i) { ofile << (rand() % (99-10+1) + 10) << (rand() % (99-10+1) + 10) << (rand() % (99-10+1)+10) << "---" << (rand() % (999-100+1) + 100) << endl; } } ofile.close(); return 0; } 0m13.082s 0m13.107s 0m12.793s 0m12.851s 0m12.801s 0m12.968s 0m12.952s 0m12.886s 0m13.138s 0m12.882s 0m4.698s 0m4.650s 0m4.690s 0m4.675s 0m4.682s 0m4.681s 0m4.698s 0m4.675s 0m4.695s 0m4.696s cin >> f77format("3I2, 3X, I3") >> a >> b >> c >> d;