C++；Python数组字符串的高效解析_Python_C++_Arrays_Regex

C++；Python数组字符串的高效解析

python c++ arrays regex

C++；Python数组字符串的高效解析,python,c++,arrays,regex,Python,C++,Arrays,Regex,我有一个从一系列Python数组创建的文件。我正在从ifstream加载它。该文件是文本，只包含数组。其形式如下： [[1 22 333 ... 9 2 2 2 ... 2] ... [5 6 2 ... 222 5 5 5 ... 240]] [[2 3 444 ... 9] ... [5 6 2 ... 222 5 5 5 ... 240]] [[ etc... 每个数组的每一行都以[开始，以]结束，但可以在文件中分成几行（即，在打开和关闭[

我有一个从一系列Python数组创建的文件。我正在从

ifstream

加载它。该文件是文本，只包含数组。其形式如下：

[[1 22 333 ... 9
  2 2 2    ... 2]
 ...    
 [5 6 2 ... 222
  5 5 5 ... 240]]

[[2 3 444 ... 9]
 ...    
 [5 6 2 ... 222
  5 5 5 ... 240]]

[[ etc...

每个数组的每一行都以

开始，以

结束，但可以在文件中分成几行（即，在打开和关闭

[]

中有回车符或换行符）。整个阵列以括号开始和结束

[]

数字的类型将始终为整数。每行中的条目数（即列数）对于特定数组的每一行都是相同的，但不同数组之间的行数可能不同。数组中的行数未知，并且在数组之间可能不同。在打开文件之前，每个文件的数组总数也未知

数组可以以任何格式存储。在本例中，让我们将它们放在向量向量向量中，即

typedef vector<vector<int>> myArray;  //Index [row][col]
typedef vector<myArray> myArrays;

typedef向量myArray；//索引[行][col]
typedef向量数组；

我想高效地解析它（可能是非常大的文件，最有可能是很多文件）。我的老板非常喜欢使用

std:：regex

来解析它，只要它是高效的，我就喜欢它

因此，我的问题是：如何使用regex高效地解析它。有没有一种方法可以在没有regex的情况下更高效地解析它？

是有效的，因为它分析了字符串的一部分，并准确地告诉了分析结束的位置，这样您就可以在不提取子字符串的情况下继续进行分析。此外，文件中的说明说：

不同于C++和C库中的其他解析函数，STD:：是独立于区域设置、非分配和非抛出的。只有一小部分其他库（例如提供了std:：sscanf）。这是为了允许最快的在通用高吞吐量系统中有用的可能实现上下文，例如基于文本的交换（JSON或XML）

下面是一个解析数据的尝试

/**
  g++ -std=c++17 -o prog_cpp prog_cpp.cpp \
      -pedantic -Wall -Wextra -Wconversion -Wno-sign-conversion \
      -g -O0 -UNDEBUG -fsanitize=address,undefined
**/

#include <iostream>
#include <sstream>
#include <charconv>
#include <cctype>
#include <string>
#include <vector>
#include <stdexcept>

using MyRow = std::vector<int>;
using MyArray = std::vector<MyRow>;

std::vector<MyArray>
parse_arrays(std::istream &input_stream)
{
  auto arrays=std::vector<MyArray>{};
  auto line=std::string{};
  for(auto depth=0, line_count=1;
      std::getline(input_stream, line);
      ++line_count)
  {
    for(const auto *first=data(line), *last=first+size(line);
        first!=last;)
    {
      // try first to consume all well known characters
      for(auto c=*first; std::isspace(c)||(c=='[')||(c==']'); c=*(++first))
      {
        switch(c)
        {
          case '[': // opening a row or an array
          {
            switch(++depth)
            {
              case 1:
              {
                arrays.emplace_back(MyArray{});
                break;
              }
              case 2:
              {
                arrays.back().emplace_back(MyRow{});
                break;
              }
              default:
              {
                const auto pfx="line "+std::to_string(line_count);
                throw std::runtime_error{pfx+": too deep"};
              }
            }
            break;
          }
          case ']': // closing a row or an array
          {
            switch(--depth)
            {
              case 0:
              {
                // nothing more to be done
                break;
              }
              case 1:
              {
                const auto &a=arrays.back();
                const auto sz=size(a);
                if((sz>1)&&(size(a[sz-1])!=size(a[sz-2])))
                {
                  const auto pfx="line "+std::to_string(line_count);
                  throw std::runtime_error{pfx+": row length mismatch"};
                }
                break;
              }
              default:
              {
                const auto pfx="line "+std::to_string(line_count);
                throw std::runtime_error{pfx+": ] mismatch"};
              }
            }
            break;
          }
          default: // a separator
          {
            // nothing more to be done
          }
        }
      }
      // the other characters probably represent an integer
      auto value=int{};
      if(auto [p, ec]=std::from_chars(first, last, value); ec==std::errc())
      {
        if(depth!=2)
        {
          const auto pfx="line "+std::to_string(line_count);
          throw std::runtime_error{pfx+": depth mismatch"};
        }
        arrays.back().back().emplace_back(value);
        first=p;
      }
      else
      {
        if(p!=first)
        {
          const auto pfx="line "+std::to_string(line_count);
          throw std::runtime_error{pfx+": integer out of range"};
        }
        else if(first!=last)
        {
          const auto pfx="line "+std::to_string(line_count);
          throw std::runtime_error{pfx+": unexpected char <"+*first+'>'};
        }
      }
    }
  }
  return arrays;
}

int
main()
{
  auto input=std::istringstream{R"(
[[1 22 333  9
  2 2 2     2]
     
 [5 6 2  222
  5 5 5  240]]

[[2 3 444  9]
     
 [5 6 2  222]]
)"};
  const auto arrays=parse_arrays(input);
  for(const auto &a: arrays)
  {
    for(const auto &r: a)
    {
      for(const auto &c: r)
      {
        std::cout << c << ' ';
      }
      std::cout << '\n';
    }
    std::cout << "~~~~~~~~~~~~~~~~\n";
  }
  return 0;
}

/**
1 22 333 9 2 2 2 2 
5 6 2 222 5 5 5 240 
~~~~~~~~~~~~~~~~
2 3 444 9 
5 6 2 222 
~~~~~~~~~~~~~~~~
**/

/**
g++-std=c++17-o prog_cpp prog_cpp.cpp\
-pedantic-Wall-Wextra-Wconversion-Wno符号转换\
-g-O0-UNDEBUG-fsanizize=地址，未定义
**/
#包括
#包括
#包括
#包括
#包括
#包括
#包括
使用MyRow=std:：vector；
使用MyArray=std:：vector；
向量
解析_数组（std:：istream和input_流）
{
自动数组=std:：vector{}；
auto line=std:：字符串{}；
对于（自动深度=0，行计数=1；
std:：getline（输入流，行）；
++行（单位计数）
{
对于（常数自动*第一个=数据（行），*最后一个=第一个+大小（行）；
第一个！=最后一个；）
{
//首先尝试使用所有已知字符
对于（auto c=*first；std:：isspace（c）| |（c=='['）| |（c==']'）；c=*（++first））
{
开关（c）
{
案例“[”：//打开行或数组
{
开关（++深度）
{
案例1：
{
array.emplace_back（MyArray{}）；
打破
}
案例2：
{
arrays.back（）.emplace_back（MyRow{}）；
打破
}
违约：
{
const auto pfx=“line”+std:：to_字符串（行计数）；
抛出std:：runtime_错误{pfx+“：太深”}；
}
}
打破
}
案例']'：//关闭行或数组
{
开关（--深度）
{
案例0：
{
//没有更多的事情要做
打破
}
案例1：
{
const auto&a=arrays.back（）；
const auto sz=尺寸（a）；
如果（（sz>1）和&（大小（a[sz-1]）！=大小（a[sz-2]））
{
const auto pfx=“line”+std:：to_字符串（行计数）；
抛出std:：runtime_错误{pfx+“：行长度不匹配”}；
}
打破
}
违约：
{
const auto pfx=“line”+std:：to_字符串（行计数）；
抛出std:：runtime_错误{pfx+“：]不匹配“}；
}
}
打破
}
默认值：//分隔符
{
//没有更多的事情要做
}
}
}
//其他字符可能表示一个整数
自动值=int{}；
if（auto[p，ec]=std:：from_chars（first，last，value）；ec==std:：errc（）
{
如果（深度！=2）
{
const auto pfx=“line”+std:：to_字符串（行计数）；
抛出std:：runtime_错误{pfx+“：深度不匹配”}；
}
arrays.back（）.back（）.emplace_back（值）；
第一个=p；
}
其他的
{
如果（p！=第一个）
{
const auto pfx=“line”+std:：to_字符串（行计数）；
抛出std:：runtime_错误{pfx+“：整数超出范围”}；
}
否则如果（第一个！=最后一个）
{
const auto pfx=“line”+std:：to_字符串（行计数）；
抛出std:：运行时_错误{pfx+"你可以使用Python的源码，最简单的方法是改变Python端的输出，使用一个在C++中得到很好支持的结构化格式。y我问了这个问题。这是我老板的主意，我正试图确定这一点。没有任何选项可以更改文件或它们从Python输出的方式。我真的不明白这篇文章的降价率。我如何才能更好地编写这个问题，或者可以添加哪些细节？