C++ 多维数组任意轴上的归约（求和）_C++_Algorithm

C++ 多维数组任意轴上的归约（求和）

c++ algorithm

C++ 多维数组任意轴上的归约（求和）,c++,algorithm,C++,Algorithm,我希望沿多维矩阵的任意轴执行求和归约，该矩阵可能具有任意维（例如，10维数组的轴5）。矩阵使用行主格式存储，即作为向量以及沿每个轴的跨距我知道如何使用嵌套循环执行此缩减（请参见下面的示例），但这样做会导致硬编码轴（缩减沿下面的轴1）和任意数量的维度（下面的4）。如何在不使用嵌套循环的情况下对其进行泛化 #包括 #包括 int main（） { //矩阵的形状、步幅和数据形状[]的大小={2,3,4,5}；步幅[]={60,20,5,1}；标准：矢量数据（2*3*4*5）；对于（siz

我希望沿多维矩阵的任意轴执行求和归约，该矩阵可能具有任意维（例如，10维数组的轴5）。矩阵使用行主格式存储，即作为

向量

以及沿每个轴的跨距

我知道如何使用嵌套循环执行此缩减（请参见下面的示例），但这样做会导致硬编码轴（缩减沿下面的轴1）和任意数量的维度（下面的4）。如何在不使用嵌套循环的情况下对其进行泛化

#包括
#包括
int main（）
{
//矩阵的形状、步幅和数据
形状[]的大小={2,3,4,5}；
步幅[]={60,20,5,1}；
标准：矢量数据（2*3*4*5）；
对于（size_t i=0；istd：：cout我认为这应该有效：
#include <iostream>
#include <vector>

int main()
{
  // shape, stride & data of the matrix
  size_t shape  [] = {  2, 3, 4, 5};
  size_t strides[] = {60, 20, 5, 1};
  std::vector<double> data(2 * 3 * 4 * 5);

  size_t rshape  [] = { 2, 4, 5};
  size_t rstrides[] = {3, 5, 1};
  std::vector<double> rdata(2 * 4 * 5, 0.0);

  const unsigned int NDIM = 4;
  unsigned int axis = 1;

  for (size_t i = 0 ; i < data.size() ; ++i) data[i] = 1;

  // How many elements to advance after each reduction
  size_t step_axis = strides[NDIM - 1];
  if (axis == NDIM - 1)
  {
      step_axis = strides[NDIM - 2];
  }
  // Position of the first element of the current reduction
  size_t offset_base = 0;
  size_t offset = 0;
  size_t s = 0;
  for (auto &v : rdata)
  {
      // Current reduced element
      size_t offset_i = offset;
      for (unsigned int i = 0; i < shape[axis]; i++)
      {
          // Reduce
          v += *(data.data() + offset_i);
          // Advance to next element
          offset_i += strides[axis];
      }
      s = (s + 1) % strides[axis];
      if (s == 0)
      {
          offset_base += strides[axis - 1];
          offset = offset_base;
      }
      else
      {
          offset += step_axis;
      }
  }

  // Print
  for ( size_t a = 0 ; a < rshape[0] ; ++a )
    for ( size_t b = 0 ; b < rshape[1] ; ++b )
      for ( size_t c = 0 ; c < rshape[2] ; ++c )
        std::cout << "(" << a << "," << b << "," << c << ") " << \
        rdata[ a*rstrides[0] + b*rstrides[1] + c*rstrides[2] ] << std::endl;

  return 0;
}

设置轴=3产生：
(0,0,0) 5
(0,0,1) 5
(0,0,2) 5
(0,0,3) 5
(0,0,4) 5
(0,1,0) 5
(0,1,1) 5
(0,1,2) 5
(0,1,3) 5
(0,1,4) 5
(0,2,0) 5
(0,2,1) 5
(0,2,2) 5
(0,2,3) 5
// ...

我不知道这段代码有多高效，但在我看来，它肯定是精确的
发生什么事？
稍微调整一下步幅

：

对于轴计数=4的

而言，调整的步距具有大小5
，其中：
 adjusted_strides[0] = shape[0]*shape[1]*shape[2]*shape[3];
 adjusted_strides[1] = shape[1]*shape[2]*shape[3];
 adjusted_strides[2] = shape[2]*shape[3];
 adjusted_strides[3] = shape[3];
 adjusted_strides[4] = 1;

让我们举一个例子，其中维度数为4
，多维数组（A
）的形状为n0、n1、n2、n3

当我们需要将此数组转换为另一个多维数组（B
）的形状：n0、n2、n3
（压缩axis=1（基于0的）
），然后，我们尝试如下操作：
对于A
的每个索引，我们试图找到它在B中的位置。
假设A[i][j][k][l]
是A
中的任何元素。它在平面A
中的位置将是A[i*n1*n2*n3+j*n2*n3+k*n3+l]

idx=i*n1*n2*n3+j*n2*n3+k*n3+l；
在压缩数组B
中，此元素将是B[i][k][l]
的一部分（或添加到其中）。在flat\u B
中，索引为new\u idx=i*n2*n3+k*n3+l；
我们如何从idx
形成new\u idx
？
压缩轴之前的所有轴都具有压缩轴的形状作为其产品的一部分。在我们的示例中，我们必须删除轴1
，因此第一轴之前的所有轴（这里只有一个轴：由i
表示的0轴
）都具有n1
作为产品的一部分（i*n1*n2*n3
）
压缩轴之后的所有轴保持不受影响
最后，我们需要做两件事：
在要压缩的轴的索引之前隔离轴的索引，并删除该轴的形状：
整数除法：idx/（n1*n2*n3）；
（==idx/adjusted_-strips[1]
）
我们只剩下i
，它可以根据新形状重新调整（乘以n2*n3
）：我们得到
i*n2*n3
（==i*调整的步距[2]
）
我们在压缩轴之后隔离轴，这些轴不受其形状的影响
idx%（n2*n3）
（==idx%调整步距[2]
）
这给了我们k*n3+l

将步骤i.和ii.的结果添加到：
computed\u idx=i*n2*n3+k*n3+l；

这与new\u idx
相同。因此，我们的转换是正确的：）

代码：
注：ni
指的是new\u idx

  size_t cmp_axis = 1, axis_count = sizeof shape/ sizeof *shape;
  std::vector<size_t> adjusted_strides;
  //adjusted strides is basically same as strides
  //only difference being that the first element is the 
  //total number of elements in the n dim array.

  //The only reason to introduce this array was
  //so that I don't have to write any if-elses
  adjusted_strides.push_back(shape[0]*strides[0]);
  adjusted_strides.insert(adjusted_strides.end(), strides, strides + axis_count);
  for(size_t i = 0; i < data.size(); ++i) {
    size_t ni = i/adjusted_strides[cmp_axis]*adjusted_strides[cmp_axis+1] + i%adjusted_strides[cmp_axis+1];
    rdata[ni] += data[i];
  }

测试
要进一步阅读，请参阅。
压缩计数器（de）看起来像什么？@无用我添加了一个伪代码，说明了（de）的含义压缩计数器。@TomdeGeus为什么步长值60被删除而不是20？这是打字错误还是我误解了算法？哦，所以你不想对源进行线性扫描，将平面索引转换为n维坐标，对吧？但这不是最有效的吗？@Darhuukrstrides
包含strid因为它的形状是{2,4,5}
它的跨步应该是{rshape[1]*rshape[0]，rshape[0]，1}=={20,5,1}
。我添加了一些评论来强调这一点。这非常聪明，正是我要寻找的。我想我理解正在做的事情，但作为将来的参考，我确实认为这篇文章将受益于对ni
@TomdeGeus添加的解释。：）我找到了打破它的方法；）（）。然而，我不明白它为什么会被破坏…@TomdeGeus有趣的是，你方出现了一个小错误（使用MAX_DIM
而不是m_ndim
），我的错误更为严重（正确的偏移量计算）。我修复了答案，修复的代码是。无论如何，即使复杂度相当（在第一个完整的张量中迭代一次），另一个答案应该更好，缓存方面。感谢修复！事实上，我希望在我的非现场实现中，MAX\u DIM
和m\u ndim
是等效的，因为我会一直走到MAX\u DIM。
(0,0,0) 5
(0,0,1) 5
(0,0,2) 5
(0,0,3) 5
(0,0,4) 5
(0,1,0) 5
(0,1,1) 5
(0,1,2) 5
(0,1,3) 5
(0,1,4) 5
(0,2,0) 5
(0,2,1) 5
(0,2,2) 5
(0,2,3) 5
// ...

 adjusted_strides[0] = shape[0]*shape[1]*shape[2]*shape[3];
 adjusted_strides[1] = shape[1]*shape[2]*shape[3];
 adjusted_strides[2] = shape[2]*shape[3];
 adjusted_strides[3] = shape[3];
 adjusted_strides[4] = 1;

  size_t cmp_axis = 1, axis_count = sizeof shape/ sizeof *shape;
  std::vector<size_t> adjusted_strides;
  //adjusted strides is basically same as strides
  //only difference being that the first element is the 
  //total number of elements in the n dim array.

  //The only reason to introduce this array was
  //so that I don't have to write any if-elses
  adjusted_strides.push_back(shape[0]*strides[0]);
  adjusted_strides.insert(adjusted_strides.end(), strides, strides + axis_count);
  for(size_t i = 0; i < data.size(); ++i) {
    size_t ni = i/adjusted_strides[cmp_axis]*adjusted_strides[cmp_axis+1] + i%adjusted_strides[cmp_axis+1];
    rdata[ni] += data[i];
  }

(0,0,0) 3
(0,0,1) 3
(0,0,2) 3
(0,0,3) 3
(0,0,4) 3
(0,1,0) 3
(0,1,1) 3
(0,1,2) 3
(0,1,3) 3
(0,1,4) 3
(0,2,0) 3
(0,2,1) 3
(0,2,2) 3
(0,2,3) 3
(0,2,4) 3
(0,3,0) 3
(0,3,1) 3
(0,3,2) 3
...