C++11 在编译时生成函数_C++11

C++11 在编译时生成函数

c++11

C++11 在编译时生成函数,c++11,C++11,我有一个形象。每个像素都包含有关RGB强度的信息。现在我想对这些通道的强度求和，但我也想选择要求和的通道强度。Straightforwad的实现如下所示： int intensity(const unsiged char* pixel, bool red, bool green, bool blue){ return 0 + (red ? pixel[0] : 0) + (green ? pixel[1] : 0) + (blue ? pixel[2] : 0); } 因为我将为图像中

我有一个形象。每个像素都包含有关RGB强度的信息。现在我想对这些通道的强度求和，但我也想选择要求和的通道强度。Straightforwad的实现如下所示：

int intensity(const unsiged char* pixel, bool red, bool green, bool blue){
    return 0 + (red ? pixel[0] : 0) + (green ? pixel[1] : 0) + (blue ? pixel[2] : 0);
}

因为我将为图像中的每个像素调用此函数，如果可以的话，我想放弃所有条件。所以我想我必须对每种情况都有一个函数：

std::function<int(const unsigned char* pixel)> generateIntensityAccumulator(
    const bool& accumulateRChannel,
    const bool& accumulateGChannel,
    const bool& accumulateBChannel)
    {
    if (accumulateRChannel && accumulateGChannel && accumulateBChannel){
            return [](const unsigned char* pixel){
                return static_cast<int>(pixel[0]) + static_cast<int>(pixel[1]) + static_cast<int>(pixel[2]);
            };
        }

        if (!accumulateRChannel && accumulateGChannel && accumulateBChannel){
            return [](const unsigned char* pixel){
                return static_cast<int>(pixel[1]) + static_cast<int>(pixel[2]);
            };
        }

        if (!accumulateRChannel && !accumulateGChannel && accumulateBChannel){
            return [](const unsigned char* pixel){
                return static_cast<int>(pixel[2]);
            };
        }

        if (!accumulateRChannel && !accumulateGChannel && !accumulateBChannel){
            return [](const unsigned char* pixel){
                return 0;
            };
        }

        if (accumulateRChannel && !accumulateGChannel && !accumulateBChannel){
            return [](const unsigned char* pixel){
                return static_cast<int>(pixel[0]);
            };
        }

        if (!accumulateRChannel && accumulateGChannel && !accumulateBChannel){
            return [](const unsigned char* pixel){
                return static_cast<int>(pixel[1]);
            };
        }

        if (accumulateRChannel && !accumulateGChannel && accumulateBChannel){
            return [](const unsigned char* pixel){
                return static_cast<int>(pixel[0]) + static_cast<int>(pixel[2]);
            };
        }

        if (accumulateRChannel && accumulateGChannel && !accumulateBChannel){
            return [](const unsigned char* pixel){
                return static_cast<int>(pixel[0]) + static_cast<int>(pixel[1]);
            };
        }
    }

但对于这样一个简单的任务，需要大量的编写工作，我觉得有更好的方法来完成这一任务：例如，让编译器为我做一件肮脏的工作，并生成所有上述案例。有人能给我指出正确的方向吗？

像这样使用

std:：function

会让你付出高昂的代价，因为你不会让编译器有机会通过内联来优化它

您所要做的是很好地处理模板。由于您使用整数，表达式本身可能会被优化掉，因此无需编写每个版本的专门化。看看这个例子：

#include <array>
#include <chrono>
#include <iostream>
#include <random>
#include <vector>

template <bool AccumulateR, bool AccumulateG, bool AccumulateB>
inline int accumulate(const unsigned char *pixel) {
  static constexpr int enableR = static_cast<int>(AccumulateR);
  static constexpr int enableG = static_cast<int>(AccumulateG);
  static constexpr int enableB = static_cast<int>(AccumulateB);
  return enableR * static_cast<int>(pixel[0]) +
         enableG * static_cast<int>(pixel[1]) +
         enableB * static_cast<int>(pixel[2]);
}

int main(void) {
  std::vector<std::array<unsigned char, 3>> pixels(
      1e7, std::array<unsigned char, 3>{0, 0, 0});

  // Fill up with randomness
  std::random_device rd;
  std::uniform_int_distribution<unsigned char> dist(0, 255);
  for (auto &pixel : pixels) {
    pixel[0] = dist(rd);
    pixel[1] = dist(rd);
    pixel[2] = dist(rd);
  }

  // Measure perf
  using namespace std::chrono;

  auto t1 = high_resolution_clock::now();
  int sum1 = 0;
  for (auto const &pixel : pixels)
    sum1 += accumulate<true, true, true>(pixel.data());
  auto t2 = high_resolution_clock::now();
  int sum2 = 0;
  for (auto const &pixel : pixels)
    sum2 += accumulate<false, true, false>(pixel.data());
  auto t3 = high_resolution_clock::now();

  std::cout << "Sum 1 " << sum1 << " in "
            << duration_cast<milliseconds>(t2 - t1).count() << "ms\n";
  std::cout << "Sum 2 " << sum2 << " in "
            << duration_cast<milliseconds>(t3 - t2).count() << "ms\n";
}

请注意，这里有一个溢出，您可能需要使用比

int

更大的值。一个

uint64\u t

就可以了。如果您检查汇编代码，您将看到函数的两个版本是以不同的方式内联和优化的。

首先要做的事情。不要编写只需要一个

像素的std:：函数
；写一个取连续范围像素
s（像素扫描线）的值
其次，您要编写一个模板版本的强度：
template<bool red, bool green, bool blue>
int intensity(const unsiged char* pixel){
  return (red ? pixel[0] : 0) + (green ? pixel[1] : 0) + (blue ? pixel[2] : 0);
}

我们现在可以生成扫描线强度计算器：
int(*)( const unsigned char* pel, std::size_t pixels )
scanline_intensity(bool red, bool green, bool blue) {
  static const auto table[] = {
    sum_intensity<0b000>, sum_intensity<0b001>,
              sum_intensity<0b010>, sum_intensity<0b011>,
    sum_intensity<0b100>, sum_intensity<0b101>,
              sum_intensity<0b110>, sum_intensity<0b111>,
  };
  std::size_t index = red + green*2 + blue*4;
  return sum_intensity[index];
}

int（*）（常量无符号字符*pel，标准：：大小\u t像素）
扫描线强度（布尔红、布尔绿、布尔蓝）{
静态常数自动表[]={
和强度，和强度，
和强度，和强度，
和强度，和强度，
和强度，和强度，
};
标准：尺寸指数=红色+绿色*2+蓝色*4；
返回和强度[指数]；
}

完成了
这些技术可以是通用的，但您不需要通用的
如果你的像素步长不是3（比如说有一个alpha通道），sum\u intensity
需要传递它（理想情况下作为模板参数）。你实际测试过上述性能吗？我感到惊讶的是，将简单的布尔测试移出循环应该如此重要，因为处理器通常通过假设“与上次相同的结果”来优化分支……我承认我没有——我只是假设dicarding条件将产生更好的性能结果。我读过关于分支预说明（）的文章，我想它在我的案例中会起作用。谢谢我是说，我可能错了。。。我会考虑在做任何复杂的事情之前运行一个性能测试。
template<bool red, bool green, bool blue>
int intensity(const unsiged char* pixel){
  return (red ? pixel[0] : 0) + (green ? pixel[1] : 0) + (blue ? pixel[2] : 0);
}

template<std::size_t index>
int intensity(const unsiged char* pixel){
  return intensity< index&1, index&2, index&4 >(pixel);
}

template<std::size_t index, std::size_t pixel_stride=3>
int sum_intensity(const unsiged char* pixel, std::size_t count){
  int value = 0;
  while(count--) {
    value += intensity<index>(pixel);
    pixel += pixel_stride;
  }
  return value;
}

int(*)( const unsigned char* pel, std::size_t pixels )
scanline_intensity(bool red, bool green, bool blue) {
  static const auto table[] = {
    sum_intensity<0b000>, sum_intensity<0b001>,
              sum_intensity<0b010>, sum_intensity<0b011>,
    sum_intensity<0b100>, sum_intensity<0b101>,
              sum_intensity<0b110>, sum_intensity<0b111>,
  };
  std::size_t index = red + green*2 + blue*4;
  return sum_intensity[index];
}