C++ C++；循环融合优化算法_C++_Compiler Optimization

C++ C++；循环融合优化算法

c++

C++ C++；循环融合优化算法,c++,compiler-optimization,C++,Compiler Optimization,我发现偶尔在我的代码中，我会有一个数据结构，我想从中得到两个或更多的值，每个值都可以使用标准算法提取。问题是，使用标准算法意味着对数据执行多个循环。以下面的示例为例，其中我有一个向量，希望得到总和、高于某个阈值的第一个元素的值以及高于该阈值的元素总数： constexpr auto GetValuesSTL(const std::vector<int>& testdata) { constexpr auto value_above_threshold = [](con

我发现偶尔在我的代码中，我会有一个数据结构，我想从中得到两个或更多的值，每个值都可以使用标准算法提取。问题是，使用标准算法意味着对数据执行多个循环。以下面的示例为例，其中我有一个

向量

，希望得到总和、高于某个阈值的第一个元素的值以及高于该阈值的元素总数：

constexpr auto GetValuesSTL(const std::vector<int>& testdata)
{
    constexpr auto value_above_threshold = [](const auto value) constexpr { return value > THRESHOLD; };
    constexpr auto optional_from_iterator = [](const auto it, const auto end) constexpr { return it != end ? std::make_optional(*it) : std::nullopt; };

    return std::make_tuple(
        std::accumulate(testdata.begin(), testdata.end(), 0L),
        optional_from_iterator(std::find_if(testdata.begin(), testdata.end(), value_above_threshold), testdata.end()),
        std::count_if(testdata.begin(), testdata.end(), value_above_threshold) );
}

constexpr auto GetValuesSTL（const std:：vector&testdata）
{
constexpr auto value_高于_阈值=[]（const auto value）constexpr{return value>threshold；}；
constexpr auto optional_from_iterator=[]（const auto it，const auto end）constexpr{返回它！=end？std:：make_optional（*it）：std:：nullopt；}；
返回std:：make_tuple(
std:：累加（testdata.begin（），testdata.end（），0L），
可选的\u来自\u迭代器（std:：find\u if（testdata.begin（）、testdata.end（）、高于\u阈值的值）、testdata.end（），
std:：count_if（testdata.begin（），testdata.end（），value_高于_阈值））；
}

但我可以将其作为原始循环更有效地编写：

constexpr auto GetValuesRawLoop(const std::vector<int>& testdata)
{
    auto sum = 0L;
    std::optional<int> first_above_threshold = std::nullopt;
    auto num_above_threshold = 0;

    auto it = testdata.begin();
    for (; it != testdata.end(); ++it)
    {
        const auto value = *it;

        if (value > THRESHOLD)
        {
            first_above_threshold = value;
            break;
        }

        sum += value;
    }
    for (; it != testdata.end(); ++it)
    {
        const auto value = *it;

        sum += value;

        if (value > THRESHOLD)
        {
            ++num_above_threshold;
        }
    }

    return std::make_tuple( sum, first_above_threshold, num_above_threshold );
}

constexpr自动获取值rawloop（const std:：vector&testdata）
{
自动求和=0L；
std:：可选的第一个\u高于\u阈值=std:：nullopt；
自动数值高于阈值=0；
auto it=testdata.begin（）；
for（；it！=testdata.end（）；++it）
{
const auto value=*it；
如果（值>阈值）
{
第一个\u高于\u阈值=值；
打破
}
总和+=数值；
}
for（；it！=testdata.end（）；++it）
{
const auto value=*it；
总和+=数值；
如果（值>阈值）
{
++高于阈值的数值；
}
}
返回std：：make_tuple（总和，第一个高于阈值的值，num高于阈值的值）；
}

我希望编译器能够将算法调用融合到一个循环中，因为它有足够的信息知道向量没有被修改，而是在随机生成的int的不同长度向量上分析这两个函数（使用

g++-9-O3

编译）显示了STL版本的函数始终需要大约2-2.5倍于原始循环的时间，这与没有循环融合时的预期完全相同

编译器不能/不应用此优化是否有充分的理由？是否需要某种假设才能融合编译器不允许进行的循环？或者，这是一件从根本上很难检测和应用的事情？有没有其他方法可以像原始循环一样高效，像算法版本一样富有表现力呢？

我只想回答你问题的最后一部分：

有没有一种替代的方法可以像原始循环和算法版本一样有表现力吗

根据你认为的“表达性”，使用单一的STD：：累加可能是这样的：

constexpr auto GetValuesACC(const std::vector<int>& testdata)
    auto accumulator = [](std::tuple<int, std::optional<int>, int> init, int val) {
        return std::make_tuple(
            std::get<0>(init) + val, 
            std::get<1>(init).has_value() ? std::get<1>(init) : 
                (val > THRESHOLD ? std::make_optional(val) : std::nullopt),
            std::get<2>(init) + int(val > THRESHOLD));
};

    return std::accumulate(testdata.begin(), testdata.end(),
        std::make_tuple(0, std::nullopt, 0), accumulator);
}

新结果：

GetValuesRawLoop2: 0.3, 0.3, 0.6
GetValuesSTL2:     4.4, 4.4, 4.5
GetValuesACC2:     1.7, 1.8, 1.9

新摘要：

在我的LLVM版本中，optional的实现在这种情况下很慢
- 但并不真正影响GetValuesSTL
单个累加器比3个循环快，但比原始循环慢

与往常一样，“你的里程数可能会有所不同！”这些结果强调了为什么在尝试优化之前，在现实世界中分析使用情况是很重要的。

实际上我已经有了一些非常类似的东西，我已经写了下来进行比较。它的性能比其他两个选项都要差，我认为这是因为对

optional:：has_value

有许多冗余调用（这就是为什么我的原始循环具有双循环-如果我将其作为单循环编写，那么这些冗余分支会对性能产生巨大影响）@JohnIlacqua我尝试了一些基准测试，并添加了一个没有std:：optional的版本。到目前为止，你的理论认为，两个手动循环更有效。如果我能让gprof/pprof正常工作，我将试着看看为什么GetValuesAC2仍然比GetRawLoop2慢。有趣的是

std:：optional

那么慢——我假设它只是额外的分支。使用随机整数而不是顺序整数如何改变基准？元组的sum元素可能应该是

长的

，以避免溢出。

GetValuesRawLoop2: 0.3, 0.3, 0.6
GetValuesSTL2:     4.4, 4.4, 4.5
GetValuesACC2:     1.7, 1.8, 1.9