C++ C++；使用ints的模板化数组运算符[]_C++_Templates_Operator Overloading_Swizzling

C++ C++；使用ints的模板化数组运算符[]

c++ templates

C++ C++；使用ints的模板化数组运算符[],c++,templates,operator-overloading,swizzling,C++,Templates,Operator Overloading,Swizzling,我试图操纵一个特殊的结构，我需要某种swizzle操作符。为此，有一个重载数组[]操作符是有意义的，但我不想有任何分支，因为结构的特定规范允许理论上的解决方法当前，结构如下所示： struct f32x4 { float fLow[2]; float fHigh[2]; f32x4(float a, float b, float c, float d) { fLow[0] = a; fLow[1] = b; f

我试图操纵一个特殊的结构，我需要某种swizzle操作符。为此，有一个重载数组

[]

操作符是有意义的，但我不想有任何分支，因为结构的特定规范允许理论上的解决方法

当前，结构如下所示：

struct f32x4
{
    float fLow[2];
    float fHigh[2];

    f32x4(float a, float b, float c, float d)
    {
        fLow[0] = a; 
        fLow[1] = b;
        fHigh[0] = c;
        fHigh[1] = d;
    }

    // template with an int here?
    inline float& operator[] (int x) {
        if (x < 2)
            return fLow[x];
        else
            return fHigh[x - 2];
    }
};

对应的C++代码应该是这个代码：

 inline const float& operator[](const unsigned& idx) const
        {
            if (idx == 0)  return xy[0];
            if (idx == 1)  return xy[1];
            if (idx == 2)  return zw[0];
            if (idx == 3)  return zw[1];
            return 0.f;
        }

创建一个包含所有4个元素的数组（或向量），流值占据前两个位置，然后在第二个位置占据高位。然后索引到它里面

inline float& operator[] (int x) {
    return newFancyArray[x]; //But do some bounds checking above.
}

说真的，别这样！！只需组合阵列。但既然你问了这个问题，下面是一个答案：

#include <iostream>

float fLow [2] = {1.0,2.0};
float fHigh [2] = {50.0,51.0};

float * fArrays[2] = {fLow, fHigh};

float getFloat (int i)
{
    return fArrays[i>=2][i%2];
}

int main()
{
    for (int i = 0; i < 4; ++i)
        std::cout << getFloat(i) << '\n';
    return 0;
}

索引

要么是运行时变量，要么是编译时常量

如果它是一个编译时常量，那么优化器很有可能在内联
```
操作符[]
```
时修剪死分支

如果它是一个运行时变量，比如

for (int i=0; i<4; ++i) { dosomething(f[i]); }

g++-O3-S

.globl _Z3fooR5f32x4
        .type       _Z3fooR5f32x4, @function
_Z3fooR5f32x4:
.LFB4:
        .cfi_startproc
        movss       (%rdi), %xmm0
        addss       4(%rdi), %xmm0
        addss       8(%rdi), %xmm0
        addss       12(%rdi), %xmm0
        ret
        .cfi_endproc

由于您在注释中说过索引始终是模板参数，因此您确实可以在编译时而不是运行时进行分支。下面是一个可能的解决方案，使用

std:：enable_if

：

#include <iostream>
#include <type_traits>

struct f32x4
{
    float fLow[2];
    float fHigh[2];

    f32x4(float a, float b, float c, float d)
    {
        fLow[0] = a; 
        fLow[1] = b;
        fHigh[0] = c;
        fHigh[1] = d;
    }

    template <int x>
    float& get(typename std::enable_if<(x >= 0 && x < 2)>::type* = 0)
    {
        return fLow[x];
    }

    template <int x>
    float& get(typename std::enable_if<(x >= 2 && x < 4)>::type* = 0)
    {
        return fHigh[x-2];
    }
};

int main()
{
    f32x4 f(0.f, 1.f, 2.f, 3.f);

    std::cout << f.get<0>() << " " << f.get<1>() << " "
              << f.get<2>() << " " << f.get<3>(); // prints 0 1 2 3
}

#包括
#包括
结构f32x4
{
浮子流[2]；
浮动fHigh[2]；
f32x4（浮子a、浮子b、浮子c、浮子d）
{
流量[0]=a；
流量[1]=b；
fHigh[0]=c；
fHigh[1]=d；
}
模板
float&get（typename std:：enable_if=0&&x<2）>：：type*=0）
{
回流[x]；
}
模板
float&get（typename std:：enable_if=2&&x<4）>：：type*=0）
{
返回fHigh[x-2]；
}
};
int main（）
{
F32x4F（0.f，1.f，2.f，3.f）；
std:：cout基于Luc Touraille的回答，由于缺乏编译器支持，我没有使用类型特征，因此我发现以下内容可以达到问题的目的。由于运算符[]不能用int参数模板化并按语法工作，因此我引入了一个at
方法。结果如下：
struct f32x4
{
    float fLow[2];
    float fHigh[2];

    f32x4(float a, float b, float c, float d)
    {
        fLow[0] = a; 
        fLow[1] = b;
        fHigh[0] = c;
        fHigh[1] = d;
    }


    template <unsigned T>
    const float& at() const;

};
template<>
const float& f32x4::at<0>() const { return fLow[0]; }
template<>
const float& f32x4::at<1>() const { return fLow[1]; }
template<>
const float& f32x4::at<2>() const { return fHigh[0]; }
template<>
const float& f32x4::at<3>() const { return fHigh[1]; }

结构f32x4
{
浮子流[2]；
浮动fHigh[2]；
f32x4（浮子a、浮子b、浮子c、浮子d）
{
流量[0]=a；
流量[1]=b；
fHigh[0]=c；
fHigh[1]=d；
}
模板
常量float&at（）常量；
};
模板
常量float&f32x4:：at（）常量{返回流[0]；}
模板
常量float&f32x4:：at（）常量{返回流[1]；}
模板
常量float&f32x4:：at（）常量{return fHigh[0]；}
模板
常量float&f32x4:：at（）常量{return fHigh[1]；}

你能详细说明一下“但是我不想有任何分支，因为结构的特定规范允许理论上的解决方法吗？”@MarkB oops，是的，修复了这个错误。当然，没有断言（x<4）出于简洁的原因，@piokuc，即在编译时执行它-因为只有4个可能的x值与该类的实例一起工作。这个问题感觉非常本地化。@ahenderson-设置是本地化的，但对我来说，这似乎是一个关于优化技术的合理问题。OP已经说过这不是一个选项（在对Mark B的回答的评论中）我们同时说了；）这似乎不公平，只有你们中的一个人得到了评论；）我没有分析它，但由于某种奇怪的原因，程序集输出似乎出现了分支（只有模板int参数被用作[]操作符的输入，这应该算是编译器优化的一个很好的候选者）…最后我会看看我能用它做些什么。我刚刚检查过，

-O3

让我完成了内联和常量折叠[在x86上使用GCC4.5.1]。你的调用站点是什么样子的？我更新了代码。程序集输出是在我调用一个简单的printf后编写的：

printf（“%f%f%f”，v0[0]，v0[1]，v0[2]，v0[3]）

编译器的策略是通过最大的内联来优化速度。如果我没有大错特错的话，仍然有可能会花费的分支，对吗？我在编译器输出中看到了分支-但我不能判断这是否是一个越轨的实例化…你能不能也显示调用站点和优化级别？好的，没问题。如果你优化器无法管理不断的折叠，我同意Luc的答案可能是最好的选择。我不确定用间接寻址替换分支是否正是OP需要的（假设动机是速度）据我所知，我可以将其标记为一个真正的宝石，并给出至少+5的答案。尽管它在x86平台上使用相当不错的编译器（g++，VS cl）工作得非常好，但它不适用于我特定的编译器/平台（似乎它不支持类型特征）.尽管如此，所有其他答案也提供了基本的提示-最后，我会接受这一条，因为它的内容已经过时了。非常感谢。

.globl _Z3fooR5f32x4
        .type       _Z3fooR5f32x4, @function
_Z3fooR5f32x4:
.LFB4:
        .cfi_startproc
        movss       (%rdi), %xmm0
        addss       4(%rdi), %xmm0
        addss       8(%rdi), %xmm0
        addss       12(%rdi), %xmm0
        ret
        .cfi_endproc

#include <iostream>
#include <type_traits>

struct f32x4
{
    float fLow[2];
    float fHigh[2];

    f32x4(float a, float b, float c, float d)
    {
        fLow[0] = a; 
        fLow[1] = b;
        fHigh[0] = c;
        fHigh[1] = d;
    }

    template <int x>
    float& get(typename std::enable_if<(x >= 0 && x < 2)>::type* = 0)
    {
        return fLow[x];
    }

    template <int x>
    float& get(typename std::enable_if<(x >= 2 && x < 4)>::type* = 0)
    {
        return fHigh[x-2];
    }
};

int main()
{
    f32x4 f(0.f, 1.f, 2.f, 3.f);

    std::cout << f.get<0>() << " " << f.get<1>() << " "
              << f.get<2>() << " " << f.get<3>(); // prints 0 1 2 3
}

struct f32x4
{
    float fLow[2];
    float fHigh[2];

    f32x4(float a, float b, float c, float d)
    {
        fLow[0] = a; 
        fLow[1] = b;
        fHigh[0] = c;
        fHigh[1] = d;
    }


    template <unsigned T>
    const float& at() const;

};
template<>
const float& f32x4::at<0>() const { return fLow[0]; }
template<>
const float& f32x4::at<1>() const { return fLow[1]; }
template<>
const float& f32x4::at<2>() const { return fHigh[0]; }
template<>
const float& f32x4::at<3>() const { return fHigh[1]; }