C++ 使用AVX的浮动水平最大值或最小值？_C++_Intrinsics_Avx

C++ 使用AVX的浮动水平最大值或最小值？

c++

C++ 使用AVX的浮动水平最大值或最小值？,c++,intrinsics,avx,C++,Intrinsics,Avx,在AVX上有没有更快的方法从32位浮点向量中找到水平最小值或最大值？目前我的代码是对以下内容的修改：就我个人而言，我更喜欢可读性更强的版本。不管性能如何。如果您想要标量浮点结果，第一步就是缩小到128位向量。如中所述，与max一起使用，而不是add。此外，将float*混叠到\uu m256对象上是严格的混叠float*不是像char*和\uu m256*那样的“may\u alias”类型。 static inline float fast_hMax_ps(__m256 a){ c

在AVX上有没有更快的方法从32位浮点向量中找到水平最小值或最大值？目前我的代码是对以下内容的修改：

就我个人而言，我更喜欢可读性更强的版本。不管性能如何。如果您想要标量浮点结果，第一步就是缩小到128位向量。如中所述，与max一起使用，而不是add。此外，将

float*

混叠到

\uu m256

对象上是严格的混叠

float*

不是像

char*

和

\uu m256*

那样的“may\u alias”类型。

static inline float fast_hMax_ps(__m256 a){
    const __m256 permHalves = _mm256_permute2f128_ps(a, a, 1); // permute 128-bit values to compare floats from different halves.
    const __m256 m0 = _mm256_max_ps(permHalves, a);//compares 4 values with 4 other values ("old half against the new half")

    //now we need to find the largest of 4 values in the half:
    const __m256 perm0 = _mm256_permute_ps(m0, 0b01001110);
    const __m256 m1 = _mm256_max_ps(m0, perm0);

    const __m256 perm1 = _mm256_permute_ps(m1, 0b10110001);
    const __m256 m2 = _mm256_max_ps(perm1, m1);
    return ((float*)&m2)[0];//largest float32 from the entire vector. All entries are the same, so just grab [0]
}