C++ 约束算法优化C++;

C++ 约束算法优化C++;,c++,algorithm,optimization,C++,Algorithm,Optimization,我似乎无法找到进一步优化约束算法的方法。到目前为止,我已经切换到SIMD sse4向量运算,这几乎使性能提高了一倍。我希望能有更多我根本看不到的表现 以下是算法: FVector4 rest(REST_DISTANCE); // Set the constant simd vector outside the loop FVector4 stiffness(STIFFNESS); for (int i = 0; i < TEST_SIZE; i++) { const auto&a

我似乎无法找到进一步优化约束算法的方法。到目前为止,我已经切换到SIMD sse4向量运算,这几乎使性能提高了一倍。我希望能有更多我根本看不到的表现

以下是算法:

FVector4 rest(REST_DISTANCE); // Set the constant simd vector outside the loop
FVector4 stiffness(STIFFNESS);

for (int i = 0; i < TEST_SIZE; i++)
{
    const auto& constraint = Constraints[i];
    auto& position1 = Particle[constraint.first].Position;
    auto& position2 = Particle[constraint.second)].Position;

    for (int j = 0; j < ITERATION_COUNT; j++)
    {
        auto delta = position2 - position1;
        auto distance = delta.norm();

        auto correctionDistance = (distance - rest) / distance;
        auto pCorrection = correctionDistance * delta * stiffness;

        position1 += pCorrection;
        position2 -= pCorrection;
    }
}
__forceinline FVector4 operator-(const FVector4& a, const FVector4& b)
{
    return FVector4(_mm_sub_ps(a.Data, b.Data));
}
__forceinline FVector4 operator*(const FVector4& a, const FVector4& b)
{
    return FVector4(_mm_mul_ps(a.Data, b.Data));
}
__forceinline FVector4 operator/(const FVector4& a, const FVector4& b)
{
    return FVector4(_mm_mul_ps(a.Data, _mm_rcp_ps(b.Data)));
}
__forceinline FVector4& operator+=(FVector4& a, const FVector4& b)
{
    a.Data = _mm_add_ps(a.Data, b.Data);
    return a;
}
__forceinline FVector4& operator-=(FVector4& a, const FVector4& b)
{
    a.Data = _mm_sub_ps(a.Data, b.Data);
    return a;
}
算法中使用的运算符:

FVector4 rest(REST_DISTANCE); // Set the constant simd vector outside the loop
FVector4 stiffness(STIFFNESS);

for (int i = 0; i < TEST_SIZE; i++)
{
    const auto& constraint = Constraints[i];
    auto& position1 = Particle[constraint.first].Position;
    auto& position2 = Particle[constraint.second)].Position;

    for (int j = 0; j < ITERATION_COUNT; j++)
    {
        auto delta = position2 - position1;
        auto distance = delta.norm();

        auto correctionDistance = (distance - rest) / distance;
        auto pCorrection = correctionDistance * delta * stiffness;

        position1 += pCorrection;
        position2 -= pCorrection;
    }
}
__forceinline FVector4 operator-(const FVector4& a, const FVector4& b)
{
    return FVector4(_mm_sub_ps(a.Data, b.Data));
}
__forceinline FVector4 operator*(const FVector4& a, const FVector4& b)
{
    return FVector4(_mm_mul_ps(a.Data, b.Data));
}
__forceinline FVector4 operator/(const FVector4& a, const FVector4& b)
{
    return FVector4(_mm_mul_ps(a.Data, _mm_rcp_ps(b.Data)));
}
__forceinline FVector4& operator+=(FVector4& a, const FVector4& b)
{
    a.Data = _mm_add_ps(a.Data, b.Data);
    return a;
}
__forceinline FVector4& operator-=(FVector4& a, const FVector4& b)
{
    a.Data = _mm_sub_ps(a.Data, b.Data);
    return a;
}

也许你应该把这篇文章发到@PaulEvans是的,这对CodeReview来说很好,添加一些相关功能可能会让问题变得更好though@PaulEvans啊,你说得对!有没有办法把这个移到那里,或者我需要把它重新贴出来?这里也有这个话题。还是离开吧。但是添加相关的代码,至少是
norm()
@harold为相关的FVector4位添加了代码