C ARM霓虹灯对RGB至YUV的改进_C_Colors_Arm_Neon_Yuv

C ARM霓虹灯对RGB至YUV的改进

c colors arm

C ARM霓虹灯对RGB至YUV的改进,c,colors,arm,neon,yuv,C,Colors,Arm,Neon,Yuv,我正在尝试改进一个函数，该函数将arm设备的RGB转换为YUV411，但这有点棘手，因为我必须对向量进行平均，然后结果需要存储在向量中下面的代码显示了函数如何将RGB转换为YUV： uint8_t Y[4]; uint8_t Cb[4]; uint8_t Cr[4]; uint8_t R[4]; uint8_t G[4]; uint8_t B[4]; //basically getting R,G and B can be done in vld3_u8 R[0] = input[0]; R

我正在尝试改进一个函数，该函数将arm设备的RGB转换为YUV411，但这有点棘手，因为我必须对向量进行平均，然后结果需要存储在向量中

下面的代码显示了函数如何将RGB转换为YUV：

uint8_t Y[4];
uint8_t Cb[4];
uint8_t Cr[4];
uint8_t R[4];
uint8_t G[4];
uint8_t B[4];

//basically getting R,G and B can be done in vld3_u8
R[0] = input[0];
R[1] = input[3];
R[2] = input[6];
R[3] = input[9];

G[0] = input[1];
G[1] = input[4];
G[2] = input[7];
G[3] = input[10];

B[0] = input[2];
B[1] = input[5];
B[2] = input[8];
B[3] = input[11];

// this calculation can be done in float32x4_t
Y[0]  =  0.3*(double)R[0] +  0.6*(double)G[0] +  0.1*(double)B[0];
Cb[0] = -0.2*(double)R[0] + -0.3*(double)G[0] +  0.5*(double)B[0] + 128.0;
Cr[0] =  0.5*(double)R[0] + -0.4*(double)G[0] + -0.1*(double)B[0] + 128.0;

Y[1]  =  0.3*(double)R[1] +  0.6*(double)G[1] +  0.1*(double)B[1];
Cb[1] = -0.2*(double)R[1] + -0.3*(double)G[1] +  0.5*(double)B[1] + 128.0;
Cr[1] =  0.5*(double)R[1] + -0.4*(double)G[1] + -0.1*(double)B[1] + 128.0;

Y[2]  =  0.3*(double)R[2] +  0.6*(double)G[2] +  0.1*(double)B[2];
Cb[2] = -0.2*(double)R[2] + -0.3*(double)G[2] +  0.5*(double)B[2] + 128.0;
Cr[2] =  0.5*(double)R[2] + -0.4*(double)G[2] + -0.1*(double)B[2] + 128.0;

Y[3]  =  0.3*(double)R[3] +  0.6*(double)G[3] +  0.1*(double)B[3];
Cb[3] = -0.2*(double)R[3] + -0.3*(double)G[3] +  0.5*(double)B[3] + 128.0;
Cr[3] =  0.5*(double)R[3] + -0.4*(double)G[3] + -0.1*(double)B[3] + 128.0;

// the problem is here: Cb is stored in an vector and without storing the data in the arm memory by using vst, how do I sum and average them ?
uint32_t CbAvg = ((double)(Cb[0] + Cb[1] + Cb[2] + Cb[3])) / 4.0;
uint32_t CrAvg = ((double)(Cr[0] + Cr[1] + Cr[2] + Cr[3])) / 4.0;

// next to the problem above storing in this way is a little tricky. 
output[0] = CbAvg;
output[1] = Y[0];
output[2] = Y[1];
output[3] = CrAvg;
output[4] = Y[2];
output[5] = Y[3];

如果您对如何有效地使用neon Intrinsic来加速此功能有任何建议，请告诉我。

据我所知，neon Intrinsic使用64位向量，它只能容纳一个

双

。也许你会考虑使用<代码>浮点< /代码>？或者可能将您的代码转换为定点整数运算。@paddy NEON没有

double

，只有

float

没有非规范化（因此编译器不会使用它，除非明确指示忽略ieee754一致性）。据我所知，NEON Intrinsic使用64位向量，它只包含一个

double

。也许你会考虑使用<代码>浮点< /代码>？或者将代码转换为定点整数运算。@paddy NEON没有

double

，只有

float

，没有非规范化（因此编译器不会使用它，除非明确指示忽略ieee754一致性）。