Image processing 霓虹灯加速12位到8位_Image Processing_Compiler Optimization_Neon

Image processing 霓虹灯加速12位到8位

image-processing

Image processing 霓虹灯加速12位到8位,image-processing,compiler-optimization,neon,Image Processing,Compiler Optimization,Neon,我有一个12位数据的缓冲区（存储在16位数据中）并且需要转换为8位（移位4）霓虹灯如何加速这一过程谢谢你的帮助 Brahim冒昧地假设了下面解释的一些事情，但是这种代码（未经测试，可能需要一些修改）与朴素的非霓虹灯版本相比应该提供了很好的加速比： #include <arm_neon.h> #include <stdint.h> void convert(const restrict *uint16_t input, // the buffer to conver

我有一个12位数据的缓冲区（存储在16位数据中）并且需要转换为8位（移位4）

霓虹灯如何加速这一过程

谢谢你的帮助

Brahim

冒昧地假设了下面解释的一些事情，但是这种代码（未经测试，可能需要一些修改）与朴素的非霓虹灯版本相比应该提供了很好的加速比：

#include <arm_neon.h>
#include <stdint.h>

void convert(const restrict *uint16_t input, // the buffer to convert
             restrict *uint8_t output,       // the buffer in which to store result
             int sz) {                       // their (common) size

  /* Assuming the buffer size is a multiple of 8 */
  for (int i = 0; i < sz; i += 8) {
    // Load a vector of 8 16-bit values:
    uint16x8_t v = vld1q_u16(buf+i);
    // Shift it by 4 to the right, narrowing it to 8 bit values.
    uint8x8_t shifted = vshrn_n_u16(v, 4);
    // Store it in output buffer
    vst1_u8(output+i, shifted);
  }

}

#包括向量的类型
大约

希望这有帮助
 冒昧地假设了下面解释的一些事情，但是这种代码（未经测试，可能需要一些修改）与朴素的非霓虹灯版本相比应该提供了很好的加速：
#include <arm_neon.h>
#include <stdint.h>

void convert(const restrict *uint16_t input, // the buffer to convert
             restrict *uint8_t output,       // the buffer in which to store result
             int sz) {                       // their (common) size

  /* Assuming the buffer size is a multiple of 8 */
  for (int i = 0; i < sz; i += 8) {
    // Load a vector of 8 16-bit values:
    uint16x8_t v = vld1q_u16(buf+i);
    // Shift it by 4 to the right, narrowing it to 8 bit values.
    uint8x8_t shifted = vshrn_n_u16(v, 4);
    // Store it in output buffer
    vst1_u8(output+i, shifted);
  }

}

prototype : void dataConvert(void * pDst, void * pSrc, unsigned int count);
    1:
    vld1.16 {q8-q9}, [r1]!
    vld1.16 {q10-q11}, [r1]!
    vqrshrn.u16 d16, q8, #4
    vqrshrn.u16 d17, q9, #4
    vqrshrn.u16 d18, q10, #4
    vqrshrn.u16 d19, q11, #4
    vst1.16 {q8-q9}, [r0]!
    subs r2, #32
    bgt 1b

#包括向量的类型
大约

希望这有帮助
prototype : void dataConvert(void * pDst, void * pSrc, unsigned int count);
    1:
    vld1.16 {q8-q9}, [r1]!
    vld1.16 {q10-q11}, [r1]!
    vqrshrn.u16 d16, q8, #4
    vqrshrn.u16 d17, q9, #4
    vqrshrn.u16 d18, q10, #4
    vqrshrn.u16 d19, q11, #4
    vst1.16 {q8-q9}, [r0]!
    subs r2, #32
    bgt 1b

q标志：饱和
r标志：四舍五入
如果是签名数据，则将u16更改为s16
q标志：饱和
r标志：四舍五入
在有符号数据的情况下，将u16更改为s16。
这有点棘手，但可以做到-到目前为止您尝试了什么？这有点棘手，但可以做到-到目前为止您尝试了什么？您不需要设置q标志，除非您还设置了r标志。这里没有什么要饱和的。没错，NEON文档中的复制粘贴不好！修好了，谢谢。我会做一些测试。我正试图将这样的代码包含到Qt4.8应用程序中，需要删除“const restrict”。buf+i已更改为输入+i。数据从12位无符号转换为16位，现在转换为8位无符号。不需要设置q标志，除非还设置了r标志。这里没有什么要饱和的。没错，NEON文档中的复制粘贴不好！修好了，谢谢。我会做一些测试。我正试图将这样的代码包含到Qt4.8应用程序中，需要删除“const restrict”。buf+i已更改为输入+i。数据从12位无符号转换为16位，现在转换为8位无符号。