Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/c/56.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
C 查找位数组中设置的最高有效位(最左侧)_C_Bit Manipulation_32 Bit - Fatal编程技术网

C 查找位数组中设置的最高有效位(最左侧)

C 查找位数组中设置的最高有效位(最左侧),c,bit-manipulation,32-bit,C,Bit Manipulation,32 Bit,我有一个位数组实现,其中第0个索引是数组中第一个字节的MSB,第8个索引是第二个字节的MSB,以此类推 查找此位数组中设置的第一个位的快速方法是什么?我查找的所有相关解决方案都会找到第一个最不重要的位,但我需要第一个最重要的位。因此,给定0x00A1,我想要8(因为它是从左边算起的第9位)。查找BSR(位扫描反转)x86 asm指令,以获得执行此操作的最快方法。从英特尔的文档: 在源操作数(第二个操作数)中搜索最高有效集位(1位)。 如果找到最高有效位1,则其位索引存储在目标操作数中 (第一个操

我有一个位数组实现,其中第0个索引是数组中第一个字节的MSB,第8个索引是第二个字节的MSB,以此类推

查找此位数组中设置的第一个位的快速方法是什么?我查找的所有相关解决方案都会找到第一个最不重要的位,但我需要第一个最重要的位。因此,给定0x00A1,我想要8(因为它是从左边算起的第9位)。

查找BSR(位扫描反转)x86 asm指令,以获得执行此操作的最快方法。从英特尔的文档:
在源操作数(第二个操作数)中搜索最高有效集位(1位)。
如果找到最高有效位1,则其位索引存储在目标操作数中

(第一个操作数)。

有多种方法可以做到这一点,不同实现的相对性能在某种程度上取决于机器(我碰巧为了类似的目的对此进行了一定程度的基准测试)。在一些机器上,甚至有一个内置的指令来实现这一点(如果可用,可以使用一个指令,并且可以处理可移植性)

查看一些实现(在“integer log base 2”下)。如果您使用的是GCC,请查看函数
\uuuuu builtin\u clz
\uuuuuu builtin\u clzl
(它们分别对非零无符号整数和无符号长整数执行此操作)。“clz”代表“计数前导零”,这是描述相同问题的另一种方式


当然,如果您的位数组不适合一个合适的机器字,您需要迭代数组中的字以找到第一个非零字,然后仅对该字执行此计算。

以下是一个简单的暴力算法,用于任意大小的字节数组:

int msb( unsigned char x);  // prototype for function that returns 
                            //  most significant bit set

unsigned char* p;

for (p = arr + num_elements; p != arr;) {
    --p;
    if (*p != 0) break;
}

// p is with pointing to the last byte that has a bit set, or
//  it's pointing to the first byte in the array

if (*p) {
    return ((p - arr) * 8) + msb( *p);
}

// what do you want to return if no bits are set?
return -1;

我将把它作为一个练习留给读者,让读者想出一个合适的
msb()
函数,以及对
int
long
大小的数据缺口进行优化。

嗯,您的标记指示32位,但看起来您使用的值是16位。如果您的意思是32位,那么我认为0x00a1的答案应该是24,而不是8

假设您正在从左侧查找MSB位索引,并且您知道您将只处理uint32_t,下面是显而易见的简单算法:

#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>

int main()
{
    uint32_t test_value = 0x00a1;
    int i;

    for (i=0; i<32; ++i)
    {
        if (test_value & (0x80000000 >> i))
        {
            printf("i = %d\n", i);
            exit(0);
        }
    }

    return 0;
}
#包括
#包括
#包括
int main()
{
uint32测试值=0x00a1;
int i;
对于(i=0;i>i))
{
printf(“i=%d\n”,i);
出口(0);
}
}
返回0;
}
GCC将其转换为x86/x64上的BSR、ARM上的CLZ等,并在硬件未实现时模拟该指令。

Visual C++ 2005和UP有.</P> < P> >我在纯C中知道的两个最好的方法:

首先线性搜索字节/字数组以找到第一个非零的字节/字,然后对找到的字节/字执行展开二进制搜索

if (b>=0x10)
  if (b>=0x40)
    if (b>=0x80) return 0;
    else return 1;
  else
    if (b>=0x20) return 2;
    else return 3;
else
  if (b>=0x4)
    if (b>=0x8) return 4;
    else return 5;
  else
    if (b>=0x2) return 6;
    else return 7;
3(顺便说一句,这是log2(8))条件跳转以获得答案。在现代x86机器上,最后一个将优化为有条件mov

或者,使用查找表将字节映射到设置的第一位的索引

您可能需要查找的一个相关主题是integer log2函数。如果我记得的话,ffmpeg有一个很好的实现


编辑:您实际上可以将上面的二进制搜索转换为无分支的二进制搜索,但我不确定在这种情况下是否更有效…

不是最快的,但它可以工作

//// C program
#include <math.h>

#define POS_OF_HIGHESTBIT(a) /* 0th position is the Least-Signif-Bit */    \
((unsigned) log2(a))         /* thus: do not use if a <= 0 */  

#define NUM_OF_HIGHESTBIT(a) ((!(a))          \
        ? 0 /* no msb set*/                   \
        : (1 << POS_OF_HIGHESTBIT(a) ))
// could be changed and optimized, if it is known that the following NEVER holds: a <= 0



int main()
{
  unsigned a = 5; // 0b101
  unsigned b = NUM_OF_HIGHESTBIT(a); // 4 since 4 = 0b100
  return 0; 
}
///C程序
#包括
#定义最高位(a)的位置/*0位是最低符号位*/\

((unsigned)log2(a))/*因此:不要使用if a这里有一段代码片段解释uu builtin_clz()

///go.c////////
#包括

unsigned NUM_BITS_=((sizeof(unsigned)如果您使用的是x86,那么您几乎可以使用SSE2操作和find first bit指令击败任何逐字节或逐字的解决方案,这些指令(在gcc世界中)最低位的发音为“ffs”,最高位的发音为“fls”。 请原谅我在回答中格式化“C”代码时遇到问题(!@$^);请检查:

作为一个性能迷,我尝试了大量的MSB集变体,以下是我遇到的最快的一个

unsigned int msb32(unsigned int x)
{
    static const unsigned int bval[] =
    {0,1,2,2,3,3,3,3,4,4,4,4,4,4,4,4};

    unsigned int r = 0;
    if (x & 0xFFFF0000) { r += 16/1; x >>= 16/1; }
    if (x & 0x0000FF00) { r += 16/2; x >>= 16/2; }
    if (x & 0x000000F0) { r += 16/4; x >>= 16/4; }
    return r + bval[x];
}
定义自由流速度(t)\ ({ \ 寄存器int n=0\ \ 如果(!(0xffff&t))\ n+=16\ \
如果(((0xff我使用了许多函数来获取最高有效位,但问题通常会出现在32位和64位之间的移动,或者在x86\u 64和x86框之间的移动。函数
\uuu builtin\u clz
\uuu builtin\u clzl
适用于32/64位数字以及x86\u 64和x86机器。但是,需要三个函数。我发现了一个简单的MSB,它依赖于右移,可以处理正数的所有情况。至少就我使用它而言,它在其他函数失败的情况下成功了:

int
getmsb (unsigned long long x)
{
    int r = 0;
    if (x < 1) return 0;
    while (x >>= 1) r++;
    return r;
}
输出:

             0  MSB : 0
           216  MSB : 7
          1021  MSB : 9
         32767  MSB : 14
         32768  MSB : 15
    3297381253  MSB : 31
  323543844043  MSB : 38

注意:出于速度考虑,使用单个函数完成以
\uuuuu builtin\uclzll
为中心的相同任务,速度更快约6倍。

tl:dr;对于32位,请使用。

这是一种可移植算法,它比该线程中的所有其他可移植32位MSB算法都要更快、更正确

当输入为零时,de Bruijn算法也会返回正确的结果。当输入为零时,uu builtin_clz和_BitScanReverse指令

在Windows x86-64上,de Bruijn乘法的运行速度与等效(有缺陷)Windows函数相当,性能差异仅为3%左右

这是密码

u32 msbDeBruijn32( u32 v )
{
    static const int MultiplyDeBruijnBitPosition[32] =
    {
        0, 9, 1, 10, 13, 21, 2, 29, 11, 14, 16, 18, 22, 25, 3, 30,
        8, 12, 20, 28, 15, 17, 24, 7, 19, 27, 23, 6, 26, 5, 4, 31
    };

    v |= v >> 1; // first round down to one less than a power of 2
    v |= v >> 2;
    v |= v >> 4;
    v |= v >> 8;
    v |= v >> 16;

    return MultiplyDeBruijnBitPosition[( u32 )( v * 0x07C4ACDDU ) >> 27];
}
该线程中的所有其他答案要么运行得比作者建议的差得多,要么计算结果不正确,要么两者兼而有之。让我们对它们进行基准测试,并验证它们是否做了它们声称做的事情

int
main (int argc, char *argv[]) {

    unsigned char c0 = 0;
    unsigned char c = 216;
    unsigned short s = 1021;
    unsigned int ui = 32768;
    unsigned long ul = 3297381253;
    unsigned long long ull = 323543844043;

    int i = 32767;

    printf ("  %16u  MSB : %d\n", c0, getmsb (c0));
    printf ("  %16u  MSB : %d\n", c, getmsb (c));
    printf ("  %16u  MSB : %d\n", s, getmsb (s));
    printf ("  %16u  MSB : %d\n", i, getmsb (i));
    printf ("  %16u  MSB : %d\n", ui, getmsb (ui));
    printf ("  %16lu  MSB : %d\n", ul, getmsb (ul));
    printf ("  %16llu  MSB : %d\n", ull, getmsb (ull));

    return 0;
}
             0  MSB : 0
           216  MSB : 7
          1021  MSB : 9
         32767  MSB : 14
         32768  MSB : 15
    3297381253  MSB : 31
  323543844043  MSB : 38
u32 msbDeBruijn32( u32 v )
{
    static const int MultiplyDeBruijnBitPosition[32] =
    {
        0, 9, 1, 10, 13, 21, 2, 29, 11, 14, 16, 18, 22, 25, 3, 30,
        8, 12, 20, 28, 15, 17, 24, 7, 19, 27, 23, 6, 26, 5, 4, 31
    };

    v |= v >> 1; // first round down to one less than a power of 2
    v |= v >> 2;
    v |= v >> 4;
    v |= v >> 8;
    v |= v >> 16;

    return MultiplyDeBruijnBitPosition[( u32 )( v * 0x07C4ACDDU ) >> 27];
}
Verification failed for msbNative64: input was 0; output was 818af060; expected 0
Verification failed for msbFfs: input was 22df; output was 0; expected d
Verification failed for msbPerformanceJunkie32: input was 0; output was ffffffff; expected 0
Verification failed for msbNative32: input was 0; output was 9ab07060; expected 0
msbLoop64 took 2.56751 seconds               
msbNative64 took 0.222197 seconds            

msbLoop32 took 1.43456 seconds               
msbFfs took 0.525097 seconds                 
msbPerformanceJunkie32 took 1.07939 seconds  
msbDeBruijn32 took 0.224947 seconds          
msbNative32 took 0.218275 seconds            
#include <iostream>
#include <chrono>
#include <random>
#include <cassert>
#include <string>
#include <limits>

#ifdef _MSC_VER
#define MICROSOFT_COMPILER 1
#include <intrin.h>
#endif // _MSC_VER

const int iterations = 100000000;
bool bVerifyResults = false;
std::random_device rd;
std::default_random_engine re(rd());
typedef unsigned int u32;
typedef unsigned long long u64;

class Timer
{
public:
    Timer() : beg_(clock_::now()) {}
    void reset() {
        beg_ = clock_::now();
    }
    double elapsed() const {
        return std::chrono::duration_cast<second_>
            (clock_::now() - beg_).count();
    }

private:
    typedef std::chrono::high_resolution_clock clock_;
    typedef std::chrono::duration<double, std::ratio<1> > second_;
    std::chrono::time_point<clock_> beg_;
};

unsigned int msbPerformanceJunkie32(u32 x)
{
    static const unsigned int bval[] =
    { 0,1,2,2,3,3,3,3,4,4,4,4,4,4,4,4 };
    unsigned int r = 0;
    if (x & 0xFFFF0000) {
        r += 16 / 1;
        x >>= 16 / 1;
    }
    if (x & 0x0000FF00) {
        r += 16 / 2;
        x >>= 16 / 2;
    }
    if (x & 0x000000F0) {
        r += 16 / 4;
        x >>= 16 / 4;
    }
    return r + bval[x];
}

#define FFS(t)  \
{ \
register int n = 0; \
if (!(0xffff & t)) \
n += 16; \
if (!((0xff << n) & t)) \
n += 8; \
if (!((0xf << n) & t)) \
n += 4; \
if (!((0x3 << n) & t)) \
n += 2; \
if (!((0x1 << n) & t)) \
n += 1; \
return n; \
}

unsigned int msbFfs32(u32 x)
{
    FFS(x);
}

unsigned int msbLoop32(u32 x)
{
    int r = 0;
    if (x < 1) return 0;
    while (x >>= 1) r++;
    return r;
}

unsigned int msbLoop64(u64 x)
{
    int r = 0;
    if (x < 1) return 0;
    while (x >>= 1) r++;
    return r;
}

u32 msbDeBruijn32(u32 v)
{
    static const int MultiplyDeBruijnBitPosition[32] =
    {
        0, 9, 1, 10, 13, 21, 2, 29, 11, 14, 16, 18, 22, 25, 3, 30,
        8, 12, 20, 28, 15, 17, 24, 7, 19, 27, 23, 6, 26, 5, 4, 31
    };

    v |= v >> 1; // first round down to one less than a power of 2
    v |= v >> 2;
    v |= v >> 4;
    v |= v >> 8;
    v |= v >> 16;

    return MultiplyDeBruijnBitPosition[(u32)(v * 0x07C4ACDDU) >> 27];
}

#ifdef MICROSOFT_COMPILER
u32 msbNative32(u32 val)
{
    unsigned long result;
    _BitScanReverse(&result, val);
    return result;
}
u32 msbNative64(u64 val)
{
    unsigned long result;
    _BitScanReverse64(&result, val);
    return result;
}
#endif // MICROSOFT_COMPILER

template <typename InputType>
void test(unsigned int msbFunc(InputType),
    const std::string &name,
    const std::vector< InputType > &inputs,
    std::vector< unsigned int > &results,
    bool bIsReference = false
)
{
    if (bIsReference)
    {
        int i = 0;
        for (int i = 0; i < iterations; i++)
            results[i] = msbFunc(inputs[i]);
    }
    InputType result;
    if (bVerifyResults)
    {
        bool bNotified = false;
        for (int i = 0; i < iterations; i++)
        {
            result = msbFunc(inputs[i]);
            if ((result != results[i]) && !bNotified)
            {
                std::cout << "Verification failed for " << name << ": "
                    << "input was " << std::hex << inputs[i]
                    << "; output was " << result
                    << "; expected " << results[i]
                    << std::endl;
                bNotified = true;
            }
        }
    }
    else
    {
        Timer t;
        for (int i = 0; i < iterations; i++)
        {
            result = msbFunc(inputs[i]);
        }
        double elapsed = t.elapsed();
        if ( !bIsReference )
            std::cout << name << " took " << elapsed << " seconds" << std::endl;
        if (result == -1.0f)
            std::cout << "this comparison only exists to keep the compiler from " <<
            "optimizing out the benchmark; this branch will never be called";
    }
}

void main()
{
    std::uniform_int_distribution <u64> dist64(0,
        std::numeric_limits< u64 >::max());
    std::uniform_int_distribution <u32> shift64(0, 63);
    std::vector< u64 > inputs64;
    for (int i = 0; i < iterations; i++)
    {
        inputs64.push_back(dist64(re) >> shift64(re));
    }
    std::vector< u32 > results64;
    results64.resize(iterations);

    test< u64 >(msbLoop64, "msbLoop64", inputs64, results64, true);
    test< u64 >(msbLoop64, "msbLoop64", inputs64, results64, false);
#ifdef MICROSOFT_COMPILER
    test< u64 >(msbNative64, "msbNative64", inputs64, results64, false);
#endif // MICROSOFT_COMPILER
    std::cout << std::endl;

    std::uniform_int_distribution <u32> dist32(0,
        std::numeric_limits< u32 >::max());
    std::uniform_int_distribution <u32> shift32(0, 31);
    std::vector< u32 > inputs32;
    for (int i = 0; i < iterations; i++)
        inputs32.push_back(dist32(re) >> shift32(re));
    std::vector< u32 > results32;
    results32.resize(iterations);


    test< u32 >(msbLoop32, "msbLoop32", inputs32, results32, true);

    test< u32 >(msbLoop32, "msbLoop32", inputs32, results32, false);
    test< u32 >(msbFfs32, "msbFfs", inputs32, results32, false);
    test< u32 >(msbPerformanceJunkie32, "msbPerformanceJunkie32",
        inputs32, results32, false);
    test< u32 >(msbDeBruijn32, "msbDeBruijn32", inputs32, results32, false);
#ifdef MICROSOFT_COMPILER
    test< u32 >(msbNative32, "msbNative32", inputs32, results32, false);
#endif // MICROSOFT_COMPILER
}
typedef unsigned long long u64;
typedef unsigned int       u32;
typedef unsigned char      u8;


u8 findMostSignificantBit (u64 u64Val)
{
  u8 u8Shift;
  u8 u8Bit = 0;

  assert (u64Val != 0ULL);

  for (u8Shift = 32 ; u8Shift != 0 ; u8Shift >>= 1)
  {
    u64 u64Temp = u64Val >> u8Shift;
    if (u64Temp)
    {
      u8Bit |= u8Shift; // notice not using +=
      u64Val = u64Temp;
    }
  }

  return u8Bit;
}
u8 findMostSignificantBit2 (u64 u64Val)
{
  assert (u64Val != 0ULL);

  return (u8) (__builtin_ctzll(u64Val));
}
#include <stdint.h>

// define BSR32() and BSR64()
#if defined(_MSC_VER) || defined(__INTEL_COMPILER)
    #ifdef __INTEL_COMPILER
        typedef unsigned int bsr_idx_t;
    #else
        #include <intrin.h>   // MSVC
        typedef unsigned long bsr_idx_t;
    #endif

    static inline
    unsigned BSR32(unsigned long x){
        bsr_idx_t idx;
        _BitScanReverse(&idx, x); // ignore bool retval
        return idx;
    }
    static inline
    unsigned BSR64(uint64_t x) {
        bsr_idx_t idx;
        _BitScanReverse64(&idx, x); // ignore bool retval
        return idx;
    }
#elif defined(__GNUC__)

  #ifdef __clang__
    static inline unsigned BSR64(uint64_t x) {
        return 63-__builtin_clzll(x);
      // gcc/ICC can't optimize this back to just BSR, but clang can and doesn't provide alternate intrinsics
    }
  #else
    #define BSR64 __builtin_ia32_bsrdi
  #endif

    #include <x86intrin.h>
    #define BSR32(x) _bit_scan_reverse(x)

#endif
;; x64 MSVC 19.16 -O2
unsigned int test32(unsigned int) PROC                                    ; test32, COMDAT
        bsr     eax, ecx
        ret     0
unsigned int test32(unsigned int) ENDP                                    ; test32
# clang -O3 -march=haswell   is too "smart?" for its own good:
test32(unsigned int):
        lzcnt   eax, edi
        xor     eax, 31
        ret
# gcc8.2 -O3 -march=haswell
test32(unsigned int):
        bsr     eax, edi
        ret
# ICC19 -O3 -march=haswell
test32(unsigned int):
        bsr       eax, edi                                      #15.9
        ret                                                     #41.12
#ifdef __GNUC__
unsigned badgcc(uint64_t x) {
    return 63 - __builtin_clzll(x);
}
#endif
# gcc8.2 -O3
badgcc(unsigned long):
        bsr     rdi, rdi
        mov     eax, 63
        xor     rdi, 63
        sub     eax, edi
        ret
# ICC19.0.1 -O3
badgcc(unsigned long):
        mov       rax, -1                                       #46.17
        bsr       rdx, rdi                                      #46.17
        cmove     rdx, rax                                      #46.17
        neg       rdx                                           #46.17
        add       rdx, 63                                       #46.17
        neg       edx                                           #46.17
        add       edx, 63                                       #46.17
        mov       eax, edx                                      #46.17
        ret                                                     #46.17
static public final int msb(int n) {
    n |= n >>> 1;  
    n |= n >>> 2; 
    n |= n >>> 4; 
    n |= n >>> 8; 
    n |= n >>> 16; 
    n >>>= 1;
    n += 1; 
    return n;
}
static public final int msb_index(int n) {

    final int[] multiply_de_bruijn_bit_position = {
        0, 1, 28, 2, 29, 14, 24, 3, 30, 22, 20, 15, 25, 17, 4, 8, 
        31, 27, 13, 23, 21, 19, 16, 7, 26, 12, 18, 6, 11, 5, 10, 9
    };
    return multiply_de_bruijn_bit_position[(msb(n) * 0x077CB531) >>> 27];
}