C++；用于多个搜索的最快数据结构 C++中的编码。我需要一个数据结构为一堆排序字符串。我将一次插入所有字符串，而不是更新它，但我会经常搜索字符串。我只需要看看结构中是否存在给定字符串。我预计名单上大约有100个字符串。什么是更快的结构？起初我在考虑hashmap，但我在某处看到，对于如此少量的元素，对向量进行二进制搜索会更好（因为它们是经过排序的）。_C++_Vector_Data Structures_Hashmap

C++；用于多个搜索的最快数据结构 C++中的编码。我需要一个数据结构为一堆排序字符串。我将一次插入所有字符串，而不是更新它，但我会经常搜索字符串。我只需要看看结构中是否存在给定字符串。我预计名单上大约有100个字符串。什么是更快的结构？起初我在考虑hashmap，但我在某处看到，对于如此少量的元素，对向量进行二进制搜索会更好（因为它们是经过排序的）。

c++ vector data-structures

C++；用于多个搜索的最快数据结构 C++中的编码。我需要一个数据结构为一堆排序字符串。我将一次插入所有字符串，而不是更新它，但我会经常搜索字符串。我只需要看看结构中是否存在给定字符串。我预计名单上大约有100个字符串。什么是更快的结构？起初我在考虑hashmap，但我在某处看到，对于如此少量的元素，对向量进行二进制搜索会更好（因为它们是经过排序的）。,c++,vector,data-structures,hashmap,C++,Vector,Data Structures,Hashmap,除非你每秒进行数亿次搜索，否则你将无法分辨差异。如果你一秒钟要做数亿次搜索，试试基数树。它的内存非常昂贵，但对于这个小数据集来说，这并不重要编写之后，请对其进行分析。判断特定情况下哪种结构最快的最佳（也是唯一）方法是使用不同的数据结构对其进行实际基准测试/测量。然后选择最快的或者换句话说：衡量你的代码比那些认为自己太聪明而无法衡量的人更有优势对于您在问题中提到的100个元素这样的小列表，使用什么结构/算法没有多大区别，因为所获得的时间可能可以忽略不计，除非您的程序经常执行搜索。使用std

除非你每秒进行数亿次搜索，否则你将无法分辨差异。如果你一秒钟要做数亿次搜索，试试基数树。它的内存非常昂贵，但对于这个小数据集来说，这并不重要

编写之后，请对其进行分析。

判断特定情况下哪种结构最快的最佳（也是唯一）方法是使用不同的数据结构对其进行实际基准测试/测量。然后选择最快的

或者换句话说：衡量你的代码比那些认为自己太聪明而无法衡量的人更有优势

对于您在问题中提到的100个元素这样的小列表，使用什么结构/算法没有多大区别，因为所获得的时间可能可以忽略不计，除非您的程序经常执行搜索。

使用

std:：unordered\u set

，这非常适合您的情况。如果还需要按顺序迭代，则可以使用

std:：set

如果在分析后发现您花费了所有的时间查询数据结构，那么现在是时候问另一个问题了（使用您将要使用的精确代码）。

假设您谈论的是“全尺寸”CPUs1，通过字符串进行二进制搜索，即使只有100个元素，也可能非常慢，至少相对于其他解决方案。每次搜索可能会出现多个分支预测失误，最终可能会多次检查输入字符串中的每个字符（因为您需要在二进制搜索中的每个节点重复执行

strcmp

）

正如已经有人指出的那样，唯一真正知道的方法是衡量——但要做到这一点，你仍然需要首先能够弄清楚候选人是什么！此外，在现实场景中并不总是能够进行度量，因为可能甚至不知道这样的场景（例如，想象一下，设计一个在许多不同情况下广泛使用的库函数）

最后，了解什么可能是快速的，可以让你排除那些你知道会表现糟糕的候选者，并让你用自己的直觉仔细检查你的测试结果：如果某件事情比你预期的慢得多，那么值得检查一下为什么（编译器做了愚蠢的事情），如果事情进展得更快，那么也许是时候更新你的直觉了

因此，我将尝试实际尝试一下什么将是快速的——假设速度在这里真的很重要，您可以花一些时间验证一个复杂的解决方案。作为基线，一个简单的实现可能需要100纳秒，而一个真正优化的实现可能需要10纳秒。因此，如果你在这方面花费了10个小时的工程时间，你将不得不调用这个函数4000亿次，以赚取10个小时的回报5。当您考虑到bug风险、维护复杂性和其他开销时，您需要确保在尝试优化该函数之前，您已经多次调用该函数。这样的功能很少见，但它们确实存在

也就是说，您缺少了许多帮助设计快速解决方案所需的信息，例如：

您对搜索函数的输入是

std:：string

还是

const char*

还是其他什么

平均和最大字符串长度是多少

您的大多数搜索是成功还是失败

你能接受一些误报吗

这组字符串在编译时是已知的，还是您可以接受较长的初始化阶段

上面的答案可以帮助您按如下所述划分设计空间

布卢姆过滤器

如果“<强>（4）< /强>”，你可以接受一个（可控的）假正点2，或者“<强>（3）< /强> >你的大多数搜索将不成功，那么你应该考虑A。例如，您可以使用一个1024位（128字节）的过滤器，并使用一个60位的字符串哈希值，用6个10位函数索引到该字符串中。这使得假阳性率<1%

这样做的优点是，在散列计算之外，它独立于字符串的长度，并且不依赖于匹配行为（例如，如果字符串的公共前缀较长，则依赖于重复字符串比较的搜索速度会较慢）

如果您可以接受误报，那么您就完成了-但是如果您需要它总是正确的，但是期望大部分搜索都不成功，那么您可以将它用作一个过滤器：如果bloom过滤器返回false（通常的情况），那么您就完成了，但是如果它返回true，那么您需要再次检查下面讨论的一个始终正确的结构。因此，常见的情况很快，但总是会返回正确的答案

完全散列

如果在编译时知道了100个字符串的集合，或者你可以做一些一次性的重工作来预处理字符串，那么你可以考虑一个完美的哈希。如果您有一个编译时已知的搜索集，只需将字符串插入其中，它就会输出一个哈希函数和查找表

例如，我刚刚将100个随机英语单词3输入到

gperf

中，它生成了一个哈希函数，只需查看两个字符即可唯一区分每个单词，如下所示：

static unsigned int hash (const char *str, unsigned int len)
{
  static unsigned char asso_values[] =
    {
      115, 115, 115, 115, 115,  81,  48,   1,  77,  72,
      115,  38,  81, 115, 115,   0,  73,  40,  44, 115,
       32, 115,  41,  14,   3, 115, 115,  30, 115, 115,
      115, 115, 115, 115, 115, 115, 115,  16,  18,   4,
       31,  55,  13,  74,  51,  44,  32,  20,   4,  28,
       45,   4,  19,  64,  34,   0,  21,   9,  40,  70,
       16,   0, 115, 115, 115, 115, 115, 115, 115, 115,
      /* most of the table omitted */
    };
  register int hval = len;

  switch (hval)
    {
      default:
        hval += asso_values[(unsigned char)str[3]+1];
      /*FALLTHROUGH*/
      case 3:
      case 2:
      case 1:
        hval += asso_values[(unsigned char)str[0]];
        break;
    }
  return hval;
}

in_word_set:                            # @in_word_set
        push    rbx
        lea     eax, [rsi - 3]
        xor     ebx, ebx
        cmp     eax, 19
        ja      .LBB0_7
        lea     ecx, [rsi - 1]
        mov     eax, 3
        cmp     ecx, 3
        jb      .LBB0_3
        movzx   eax, byte ptr [rdi + 3]
        movzx   eax, byte ptr [rax + hash.asso_values+1]
        add     eax, esi
.LBB0_3:
        movzx   ecx, byte ptr [rdi]
        movzx   edx, byte ptr [rcx + hash.asso_values]
        cdqe
        add     rax, rdx
        cmp     eax, 114
        ja      .LBB0_6
        mov     rbx, qword ptr [8*rax + in_word_set.wordlist]
        cmp     cl, byte ptr [rbx]
        jne     .LBB0_6
        add     rdi, 1
        lea     rsi, [rbx + 1]
        call    strcmp
        test    eax, eax
        je      .LBB0_7
.LBB0_6:
        xor     ebx, ebx
.LBB0_7:
        mov     rax, rbx
        pop     rbx
        ret

现在，您的散列函数很快，并且很可能可以很好地预测（如果您不这样做的话）

/** 
 * Returns a canonical representation for the string object. 
 * <p> 
 * A pool of strings, initially empty, is maintained privately by the 
 * class <code>String</code>. 
 * <p> 
 * When the intern method is invoked, if the pool already contains a 
 * string equal to this <code>String</code> object as determined by 
 * the {@link #equals(Object)} method, then the string from the pool is 
 * returned. Otherwise, this <code>String</code> object is added to the 
 * pool and a reference to this <code>String</code> object is returned. 
 * <p> 
 * It follows that for any two strings <code>s</code> and <code>t</code>, 
 * <code>s.intern()&nbsp;==&nbsp;t.intern()</code> is <code>true</code> 
 * if and only if <code>s.equals(t)</code> is <code>true</code>. 
 * <p> 
 * All literal strings and string-valued constant expressions are 
 * interned. String literals are defined in section 3.10.5 of the 
 * <cite>The Java&trade; Language Specification</cite>. 
 * 
 * @return  a string that has the same contents as this string, but is 
 *          guaranteed to be from a pool of unique strings. 
 */  
public native String intern();

Java_java_lang_String_intern(JNIEnv *env, jobject this)  
{  
    return JVM_InternString(env, this);  
}

/* 
* java.lang.String 
*/  
JNIEXPORT jstring JNICALL  
JVM_InternString(JNIEnv *env, jstring str);

// String support ///////////////////////////////////////////////////////////////////////////  
JVM_ENTRY(jstring, JVM_InternString(JNIEnv *env, jstring str))  
  JVMWrapper("JVM_InternString");  
  JvmtiVMObjectAllocEventCollector oam;  
  if (str == NULL) return NULL;  
  oop string = JNIHandles::resolve_non_null(str);  
  oop result = StringTable::intern(string, CHECK_NULL);
  return (jstring) JNIHandles::make_local(env, result);  
JVM_END

oop StringTable::intern(Handle string_or_null, jchar* name,  
                        int len, TRAPS) {  
  unsigned int hashValue = java_lang_String::hash_string(name, len);  
  int index = the_table()->hash_to_index(hashValue);  
  oop string = the_table()->lookup(index, name, len, hashValue);  
  // Found  
  if (string != NULL) return string;  
  // Otherwise, add to symbol to table  
  return the_table()->basic_add(index, string_or_null, name, len,  
                                hashValue, CHECK_NULL);  
}

oop StringTable::lookup(int index, jchar* name,  
                        int len, unsigned int hash) {  
  for (HashtableEntry<oop>* l = bucket(index); l != NULL; l = l->next()) {  
    if (l->hash() == hash) {  
      if (java_lang_String::equals(l->literal(), name, len)) {  
        return l->literal();  
      }  
    }  
  }  
  return NULL;  
}

String* myStrings[256];

String* myStrings[256][256][256];

char charToSlot[256]; 
String* myStrings[3];

char charToSlot[256]; 
String* myStrings[26];

char charToSlot[256]; 
String* myStrings[26][26][26];

char charToSlot[256]; 
String**** myStrings;

String* myStrings[30][256][256]...

String* myStrings[8];

String* myStrings[8][8][8][8]...

typedef struct {
  char book_name[30];
  char book_description[61];
  char book_categories[9];
  int book_code;  
} my_book_t;

// 160000 size, 10 index field slot
bin_array_t *all_books = bin_array_create(160000, 10);

if (bin_add_index(all_books, my_book_t, book_name, __def_cstr_sorted_cmp_func__)
&& bin_add_index(all_books, my_book_t, book_categories, __def_cstr_sorted_cmp_func__)
&& bin_add_index(all_books, my_book_t, book_code, __def_int_sorted_cmp_func__)
   ) {

    my_book_t *bk = malloc(sizeof(my_book_t));
    strcpy(bk->book_name, "The Duck Story"));
    ....
    ...
    bin_array_push(all_books, bk );

int data_search = 100;
bin_array_rs *bk_rs= (my_book_t*) ba_search_eq(all_books, my_book_t,             
book_code, &data_search);
my_book_t **bks = (my_book_t**)bk_rs->ptrs; // Convert to pointer array
// Loop it
for (i = 0; i < bk_rs->size; i++) {  
   address_t *add = bks[i];
    ....
}

 // Join Solution
bin_array_rs *bk_rs=bin_intersect_rs(
    bin_intersect_rs(ba_search_gt(...), ba_search_lt(...), true),
    bin_intersect_rs(ba_search_gt(...), ba_search_lt(....), true),
                             true);

 // Union Solution
bin_array_rs *bk_rs= bin_union_rs(
    bin_union_rs(ba_search_gt(...), ba_search_lt(...), true),
    bin_union_rs(ba_search_gt(...), ba_search_lt(....), true),
                             true);