C++ C+；中使用的默认哈希函数是什么+；无序地图？_C++_C++11_Hash_Stl_Unordered Map

C++ C+；中使用的默认哈希函数是什么+；无序地图？

c++ c++11 hash

C++ C+；中使用的默认哈希函数是什么+；无序地图？,c++,c++11,hash,stl,unordered-map,C++,C++11,Hash,Stl,Unordered Map,我正在使用 unordered_map<string, int> 无序地图及无序地图在每种情况下使用什么样的散列函数，在每种情况下发生冲突的可能性是多少？我将在每种情况下分别插入unique string和unique int作为键我感兴趣的是了解字符串和int键情况下的哈希函数算法及其冲突统计。使用函数对象所有内置类型和一些其他标准库类型都存在标准专门化例如std:：string和std:：thread。有关完整列表，请参见链接对于要在std:：unorder

我正在使用

unordered_map<string, int>

无序地图

及

无序地图

在每种情况下使用什么样的散列函数，在每种情况下发生冲突的可能性是多少？我将在每种情况下分别插入unique string和unique int作为键

我感兴趣的是了解字符串和int键情况下的哈希函数算法及其冲突统计。

使用函数对象

所有内置类型和一些其他标准库类型都存在标准专门化例如

std:：string

和

std:：thread

。有关完整列表，请参见链接

对于要在

std:：unordered_map

中使用的其他类型，您必须专门化

std:：hash

或创建自己的函数对象

冲突的可能性完全取决于实现，但考虑到整数限制在定义的范围内，而字符串理论上是无限长的，我认为与字符串冲突的可能性要大得多

至于GCC中的实现，内置类型的专门化只返回位模式。以下是如何在

bits/functional_hash.h

中定义它们：

  /// Partial specializations for pointer types.
  template<typename _Tp>
    struct hash<_Tp*> : public __hash_base<size_t, _Tp*>
    {
      size_t
      operator()(_Tp* __p) const noexcept
      { return reinterpret_cast<size_t>(__p); }
    };

  // Explicit specializations for integer types.
#define _Cxx_hashtable_define_trivial_hash(_Tp)     \
  template<>                        \
    struct hash<_Tp> : public __hash_base<size_t, _Tp>  \
    {                                                   \
      size_t                                            \
      operator()(_Tp __val) const noexcept              \
      { return static_cast<size_t>(__val); }            \
    };

  /// Explicit specialization for bool.
  _Cxx_hashtable_define_trivial_hash(bool)

  /// Explicit specialization for char.
  _Cxx_hashtable_define_trivial_hash(char)

  /// ...

通过进一步搜索，我们发现：

struct _Hash_impl
{
  static size_t
  hash(const void* __ptr, size_t __clength,
       size_t __seed = static_cast<size_t>(0xc70f6907UL))
  { return _Hash_bytes(__ptr, __clength, __seed); }
  ...
};
...
// Hash function implementation for the nontrivial specialization.
// All of them are based on a primitive that hashes a pointer to a
// byte array. The actual hash algorithm is not guaranteed to stay
// the same from release to release -- it may be updated or tuned to
// improve hash quality or speed.
size_t
_Hash_bytes(const void* __ptr, size_t __len, size_t __seed);

因此GCC用于字符串的默认散列算法是MurrushUnaligned2。

GCC C C++11使用Austin Appleby的“MurrushUnaligned2” 虽然哈希算法依赖于编译器，但我将为GCC C C++11演示它。用于字符串的GCC哈希算法是Austin Appleby的“MurrullHashUnaligned2”。我做了一些搜索，在Github上找到了GCC的镜像副本。因此：

用于（哈希表模板）和（哈希集模板）的GCC C C++11哈希函数如下所示。

对于他关于GCC C++11哈希函数使用的问题的背景研究，他指出GCC使用Austin Appleby的“MurrushashUnaligned2”实现（参见和）
在文件“gcc/libstdc++-v3/libsupc++/hash_bytes.cc”中，我找到了实现。例如，“32位大小”返回值的示例（2017年8月11日）

代码：

因此，如果您想在开源软件、个人项目或专有软件中使用MurmerHash3，包括在C中实现您自己的哈希表，请尝试

如果您希望使用构建说明来构建和测试他的MurmerHash3代码，我在这里写了一些：。希望能被接受，然后他们会在他的主要回购中结束。但是，在此之前，请参考我的fork中的构建说明

对于其他哈希函数，包括

djb2

，以及K&R哈希函数的两个版本。。。

…（一个显然很糟糕，一个相当不错），请看我的另一个答案：。

我认为它符合标准。不确定无序映射是什么。无序映射类似于哈希表…默认哈希函数在C++98和C++11中是否发生了更改？您标记了此C++11，但询问了TR1。是哪一个？抱歉@John Dibling，我给它加了C++11的标签。我也编辑了这个标题，因为我认为这个问题更有意义；现在答案可以参考正式标准。你可以随意换回来；我知道你在这个网站上的经验比我多。那你为什么要提到

tr1

名称空间？我很想知道字符串和int键的哈希函数算法及其冲突统计。@Medicine:标准中没有规定，这取决于库实现来决定实现这一点的最佳方式。您必须查看您的本地实现。例如，此答案现在包括GCC的特定选择。@Medicine:Visual Studio（自VS2012起）的默认哈希算法是（FNV-1a）。谢谢@Avidanborisov我的字符串都是唯一的，大小在14到21之间，由英文字母组成，数字链接到源代码。也可在上获得。

#ifndef _GLIBCXX_COMPATIBILITY_CXX0X
  /// std::hash specialization for string.
  template<>
    struct hash<string>
    : public __hash_base<size_t, string>
    {
      size_t
      operator()(const string& __s) const noexcept
      { return std::_Hash_impl::hash(__s.data(), __s.length()); }
    };

struct _Hash_impl
{
  static size_t
  hash(const void* __ptr, size_t __clength,
       size_t __seed = static_cast<size_t>(0xc70f6907UL))
  { return _Hash_bytes(__ptr, __clength, __seed); }
  ...
};
...
// Hash function implementation for the nontrivial specialization.
// All of them are based on a primitive that hashes a pointer to a
// byte array. The actual hash algorithm is not guaranteed to stay
// the same from release to release -- it may be updated or tuned to
// improve hash quality or speed.
size_t
_Hash_bytes(const void* __ptr, size_t __len, size_t __seed);

// This file defines Hash_bytes, a primitive used for defining hash
// functions. Based on public domain MurmurHashUnaligned2, by Austin
// Appleby.  http://murmurhash.googlepages.com/

// Implementation of Murmur hash for 32-bit size_t.
size_t _Hash_bytes(const void* ptr, size_t len, size_t seed)
{
  const size_t m = 0x5bd1e995;
  size_t hash = seed ^ len;
  const char* buf = static_cast<const char*>(ptr);

  // Mix 4 bytes at a time into the hash.
  while (len >= 4)
  {
    size_t k = unaligned_load(buf);
    k *= m;
    k ^= k >> 24;
    k *= m;
    hash *= m;
    hash ^= k;
    buf += 4;
    len -= 4;
  }

  // Handle the last few bytes of the input array.
  switch (len)
  {
    case 3:
      hash ^= static_cast<unsigned char>(buf[2]) << 16;
      [[gnu::fallthrough]];
    case 2:
      hash ^= static_cast<unsigned char>(buf[1]) << 8;
      [[gnu::fallthrough]];
    case 1:
      hash ^= static_cast<unsigned char>(buf[0]);
      hash *= m;
  };

  // Do a few final mixes of the hash.
  hash ^= hash >> 13;
  hash *= m;
  hash ^= hash >> 15;
  return hash;
}

// MurmurHash3 was written by Austin Appleby, and is placed in the public
// domain. The author hereby disclaims copyright to this source code.