C ++ std :: unordered_map中使用的默认哈希函数是什么？

我在用

unordered_map<string, int>

和

 unordered_map<int, int>

每种情况下使用什么散列函数，每种情况下碰撞的机会是多less？我将分别插入唯一的string和唯一的int作为键。

我有兴趣知道散列函数在string和整型键以及它们的碰撞统计量的情况下的algorithm。

使用函数对象std::hash<> 。

所有内置types都有标准专业化，还有一些其他标准库types，如std::string和std::thread 。请参阅完整列表的链接。

对于在std::unordered_map使用的其他types，您将不得不专门化std::hash<>或创build您自己的函数对象。

碰撞的机会是完全依赖于实现的，但是考虑到整数限制在一个确定的范围内的事实，而理论上string是无限长的，我认为与string碰撞的机会更大。

至于GCC的实现，内置types的专业化只是返回位模式。以下是他们在bits/functional_hash.h中的定义：

  /// Partial specializations for pointer types. template<typename _Tp> struct hash<_Tp*> : public __hash_base<size_t, _Tp*> { size_t operator()(_Tp* __p) const noexcept { return reinterpret_cast<size_t>(__p); } }; // Explicit specializations for integer types. #define _Cxx_hashtable_define_trivial_hash(_Tp) \ template<> \ struct hash<_Tp> : public __hash_base<size_t, _Tp> \ { \ size_t \ operator()(_Tp __val) const noexcept \ { return static_cast<size_t>(__val); } \ }; /// Explicit specialization for bool. _Cxx_hashtable_define_trivial_hash(bool) /// Explicit specialization for char. _Cxx_hashtable_define_trivial_hash(char) /// ...

std::string的特化定义如下：

 #ifndef _GLIBCXX_COMPATIBILITY_CXX0X /// std::hash specialization for string. template<> struct hash<string> : public __hash_base<size_t, string> { size_t operator()(const string& __s) const noexcept { return std::_Hash_impl::hash(__s.data(), __s.length()); } };

一些进一步的search引导我们：

 struct _Hash_impl { static size_t hash(const void* __ptr, size_t __clength, size_t __seed = static_cast<size_t>(0xc70f6907UL)) { return _Hash_bytes(__ptr, __clength, __seed); } ... }; ... // Hash function implementation for the nontrivial specialization. // All of them are based on a primitive that hashes a pointer to a // byte array. The actual hash algorithm is not guaranteed to stay // the same from release to release -- it may be updated or tuned to // improve hash quality or speed. size_t _Hash_bytes(const void* __ptr, size_t __len, size_t __seed);

_Hash_bytes是来自libstdc++的外部函数。更多的search引导我到这个文件，其中指出：

 // This file defines Hash_bytes, a primitive used for defining hash // functions. Based on public domain MurmurHashUnaligned2, by Austin // Appleby. http://murmurhash.googlepages.com/

所以默认的哈希algorithmGCC使用的string是MurmurHashUnaligned2。

尽pipe散列algorithm是依赖于编译器的，但是我将介绍它用于GCC C ++ 11。 @Avidan Borisov敏锐地发现，用于string的GCC哈希algorithm是Austin Appleby的“MurmurHashUnaligned2”。我做了一些search，并在Github上find了GCC的镜像副本。因此：

用于unordered_map （哈希表模板）和unordered_set （哈希集模板）的GCC C ++ 11哈希函数如下所示。

感谢Avidan Borisov对于使用GCC C ++ 11哈希函数的问题进行的背景研究，指出GCC使用Austin Appleby的“MurmurHashUnaligned2”实现（ http://murmurhash.googlepages.com/ ）。
在文件“gcc / libstdc ++ – v3 / libsupc ++ / hash_bytes.cc”中，这里（ https://github.com/gcc-mirror/gcc/blob/master/libstdc++-v3/libsupc++/hash_bytes.cc ），我发现实现。例如，下面是“32位size_t”返回值的值（2017年8月11日提取）

码：

 // Implementation of Murmur hash for 32-bit size_t. size_t _Hash_bytes(const void* ptr, size_t len, size_t seed) { const size_t m = 0x5bd1e995; size_t hash = seed ^ len; const char* buf = static_cast<const char*>(ptr); // Mix 4 bytes at a time into the hash. while (len >= 4) { size_t k = unaligned_load(buf); k *= m; k ^= k >> 24; k *= m; hash *= m; hash ^= k; buf += 4; len -= 4; } // Handle the last few bytes of the input array. switch (len) { case 3: hash ^= static_cast<unsigned char>(buf[2]) << 16; [[gnu::fallthrough]]; case 2: hash ^= static_cast<unsigned char>(buf[1]) << 8; [[gnu::fallthrough]]; case 1: hash ^= static_cast<unsigned char>(buf[0]); hash *= m; }; // Do a few final mixes of the hash. hash ^= hash >> 13; hash *= m; hash ^= hash >> 15; return hash; }

对于其他散列函数，包括djb2和K＆R散列函数的两个版本 （一个显然很糟糕，一个相当不错），请参阅我的其他答案在这里： https : //stackoverflow.com/a/45641002/4561887 。

C ++ std :: unordered_map中使用的默认哈希函数是什么？

C ++：在多行string常量中是否有行尾的标准定义？

这个C ++代码在技术上会发生什么？

为什么我们不能自动推导出返回types？

std :: function vs模板

为什么有些人使用交换分配移动？

C ++最令人烦恼的parsing

primefacesvariables是否是无锁的？

uint8_t，uint16_t，…的格式说明符？

将最小可能的浮点数添加到浮点数

为什么一个const成员函数可以修改一个静态数据成员？