blog.ilvokhin.com

How libstdc++ `std::unordered_map` implemented?

Introduction

We all love maps. We love hash maps even more. They should be fast and help to solve a large number of problems. Do you ever wonder about how they are working under the hood? In this post I am going to explore implementation details of unordered associative containers from C++ Standard Library (precisely GCC's libstdc++ implementation, I'll use b9b7981f3d revision for code examples).

Currently there are four types of unordered associative containers:

Usually, they are implemented on top of some kind of Hashtable container. I am going to jump into implementation of this Hashtable container directly, because that's where all the interesting stuff is hidden.

I'll focus on the key/value containers with a unique set of keys (std::unordered_map), std::unordered_set have mostly similar logic. Multi keys containers are different beasts. They share a large portion of code with unique keys containers, but insertion and find logic is quite different, due to the set of stored keys is actually a multi set (allows multiple instances of the same key).

GCC's libstdc++ implementation can be found in the hashtable.h header, the class of interest is named _Hashtable. There will be a lot of names with leading underscores. Not everyone is used to such code, but Standard Library implementers have no choice to avoid collisions with user defined names (macros for example).

Policy-Based Design

_Hashtable class implemented by decomposition into policies and closely recalls GCC's Policy-Based Data Structures design. Some of these policies are available for redefinition by the user, for example _Alloc, _Equal and _Hash. Some depend on the container type, like _ExtractKey, _Value. And others like _RangeHash and _RehashPolicy can be defined by library implementers only.

template<typename _Key, typename _Value, typename _Alloc,
   typename _ExtractKey, typename _Equal,
   typename _Hash, typename _RangeHash, typename _Unused,
   typename _RehashPolicy, typename _Traits>
class _Hashtable
<...>

Data Layout

If you don't want to dive into the details, you can use this comment from the hashtable.h header to grasp the basics of _Hashtable data layout.

Nodes

One of the basic building blocks of the _Hashtable is a node. Each node is allocated from the heap and stores container data along with metadata information to maintain hash table data structure. The actual content of the node is data dependent (more about it later).

The node itself is a compound entity and contains several parts, some of them are optional. The design of the node structs brings to mind matryoshka dolls, because they are nested to each other. More complex node type (with more data) is inherited from the simpler node type (with a little bit less data). Let's walk through the components bottom up (from simpler to more complex).

_Hash_node_base defined in the following way. It has only _M_nxt field, which is a pointer to the next node of the hash table.

struct _Hash_node_base
{ 
  _Hash_node_base* _M_nxt;

<...>
};

The next one _Hash_node_value_base is a little bit more interesting (see code here). _Hash_node_value_base has a _Value template parameter which is actual data stored in the container.

template<typename _Value>
  struct _Hash_node_value_base
  {
    typedef _Value value_type;

    __gnu_cxx::__aligned_buffer<_Value> _M_storage;

    <...>
  };

_Value type is wrapped into __gnu_cxx::__aligned_buffer (thin wrapper around std::aligned_storage) to decouple memory allocation from actual object creation.

The next struct is _Hash_node_code_cache and it implements hash value caching logic.

template<bool _Cache_hash_code>
  struct _Hash_node_code_cache
  { };

template<>
  struct _Hash_node_code_cache<true>
  { std::size_t  _M_hash_code; };

_Hash_node_code_cache uses template specialization mechanism to extend struct with an additional _M_hash_code field. And this optimization, I believe, is one of the reasons why the «matryoshka dolls»-like design for node structs are used. This way Empty Base Optimization (EBO) can be leveraged, when _Hash_node_code_cache will be extended by inheritance. And that's exactly what _Hash_node_value is doing:

template<typename _Value, bool _Cache_hash_code>
  struct _Hash_node_value
  : _Hash_node_value_base<_Value>
  , _Hash_node_code_cache<_Cache_hash_code>
  { };

Size of the _Hash_node_value will be the same as size of _Hash_node_value_base<_Value> in case template argument _Cache_hash_code is false as _Hash_node_code_cache will be an empty struct.

The final piece of the puzzle is the _Hash_node combining everything above together:

template<typename _Value, bool _Cache_hash_code>
  struct _Hash_node
  : _Hash_node_base
  , _Hash_node_value<_Value, _Cache_hash_code>
  {
    <...>
  };

Below is the picture (original) of _Hash_node struct data layout to better visualize what's going on.

Summarizing, _Hash_node (directly or inherited from base structs) contains the following data.

  1. _Hash_node_base* _M_nxt is a pointer to the next element in the linked list of hash table elements.
  2. __gnu_cxx::__aligned_buffer<_Value> _M_storage — node data itself. For example for std::unordered_map<std::string, int> container _Value template argument is std::pair<const std::string, int>.
  3. std::size_t _M_hash_code optional cached value of key's hash.

Hash table

_Hashtable class defined in the following way (I replaced type aliases with actual types being used to simplify code reading):

template<<...>>
  class _Hashtable
  <...>
  {
    private:
      _Hash_node_base**     _M_buckets          = &_M_single_bucket;
      std::size_t           _M_bucket_count     = 1;
      _Hash_node_base       _M_before_begin;
      std::size_t           _M_element_count    = 0;
      _RehashPolicy         _M_rehash_policy;

      <...>                   
       _Hash_node_base*     _M_single_bucket    = nullptr;
  };

The _Hashtable class itself is a combination of std::forward_list<_Hash_node> containing the elements and std::vector<std::forward_list<_Hash_node>::iterator> representing the buckets (code comment).

_Hash_node_base** _M_buckets is an array of pointers to hash table nodes. You can think of it as _Hash_node_base* _M_buckets[] instead of pointer to a pointer.

_Hash_node_base _M_before_begin is a special node without any user data. This node stores a pointer to the first hash table element (if there is any) in _M_before_begin._M_nxt.

An interesting thing is that _M_buckets contains _Hash_node_base* instead of _Hash_node*. The reason is because _M_buckets is kind of a storage for two types of objects: actual hash table nodes and a special «before begin node» (_M_before_begin). Invariant is each bucket stores a pointer to the node before the first node from the bucket. Meaning, the bucket containing the first element of the table actually stores the address of the _M_before_begin element. I hope it becomes clearer with an example.

Suppose we have the following code. We create std::unordered_map and insert four keys in this order: 14, 25, 36, 19.

std::unordered_map<int, int> map;

map[14] = 14;
map[25] = 25;
map[36] = 36;
map[19] = 19;

Then the internal _Hashtable linked list will look like one on the picture below (original). The key order in the hash table is a reverse insertion order, so key's iteration order will be: 19, 36, 25, 14.

Let's make a real hash table from the linked list by adding buckets (original).

There are 11 buckets in the picture (vertical stack of rectangles), only two buckets are not empty: bucket #3 (keys 36, 25 and 14) and bucket #8 (key 19). Thin arrows are pointers from the _Hashtable internal linked list from the previous picture, this time slightly rearranged and grouped together by buckets. Now you can probably understand better what I meant by the phrase «each bucket stores a pointer to the node before the first node from the bucket». Bucket #3 has keys 36, 25 and 14, but a _Hash_node_base* from _M_buckets array point to the element with a key 19, which is a previous element in the hash table iteration order. Same logic is true for the bucket #3. For this bucket _M_buckets array has a pointer to the _M_before_begin node.

«Fast» and «slow» hash functions

GCC's libstdc++ hash table implementation distinguish between fast and slow hash functions. If hash function is slow, then hash code will be cached inside the hash table node.

Currently std::hash is fast by default (including user defined types), except for long double and string-like types (std::string, std::string_view and all other variations of character types).

template<typename _Hash>
struct __is_fast_hash : public std::true_type
{ };

template<>
struct __is_fast_hash<hash<long double>> : public std::false_type
{ };

I am not completely sure what is the reasoning behind marking std::hash<long double> as slow (and the commit message didn't shed more light on the topic), but for string-like types it totally makes sense.

Comparison of two strings with length n and m has a O(min(n, m)) time complexity, but if we'll cache hash value in the hash table node, we can implement faster negative comparison: if hash codes of two strings do not match, we can instantly conclude they are not equal. Moreover, for a rehash operation we can avoid hash recalculation for every key in the hash table as well. All of that in exchange for 8 more bytes of memory to store a hash value (std::size_t) for every node. Seems like a reasonable trade-off for me.

As a side note I want to mention that for integer types std::hash is defined as an identity function, which is indeed fast.

#define _Cxx_hashtable_define_trivial_hash(_Tp)     \
  template<>                        \
    struct hash<_Tp> : public __hash_base<size_t, _Tp>  \
    {                                                   \
      size_t                                            \
      operator()(_Tp __val) const noexcept              \
      { return static_cast<size_t>(__val); }            \
    };

Insert

Now, when we know how hash table data structure is organized internally, let's look into the implementation of insert.

High level steps are the following.

  1. Check there is no such key in the hash table.
  2. Create a hash table node.
  3. Insert a new node to a hash table data structure.

The implementation is in the _M_insert_unique method. And there is a surprise from the very first line of the code.

if (size() <= __small_size_threshold())
  for (auto __it = begin(); __it != end(); ++__it)
    if (this->_M_key_equals_tr(__k, *__it._M_cur))
      return { __it, false };

If the hash table is «small» we just iterate from the beginning to the end and compare all the keys already in the hash table with a key we are trying to insert. Thresholds for «fast» and «slow» hash functions are different.

template<typename _Hash>
struct _Hashtable_hash_traits
{ 
  static constexpr std::size_t
  __small_size_threshold() noexcept
  { return std::__is_fast_hash<_Hash>::value ? 0 : 20; }
};

Then, we generate a hash code from a search key and try to find if there is a node in the hash table, but only for «big» hash tables, as we already did a linear search for «small» ones.

__hash_code __code = this->_M_hash_code_tr(__k);
size_type __bkt = _M_bucket_index(__code);

if (size() > __small_size_threshold())
  if (__node_ptr __node = _M_find_node_tr(__bkt, __k, __code))
    return { iterator(__node), false };

And finally, if there is no such key in the hash table, we create a new node and insert it into the data structure.

_Scoped_node __node {
  __node_builder_t::_S_build(std::forward<_Kt>(__k),
                 std::forward<_Arg>(__v),
                 __node_gen),
  this
};
auto __pos
  = _M_insert_unique_node(__bkt, __code, __node._M_node);
__node._M_node = nullptr;
return { __pos, true };

Actual insertion logic is hidden inside _M_insert_unique_node method. There we decide if the hash table requires a rehash and insert a node into the beginning of the bucket.

const __rehash_state& __saved_state = _M_rehash_policy._M_state();
std::pair<bool, std::size_t> __do_rehash
  = _M_rehash_policy._M_need_rehash(_M_bucket_count, _M_element_count,
                                    __n_elt);

if (__do_rehash.first)
  { 
    _M_rehash(__do_rehash.second, __saved_state);
    __bkt = _M_bucket_index(__code);
  }

this->_M_store_code(*__node, __code);

// Always insert at the beginning of the bucket.
_M_insert_bucket_begin(__bkt, __node);
++_M_element_count;
return iterator(__node);

There is a lot of interesting things inside _M_rehash_policy._M_need_rehash, but I don't want to bore you with too many details, only mention the fact that the number of buckets in the hash table is a prime number.

Find

_Hashtable::find has the same optimization for small hash tables as insert, for «small» hash tables with «slow» hash function we will do a linear search first, otherwise do a usual bucket search.

__hash_code __code = this->_M_hash_code(__k);
std::size_t __bkt = _M_bucket_index(__code);
return const_iterator(_M_find_node(__bkt, __k, __code));

_M_find_node is implemented through the call to _M_find_before_node.

__node_ptr
_M_find_node(size_type __bkt, const key_type& __key,
             __hash_code __c) const
{    
  __node_base_ptr __before_n = _M_find_before_node(__bkt, __key, __c);
  if (__before_n)
    return static_cast<__node_ptr>(__before_n->_M_nxt);
  return nullptr;
}

_M_find_before_node is a good building block to have if you'll think about erase implementation as we need to remove an element from the singly linked lists, so having a pointer to a previous node comes in handy.

_M_find_before_node does mostly what we expect it to do, but has a couple of interesting things in the sleeve.

We locate a pointer to the element before the first bucket element.

__node_base_ptr __prev_p = _M_buckets[__bkt];
if (!__prev_p)
  return nullptr;

And iterate through elements in the bucket.

for (__node_ptr __p = static_cast<__node_ptr>(__prev_p->_M_nxt);;
     __p = __p->_M_next())
  { 
    if (this->_M_equals(__k, __code, *__p))
      return __prev_p;

    if (!__p->_M_nxt || _M_bucket_index(*__p->_M_next()) != __bkt)
      break;
    __prev_p = __p;
  }

There is no stop condition in the for loop statement itself, but only in the loop body. We will stop when there is no next element in the hash table linked list or we are done with a current bucket.

if (!__p->_M_nxt || _M_bucket_index(*__p->_M_next()) != __bkt)
  break;

The only way for us to understand we have crossed the bucket's boundary is to explicitly get the bucket number for the element. To do so, we need to either locate the cached hash value from the node itself, or recalculate hash code from the key by calling std::hash. Now you probably get a better understanding of the rationale behind __is_fast_hash logic as we need to know the hash value of each element in the bucket chain until we find the right one. And if the std::hash calls are expensive and hash code wasn't cached in the node, then find performance might severely degrade.

Well, back to _M_find_before_node, to compare the search key with a key in the node we call _M_equals, where we compare hash values first and if they are equal, then compare keys.

bool
_M_equals(const _Key& __k, __hash_code __c,
const _Hash_node_value<_Value, __hash_cached::value>& __n) const
{ return _S_equals(__c, __n) && _M_key_equals(__k, __n); }

And at the same time, _S_equals have two overloads, one for nodes with cached value and one for nodes without hash. When a node has cached value, we compare key hash with a hash in the node. If there is no cached hash value, we do nothing. Think about integer keys for example. We know std::hash<int>{}(x) == x, so there is no point in comparing hashes first.

static bool
_S_equals(__hash_code __c, const _Hash_node_code_cache<true>& __n)
{ return __c == __n._M_hash_code; }

static bool
_S_equals(__hash_code, const _Hash_node_code_cache<false>&)
{ return true; }

Erase

Last operation we need to cover to get a complete understanding of basic hash table operations is erase. As I mentioned above, from the internal representation point of view, nodes are connected together as a singly linked list, so erase operation is similar to element removal from linked list.

As a first step we need to make sure the key we want to erase from the container is present, to simplify the code, we actually will find the node before the node we are going to remove. There are again two possible paths for «small» and «big» hash tables.

For «small» tables we just do a linear search in the linked list by calling _M_find_before_node.

if (size() <= __small_size_threshold())
  {
   __prev_n = _M_find_before_node(__k);
   if (!__prev_n)
     return 0;

   // We found a matching node, erase it.
   __n = static_cast<__node_ptr>(__prev_n->_M_nxt);
   __bkt = _M_bucket_index(*__n);
  }

And for bigger ones, we are trying to find the correct bucket for the key and then do a search in the bucket.

else
  { 
   __hash_code __code = this->_M_hash_code(__k);
   __bkt = _M_bucket_index(__code);

   // Look for the node before the first matching node.
   __prev_n = _M_find_before_node(__bkt, __k, __code);
   if (!__prev_n)
     return 0;

   // We found a matching node, erase it.
   __n = static_cast<__node_ptr>(__prev_n->_M_nxt);
  }

If we found something, the actual manipulations with the linked list pointers are done in the overload of _M_erase method for three arguments: __bkt (bucket), __prev_n (previous node in the hash table linked list) and __n (node we want to erase), where we update _M_nxt pointer for __prev_n and _M_buckets value if necessary, then destroy the node itself.

if (__prev_n == _M_buckets[__bkt])
  _M_remove_bucket_begin(__bkt, __n->_M_next(),
     __n->_M_nxt ? _M_bucket_index(*__n->_M_next()) : 0);
else if (__n->_M_nxt)
  {
   size_type __next_bkt = _M_bucket_index(*__n->_M_next());
   if (__next_bkt != __bkt)
     _M_buckets[__next_bkt] = __prev_n;
  }

__prev_n->_M_nxt = __n->_M_nxt;
iterator __result(__n->_M_next());
this->_M_deallocate_node(__n);
--_M_element_count;

return __result;

Closing thoughts

There are a lot of cool tricks implemented to improve performance of std::unordered_map libstdc++ implementation. These tricks help for sure, but the main issue of std::unordered_map (not only libstdc++ implementation, but all standard compatible ones) is cache unfriendliness.

Almost every operation on the container is practically a textbook example of pointer chasing, so for real world use cases performance would not be as great as it can be in «ideal» implementation from the data locality point of view. The main problem is standard compatible implementation requires reference and iterator stability and this requirement forces hash tables to be implemented using chaining. I hope we'll find a solution for this in the future and bring faster hash tables in the C++ standard library, but we are not here yet.

In any case, libstdc++ implementation of std::unordered_map is really awesome. I really enjoyed reading the code and learnt a lot, I hope you are too.