# How libstdc++ `std::unordered_map` implemented?

## Introduction

We all love maps. We love hash maps even more. They should be fast and help to solve a large number of problems. Do you ever wonder about how they are working under the hood? In this post I am going to explore implementation details of unordered associative containers from C++ Standard Library (precisely GCC's libstdc++ implementation, I'll use b9b7981f3d revision for code examples).

Currently there are four types of unordered associative containers:

`std::unordered_map`

,`std::unordered_set`

,`std::unordered_multimap`

,`std::unordered_multiset`

.

Usually, they are implemented on top of some kind of `Hashtable`

container. I am
going to jump into implementation of this `Hashtable`

container directly,
because that's where all the interesting stuff is hidden.

I'll focus on the key/value containers with a unique set of keys
(`std::unordered_map`

), `std::unordered_set`

have mostly similar logic. Multi
keys containers are different beasts. They share a large portion of code with
unique keys containers, but insertion and find logic is quite different, due to
the set of stored keys is actually a multi set (allows multiple instances of
the same key).

GCC's libstdc++ implementation can be found in the hashtable.h header, the
class of interest is named `_Hashtable`

. There will be a lot of names with
leading underscores. Not everyone is used to such code, but Standard Library
implementers have no choice to avoid collisions with user defined names
(macros for example).

## Policy-Based Design

`_Hashtable`

class implemented by decomposition into policies and closely
recalls GCC's Policy-Based Data Structures design. Some of these policies
are available for redefinition by the user, for example `_Alloc`

, `_Equal`

and
`_Hash`

. Some depend on the container type, like `_ExtractKey`

, `_Value`

. And
others like `_RangeHash`

and `_RehashPolicy`

can be defined by library implementers
only.

```
template<typename _Key, typename _Value, typename _Alloc,
typename _ExtractKey, typename _Equal,
typename _Hash, typename _RangeHash, typename _Unused,
typename _RehashPolicy, typename _Traits>
class _Hashtable
<...>
```

## Data Layout

If you don't want to dive into the details, you can use this comment
from the hashtable.h header to grasp the basics of `_Hashtable`

data layout.

### Nodes

One of the basic building blocks of the `_Hashtable`

is a node. Each node is
allocated from the heap and stores container data along with metadata
information to maintain hash table data structure. The actual content of the
node is data dependent (more about it later).

The node itself is a compound entity and contains several parts, some of them are optional. The design of the node structs brings to mind matryoshka dolls, because they are nested to each other. More complex node type (with more data) is inherited from the simpler node type (with a little bit less data). Let's walk through the components bottom up (from simpler to more complex).

`_Hash_node_base`

defined in the following way. It has only `_M_nxt`

field, which is a pointer to the next node of the hash table.

```
struct _Hash_node_base
{
_Hash_node_base* _M_nxt;
<...>
};
```

The next one `_Hash_node_value_base`

is a little bit more interesting
(see code here). `_Hash_node_value_base`

has a `_Value`

template
parameter which is actual data stored in the container.

```
template<typename _Value>
struct _Hash_node_value_base
{
typedef _Value value_type;
__gnu_cxx::__aligned_buffer<_Value> _M_storage;
<...>
};
```

`_Value`

type is wrapped into `__gnu_cxx::__aligned_buffer`

(thin wrapper around
`std::aligned_storage`

) to decouple memory allocation from actual object
creation.

The next struct is `_Hash_node_code_cache`

and it implements hash value
caching logic.

```
template<bool _Cache_hash_code>
struct _Hash_node_code_cache
{ };
template<>
struct _Hash_node_code_cache<true>
{ std::size_t _M_hash_code; };
```

`_Hash_node_code_cache`

uses template specialization mechanism to extend
struct with an additional `_M_hash_code`

field. And this optimization, I
believe, is one of the reasons why the «matryoshka dolls»-like design for node structs
are used. This way Empty Base Optimization (EBO) can be leveraged, when
`_Hash_node_code_cache`

will be extended by inheritance. And that's exactly
what `_Hash_node_value`

is doing:

```
template<typename _Value, bool _Cache_hash_code>
struct _Hash_node_value
: _Hash_node_value_base<_Value>
, _Hash_node_code_cache<_Cache_hash_code>
{ };
```

Size of the `_Hash_node_value`

will be the same as size of
`_Hash_node_value_base<_Value>`

in case template argument `_Cache_hash_code`

is
false as `_Hash_node_code_cache`

will be an empty struct.

The final piece of the puzzle is the `_Hash_node`

combining everything above
together:

```
template<typename _Value, bool _Cache_hash_code>
struct _Hash_node
: _Hash_node_base
, _Hash_node_value<_Value, _Cache_hash_code>
{
<...>
};
```

Below is the picture (original) of `_Hash_node`

struct
data layout to better visualize what's going on.

Summarizing, `_Hash_node`

(directly or inherited from base structs) contains
the following data.

`_Hash_node_base* _M_nxt`

is a pointer to the next element in the linked list of hash table elements.`__gnu_cxx::__aligned_buffer<_Value> _M_storage`

— node data itself. For example for`std::unordered_map<std::string, int>`

container`_Value`

template argument is`std::pair<const std::string, int>`

.`std::size_t _M_hash_code`

optional cached value of key's hash.

### Hash table

`_Hashtable`

class defined in the following way (I replaced type aliases
with actual types being used to simplify code reading):

`__buckets_ptr`

->`_Hash_node_base**`

,`size_type`

->`std::size_t`

,`__node_base`

->`_Hash_node_base`

,`__node_base_ptr`

->`_Hash_node_base*`

.

```
template<<...>>
class _Hashtable
<...>
{
private:
_Hash_node_base** _M_buckets = &_M_single_bucket;
std::size_t _M_bucket_count = 1;
_Hash_node_base _M_before_begin;
std::size_t _M_element_count = 0;
_RehashPolicy _M_rehash_policy;
<...>
_Hash_node_base* _M_single_bucket = nullptr;
};
```

The `_Hashtable`

class itself is a combination of
`std::forward_list<_Hash_node>`

containing the elements and
`std::vector<std::forward_list<_Hash_node>::iterator>`

representing the buckets
(code comment).

`_Hash_node_base** _M_buckets`

is an array of pointers to hash table nodes. You
can think of it as `_Hash_node_base* _M_buckets[]`

instead of pointer to a
pointer.

`_Hash_node_base _M_before_begin`

is a special node without any user data.
This node stores a pointer to the first hash table element (if there is
any) in `_M_before_begin._M_nxt`

.

An interesting thing is that `_M_buckets`

contains `_Hash_node_base*`

instead of `_Hash_node*`

. The reason is because `_M_buckets`

is kind of a
storage for two types of objects: actual hash table nodes and a special
«before begin node» (`_M_before_begin`

). Invariant is each bucket
stores a pointer to the node **before** the first node from the bucket.
Meaning, the bucket containing the first element of the table actually stores
the address of the `_M_before_begin`

element. I hope it becomes clearer with
an example.

Suppose we have the following code. We create `std::unordered_map`

and insert
four keys in this order: 14, 25, 36, 19.

```
std::unordered_map<int, int> map;
map[14] = 14;
map[25] = 25;
map[36] = 36;
map[19] = 19;
```

Then the internal `_Hashtable`

linked list will look like one on the picture
below (original). The key order in the
hash table is a reverse insertion order, so key's iteration order will be:
19, 36, 25, 14.

Let's make a real hash table from the linked list by adding buckets (original).

There are 11 buckets in the picture (vertical stack of rectangles), only two
buckets are not empty: bucket #3 (keys 36, 25 and 14) and bucket #8 (key 19).
Thin arrows are pointers from the `_Hashtable`

internal linked list from the
previous picture, this time slightly rearranged and grouped together by
buckets. Now you can probably understand better what I meant by the phrase
«each bucket stores a pointer to the node before the first node from the
bucket». Bucket #3 has keys 36, 25 and 14, but a `_Hash_node_base*`

from
`_M_buckets`

array point to the element with a key 19, which is a **previous**
element in the hash table iteration order. Same logic is true for the bucket #3.
For this bucket `_M_buckets`

array has a pointer to the `_M_before_begin`

node.

## «Fast» and «slow» hash functions

GCC's libstdc++ hash table implementation distinguish between fast and slow hash functions. If hash function is slow, then hash code will be cached inside the hash table node.

Currently `std::hash`

is fast by default (including user defined types),
except for `long double`

and string-like types (`std::string`

,
`std::string_view`

and all other variations of character types).

```
template<typename _Hash>
struct __is_fast_hash : public std::true_type
{ };
template<>
struct __is_fast_hash<hash<long double>> : public std::false_type
{ };
```

I am not completely sure what is the reasoning behind marking `std::hash<long double>`

as slow (and the commit message didn't shed more light on the topic), but for
string-like types it totally makes sense.

Comparison of two strings with length `n`

and `m`

has a `O(min(n, m))`

time
complexity, but if we'll cache hash value in the hash table node, we can
implement faster negative comparison: if hash codes of two strings do not
match, we can instantly conclude they are not equal. Moreover, for a `rehash`

operation we can avoid hash recalculation for every key in the hash table as
well. All of that in exchange for 8 more bytes of memory to store a hash value
(`std::size_t`

) for every node. Seems like a reasonable trade-off for me.

As a side note I want to mention that for integer types `std::hash`

is defined as an identity function, which is indeed fast.

```
#define _Cxx_hashtable_define_trivial_hash(_Tp) \
template<> \
struct hash<_Tp> : public __hash_base<size_t, _Tp> \
{ \
size_t \
operator()(_Tp __val) const noexcept \
{ return static_cast<size_t>(__val); } \
};
```

## Insert

Now, when we know how hash table data structure is organized internally, let's
look into the implementation of `insert`

.

High level steps are the following.

- Check there is no such key in the hash table.
- Create a hash table node.
- Insert a new node to a hash table data structure.

The implementation is in the `_M_insert_unique`

method. And there is
a surprise from the very first line of the code.

```
if (size() <= __small_size_threshold())
for (auto __it = begin(); __it != end(); ++__it)
if (this->_M_key_equals_tr(__k, *__it._M_cur))
return { __it, false };
```

If the hash table is «small» we just iterate from the beginning to the end and compare all the keys already in the hash table with a key we are trying to insert. Thresholds for «fast» and «slow» hash functions are different.

```
template<typename _Hash>
struct _Hashtable_hash_traits
{
static constexpr std::size_t
__small_size_threshold() noexcept
{ return std::__is_fast_hash<_Hash>::value ? 0 : 20; }
};
```

Then, we generate a hash code from a search key and try to find if there is a node in the hash table, but only for «big» hash tables, as we already did a linear search for «small» ones.

```
__hash_code __code = this->_M_hash_code_tr(__k);
size_type __bkt = _M_bucket_index(__code);
if (size() > __small_size_threshold())
if (__node_ptr __node = _M_find_node_tr(__bkt, __k, __code))
return { iterator(__node), false };
```

And finally, if there is no such key in the hash table, we create a new node and insert it into the data structure.

```
_Scoped_node __node {
__node_builder_t::_S_build(std::forward<_Kt>(__k),
std::forward<_Arg>(__v),
__node_gen),
this
};
auto __pos
= _M_insert_unique_node(__bkt, __code, __node._M_node);
__node._M_node = nullptr;
return { __pos, true };
```

Actual insertion logic is hidden inside `_M_insert_unique_node`

method.
There we decide if the hash table requires a rehash and insert a node into
the beginning of the bucket.

```
const __rehash_state& __saved_state = _M_rehash_policy._M_state();
std::pair<bool, std::size_t> __do_rehash
= _M_rehash_policy._M_need_rehash(_M_bucket_count, _M_element_count,
__n_elt);
if (__do_rehash.first)
{
_M_rehash(__do_rehash.second, __saved_state);
__bkt = _M_bucket_index(__code);
}
this->_M_store_code(*__node, __code);
// Always insert at the beginning of the bucket.
_M_insert_bucket_begin(__bkt, __node);
++_M_element_count;
return iterator(__node);
```

There is a lot of interesting things inside `_M_rehash_policy._M_need_rehash`

,
but I don't want to bore you with too many details, only mention the fact that
the number of buckets in the hash table is a prime number.

## Find

`_Hashtable::find`

has the same optimization for small hash tables as `insert`

,
for «small» hash tables with «slow» hash function we will do a
linear search first, otherwise do a usual bucket search.

```
__hash_code __code = this->_M_hash_code(__k);
std::size_t __bkt = _M_bucket_index(__code);
return const_iterator(_M_find_node(__bkt, __k, __code));
```

`_M_find_node`

is implemented through the call to `_M_find_before_node`

.

```
__node_ptr
_M_find_node(size_type __bkt, const key_type& __key,
__hash_code __c) const
{
__node_base_ptr __before_n = _M_find_before_node(__bkt, __key, __c);
if (__before_n)
return static_cast<__node_ptr>(__before_n->_M_nxt);
return nullptr;
}
```

`_M_find_before_node`

is a good building block to have if you'll think about
`erase`

implementation as we need to remove an element from the singly linked
lists, so having a pointer to a previous node comes in handy.

`_M_find_before_node`

does mostly what we expect it to do, but has a couple of
interesting things in the sleeve.

We locate a pointer to the element before the first bucket element.

```
__node_base_ptr __prev_p = _M_buckets[__bkt];
if (!__prev_p)
return nullptr;
```

And iterate through elements in the bucket.

```
for (__node_ptr __p = static_cast<__node_ptr>(__prev_p->_M_nxt);;
__p = __p->_M_next())
{
if (this->_M_equals(__k, __code, *__p))
return __prev_p;
if (!__p->_M_nxt || _M_bucket_index(*__p->_M_next()) != __bkt)
break;
__prev_p = __p;
}
```

There is no stop condition in the `for`

loop statement itself, but only in the
loop body. We will stop when there is no next element in the hash table linked
list or we are done with a current bucket.

```
if (!__p->_M_nxt || _M_bucket_index(*__p->_M_next()) != __bkt)
break;
```

The only way for us to understand we have crossed the bucket's boundary is to
explicitly get the bucket number for the element. To do so, we need to either
locate the cached hash value from the node itself, or recalculate hash code
from the key by calling `std::hash`

. Now you probably get a better
understanding of the rationale behind `__is_fast_hash`

logic as we need to know
the hash value of each element in the bucket chain until we find the right
one. And if the `std::hash`

calls are expensive and hash code wasn't cached in
the node, then `find`

performance might severely degrade.

Well, back to `_M_find_before_node`

, to compare the search key with a key in the
node we call `_M_equals`

, where we compare hash values first and if they are
equal, then compare keys.

```
bool
_M_equals(const _Key& __k, __hash_code __c,
const _Hash_node_value<_Value, __hash_cached::value>& __n) const
{ return _S_equals(__c, __n) && _M_key_equals(__k, __n); }
```

And at the same time, `_S_equals`

have two overloads, one for nodes with cached
value and one for nodes without hash. When a node has cached value, we
compare key hash with a hash in the node. If there is no cached hash
value, we do nothing. Think about integer keys for example. We know
`std::hash<int>{}(x) == x`

, so there is no point in comparing hashes first.

```
static bool
_S_equals(__hash_code __c, const _Hash_node_code_cache<true>& __n)
{ return __c == __n._M_hash_code; }
static bool
_S_equals(__hash_code, const _Hash_node_code_cache<false>&)
{ return true; }
```

## Erase

Last operation we need to cover to get a complete understanding of basic hash
table operations is `erase`

. As I mentioned above, from the internal
representation point of view, nodes are connected together as a singly linked
list, so erase operation is similar to element removal from linked list.

As a first step we need to make sure the key we want to erase from the container is
present, to simplify the code, we actually will find the node **before** the
node we are going to remove. There are again two possible paths for «small» and «big»
hash tables.

For «small» tables we just do a linear search in the linked list by calling
`_M_find_before_node`

.

```
if (size() <= __small_size_threshold())
{
__prev_n = _M_find_before_node(__k);
if (!__prev_n)
return 0;
// We found a matching node, erase it.
__n = static_cast<__node_ptr>(__prev_n->_M_nxt);
__bkt = _M_bucket_index(*__n);
}
```

And for bigger ones, we are trying to find the correct bucket for the key and then do a search in the bucket.

```
else
{
__hash_code __code = this->_M_hash_code(__k);
__bkt = _M_bucket_index(__code);
// Look for the node before the first matching node.
__prev_n = _M_find_before_node(__bkt, __k, __code);
if (!__prev_n)
return 0;
// We found a matching node, erase it.
__n = static_cast<__node_ptr>(__prev_n->_M_nxt);
}
```

If we found something, the actual manipulations with the linked list pointers are
done in the overload of `_M_erase`

method for three arguments: `__bkt`

(bucket), `__prev_n`

(previous node in the hash table linked list) and `__n`

(node we want to erase), where we update `_M_nxt`

pointer for `__prev_n`

and
`_M_buckets`

value if necessary, then destroy the node itself.

```
if (__prev_n == _M_buckets[__bkt])
_M_remove_bucket_begin(__bkt, __n->_M_next(),
__n->_M_nxt ? _M_bucket_index(*__n->_M_next()) : 0);
else if (__n->_M_nxt)
{
size_type __next_bkt = _M_bucket_index(*__n->_M_next());
if (__next_bkt != __bkt)
_M_buckets[__next_bkt] = __prev_n;
}
__prev_n->_M_nxt = __n->_M_nxt;
iterator __result(__n->_M_next());
this->_M_deallocate_node(__n);
--_M_element_count;
return __result;
```

## Closing thoughts

There are a lot of cool tricks implemented to improve performance of
`std::unordered_map`

libstdc++ implementation. These tricks help for sure,
but the main issue of `std::unordered_map`

(not only libstdc++ implementation,
but all standard compatible ones) is cache unfriendliness.

Almost every operation on the container is practically a textbook example of pointer chasing, so for real world use cases performance would not be as great as it can be in «ideal» implementation from the data locality point of view. The main problem is standard compatible implementation requires reference and iterator stability and this requirement forces hash tables to be implemented using chaining. I hope we'll find a solution for this in the future and bring faster hash tables in the C++ standard library, but we are not here yet.

In any case, libstdc++ implementation of `std::unordered_map`

is really
awesome. I really enjoyed reading the code and learnt a lot, I hope you are
too.