Data Structures: Hash Tables

├─ What is a HASH TABLE
│ ├─ How can it implement ASSOCIATIVE ARRAY abstract data type
│ └─ How can it implement SET abstract data type
└─ What are its strengths and weaknesses

What is a hash table?

A HASH TABLE is a non-sequential data structure that uses a HASHER to evenly distribute entries inside into buckets for amortized O(1) insertion/search/deletion performance. To understand rationale behind, let's start with its logical coordinates:

ENTRY: entry that takes part of data structure
BUCKET LIST: data structure itself as a Native Array of CAPACITY size holding BUCKET entries. C++ prototype:
template<typename T, int (*comparator)(const T&, const T&), std::size_t (*hasher)(const T&)> class HashTable { ... private: LinkedList<T>* bucketList; // preallocated at CAPACITY size std::size_t capacity; // size of bucketList std::size_t size; // number of entries in data structure };
BUCKET: entry inside BUCKET LIST, itself a Linked List holding ideally one ENTRY item unless it has "collisions".
We shouldn't use Dynamic Array here for safety reasons: if a bad HASHER is used and collisions amount, deleting entry would become itself a problem
CAPACITY: number of entries BUCKET LIST expects from the beginning
HASHER: calculates a positive integer representation of ENTRY, to be used in identifying corresponding BUCKET in BUCKET LIST
ALGORITHM: hash(ENTRY)%CAPACITY => position @ BUCKET_LIST => BUCKET
COMPARATOR: compares two ENTRY items to insure uniqueness
ALGORITHM: foreach EXISTING_ENTRY in BUCKET, if isEqual(EXISTING_ENTRY, NEW_ENTRY) then aborts operation
SIZE: number of ENTRY items in data structure

Its most common usages are implementations of abstract data types:

HashMap: hash table that implements Associative Array
HashSet: hash table that implements Set

How are entries inserted in HASH TABLE?

How are ENTRY items inserted into a hash table:

program: instances HashTable with HASH/COMPARATORs, optionally signaling expected CAPACITY
HashTable: a BUCKET LIST is allocated with CAPACITY
program: uses above object to write ENTRY
HashTable: HASHER is applied to ENTRY to get BUCKET to write to
HashTable: COMPARATOR is applied to every ENTRY in BUCKET and writes only if none match

How does HASH TABLE grow?

If insertion is attempted but SIZE (number of items) will exceed CAPACITY, BUCKET LIST needs to grow by doubling its CAPACITY, same as a dynamic array. However, unlike latter, growth will be much more costly as it requires hash table to:

allocate a new BUCKET LIST
iterate each BUCKET in old BUCKET LIST then iterate each ENTRY found inside
for each ENTRY, apply HASHER to get position in new BUCKET LIST
uses BUCKET found at position to store existing ENTRY
deallocate old BUCKET LIST

How can it implement ASSOCIATIVE ARRAY abstract data type?

A HASH MAP implements Associative Array abstract data type using HASH TABLE above. In order to do so, it needs to derive hash table coordinates specification:

ENTRY: becomes a KEY-VALUE pair where:
- KEY: uniquely identifies entry in data structure
- VALUE: identifies entry payload (data)
Its C++ prototype will be:
template<typename K, typename V> struct MapEntry { K key; V value; };
BUCKET LIST: its C++ prototype changes to:
template<typename K, typename V, int (*comparator)(const K&, const K&), std::size_t (*hasher)(const K&)> class HashMap { ... private: LinkedList<MapEntry<K, V>>>* bucketList; // preallocated at CAPACITY size std::size_t capacity; // size of bucketList std::size_t size; // number of entries in data structure };
HASHER: calculates a positive integer representation of ENTRY KEY, to be used in identifying corresponding BUCKET in BUCKET LIST
ALGORITHM: hash(NEW_ENTRY.KEY)%CAPACITY => position @ BUCKET_LIST => BUCKET
COMPARATOR: compares two ENTRY items by KEY, preventing duplication in hashtable
ALGORITHM: foreach EXISTING_ENTRY in BUCKET, if isEqual(EXISTING_ENTRY.KEY, NEW_ENTRY.KEY) then UPDATE. If none matches then INSERT

Insertion steps specification derives into:

program instances HashTable with HASH/COMPARATORs, optionally signaling expected CAPACITY
HashTable: a BUCKET LIST is allocated with CAPACITY
program uses above object to set VALUE by KEY
HashTable: an ENTRY is created based on KEY and VALUE
HashTable: HASHER is applied to ENTRY KEY to get BUCKET to write to
HashTable: COMPARATOR is applied to every ENTRY in BUCKET to see if KEY exists already:
1. if exists: updates existing ENTRY with new VALUE
2. otherwise, if none match: adds new ENTRY

How are ASSOCIATIVE ARRAY operations implemented?

Operation	Description
clear	Deallocates BUCKETS LIST, reallocates new BUCKETS LIST and resets SIZE to zero.
containsKey	Applies HASHER to ENTRY KEY to detect BUCKET then loops through all ENTRIES inside and applies COMPARATOR to see if any KEY matches
containsValue	For each BUCKET in BUCKET LIST, loops through all ENTRIES inside and applies VALUE COMPARATOR to see if any VALUE matches
get	Applies HASHER to ENTRY KEY to detect BUCKET then loops through all ENTRIES inside and applies COMPARATOR to see if any KEY matches. If it does, returns VALUE of ENTRY. If not, ERROR thrown
set	Applies HASHER to ENTRY KEY to detect BUCKET then loops through all ENTRIES inside and applies COMPARATOR to see if any KEY matches. If it does, updates VALUE of existing ENTRY. If not, writes ENTRY and increments SIZE
removeKey	Applies HASHER to ENTRY KEY to detect BUCKET then loops through all ENTRIES inside and applies COMPARATOR to see if any KEY matches. If it does, removes ENTRY from BUCKET and decrements SIZE. If not, ERROR thrown
removeValue	For each BUCKET in BUCKET LIST, loops through all ENTRIES inside and applies VALUE COMPARATOR to see if any VALUE matches. If it does, removes ENTRY from BUCKET and decrements SIZE. If none match, ERROR thrown
isEmpty	Checks if SIZE is zero
size	Gets value of SIZE

As you can see above, two more coordinates are put forward:

VALUE COMPARATOR: compares two ENTRY items by VALUE
ERROR: out-of-range exception thrown if user attempts to get or remove non-existent ENTRY

For a complete implementation example, check HashMap class documentation or its GitHub code.

How can it implement SET abstract data type?

A HASH SET implements Set abstract data type using HASH TABLE above. Since hash table by itself fits SET requirements, no derivation is needed!

How are SET operations implemented?

Operation	Description
clear	Deallocates each BUCKET, then BUCKETS LIST, reallocates new BUCKETS LIST and resets SIZE to zero.
contains	Applies HASHER to ENTRY to detect BUCKET then loops through all ENTRIES inside and applies COMPARATOR to see if any ENTRY matches
add	Applies HASHER to ENTRY to detect BUCKET then loops through all ENTRIES inside and applies COMPARATOR to see if any ENTRY matches. If none does, writes ENTRY and increments SIZE
remove	Applies HASHER to ENTRY to detect BUCKET then loops through all ENTRIES inside and applies COMPARATOR to see if any ENTRY matches. If it does, removes ENTRY from BUCKET and decrements SIZE. If not, ERROR thrown
isEmpty	Checks if SIZE is zero
size	Gets value of SIZE

As you can see above, one more coordinates was put forward:

ERROR: out-of-range exception thrown if user attempts to remove non-existent ENTRY

For a complete implementation example, check HashSet class documentation or its GitHub code.

What are its strengths and weaknesses?

Overall, hash table presents itself with a very favorable speed and memory efficiency profile, which is why it should be solution of choice for Associative Array and Set implementation. It can also be used for many other implementations (storing nodes/vertexes/edges in a tree/graph, etc). That being said, it also comes with disadvantages along with advantages:

it severely depends on a powerful HASHER for ENTRY data type that guarantees near-even spread of ITEMS throughout BUCKETS
very fast O(1) random insertions/deletions/retrievals (as long as there are no collisions)
elements are saved "randomly" in buckets, so they cannot be predictably iterated by insertion/value order
it allocates more space than it uses because of Native Array that stores BUCKET LIST
growth is very costly, for reasons already mentioned