Data Structures: Hash Tables

What is a hash table?

A HASH TABLE is a non-sequential data structure that uses a HASHER to evenly distribute entries inside into buckets for amortized O(1) insertion/search/deletion performance. To understand rationale behind, let's start with its logical coordinates:

Its most common usages are implementations of abstract data types:

How are entries inserted in HASH TABLE?

How are ENTRY items inserted into a hash table:

  1. program: instances HashTable with HASH/COMPARATORs, optionally signaling expected CAPACITY
  2. HashTable: a BUCKET LIST is allocated with CAPACITY
  3. program: uses above object to write ENTRY
  4. HashTable: HASHER is applied to ENTRY to get BUCKET to write to
  5. HashTable: COMPARATOR is applied to every ENTRY in BUCKET and writes only if none match

How does HASH TABLE grow?

If insertion is attempted but SIZE (number of items) will exceed CAPACITY, BUCKET LIST needs to grow by doubling its CAPACITY, same as a dynamic array. However, unlike latter, growth will be much more costly as it requires hash table to:

  1. allocate a new BUCKET LIST
  2. iterate each BUCKET in old BUCKET LIST then iterate each ENTRY found inside
  3. for each ENTRY, apply HASHER to get position in new BUCKET LIST
  4. uses BUCKET found at position to store existing ENTRY
  5. deallocate old BUCKET LIST

How can it implement ASSOCIATIVE ARRAY abstract data type?

A HASH MAP implements Associative Array abstract data type using HASH TABLE above. In order to do so, it needs to derive hash table coordinates specification:

Insertion steps specification derives into:

  1. program instances HashTable with HASH/COMPARATORs, optionally signaling expected CAPACITY
  2. HashTable: a BUCKET LIST is allocated with CAPACITY
  3. program uses above object to set VALUE by KEY
  4. HashTable: an ENTRY is created based on KEY and VALUE
  5. HashTable: HASHER is applied to ENTRY KEY to get BUCKET to write to
  6. HashTable: COMPARATOR is applied to every ENTRY in BUCKET to see if KEY exists already:
    1. if exists: updates existing ENTRY with new VALUE
    2. otherwise, if none match: adds new ENTRY

How are ASSOCIATIVE ARRAY operations implemented?

Operation Description
clear Deallocates BUCKETS LIST, reallocates new BUCKETS LIST and resets SIZE to zero.
containsKey Applies HASHER to ENTRY KEY to detect BUCKET then loops through all ENTRIES inside and applies COMPARATOR to see if any KEY matches
containsValue For each BUCKET in BUCKET LIST, loops through all ENTRIES inside and applies VALUE COMPARATOR to see if any VALUE matches
get Applies HASHER to ENTRY KEY to detect BUCKET then loops through all ENTRIES inside and applies COMPARATOR to see if any KEY matches. If it does, returns VALUE of ENTRY. If not, ERROR thrown
set Applies HASHER to ENTRY KEY to detect BUCKET then loops through all ENTRIES inside and applies COMPARATOR to see if any KEY matches. If it does, updates VALUE of existing ENTRY. If not, writes ENTRY and increments SIZE
removeKey Applies HASHER to ENTRY KEY to detect BUCKET then loops through all ENTRIES inside and applies COMPARATOR to see if any KEY matches. If it does, removes ENTRY from BUCKET and decrements SIZE. If not, ERROR thrown
removeValue For each BUCKET in BUCKET LIST, loops through all ENTRIES inside and applies VALUE COMPARATOR to see if any VALUE matches. If it does, removes ENTRY from BUCKET and decrements SIZE. If none match, ERROR thrown
isEmpty Checks if SIZE is zero
size Gets value of SIZE

As you can see above, two more coordinates are put forward:

  1. VALUE COMPARATOR: compares two ENTRY items by VALUE
  2. ERROR: out-of-range exception thrown if user attempts to get or remove non-existent ENTRY

For a complete implementation example, check HashMap class documentation or its GitHub code.

How can it implement SET abstract data type?

A HASH SET implements Set abstract data type using HASH TABLE above. Since hash table by itself fits SET requirements, no derivation is needed!

How are SET operations implemented?

Operation Description
clear Deallocates each BUCKET, then BUCKETS LIST, reallocates new BUCKETS LIST and resets SIZE to zero.
contains Applies HASHER to ENTRY to detect BUCKET then loops through all ENTRIES inside and applies COMPARATOR to see if any ENTRY matches
add Applies HASHER to ENTRY to detect BUCKET then loops through all ENTRIES inside and applies COMPARATOR to see if any ENTRY matches. If none does, writes ENTRY and increments SIZE
remove Applies HASHER to ENTRY to detect BUCKET then loops through all ENTRIES inside and applies COMPARATOR to see if any ENTRY matches. If it does, removes ENTRY from BUCKET and decrements SIZE. If not, ERROR thrown
isEmpty Checks if SIZE is zero
size Gets value of SIZE

As you can see above, one more coordinates was put forward:

  1. ERROR: out-of-range exception thrown if user attempts to remove non-existent ENTRY

For a complete implementation example, check HashSet class documentation or its GitHub code.

What are its strengths and weaknesses?

Overall, hash table presents itself with a very favorable speed and memory efficiency profile, which is why it should be solution of choice for Associative Array and Set implementation. It can also be used for many other implementations (storing nodes/vertexes/edges in a tree/graph, etc). That being said, it also comes with disadvantages along with advantages: