Abstract Data Types: Sets

├─ What is a SET
├─ What are the operations a SET requires
└─ What are the data structures that implement SET

What is a set?

A set is a sequential abstract data type where following rules have to be obeyed:

any entry must be unique

How values are added to set:

What are the operations a set requires?

Although, as a abstract data type, set comes with no implementation, it requires from data structures that implement it to provide support for a minimal set of operations:

Operation	Arguments	Returns	Description
add	VALUE	void	Adds value to set.
clear		void	Clears set of all values.
contains	VALUE	bool	Checks if value exists in set.
remove	VALUE	void	Removes element by value.
isEmpty		bool	Checks if set is empty
size		ULONG	Gets number of entries in set

Where:

VALUE: value of a set entry

What are the data structures that implement set?

Most common implementations of set are:

Hash Table
- Strengths:
  - By far the fastest in insertion/lookup/deletion of entries: because hash table only needs to compute a hash of entry and look up in bucket (List at position @ List) that corresponds to hash to add/delete/retrieve that entry. Because bucket ideally contains one entry only, this operation is considered to occur in constant O(1) time.
  - By far the smallest in memory consumption: only a List of List.
- Weaknesses:
  - Unpredictable iteration, which makes it hard to test.
  - It depends on:
    - a fast hash function that guarantees even distribution throughout buckets, so that the List inside ideally contains only one entry.
    - a reliable comparator function that guarantees there won't be duplicates (which would violate above-mentioned set principles)
Linked Hash Table
- Strengths:
  - Almost as fast, just a bit slower (5-10%) than Hash Table on insertion/lookup/deletion of entries: because it needs to maintain the doubly linked list that preserves insertion order as well.
  - Predictable iteration, which makes testing easy
- Weaknesses:
  - Greater memory consumption (33%) than Hash Table, because it needs to store the doubly linked list that maintains insertion order.
  - Same dependency on a good hash/comparator function.
Red Black Tree
- Strengths:
  - Stays ordered.
  - Predictable iteration, which makes testing easy
- Weaknesses:
  - Much slower (6-7 times) than Hash Table in insertion/deletion of entries: because the red black tree always has to rebalance itself, which is a VERY costly operation that may involve the whole data set.
  - Greater memory consumption (33%) than Hash Table, because the struct that wraps each value requires more information.
  - Depends on a reliable comparator function.

This time complexity table using Big O Notation shows advantages and disadvantages of each data structure chosen for each set operation:

Operation	Hash Table	Linked Hash Table	Red Black Tree
add	O(1)	O(1)	O(log(N))
clear	O(N)	O(N)	O(N)
contains	O(1)	O(1)	O(log(N))
find	O(1)	O(1)	O(log(N))
remove	O(1)	O(1)	O(log(N))
isEmpty	O(1)	O(1)	O(1)
size	O(1)	O(1)	O(1)

Conclusion: Hash Table, with its speed/memory efficiency advantages, should be solution of choice unless one specifically desires structure to be predictably iterated or self-ordered!