Abstract Data Types: Sets
Home >
Set Abstract Data Type
What is a set?
A set is a sequential abstract data type where following rules have to be obeyed:
How values are added to set:
What are the operations a set requires?
Although, as a abstract data type, set comes with no implementation, it requires from data structures that implement it to provide support
for a minimal set of operations:
| Operation |
Arguments |
Returns |
Description |
| add |
VALUE |
void |
Adds value to set. |
| clear |
|
void |
Clears set of all values. |
| contains |
VALUE |
bool |
Checks if value exists in set. |
| remove |
VALUE |
void |
Removes element by value. |
| isEmpty |
|
bool |
Checks if set is empty |
| size |
|
ULONG |
Gets number of entries in set |
Where:
- VALUE: value of a set entry
What are the data structures that implement set?
Most common implementations of set are:
- Hash Table
- Strengths:
- By far the fastest in insertion/lookup/deletion of entries: because hash table only needs to compute a hash of entry and look up in bucket (List at position @ List) that corresponds to hash to add/delete/retrieve that entry. Because bucket ideally contains one entry only, this operation is considered to occur in constant O(1) time.
- By far the smallest in memory consumption: only a List of List.
- Weaknesses:
- Unpredictable iteration, which makes it hard to test.
- It depends on:
- a fast hash function that guarantees even distribution throughout buckets, so that the List inside ideally contains only one entry.
- a reliable comparator function that guarantees there won't be duplicates (which would violate above-mentioned set principles)
- Linked Hash Table
- Strengths:
- Almost as fast, just a bit slower (5-10%) than Hash Table on insertion/lookup/deletion of entries: because it needs to maintain the doubly linked list that preserves insertion order as well.
- Predictable iteration, which makes testing easy
- Weaknesses:
- Greater memory consumption (33%) than Hash Table, because it needs to store the doubly linked list that maintains insertion order.
- Same dependency on a good hash/comparator function.
- Red Black Tree
- Strengths:
- Stays ordered.
- Predictable iteration, which makes testing easy
- Weaknesses:
- Much slower (6-7 times) than Hash Table in insertion/deletion of entries: because the red black tree always has to rebalance itself, which is a VERY costly operation that may involve the whole data set.
- Greater memory consumption (33%) than Hash Table, because the struct that wraps each value requires more information.
- Depends on a reliable comparator function.
This time complexity table using Big O Notation
shows advantages and disadvantages of each data structure chosen for each set operation:
| Operation |
Hash Table |
Linked Hash Table |
Red Black Tree |
| add |
O(1) |
O(1) |
O(log(N)) |
| clear |
O(N) |
O(N) |
O(N) |
| contains |
O(1) |
O(1) |
O(log(N)) |
| find |
O(1) |
O(1) |
O(log(N)) |
| remove |
O(1) |
O(1) |
O(log(N)) |
| isEmpty |
O(1) |
O(1) |
O(1) |
| size |
O(1) |
O(1) |
O(1) |
Conclusion: Hash Table, with its speed/memory efficiency advantages, should be solution of choice unless one specifically desires structure to be predictably iterated or self-ordered!