Hash Tables
Store key/values based on hashing the key into another value, ideally evenly distributed across the size of the hash table.
Basic implementation (Java)
Notes:
- Hash function is very weak
- Uses open addressing technique for collision resolution
- Does not handle resizing
- Does not handle manipulations on full hash table
class HashTable<T> {
private T[] values;
private int[] keys;
private int size;
public HashTable(int size) {
values = new T[size];
keys = new Integer[size];
for(int i = 0; i < size; i++){
values[i] = null;
}
}
public void put(int key, T value){
int spot = findSpot(key);
values[spot] = value;
keys[spot] = key;
}
public T find(int key) {
int spot = findSpot(key);
return values[spot]; // will be null if not set
}
private int findSpot(int key){
int hashIndex = hash(key);
while(values[hashIndex] != null && key[hashIndex] != key){
hashIndex = hash(hashIndex + 1);
}
return hashIndex;
}
private int hash(int key) {
return key % size;
}
}
Choosing a Hash Function
TODO: fill me out
Collision Resolution
Separate Chaining
Separate chaining keeps each bucket independent, and maintains a list of the elements that have been hashed to the bucket. Oftentimes, a linked list is used. A variation of this includes the first value for the linked list in the bucket along with a pointer to the rest of the list.
Open Addressing
With open addressing, each bucket holds only one element. If keys match, then the hash table generates a new index to try, until an empty bucket is found.
- Linear probing: interval between probes is fixed (e.g. increase index by 1)
- Quadratic probing: intervals between probes is increased by successive values of quadratic polynomial (e.g. attempt x increases x-1 by 4x^3 + 2x^2 + 5x + 5)
- Double hashing: interval between probes is computed by another hash function
Which is better?
Open addressing is better when records are small, and load factor is low. Separate chaining is better when records are large (as space is only allocated for pointers to elements), or if load factor is higher (since less attempts to find an empty bucket)