Monday, June 16, 2014

Back To The Basics: The Hash Table CRUD Operations

In my previous post Back To The Basics: The Hash Table I described the Hash Table data structure, provided a basic template for the class in Java, as well as walked you through the hashing algorithm. In this post we’ll take a look at the CRUD operations. Actually we'll really only talk about addition, retrieval, and deletion because adding and updating in a Hash Table are the same operation.

Let's take a look at the addget, and remove operations first in their simplest form.

private void add(String key, Object value) {
    int index = this.getKeyIndex(key);
    values[index] = value;
}

private Object get(String key) {
    int index = this.getKeyIndex(key);
    return values[index];
}

private Object remove(String key) {
    int index = this.getKeyIndex(key);
    Object value = values[index];
    values[index] = null;
    return value;
}

Looking at these three operations we can now see why a Hash Table in practice provides near constant time additional and retrieval operations. The only non-constant time operation that is performed is the generation of the hash code. But this operation can be optimized and it's resulting value cached (if necessary) effectively making the add and get operations approach O(1).

What you may notice by looking at these operations are that they don't handle collisions. The add operation is simply overwriting the existing value associated with the hash regardless of whether it's actually associated with the key. The get operation is returning whatever value is stored at the index generated by the getKeyIndex method. And, like the add operation, the remove operation is removing whatever is associated with the hash.

We can try to optimize our hashing algorithms as much as we want but we'll never truly be able to avoid collisions 100% of the time. The reason is that we'd have to know all the possible item's that can be hashed in order to create a truly perfect hashing algorithm. Since this is not really possible for a general purpose Hash Table, it's important that we add some code that can handle collisions when hashing the key.

The most straight forward way to handle hash collisions is to store the values at each index in a Linked List. Instead of setting the value associated with a particular key as the value in the backing array the value will be set to a Linked List which will contain all the key/value pairs that map to a particular hash. This will require us to add a little complexity to our Hash Table but will allow us to avoid putting our Hash Table into an unexpected state when a collision occurs.

The first thing we need to do to avoid collisions is to define an object that can represent a key/value pair item that gets stored in our Linked List for each value. Since we're using Java we're going to create a class that implements Map.Entry<String, Object>.

final class HashEntry<K,V> implements Map.Entry<K,V> {
private final K key;
private V value;

public HashEntry(K key, V value) {
this.key = key;
this.value = value;
}

@Override
public K getKey() { return this.key; }

@Override
public V getValue() { return this.value; }

@Override
public V setValue(V newValue) {
V oldValue = this.value;
this.value = newValue;
return oldValue;
}
}

The next thing we need to do is update our Hash Table class definition to reflect the fact that our backing array contains a Linked List of our Hash Entry key/value pairs instead of an array of objects. Our new class definition looks like:

public class HashTable {
private LinkedList<HashEntry<String,Object>>[] values;
private final int tableSize = 1543;

public HashTable() {
this.values = (LinkedList<HashEntry<String,Object>>[])new LinkedList[tableSize];
}
}

Now we can update our add, get, and remove methods to support using a Linked List as the backing array. We're going to start with the get method because we'll define a helper method that the addition operation can also take advantage of.

Our new get method is fundamentally the same as our previous get method with the only exception being that we've pushed the actual retrieval logic into it's own method named findEntry. First it tries to find the item. If it does it returns it's value, otherwise it returns null. The new get method looks like:

private Object get(String key) {
    int index = this.getKeyIndex(key);
    HashEntry<String, Object> entry = this.findEntry(values[index], key);
    if(entry != null)
        return entry.getValue();
    return null;
}

The findEntry method simply iterates through the linked list attempting to find an existing item with the given key. If it finds one it returns it, otherwise it returns null:

private HashEntry<String, Object> findEntry(LinkedList<HashEntry<String, Object>> list, String key) {
    if(list == null)
        return null;

    Iterator<HashEntry<String, Object>> iterator = list.iterator();
    while(iterator.hasNext()) {
        HashEntry<String, Object> entry = iterator.next();
        if(entry.getKey().equals(key))
            return entry;
    }
    return null;
}

Our new add method is also the same as our previous add method with the exception of two things. First we've added logic to create the Linked List if it doesn't already exist.  The second change is that we've pushed the actual addition logic into it's own addToList method. This allows the add method to maintain a single responsibility. The new add method looks like:

private void add(String key, Object value) {
    int index = this.getKeyIndex(key);
    LinkedList<HashEntry<String,Object>> list = this.values[index];
    if(list == null) {
        list = new LinkedList<HashEntry<String, Object>>();
        this.values[index] = list;
    }
    this.addValueToList(list, key, value);
}

The addToList method is not very complicated either. It tries to get a handle to the existing object using the findEntry method we've already defined. If it does not find an entry it adds the new key/value pair to the list otherwise it updates the existing entry value. Here's what the addToList method looks like:

private void addValueToList(LinkedList<HashEntry<String, Object>> list, String key, Object value) {
    HashEntry<String, Object> entry = this.findEntry(list, key);
    if(entry == null) {
        list.add(new HashEntry<String, Object>(key, value));
    } else {
        entry.setValue(value);
    }
}

Finally, our new remove method is also able to benefit from our findEntry method. It first tries to find the existing value. If it finds it then it removes it from the list and returns the value of the entry. Otherwise it returns null. The new remove method looks like:

private Object remove(String key) {
    int index = this.getKeyIndex(key);
    LinkedList<HashEntry<String, Object>> list = values[index];
    HashEntry<String, Object> entry = this.findEntry(list, key);

    if(list != null && entry != null) {
        list.remove(entry);
        return entry.getValue();
    }

    return null;
}

Wrap Up

There are a few additional things to consider when creating or using Hash Tables. Concurrency can become an issue when the hash table is used in a multi-threaded environment. If you use a hash table in a multi-threaded environment you'll need to make sure that the implementation you use is thread safe.

Another thing to consider when creating or using Hash Tables is that because the Hash Table relies on a fixed size array as it's backing object store, it may eventually need to grow as more items are added. The cost of performing a resize operation is typically O(N) as each item in the existing array needs to be copied to a new, larger, array. This can be done all at once or incrementally.

Lastly, you'll also need to consider how you are going to handle null keys and/or null values. Some implementations don't allow null keys and/or null values.

No comments:

Post a Comment