Monthly Archives: July 2016

How to Implement a Simple HashSet in Java

In the first part of this post we discussed the design of hash-based collections and why classes must obey the general contract stipulated by the equals() and hashCode() methods of the java.lang.Object class. Now we will put in practice what we have discussed by writing a simple implementation of a HashSet in Java.

Our simple HashSet will implement the following interface:

import java.util.Iterator;

public interface SimpleSet {

    // Adds the specified element to this set if it is not already present
    boolean add(Object element);
    // Removes the specified element from this set if it is present
    boolean remove(Object element);
    // Returns true if this set contains the specified element
    boolean contains(Object element);
    // Returns an iterator over the elements in this set
    Iterator iterator();
    // Returns the number of elements in this set
    int size();
}

 
Now let’s move to the implementation details.

Since we will be adopting chaining to resolve collisions, we need an inner class that we can use to build a linked list of collided objects. We will call the inner class Entry.

public class SimpleHashSet implements SimpleSet {

    private static class Entry {
        Object key;
        Entry next;
    }

    // ...

 
The bucket array is an array of Entry instances. We also need a private property to store the size of the Set.

    private Entry[] buckets;

    private int size;

    // ...

 
Our simple HashSet constructor takes a single int argument which defines the capacity of the bucket array:

    public SimpleHashSet(int capacity) {

        buckets = new Entry[capacity];
        size = 0;
    }
    
    // ...

 
Next we need to define a hash function that computes the index into the bucket array giving the hash value of an object:

     private int hashFunction(int hashCode) {

        int index = hashCode;
        if (index < 0) { index = -index; }
        return index % buckets.length;
    }

    // ...

 
We can now start implementing the interface methods.

The algorithm for the add method first computes the index into the bucket array using our hash function.

If there is already one or more entries in that bucket location they are compared for equality with the object to add. If any of the comparisons is positive then the method returns false as a Set cannot store duplicates.

If no duplicates are found than a new Entry is created and assigned to the computed bucket location and linked to the chain if one exists. The size property is incremented and the method returns true to indicate that the object has been added to the Set.

    @Override
    public boolean add(Object element) {

        int index = hashFunction(element.hashCode());
        Entry current = buckets[index];

        while (current != null) {
            // element is already in set
            if (current.key.equals(element)) { return false; }
            // otherwise visit next entry in the bucket
            current = current.next;
        }
        // no element found so add new entry
        Entry entry = new Entry();
        entry.key = element;
        // current Entry is null if bucket is empty
        // if it is not null it becomes next Entry
        entry.next  = buckets[index];
        buckets[index] = entry;
        size++;
        return true;
    }

    // ...

 
The algorithm for the remove method also first computes the index into the bucket array using our hash function.

If there is already one or more entries in that bucket location they are compared for equality with the object to remove. If a match is found the object is removed and the chain, if present, adjusted. The size property is decremented and the method returns true to indicate that the object has been removed.

If no match is found then the method returns false, as there is nothing to remove.

    @Override
    public boolean remove(Object element) {

        int index = hashFunction(element.hashCode());
        Entry current = buckets[index];
        Entry previous = null;

        while (current != null) {
            // element found so remove it
            if (current.key.equals(element)) {

                if (previous == null) {
                    buckets[index] = current.next;
                } else {
                    previous.next = current.next;
                }
                size--;
                return true;
            }

            previous = current;
            current = current.next;
        }
        // no element found nothing to remove
        return false;
    }

    // ...

 
The algorithm for the contains method again first computes the index into the bucket array using our hash function.

If there is already one or more entries in that bucket location they are compared for equality with the object passed as argument. If a match is found the method returns true, otherwise it returns false.

    @Override
    public boolean contains(Object element) {

        int index = hashFunction(element.hashCode());
        Entry current = buckets[index];

        while (current != null) {
            // check if node contains element
            if (current.key.equals(element)) { return true; }
            // otherwise visit next node in the bucket
            current = current.next;
        }
        // no element found
        return false;
    }

    // ...

 
The iterator method simply returns an instance of an inner class that implements the java.util.Iterator interface.

    @Override
    public Iterator iterator() {
        return new SimpleHashSetIterator();
    }

    // ...

 
The code for the Iterator implementation follows:

    class SimpleHashSetIterator implements Iterator {

        private int currentBucket;
        private int previousBucket;
        private Entry currentEntry;
        private Entry previousEntry;

        public SimpleHashSetIterator() {
            currentEntry = null;
            previousEntry = null;
            currentBucket = -1;
            previousBucket = -1;
        }

        @Override
        public boolean hasNext() {

            // currentEntry node has next
            if (currentEntry != null && currentEntry.next != null) { return true; }

            // there are still nodes
            for (int index = currentBucket+1; index < buckets.length; index++) {
                if (buckets[index] != null) { return true; }
            }

            // nothing left
            return false;
        }

        @Override
        public Object next() {

            previousEntry = currentEntry;
            previousBucket = currentBucket;

            // if either the current or next node are null
            if (currentEntry == null || currentEntry.next == null) {

                // go to next bucket
                currentBucket++;

                // keep going until you find a bucket with a node
                while (currentBucket < buckets.length &&
                        buckets[currentBucket] == null) {
                    // go to next bucket
                    currentBucket++;
                }

                // if bucket array index still in bounds
                // make it the current node
                if (currentBucket < buckets.length) {
                    currentEntry = buckets[currentBucket];
                }
                // otherwise there are no more elements
                else {
                    throw new NoSuchElementException();
                }
            }
            // go to the next element in bucket
            else {

                currentEntry = currentEntry.next;
            }

            // return the element in the current node
            return currentEntry.key;

        }

    }

    // ...

 

The implementation of the size method simply returns the value of the size property:

    public int size() {
        return size;
    }

    // ...

 
Finally we override the java.lang.Object toString method to produce a visual representation of our HashSet.

    @Override
    public String toString() {

        Entry currentEntry = null;
        StringBuffer sb = new StringBuffer();

        // loop through the array
        for (int index=0; index < buckets.length; index++) {
            // we have an entry
            if (buckets[index] != null) {
                currentEntry = buckets[index];
                sb.append("[" + index + "]");
                sb.append(" " + currentEntry.key.toString());
                while (currentEntry.next != null) {
                    currentEntry = currentEntry.next;
                    sb.append(" -> " + currentEntry.key.toString());
                }
                sb.append('\n');
            }
        }

        return sb.toString();
    }

    // ...

 
The full source code can be found on GitHub at the following URLs:

SimpleSet (interface)
SimpleHashSet (implementation)
SimpleHashSetTest (unit tests)

This is of course a very basic implementation of a HashSet in Java. It is useful to study and understand the main principles behind hash-based collections, or to code on interviews to demonstrate your knowledge of a hash-based data structure and its algorithms.

If you want to study a production quality implementation with all the required optimizations, I suggest that you take a look at the source code of java.util.HashSet. To find out how to look at the Java library source code in Eclipse please read this post.

Thank you for reading, feel free to post any questions or comments.