In the first part of this post we discussed the design of hash-based collections and why classes must obey the general contract stipulated by the equals() and hashCode() methods of the java.lang.Object class. Now we will put in practice what we have discussed by writing a simple implementation of a HashSet in Java.
Our simple HashSet will implement the following interface:
import java.util.Iterator;
public interface SimpleSet {
// Adds the specified element to this set if it is not already present
boolean add(Object element);
// Removes the specified element from this set if it is present
boolean remove(Object element);
// Returns true if this set contains the specified element
boolean contains(Object element);
// Returns an iterator over the elements in this set
Iterator iterator();
// Returns the number of elements in this set
int size();
}
Now let’s move to the implementation details.
Since we will be adopting chaining to resolve collisions, we need an inner class that we can use to build a linked list of collided objects. We will call the inner class Entry.
public class SimpleHashSet implements SimpleSet {
private static class Entry {
Object key;
Entry next;
}
// ...
The bucket array is an array of Entry instances. We also need a private property to store the size of the Set.
private Entry[] buckets;
private int size;
// ...
Our simple HashSet constructor takes a single int argument which defines the capacity of the bucket array:
public SimpleHashSet(int capacity) {
buckets = new Entry[capacity];
size = 0;
}
// ...
Next we need to define a hash function that computes the index into the bucket array giving the hash value of an object:
private int hashFunction(int hashCode) {
int index = hashCode;
if (index < 0) { index = -index; }
return index % buckets.length;
}
// ...
We can now start implementing the interface methods.
The algorithm for the add method first computes the index into the bucket array using our hash function.
If there is already one or more entries in that bucket location they are compared for equality with the object to add. If any of the comparisons is positive then the method returns false as a Set cannot store duplicates.
If no duplicates are found than a new Entry is created and assigned to the computed bucket location and linked to the chain if one exists. The size property is incremented and the method returns true to indicate that the object has been added to the Set.
@Override
public boolean add(Object element) {
int index = hashFunction(element.hashCode());
Entry current = buckets[index];
while (current != null) {
// element is already in set
if (current.key.equals(element)) { return false; }
// otherwise visit next entry in the bucket
current = current.next;
}
// no element found so add new entry
Entry entry = new Entry();
entry.key = element;
// current Entry is null if bucket is empty
// if it is not null it becomes next Entry
entry.next = buckets[index];
buckets[index] = entry;
size++;
return true;
}
// ...
The algorithm for the remove method also first computes the index into the bucket array using our hash function.
If there is already one or more entries in that bucket location they are compared for equality with the object to remove. If a match is found the object is removed and the chain, if present, adjusted. The size property is decremented and the method returns true to indicate that the object has been removed.
If no match is found then the method returns false, as there is nothing to remove.
@Override
public boolean remove(Object element) {
int index = hashFunction(element.hashCode());
Entry current = buckets[index];
Entry previous = null;
while (current != null) {
// element found so remove it
if (current.key.equals(element)) {
if (previous == null) {
buckets[index] = current.next;
} else {
previous.next = current.next;
}
size--;
return true;
}
previous = current;
current = current.next;
}
// no element found nothing to remove
return false;
}
// ...
The algorithm for the contains method again first computes the index into the bucket array using our hash function.
If there is already one or more entries in that bucket location they are compared for equality with the object passed as argument. If a match is found the method returns true, otherwise it returns false.
@Override
public boolean contains(Object element) {
int index = hashFunction(element.hashCode());
Entry current = buckets[index];
while (current != null) {
// check if node contains element
if (current.key.equals(element)) { return true; }
// otherwise visit next node in the bucket
current = current.next;
}
// no element found
return false;
}
// ...
The iterator method simply returns an instance of an inner class that implements the java.util.Iterator interface.
@Override
public Iterator iterator() {
return new SimpleHashSetIterator();
}
// ...
The code for the Iterator implementation follows:
class SimpleHashSetIterator implements Iterator {
private int currentBucket;
private int previousBucket;
private Entry currentEntry;
private Entry previousEntry;
public SimpleHashSetIterator() {
currentEntry = null;
previousEntry = null;
currentBucket = -1;
previousBucket = -1;
}
@Override
public boolean hasNext() {
// currentEntry node has next
if (currentEntry != null && currentEntry.next != null) { return true; }
// there are still nodes
for (int index = currentBucket+1; index < buckets.length; index++) {
if (buckets[index] != null) { return true; }
}
// nothing left
return false;
}
@Override
public Object next() {
previousEntry = currentEntry;
previousBucket = currentBucket;
// if either the current or next node are null
if (currentEntry == null || currentEntry.next == null) {
// go to next bucket
currentBucket++;
// keep going until you find a bucket with a node
while (currentBucket < buckets.length &&
buckets[currentBucket] == null) {
// go to next bucket
currentBucket++;
}
// if bucket array index still in bounds
// make it the current node
if (currentBucket < buckets.length) {
currentEntry = buckets[currentBucket];
}
// otherwise there are no more elements
else {
throw new NoSuchElementException();
}
}
// go to the next element in bucket
else {
currentEntry = currentEntry.next;
}
// return the element in the current node
return currentEntry.key;
}
}
// ...
The implementation of the size method simply returns the value of the size property:
public int size() {
return size;
}
// ...
Finally we override the java.lang.Object toString method to produce a visual representation of our HashSet.
@Override
public String toString() {
Entry currentEntry = null;
StringBuffer sb = new StringBuffer();
// loop through the array
for (int index=0; index < buckets.length; index++) {
// we have an entry
if (buckets[index] != null) {
currentEntry = buckets[index];
sb.append("[" + index + "]");
sb.append(" " + currentEntry.key.toString());
while (currentEntry.next != null) {
currentEntry = currentEntry.next;
sb.append(" -> " + currentEntry.key.toString());
}
sb.append('\n');
}
}
return sb.toString();
}
// ...
The full source code can be found on GitHub at the following URLs:
SimpleSet (interface)
SimpleHashSet (implementation)
SimpleHashSetTest (unit tests)
This is of course a very basic implementation of a HashSet in Java. It is useful to study and understand the main principles behind hash-based collections, or to code on interviews to demonstrate your knowledge of a hash-based data structure and its algorithms.
If you want to study a production quality implementation with all the required optimizations, I suggest that you take a look at the source code of java.util.HashSet. To find out how to look at the Java library source code in Eclipse please read this post.
Thank you for reading, feel free to post any questions or comments.