Java/HashCodes

Notes

Hash Functions are an important part of computer science. These are generally divided into two categories: cryptographic hash functions, and non-cryptographic hash functions. This page focuses on the implementation of non-cryptographic hash functions in Java.

When Java creates objects that encapsulate complex, nested, or hierarchical data, it needs a fast, O(1) way to check if two objects are equal. The way it does this is by computing a hash code, which is a way of summarizing and fingerprinting all of the data in an object in such a way that two objects that are equal are guaranteed to have the same hash, and two objects with the same hash have a high likelihood of being equal.

It is possible to modify how hash codes for objects are computed in Java. This page shows a simple example. However, it is important to note that if you are overriding the hashCode function, you must also override the equals() function, and if you are overriding these two functions, you must use the @Overrides decorator to let the compiler know you're overriding these methods.

The Code

Note that all of this code is available on git.charlesreid1.com: https://git.charlesreid1.com/cs/java/src/master/hash

Word class

Here, we start with a simple class that wraps a String word, and that implements a custom hash function.

/** Word class.
 * We modify the hash function of the Word class.
 */
class Word {
	private String data;
	public Word(String s) { 
		data = s;
	}
	public String getData() { return data; }
	public String toString() { return data; }

	@Override
	public boolean equals(Object w) { 
		// See https://stackoverflow.com/questions/15722485/hashset-storing-equal-objects#15722565
		//
		// Verify this is a Word object
		if(!(w instanceof Word)) { 
			return false;
		}
		// Cast as Word object
		Word ww = (Word) w; 
		// Access Word attributes
		if(data.equals(ww.data)) {
			return true;
		} else { 
			return false;
		}
	}

	@Override
	public int hashCode() { 
		int h = 1;
		int shift = 5;
		for(int i=0; i<data.length(); i++) { 
			h ^= (int)(data.charAt(i)); // bitwise xor with h
			h = (h << shift) | (h >>> (32-shift)); // n-bit cyclic shift of h 
		}
		return h;
	}
}

The Hash Code Calculation Class

Next, we'll implement a static method that will load a bunch of words from a file, and will test how many hash collisions we have for the Word objects versus for String objects.

public class ObjectHashCode {

	/** Run the driver method. */
	public static void main(String[] args) throws FileNotFoundException { 
		populateWords();
	}

   	/** Populate words, then check for hash code collisions. */
   	public static void populateWords() throws FileNotFoundException { 
   		Scanner s = new Scanner(new BufferedReader(new FileReader("14oxenofthesun.dat")));

   		// create list of unique words as a string set
   		Set<String> strings = new HashSet<String>();
   		Set<Word> words = new HashSet<Word>();
   		while(s.hasNext()) { 
   			String word = s.next();
   			strings.add(word);
   			words.add(new Word(word));
   		}

So far so good - we now have two Set objects, one containing String objects and the other containing Word objects. Note that if we don't override the correct equals() method implementation, the Set object will not know how to check if two Word objects are equal, so it will add a new Word object for each new word - not for each new unique word!

The Hash Code Calculations

Now we create containers to store the hash codes for each string and word, and run through each one and calculate them.

   		// count number of hash collisions for:
   		// strings
   		List<Integer> stringHashes = new LinkedList<Integer>();
   		Set<Integer> uniqStringHashes = new HashSet<Integer>();
   		// words
   		List<Integer> wordHashes = new LinkedList<Integer>();
   		Set<Integer> uniqWordHashes = new HashSet<Integer>();

   		for(String st : strings) { 
   			stringHashes.add( st.hashCode() );
   			uniqStringHashes.add( st.hashCode() );
		}

   		for(Word w : words) {
   			wordHashes.add( w.hashCode() );
   			uniqWordHashes.add( w.hashCode() );
   		}

   		int stringCollisions = stringHashes.size() - uniqStringHashes.size();
   		int wordCollisions = wordHashes.size() - uniqWordHashes.size();

   		System.out.printf("String class hash code had %d strings, %d collisions.\n", stringHashes.size(), stringCollisions);
   		System.out.printf("Word class hash code had %d strings, %d collisions.\n",   wordHashes.size(),   wordCollisions);
   	}