Hash Codes

The hash tables are used to associate data with integer keys consisting of ww bits. In many cases, we have keys that are not integers. They may be strings, objects, arrays, or other compound structures. To use hash tables for these types of data, we must map these data types to wbitw-\text{bit} hash codes. Hash code mappings should have the following properties:

  1. If x and y are equal, then x.hashCode() and y.hashCode() are equal.
  2. If x and y are not equal, then the probability that x.hashCode() = y.hashCode() should be small (close to 1/2w)1/2^{w} ).

The first property ensures that if we store x in a hash table and later look up a value y equal to x, then we will find x—as we should. The second property minimizes the loss from converting our objects to integers. It ensures that unequal objects usually have different hash codes and so are likely to be stored at different locations in our hash table.

Hash codes for primitive data types

Small primitive data types like char, byte, int, and float are usually easy to find hash codes for. These data types always have a binary representation and this binary representation usually consists of ww or fewer bits. (For example, in C++, byte is an 8-bit type and float is a 32-bit type.) In these cases, we just treat these bits as the representation of an integer in the range {0,...,2w1}\{0,...,2^{w} − 1\}. If two values are different, they get different hash codes. If they are the same, they get the same hash code.

A few primitive data types are made up of more than ww bits, usually cwcw bits for some constant integer cc. (Java’s long and double types are examples of this with c=2c = 2.) These data types can be treated as compound objects made of cc parts, as described in the next section.

Hash codes for compound objects

For a compound object, we want to create a hash code by combining the individual hash codes of the object’s constituent parts. This is not as easy as it sounds. Although one can find many hacks for this (for example, combining the hash codes with bitwise exclusive-or operations), many of these hacks are easy to foil. However, if one is willing to do arithmetic with 2w2w bits of precision, then there are simple and robust methods available. Suppose we have an object made up of several parts P0,...,Pr1P_{0},...,P_{r−1} whose hash codes are x0,...,xr1{x_{0},...,x_{r−1}}. Then we can choose mutually independent random ww-bit integers z0,...,zr1z_{0},...,z_{r−1} and a random 2w2w-bit odd integer zz and compute a hash code for our object with

h(x0,,xr1)=[[zi=0r1zixi]mod 22w]div 2wh\left ( x_{0},\ldots,x_{r-1} \right ) = \left [ \left [ z \sum_{i=0}^{r-1}z_{i}x_{i} \right] \text{mod}\ 2^{2w} \right ] \text{div}\ 2^w

Note that this hash code has a final step (multiplying by zz and dividing by 2w)2^{w} ) that uses the multiplicative hash function to take the 2w2w-bit intermediate result and reduce it to a ww-bit final result. Here is an example of this method applied to a simple compound object with three parts x0, x1, and x2:

Create a free account to access the full course.

By signing up, you agree to Educative's Terms of Service and Privacy Policy