# Hash Codes

Break down the components of hash codes.

## We'll cover the following

The hash tables are used to associate data with integer keys consisting of $w$ bits. In many cases, we have keys that are not integers. They may be strings, objects, arrays, or other compound structures. To use hash tables for these types of data, we must map these data types to $w-\text{bit}$ hash codes. Hash code mappings should have the following properties:

- If
`x`

and`y`

are equal, then`x.hashCode()`

and`y.hashCode()`

are equal. - If
`x`

and`y`

are not equal, then the probability that`x.hashCode() = y.hashCode()`

should be small (close to $1/2^{w} )$.

The first property ensures that if we store `x`

in a hash table and later
look up a value `y`

equal to `x`

, then we will find `x`

—as we should. The second property minimizes the loss from converting our objects to integers.
It ensures that unequal objects usually have different hash codes and so
are likely to be stored at different locations in our hash table.

## Hash codes for primitive data types

Small primitive data types like `char`

, `byte`

, `int`

, and `float`

are usually
easy to find hash codes for. These data types always have a binary representation and this binary representation usually consists of $w$ or fewer
bits. (For example, in C++, `byte`

is an 8-bit type and `float`

is a 32-bit
type.) In these cases, we just treat these bits as the representation of an
integer in the range $\{0,...,2^{w} − 1\}$. If two values are different, they get different hash codes. If they are the same, they get the same hash code.

A few primitive data types are made up of more than $w$ bits, usually $cw$ bits for some constant integer $c$. (Java’s `long`

and `double`

types are examples of this with $c = 2$.) These data types can be treated as compound objects made of $c$ parts, as described in the next section.

## Hash codes for compound objects

For a compound object, we want to create a hash code by combining the individual hash codes of the object’s constituent parts. This is not as easy as it sounds. Although one can find many hacks for this (for example, combining the hash codes with bitwise exclusive-or operations), many of these hacks are easy to foil. However, if one is willing to do arithmetic with $2w$ bits of precision, then there are simple and robust methods available. Suppose we have an object made up of several parts $P_{0},...,P_{r−1}$ whose hash codes are ${x_{0},...,x_{r−1}}$. Then we can choose mutually independent random $w-$bit integers $z_{0},...,z_{r−1}$ and a random $2w-$bit odd integer $z$ and compute a hash code for our object with

$h\left ( x_{0},\ldots,x_{r-1} \right ) = \left [ \left [ z \sum_{i=0}^{r-1}z_{i}x_{i} \right] \text{mod}\ 2^{2w} \right ] \text{div}\ 2^w$

Note that this hash code has a final step (multiplying by $z$ and dividing by
$2^{w}
)$ that uses the multiplicative hash function to take the $2w-$bit intermediate result and reduce it to a $w$-bit final result. Here is an example of this method applied to a simple compound object with three parts `x0`

, `x1`

, and `x2`

:

Create a free account to access the full course.

By signing up, you agree to Educative's Terms of Service and Privacy Policy