# The Security of Hash Functions and the Birthday Attack

Let's learn about the security goals of a hash function and different attacks on hash function in this lesson.

## We'll cover the following

We defined the security requirements of hash functions in this definition

As a result, a brute force attack can find a preimage or second preimage for a general $n$-bit hash function in approximately $2^{n}$ hash computations, and collisions in approximately $2^{n / 2}$ computations, respectively. The $2^{n / 2}$ hash computations in case the collision is possible due to the birthday attack are shown in this theorem. Therefore, Kelsey and Schneier (2004) give the following definition of the security goal of a hash function:

## The security goal of a hash function

A hash function $H$ with a message digest of bit-length $n$ is expected to have the following minimal security properties:

- An adversary should not be able to find a preimage or second preimage in less than $2^{n}$ hash computations.
- An adversary should not be able to find a collision in less than $2^{n / 2}$ hash computations.

Furthermore, Kelsey and Schneier state that “a collision attack on an $n$ bit hash function with less than $2^{n / 2}$ work, or a preimage or second preimage attack with less than $2^{n}$ work, is formally a break of the hash function.” (

## A formally broken hash function

A hash function is said to be formally broken if a preimage, second preimage, or a collision can be generated faster than via brute force.

We now want to prove that the birthday attack yields an upper bound of $\mathcal{O}\left(2^{n / 2}\right)$ hash computations in order to find a collision.

## Lemma 1

The exponential function $e^{-x}$ is defined by the power series

$e^{-x}=\sum_{k=1}^{\infty} \frac{(-x)^{k}}{k !}=1-x+\frac{x^{2}}{2 !}-\frac{x^{3}}{3 !}+\ldots$

## Theorem 1

A collision of a hash function $H$ with message digest bit length $n$ can be found with probability $\mathbb{P}($ collision $)>0.5 \space in \space\mathcal{O}\left(2^{n / 2}\right)$ hash computations.

**Proof**

Let $H:\{0,1\}^{*} \rightarrow\{0,1\}^{n}$ a hash function and the set $\left\{x_{1}, x_{2}, \ldots, x_{k}\right\}$ arbitrary input values. A collision is found if we’re able to determine two different $x_{i} \neq x_{j}$ such that $y_{i}=H\left(x_{i}\right)=H\left(x_{j}\right)=y_{j} .$ When we pick two values $x_{1} \neq x_{2}$, the probability of not getting a collision (i.e., to get two different message digests) is

$\mathbb{P}\left(H\left(x_{1}\right) \neq H\left(x_{2}\right)\right)=\frac{2^{n}-1}{2^{n}}=1-\frac{1}{2^{n}}.$

If this holds, the probability to find a third value $x_{3}$ such that $H\left(x_{3}\right) \neq H\left(x_{1}\right)$ and $H\left(x_{3}\right) \neq H\left(x_{2}\right)$ is given by

$\mathbb{P}\left(y_{3} \neq y_{1} \wedge y_{3} \neq y_{2}\right)=\frac{2^{n}-2}{2^{n}}=1-\frac{2}{2^{n}}.$

As a consequence, the probability that $y_{1}, y_{2}$, and $y_{3}$ are pairwise different is given by

$\mathbb{P}\left(y_{1} \neq y_{2} \wedge y_{1} \neq y_{3} \wedge y_{2} \neq y_{3}\right)=\left(1-\frac{1}{2^{n}}\right)\left(1-\frac{2}{2^{n}}\right).$

This argument can be continued, and thus the probability for no collisions among $k$ message digests is

$\mathbb{P}(\text { no collision })=\left(1-\frac{1}{2^{n}}\right)\left(1-\frac{2}{2^{n}}\right) \cdots\left(1-\frac{k-1}{2^{n}}\right)=\prod_{i=1}^{k-1}\left(1-\frac{i}{2^{n}}\right).$

As discussed earlier, it follows that the approximation

$e^{-x} \approx 1-x$

holds for $x \ll 1$, and therefore

$\left(1-\frac{i}{2^{n}}\right) \approx e^{-\frac{i}{2^{n}}}$

since $\frac{i}{2^{n}} \ll 1$. Thus, we can approximate the probability as

$\mathbb{P}(\text { no collision }) \approx \prod_{i=1}^{k-1} e^{-\frac{i}{2^{n}}}=e^{-\frac{1+2+3+\ldots+k-1}{2^{n}}}.$

The exponent contains the arithmetic series

$1+2+3+\ldots+k-1=\frac{k(k-1)}{2}$

that yields a probability

$\mathbb{P}(\text { no collision }) \approx e^{-\frac{k(k-1)}{2 \cdot 2 n}}$

that no collision occurs. Since the probability $P$ of at least one collision is given by $P=\mathbb{P}($ collision $)=1-\mathbb{P}($ no collision $)$, we achieve an approximated probability of

$P \approx 1-e^{-\frac{k(k-1)}{2 \cdot 2^{n}}}$

It follows that

$\begin{aligned} 1-P & \approx e^{-\frac{k(k-1)}{2 \cdot 2^{n}}} \\ \ln (1-P) & \approx-\frac{k(k-1)}{2 \cdot 2^{n}} \\ k(k-1) & \approx-2^{n} \cdot 2 \ln (1-P) \\ k(k-1) & \approx 2^{n} \cdot 2 \ln \left(\frac{1}{1-P}\right). \end{aligned}$

We assume that $k \gg 1$, therefore it holds that $k(k-1)=k^{2}-k \approx k^{2}$ and thus

$\begin{aligned} &k \approx \sqrt{2^{n} \cdot 2 \ln \left(\frac{1}{1-P}\right)} \\ &k \approx 2^{n / 2} \sqrt{2 \ln \left(\frac{1}{1-P}\right)} \end{aligned}$

For a success probability of $P=0.5$, we get

$k \approx 2^{n / 2} \sqrt{2 \ln \left(\frac{1}{1-0.5}\right)} \approx 1.177 \cdot 2^{n / 2}.$

Get hands-on with 1200+ tech skills courses.