The Security of Hash Functions and the Birthday Attack

Let's learn about the security goals of a hash function and different attacks on hash function in this lesson.

We defined the security requirements of hash functions in this definition :Security_Requirements_of_Hash_Functions , where we formulated three properties that should be computationally infeasible to attack. In this section, we outline what this means in view of complexity. We’ll see that the size of a hash function’s message digest is a crucial security parameter as a hash function whose message-digest range is constrained by nn bits contains 2n2^{n} distinct hash values.

As a result, a brute force attack can find a preimage or second preimage for a general nn-bit hash function in approximately 2n2^{n} hash computations, and collisions in approximately 2n/22^{n / 2} computations, respectively. The 2n/22^{n / 2} hash computations in case the collision is possible due to the birthday attack are shown in this theorem. Therefore, Kelsey and Schneier (2004) give the following definition of the security goal of a hash function:

The security goal of a hash function

A hash function HH with a message digest of bit-length nn is expected to have the following minimal security properties:

  1. An adversary should not be able to find a preimage or second preimage in less than 2n2^{n} hash computations.
  2. An adversary should not be able to find a collision in less than 2n/22^{n / 2} hash computations.

Furthermore, Kelsey and Schneier state that “a collision attack on an nn bit hash function with less than 2n/22^{n / 2} work, or a preimage or second preimage attack with less than 2n2^{n} work, is formally a break of the hash function.” (John Kelsey et al. (2005)John Kelsey and Bruce Schneier. Second preimages on n-bit hash functions for much less than 2Power{n} work. In Advances in Cryptology - EUROCRYPT 2005, 24th Annual International Conference on the Theory and Applications of Cryptographic Techniques, Aarhus, Denmark, May 22-26, 2005, Proceedings, pages 474-90, 2005.).

A formally broken hash function

A hash function is said to be formally broken if a preimage, second preimage, or a collision can be generated faster than via brute force.

We now want to prove that the birthday attack yields an upper bound of O(2n/2)\mathcal{O}\left(2^{n / 2}\right) hash computations in order to find a collision.

Lemma 1

The exponential function exe^{-x} is defined by the power series

ex=k=1(x)kk!=1x+x22!x33!+e^{-x}=\sum_{k=1}^{\infty} \frac{(-x)^{k}}{k !}=1-x+\frac{x^{2}}{2 !}-\frac{x^{3}}{3 !}+\ldots

Theorem 1

A collision of a hash function HH with message digest bit length nn can be found with probability P(\mathbb{P}( collision )>0.5 in O(2n/2))>0.5 \space in \space\mathcal{O}\left(2^{n / 2}\right) hash computations.


Let H:{0,1}{0,1}nH:\{0,1\}^{*} \rightarrow\{0,1\}^{n} a hash function and the set {x1,x2,,xk}\left\{x_{1}, x_{2}, \ldots, x_{k}\right\} arbitrary input values. A collision is found if we’re able to determine two different xixjx_{i} \neq x_{j} such that yi=H(xi)=H(xj)=yj.y_{i}=H\left(x_{i}\right)=H\left(x_{j}\right)=y_{j} . When we pick two values x1x2x_{1} \neq x_{2}, the probability of not getting a collision (i.e., to get two different message digests) is

P(H(x1)H(x2))=2n12n=112n.\mathbb{P}\left(H\left(x_{1}\right) \neq H\left(x_{2}\right)\right)=\frac{2^{n}-1}{2^{n}}=1-\frac{1}{2^{n}}.

If this holds, the probability to find a third value x3x_{3} such that H(x3)H(x1)H\left(x_{3}\right) \neq H\left(x_{1}\right) and H(x3)H(x2)H\left(x_{3}\right) \neq H\left(x_{2}\right) is given by

P(y3y1y3y2)=2n22n=122n.\mathbb{P}\left(y_{3} \neq y_{1} \wedge y_{3} \neq y_{2}\right)=\frac{2^{n}-2}{2^{n}}=1-\frac{2}{2^{n}}.

As a consequence, the probability that y1,y2y_{1}, y_{2}, and y3y_{3} are pairwise different is given by

P(y1y2y1y3y2y3)=(112n)(122n).\mathbb{P}\left(y_{1} \neq y_{2} \wedge y_{1} \neq y_{3} \wedge y_{2} \neq y_{3}\right)=\left(1-\frac{1}{2^{n}}\right)\left(1-\frac{2}{2^{n}}\right).

This argument can be continued, and thus the probability for no collisions among kk message digests is

P( no collision )=(112n)(122n)(1k12n)=i=1k1(1i2n).\mathbb{P}(\text { no collision })=\left(1-\frac{1}{2^{n}}\right)\left(1-\frac{2}{2^{n}}\right) \cdots\left(1-\frac{k-1}{2^{n}}\right)=\prod_{i=1}^{k-1}\left(1-\frac{i}{2^{n}}\right).

As discussed earlier, it follows that the approximation

ex1xe^{-x} \approx 1-x

holds for x1x \ll 1, and therefore

(1i2n)ei2n\left(1-\frac{i}{2^{n}}\right) \approx e^{-\frac{i}{2^{n}}}

since i2n1\frac{i}{2^{n}} \ll 1. Thus, we can approximate the probability as

P( no collision )i=1k1ei2n=e1+2+3++k12n.\mathbb{P}(\text { no collision }) \approx \prod_{i=1}^{k-1} e^{-\frac{i}{2^{n}}}=e^{-\frac{1+2+3+\ldots+k-1}{2^{n}}}.

The exponent contains the arithmetic series


that yields a probability

P( no collision )ek(k1)22n\mathbb{P}(\text { no collision }) \approx e^{-\frac{k(k-1)}{2 \cdot 2 n}}

that no collision occurs. Since the probability PP of at least one collision is given by P=P(P=\mathbb{P}( collision )=1P()=1-\mathbb{P}( no collision )), we achieve an approximated probability of

P1ek(k1)22nP \approx 1-e^{-\frac{k(k-1)}{2 \cdot 2^{n}}}

It follows that

1Pek(k1)22nln(1P)k(k1)22nk(k1)2n2ln(1P)k(k1)2n2ln(11P).\begin{aligned} 1-P & \approx e^{-\frac{k(k-1)}{2 \cdot 2^{n}}} \\ \ln (1-P) & \approx-\frac{k(k-1)}{2 \cdot 2^{n}} \\ k(k-1) & \approx-2^{n} \cdot 2 \ln (1-P) \\ k(k-1) & \approx 2^{n} \cdot 2 \ln \left(\frac{1}{1-P}\right). \end{aligned}

We assume that k1k \gg 1, therefore it holds that k(k1)=k2kk2k(k-1)=k^{2}-k \approx k^{2} and thus

k2n2ln(11P)k2n/22ln(11P)\begin{aligned} &k \approx \sqrt{2^{n} \cdot 2 \ln \left(\frac{1}{1-P}\right)} \\ &k \approx 2^{n / 2} \sqrt{2 \ln \left(\frac{1}{1-P}\right)} \end{aligned}

For a success probability of P=0.5P=0.5, we get

k2n/22ln(110.5)1.1772n/2.k \approx 2^{n / 2} \sqrt{2 \ln \left(\frac{1}{1-0.5}\right)} \approx 1.177 \cdot 2^{n / 2}.

Get hands-on with 1200+ tech skills courses.