# Real Hash Functions: Structural Weakness

Let's learn about real-time hash functions and their security in this lesson.

We'll cover the following

## Structural weakness

The main problem of the hash functions based on Merkle-Damgärd construction is that they’re all vulnerable to length extension attacks. As the message $x$ is split into $k$ blocks $x=x_{1}, x_{2}, \ldots, x_{k}$ (whereas $x_{k}$ contains the padding) and then hashed to the value $h(x)=H_{k}$, we can choose a message $x^{\prime}=x, x_{k+1}=x_{1}, x_{2}, \ldots, x_{k}, x_{k+1}$. Since the first $k$ blocks of $x^{\prime}$ and $x$ are identical, the final hash value of $x$, i.e., $H(x)=H_{k}$, is identical to the internal hash value of $H\left(x^{\prime}\right)$ after $k$ blocks. Because of the iterated formula $H_{i}=g\left(H_{i-1}, x_{i}\right)$, we can conclude that $H\left(x^{\prime}\right)=g\left(H_{k} \| x_{k+1}\right)=g\left(H(x) \| x_{k+1}\right)$.

The fact that $H(x)$ provides direct information about the inner state of $H\left(x^{\prime}\right)$ after $k$ blocks are highly problematic. Suppose Alice and Bob communicate with each other, whereas Alice sends the message $x$ to Bob and wants to authenticate it by sending $H(K|| x)$, where $K$ is a secret key that is only known by Alice and Bob, and $x$ is a known message. If $H$ was an ideal hash function, $H(K|| x)$ would be a proper authentication system. But since $H$ is prone to length extensions, Eve is now able to extend the message to $x \| x^{\prime}$ and forge the authentication code to $H\left(K\|x\| x^{\prime}\right)$ without knowing the secret $K$.

In order to make length extension attacks impossible, Ferguson and Schneier (2003) proposed the following fix for the hash function:

### Iterative hash function

Let $H$ be an iterative hash function. The hash function $H_{d}$ is defined by $H_{d}=H(H(x))$ and has the security level of $\min (k, n / 2)$, where $k$ is the security level of $H$ and $n$ is the size of the hash result.

Bitcoin uses this fix as they use double-SHA256 as their Proof-of-Work function.

## Security of real-life dedicated hash functions

Today’s practical implementations of hash functions are based on dedicated hash functions, which are widely in use, such as MD5 or SHA-2. In this section, we’ll give a brief overview of the most common hash functions and their security (Mohammad A. AlAhmad et al. (2013)Mohammad A. AlAhmad and Imad Fakhri Alshaikhli. Broad view of cryptographic hash functions, 2013.).

The MD4 hash function was designed by Rivest in 1990Ronald L. Rivest. The MD4 message-digest algorithm. In Proceedings of the 10th Annual International Cryptology Conference on Advances in Cryptology, CRYPTO’90, pages 303-11, London, UK, 1991. Springer-Verlag.. The first collision was found by Dobbertin (1996)Hans Dobbertin. Cryptanalysis of MD4. In Fast Software Encryption, Third International Workshop, Cambridge, UK, February 21-23, 1996, Proceedings, pages 53-69, 1996. with a complexity of $2^{20}$ MD4 hash operations. This result was further improved by Wang et al. (2005)Xiaoyun Wang and Hongbo Yu. How to break MD5 and other hash functions. In Proceedings of the 24th Annual International Conference on Theory and Applications of Cryptographic Techniques, EUROCRYPT’05, pages 19-35, Berlin, Heidelberg,2005. Springer-Verlag., who reduced the complexity to $2^{8}$ hash computations such that collisions can be found in a few microseconds. Because of these attacks, the MD4 is no longer collision-resistant. Leurent (2008)Gaëtan Leurent. Fast software encryption. Chapter MD4 is Not One-Way, pages 41228. Berlin, Heidelberg, 2008. Springer-Verlag. demonstrated the first preimage attack on MD4 with a complexity of $2^{102}$, which is far below the expected $2^{128}$ computations (see this definition :Security_Goal_of_Hash_Function ).

MD5 is a further development of MD4 and was presented by Rivest in 1992Ronald L. Rivest. The MD5 message-digest algorithm, 1992. Available at https://www. rfc-editor.org/rfc/rfcl321.txt.. Den Boer and Bosselaers (1993)Bert den Boer and Antoon Bosselaers. Collisions for the compression function of MD5. In Tor Helleseth, editor, Advances in Cryptology - EUROCRYPT '93, pages 293-304, Berlin, Heidelberg, 1994. Springer. have shown collisions in the underlying compression function, whereas Wang et al. (2005)Xiaoyun Wang and Hongbo Yu. How to break MD5 and other hash functions. In Proceedings of the 24th Annual International Conference on Theory and Applications of Cryptographic Techniques, EUROCRYPT’05, pages 19-35, Berlin, Heidelberg, 2005. Springer-Verlag. introduced the first efficient practical collision attack on the full MD5 hash function. Further improvements by Klíma (2005)Vlastimil Klíma. Finding MD5 collisions - a toy for a notebook. IACR Cryptology ePrint Archive, 2005:75, 2005. have shortened the search time for full collisions significantly, i.e., down to less than 4 hours using an ordinary notebook PC (Intel Pentium $1.6 \mathrm{GHz}$ ). This attack was developed by Stevens (2006)Marc Stevens. Fast collision attack on MD5. IACR Cryptology ePrint Archive, 2006:104, 2006., such that collisions could be found in only minutes using a $3 \mathrm{GHz}$ Intel Pentium 4. A further improvement by Stevens (2007)Marc Stevens. On collisions for MD5. Master’s thesis, Eindhoven University of Technology, 62007 . allowed them to find collisions in about $2^{24}$ compressions such that an attack takes approximately six seconds on a $2.6 \mathrm{GHz}$ Intel Pentium 4. Furthermore, Sasaki and Aoki (2009) proposed a preimage attack with a computational complexity $\mathcal{O}\left(2^{123.4}\right)$.

The most popular dedicated hash functions from the Secure Hash Algorithm (SHA) family were published by the National Institute of Standard and Technology (NIST) as a U.S. Federal Information Processing Standard FIPS PUB $180-X$, including SHA-0, SHA-1, SHA-2, and SHA-3. SHA-0, SHA-1, and SHA-2 rely on the Merkle-Damgård construction and were designed by the United States National Security Agency (NSA). SHA-0 was released in 1993 but was later replaced by SHA-1 in 1995 because of a significant weakness due to a design flaw, which is why SHA-0 plays no role in practice.

SHA-1 was released in 1995 as a 160-bit message digest function. However, many standards recommend it to be replaced after 2010 by SHA-2 or SHA-3, as Wang et al. (2005)Xiaoyun Wang and Hongbo Yu. How to break MD5 and other hash functions. In Proceedings of the 24th Annual International Conference on Theory and Applications of Cryptographic Techniques, EUROCRYPT’05, pages 19-35, Berlin, Heidelberg,2005. Springer-Verlag. proposed the first theoretical collision attack on SHA-1 that required $2^{69}$, and thus less than the expected $2^{80}$ hash computations.

In 2017, Google and the Cryptology Group at Centrum Wiskunde & Informatica (CWI) Amsterdam announced the first practical collision by generating a hash collision for two different PDF documents, as shown in the figure given below. The attack requires roughly $2^{63.1}$ hash evaluations, which equals approximately 6500 single-CPU years. The corresponding PDF files can be found on the official project homepage.

### Figure 1

The two PDF documents yield a collision for SHA-1, but not for SHA-256

Get hands-on with 1200+ tech skills courses.