# Real Hash Functions: Structural Weakness

Let's learn about real-time hash functions and their security in this lesson.

## We'll cover the following

## Structural weakness

The main problem of the hash functions based on Merkle-Damgärd construction is that they’re all vulnerable to length extension attacks. As the message $x$ is split into $k$ blocks $x=x_{1}, x_{2}, \ldots, x_{k}$ (whereas $x_{k}$ contains the padding) and then hashed to the value $h(x)=H_{k}$, we can choose a message $x^{\prime}=x, x_{k+1}=x_{1}, x_{2}, \ldots, x_{k}, x_{k+1}$. Since the first $k$ blocks of $x^{\prime}$ and $x$ are identical, the final hash value of $x$, i.e., $H(x)=H_{k}$, is identical to the internal hash value of $H\left(x^{\prime}\right)$ after $k$ blocks. Because of the iterated formula $H_{i}=g\left(H_{i-1}, x_{i}\right)$, we can conclude that $H\left(x^{\prime}\right)=g\left(H_{k} \| x_{k+1}\right)=g\left(H(x) \| x_{k+1}\right)$.

The fact that $H(x)$ provides direct information about the inner state of $H\left(x^{\prime}\right)$ after $k$ blocks are highly problematic. Suppose Alice and Bob communicate with each other, whereas Alice sends the message $x$ to Bob and wants to authenticate it by sending $H(K|| x)$, where $K$ is a secret key that is only known by Alice and Bob, and $x$ is a known message. If $H$ was an ideal hash function, $H(K|| x)$ would be a proper authentication system. But since $H$ is prone to length extensions, Eve is now able to extend the message to $x \| x^{\prime}$ and forge the authentication code to $H\left(K\|x\| x^{\prime}\right)$ without knowing the secret $K$.

In order to make length extension attacks impossible, Ferguson and Schneier (2003) proposed the following fix for the hash function:

### Iterative hash function

Let $H$ be an iterative hash function. The hash function $H_{d}$ is defined by $H_{d}=H(H(x))$ and has the security level of $\min (k, n / 2)$, where $k$ is the security level of $H$ and $n$ is the size of the hash result.

Bitcoin uses this fix as they use `double-SHA256`

as their Proof-of-Work function.

## Security of real-life dedicated hash functions

Today’s practical implementations of hash functions are based on dedicated hash functions, which are widely in use, such as `MD5`

or `SHA-2`

. In this section, we’ll give a brief overview of the most common hash functions and their security (

The `MD4`

hash function was designed by `MD4`

hash operations. This result was further improved by `MD4`

is no longer collision-resistant.

`MD5`

is a further development of `MD4`

and was presented by `MD5`

hash function. Further improvements by

The most popular dedicated hash functions from the Secure Hash Algorithm (SHA) family were published by the National Institute of Standard and Technology (NIST) as a U.S. Federal Information Processing Standard FIPS PUB $180-X$, including `SHA-0`

, `SHA-1`

, `SHA-2`

, and `SHA-3`

. `SHA-0`

, `SHA-1`

, and `SHA-2`

rely on the Merkle-Damgård construction and were designed by the United States National Security Agency (NSA). `SHA-0`

was released in 1993 but was later replaced by SHA-1 in 1995 because of a significant weakness due to a design flaw, which is why SHA-0 plays no role in practice.

`SHA-1`

was released in 1995 as a 160-bit message digest function. However, many standards recommend it to be replaced after 2010 by `SHA-2`

or `SHA-3`

, as `SHA-1`

that required $2^{69}$, and thus less than the expected $2^{80}$ hash computations.

In 2017, Google and the Cryptology Group at Centrum Wiskunde & Informatica (CWI) Amsterdam announced the first practical collision by generating a hash collision for two different PDF documents, as shown in the figure given below. The attack requires roughly $2^{63.1}$ hash evaluations, which equals approximately 6500 single-CPU years. The corresponding PDF files can be found on the official project homepage.

### Figure 1

The two PDF documents yield a collision for `SHA-1`

, but not for `SHA-256`

Get hands-on with 1200+ tech skills courses.