Practical Security: Simple Practices for Defending Your Systems/

...

Password Storage

In this lesson, we will explore techniques for safe password storage.

We'll cover the following...

Our users’ passwords are extremely valuable to attackers. An attacker with access to user passwords pretty much has full control of our application. Such an attacker will probably be able to gain access to many other systems as well: unfortunately, people frequently use the same passwords on multiple websites.

We’ll obviously do everything we can to prevent compromise. But if we fail in that, we can take an important precaution to lower the value of our users’ passwords—we won’t store the user passwords at all. Instead, we’ll use a cryptographic hashing algorithm so that we store a value that’s derived from the password.

We’re going to start with a brief discussion of hashing and then walk through a brief history of password-storage techniques.

On storing (and not storing) Your users’ passwords

Instead of storing a user password, we can store a value that’s derived from the password itself. If this derivation can only be performed in one direction (that is, it’s easy to calculate the derived value given a password but it’s hard to go from a derived value back to the original password), then we’ll have a great defense. When a user creates an account for themselves, they’ll enter a password and we won’t store the password itself, we’ll only store the derived value. Next time the user logs in, they’ll type in their password, we’ll perform that same derivation and compare it to the previously stored derived value. If the two derived values match, we know the user typed in the right password. If they don’t match, we know the user typed in the wrong password.

At no time do we ever store the user password itself, only the derived value. This helps us in the unfortunate event that an attacker gets access to our database. Knowing the derived value of a password doesn’t help an attacker log in, because if they type in the derived value when they attempt to log in, the system will derive a value from the derived value, and that won’t match the derived value of the real password, so the attacker won’t be able to log in. These derived values are much less valuable to an attacker.

Real life derivations

This talk of derived values is a little magical and hand-wavy, so let’s talk about real-life derivations. A hashing function has the properties that we’re looking for. Hashing functions have been thoroughly researched in academia and widely used in industry. Hashing algorithms have a long history of use in preventing and detecting accidental corruption of data from unreliable networks and file systems. More recently, they’ve been cleverly applied to the problem of password storage. A hashing algorithm takes an arbitrary-length input and maps it into a fixed-length bucket called a hash code. The mapping of arbitrary-length input to hash code is said to be “one-way” only. That is, it’s easy to calculate a hash code for any input. But it’s “hard” to calculate the reverse of this and find an input given only a hash code. In this case, by “hard” we’re using the academic modesty of computer science—there are no known ways to reverse any of the widely used hash algorithms other than to try all of the possible inputs and see which one matches the given hash code.

So if we pick a hashing algorithm that produces hash codes that are uniformly distributed over a very large number of possible outputs, we have the beginning of a secure way to store our user passwords.

Let’s take a look at a concrete example—the popular SHA-256 hashing algorithm. SHA-256 takes an input of any size and maps it into an output of 256 bits (32 bytes). It’s kind of crazy that you could take a really large input, like all the data on a 50-GB Blu-ray disc, and map it into just 32 bytes. The thing to remember is that this operation is not compression because it is not reversible. Given a 256-bit hash, you can’t tell if the input that generated it is small or large or whether it was a Blu-ray, text, gif, or something else, and you can’t work backward to find the input that generated it.

Recall that back in What’s So Great About a Deck of Playing Cards? we saw that the number of different 256-bit strings is really big. Large numbers are necessary, but not sufficient, to keep passwords safe. To see why that is, let’s take a look at how attacks and defenses have leapfrogged each other over the years.

Store passwords in the clear

Initially, passwords were just stored in the clear on the server. The thinking was that an attacker who got as far as being able to read from the database had already “won,” so why bother doing anything else? You can see evidence of this having been a trend if you think back to websites in the 1990s and early 2000s. It was common practice at the time for websites to email passwords back to users who clicked on the “I forgot my password” button.

With passwords stored in the clear in a database, an attacker who gets database access, say, through SQL injection, can exfiltrate the passwords of every user of the system. This is a big problem, especially since people tend to reuse passwords across websites. So a breach at one site could impact many other sites that have no vulnerabilities at all.

Reversibly encrypt everyone’s passwords

One weak response to this threat is to encrypt all passwords before storing them in the database. But it’s weak because the master password that allows the system to decrypt all of the user passwords has to be known to the system. If it weren’t, the system would not be able to log anyone in. So this defense only helps in a very narrow set of circumstances.

Store hash of passwords

The important insight for secure password storage is to realize that you never have to store the password itself. Instead, you can store a value that’s derived from the password. You need a derivation that’s extremely unlikely to generate the same derived value for two different inputs. Initially, people used this insight by storing the output of generic hashing algorithms like MD5 and SHA-1 instead of the password itself. Then, when a user logged in, the hash output of the user-supplied input was compared to the previously stored hash output from when the user was created. If they’re the same, then the user is logged in. This was a step forward because the passwords themselves no longer had to be stored. So if the database were compromised, the passwords weren’t given up to the attacker directly.

Rainbow tables

For a time, the best an attacker could hope to do with password hashes was to look them up via rainbow tables. Rainbow tables are generated by hashing each ASCII input up to a certain length and storing the input and the hash so that one can look up an ASCII input given a hash. Rainbow tables take up a lot of space, and ...

Introduction

Patching

Vulnerabilities

Cryptography

Windows

Phishing