The Art of Password Cracking: Rainbow Tables

Table of Contents

Introduction #

This is not a deep explanation of rainbow tables but a brief explanation linking to resources that have helped me better understand what they are and how they work.

What is a Rainbow Table? #

A rainbow table is a precomputed table used to crack passwords. A rainbow table is composed of rows of plain text passwords and hashes. The difference from dictionary attacks is that the hash is not the hash of the plain text password; the hash is the end of the “chain”. What is a chain?

Creating a Rainbow Table #

A rainbow table is composed by multiple chains, every row in a rainbow table is a “chain”. A chain is formed using a hash function and a reduction function multiple times.

Hash function (H): A mathematical one-way function that takes an input and produces a fixed-size string (e.g., md5, sha-1, sha-256).
Reduction function (R): A function that takes a hash as input and produces a shorter fixed-size output. One simple example could be taking the first n characters of a hash. Given the same input, it always produces the same output (i.e., it is a deterministic function).

As seen in the following table, we start with a random password. Let’s imagine that we are trying to crack an md5 hash where the plain text is formed by 5 alphanumeric characters. We start with a random password “a6ns9” and hash it with md5 “8c19f075b6208d096601a1d21c9aae65”. Now, we will apply a reduction function to generate p2. The previous example of a reduction function, taking the last n characters of the hash, cannot be used here; we need a reduction function that, given a hash, generates an alphanumeric password with a length of 5 characters. After generating p2, we hash again and repeat the process multiple times. The length of the chain is the number of times we apply the H and R functions.

	Chain 1
Link1	a6ns9
Link1	8c19f075b6208d096601a1d21c9aae65
Link2	sl8cc
Link2	59a8ee18ea1cc6d76777289a7d81c36a
Link3	wrou2
Link3	4352afdd74f9d9a8d4f2d297f849a80a
Link4	7sc2a
Link4	8e8b7a07b418649cbda2773bbba9d73b

Finally, we write in the file only the first plain text password and the last hash. This means that with the previous table, we are covering 4 passwords and storing 1 password and a hash. In a dictionary attack, we would be storing 4 passwords and 4 hashes.

password	hash
a6ns9	8c19f075b6208d096601a1d21c9aae65
sl8cc	59a8ee18ea1cc6d76777289a7d81c36a
wrou2	4352afdd74f9d9a8d4f2d297f849a80a
7sc2a	8e8b7a07b418649cbda2773bbba9d73b

Now it’s clear how a rainbow table is a space-time tradeoff. We can cover the same number of passwords with less space compared to a dictionary. On the other hand, the time to find a password in a rainbow table will be higher compared to a dictionary because we need a higher computational effort.

Lookup #

Explaining how the lookup in a rainbow table works will clarify why we can cover more passwords with less space. Let’s imagine that you want to crack the following md5 hash “59a8ee18ea1cc6d76777289a7d81c36a”, and you have the following rainbow table where the number of chains is 2 and the chain length is 4. See that the second chain of this rainbow table is the example that used before.

p1	h4
d1kb0	f0f253fa39f7b6253260f530af9d1dda
a6ns9	8e8b7a07b418649cbda2773bbba9d73b

We will iterate over the whole table and compare our value with h4. In this case, there are no matches between our hash and h4. Now we would apply to our hash the R and H functions and will get a new hash. Why? If we did not find our hash in h4, this means that maybe our password is in between one of the chains and not at the end. To check in the middle of the chains, we apply the R and H functions. If the hash we are trying to find was in h3, after applying the reduction and hash functions, we will have h4, and now we will find a match in between our hash and h4 in the rainbow table. In this case, the hash that I am using is h2 from the previous section, so after applying the reduction and hash functions 2 times, we will have p4 = 8e8b7a07b418649cbda2773bbba9d73b, and we will find a match. Okay, now we have a match in a chain. How do we find the password? The match is in chain number 2, so we take p1 “a6ns9” and:

Execute the H function, check if the hash “8c19f075b6208d096601a1d21c9aae65” matches ours “59a8ee18ea1cc6d76777289a7d81c36a”.
We execute the R and H functions and check hashes again. In this case they match, we have found the password! The password is “sl8cc”.

Some Conclusions #

A dictionary will have a set of common passwords, while a rainbow table covers random passwords. This means that a rainbow table is useful when we know that the passwords are random, for example, when they have been created by a password generator.
The tradeoff is clear. With a rainbow table we can cover a large number of passwords while using less space compared to a dictionary. The more hashes we generate, the better coverage we have. Given a fixed number of hashes, we can:
- Increase the chain length and reduce the chain number. This will imply slower lookup times but less space used.
- Have a smaller chain and a bigger number of chains. We would use more space, but lookup time will be faster.

Defense #

There is an easy and widely implemented defense for rainbow table attacks: salted passwords. Using a salt means that in the database, we are not storing the hash of the password but the hash of the password+salt, with a salt being a set of randomly generated characters appended to every password. Why is this a good defense for rainbow tables?

In a brute-force attack, you can generate the password, add the salt, hash everything, and compare it with your hash. But if you have a rainbow table for passwords with a length of 5 and alphanumeric characters, and then you find that the passwords are salted, you will need to create a new table considering the password and the salt. In addition, the salt will increase the number of possible combinations, so the rainbow table will need to be much bigger.

Selecting the Parameters #

How do you decide how many chains to create and how long they are? We know that the larger they are the better coverage, that is a higher success rate. However, it will take much longer to create the rainbow table. The same goes for the number of chains; it’s possible that we may not have enough space to store that many chains. Generally, the approach is to choose based on the available size ( What’s the perfect chain length : number of chains ratio for rainbow tables?). If we want a rainbow table of a maximum of 32TB, we choose the chain number accordingly. Then, we can choose the chain length based on one of two parameters:

How much time we want to spend creating the rainbow table.
How much success rate we want to achieve.

Considering the following formula extracted from Making a Faster Cryptanalytic Time-Memory Trade-Off.

$$ P_{\text {table }}=1-\prod_{i=1}^t\left(1-\frac{m_i}{N}\right) $$ With $m_i$ recursively defined as $m_1=m$ and $m_{n+1}=N\left(1-e^{\frac{-m_n}{N}}\right)$.

We can calculate the probability of success given a certain chain length, number of chains, and the keyspace (i.e., the amount of possible passwords). This means that we can choose a chain length that fits the desired success probability.

Chain Flaws #

Collisions #

A collision exists when two different input values produce the same output. Hash functions are designed not to produce collisions, as this is a really important feature. But the reduction function needs to correctly cover the likely plaintexts, so it cannot be collision-resistant. As we are generating a vast amount of hashes and using reduction functions with them, it is probable that collisions will appear. Detecting them is difficult. A collision could appear between the second value of chain 3 and the 23,345,567th value of chain number 7,492,794. In the following table, we show the plaintexts of two chains that collide.

bxppzm	shuyf0	yborr6	eiyyxi	kb48m7	dqg8w5	8y6z6l
cbfswx	chj1g3	a47i63	shuyf0	yborr6	eiyyxi	kb48m7

The R function is colliding when applied to md5(bxppzm) and md5(a47i63). After that, we will have some values merging in both chains. Having collisions will make us not cover as many passwords despite using that space and computational effort.

Improving Collisions #

Instead of using one R function, we can use multiple R functions for every column in the chain. We do not avoid collisions completely, but the paths will not merge as shown before. In the following table, every column is colored with the reduction function used to calculate the next plain-text. We are using three reduction functions. R1(md5(bxppzm)) can collide with R3(md5(a47i63)), but after that, the paths will not be merged because we are applying different R functions to the collision (shuyf0).

bxppzm	shuyf0	yborr6	eiyyxi	kb48m7	dqg8w5	8y6z6l
cbfswx	chj1g3	a47i63	shuyf0	lsL5wx	hg0E6Y	q92Qpf

Lookup with Multiple R Functions #

The lookup changes slightly. Let’s imagine that we have a chain length of 9, and 3 R functions. That is the following table:

rand() → p1	H(p1)→h1
R1(h1) → p2	H(p2)→h2
R2(h2) → p3	H(p3)→h3
R3(h3) → p3	H(p4)→h4
R1(h4) → p3	H(p5)→h5
R2(h5) → p3	H(p6)→h6
R3(h6) → p3	H(p7)→h7
R1(h7) → p3	H(p8)→h8
R2(h8) → p3	H(p9)→h9

We first look for the hash in the last column (h9) as we do with a rainbow table with just one reduction function. If it is not found, we assume the hash value is in the second-to-last position, so we apply R2 and H functions and compare with (h9) again. If not, we apply R1 and H, R2 and H, and compare with the last column hashes again (h9). Until we find a match, then we start from p1 applying the different R and H functions.

Useful Resources:

You can find more information about improving collisions here: wiki/Rainbow-table#Rainbow-tables.
This post also gives some clarity about how using multiple rainbow tables increases the success probability: Multiple Rainbow Tables and their Success Probability.

Designing the Reduction Function #

The perfect R function will produce plaintexts that align with the idea an attacker has of likely passwords. Designing R to meet these constraints will improve performance compared to covering the entire space of possible passwords. However, the drawback is that R will not generate every possible plaintext, making it impossible to guarantee finding the password in the created rainbow table. Additionally, designing R functions can be a challenging task.