Efficiently Generating Python Hash Collisions
In 2003, Crosby and Wallach wrote an excellent paper demonstrating a new form of Denial of Service vulnerability against applications by abusing algorithmic complexity in the data structures that they depend on. For example, some data structures operate quite efficiently when given everyday input, but the performance degrades precipitously in certain edge cases.
If an application provides user-controlled data to one of these data structures, attackers can intentionally provide this worst case form of input to generate a Denial of Service attack against the application.
In 2011, Alexander Klink and Julian demonstrated this form of attack quite affectively against web servers that take attacker-provided HTTP headers and represent them internally as hashtables.
Hashtables work extremely quickly for most input by distributing items into different buckets based on the hash value of the item’s key. There is a degenerate case, however, when hash collisions cause the algorithm to distribute items into the same bucket. Once hundreds or thousands of items start being placed into the same bucket due to having the same hash value, the performance of a hashtable degrades exponentially. A web server that used to be able to handle 1,000 requests per second may now only be able to handle 10. Or 1. Or worse.
Many programming languages and frameworks now account for this type of attack. Python fixed this by default in Python 3.3+, which was released over 7 years ago. However, despite the fix being released over 7 years ago, many applications still build on Python 2.7 – including an application I was testing. Alexander and Julian didn’t ever release the code they used to DOS Python, so I embarked on an interesting journey to find the most efficient way possible to generate hashtable collisions to DOS Python.
Before this research, the most efficient mechanism I could find was from Robert Grosse. He focused on 64-bit hashes, and his approach took about 80 hours to generate 50,000 hash collisions. The result of my research (against 32-bit Python) generates billions of collisions essentially instantaneously (as fast as your computer can print them to the screen, write them to a file, etc.).
Background: Understanding Python’s hash implementation
Python (remember: 3.2 and below) uses a FNV-style algorithm to calculate the hash values of strings. The hash value is a single 32-bit number that is the result of running a calculation over all of the characters in the input string. Here’s a simple re-implementation in C that takes the string to hash as its first argument:
The variable ‘x’ represents the resulting hash value. The variable ‘a’ represents the string to be hashed. The variable ‘p’ represents the value of the current character being processed.
Let’s walk through an example of this algorithm.
On line 21, the hash value is initialized to zero. The PowerShell console on the right-hand side shows the state of the hash value throughout the algorithm, and the highlighted line shows the bits of the current value of the hash. They are all zero, of course, since we just initialized it to zero.
Line 22 takes the value of the first letter (‘L’ is 1001100 in binary), shifts it left by 7 places, and then blends it into the hash value with the XOR operator.
Line 25 is where we start the character-by-character processing. Python takes the current value of the hash, and multiplies it by 1000003. This is a large prime number, which essentially largely and unpredictably churns the bits in the leftmost digits (the red square – I call that area the “bit blender”). Then, it uses the XOR operator to bring this character into the hash value for the rightmost digits.
Then, we rinse and repeat until the end of the string.
For the final stage, Python takes the current hash value and uses the XOR operation to incorporate the length of the string, which is 3 in our case. This step is done to reduce the amount of collisions for short strings.
When we convert this final value to an integer, we get the hash value that you know and love from Python 3.2 and below.
Meet in the Middle Attacks
Now that we have an understanding of how Python implements its hash algorithm, we can explore the previous state-of-the-art technique used to attack the algorithm.
Pretend that you’re trying to generate hundreds or thousands of strings that all hash to the value ‘0’ as a way to generate a hashtable denial of service. One way to do this is to brute force your way through billions and billions of strings and hope to find hundreds or thousands that have the hash value of ‘0’. This is incredibly inefficient and will take you many hundreds of years of computation.
Another way is called a “Meet in the Middle Attack”. This is the method used by Alexander Klink and Julian in their CCC presentation, as well as the technique explored more deeply for 64-bit Python by Robert Grosse.
As Python iterates through a string, it keeps modifying and altering a running hash value. Once it gets to the end of the string, it outputs the hash value that it’s calculated. For example, imagine that the running hash value of the first part of a string – QCMWaIOFFgH – was 3B847A2A. Well, it turns out that Python’s FNV hash algorithm can partially be reversed as well. If you try to work backwards from a desired hash value (i.e.: 0), you can figure out a set of intermediate hash values such that “if the intermediate hash value is and the rest of the string is , then the resulting hash value will be 0”. This isn’t fully deterministic: for a given hash value or internal state, there are many characters and previous internal states that can lead to it.
You can combine both of these tricks with a lot of brute force to massively speed up a naïve brute force approach.
As the first step, we calculate the hash values of a bunch of strings. We record the internal hash value as Python processes each character. For each internal state value, we save the characters in the string that led up to it.
As the second step, we take a desired hash value (i.e.: 0) and reverse the hashing process. We iterate through a bunch of hypothetical preceding characters and figure out the internal state hash value that could have led to it.
Both of these steps require a lot of computational power and a lot of memory.
The example above demonstrates a situation where we found:
- A string, QCMWaIO, where processing it up until that point generated an internal hash value of 3B847A2A
- Two strings, vpl and wQl, where if the internal hash value was 3B847A2A then it would result in a hash value of 0
If you combine these two paths together, you get a hash collision: two strings (QCMWaIOvpl and QCMWaIOwQl) that both hash to the value of 0. Apply enough time, memory, and computational power and you have thousands of hash collisions! This is many thousands or millions of times faster than the basic brute force approach.
Hash Stretching: Making Meet-in-the-Middle attacks 250 times more efficient
One thing that became clear after analyzing Python’s FNV-style algorithm is that it has a critical weakness: intermediate hash values of zero.
If the intermediate hash state (the variable x in this sample code) ever gets to 0, then multiplying it by 1000003 doesn’t change the intermediate hash value at all and you don’t get the unpredictable mixing that used to previously happen with that step. Then, you blend in an attacker-controlled character (the variable p in this sample code) and attackers have fine-grained control over the internal hash state.
It only takes a few seconds of brute forcing to find strings that at some point have an internal hash state of zero.
This introduces two major attacks:
Control of the final hash value
If the internal hash state ever reaches zero, the next character of the string completely defines the new internal hash state. If there are no more characters in the string, then the final hash value (as in the first example above) will simply be the length of the string due to Python’s final XOR with the length of the string. Additionally, since you know that Python will do a final XOR by the length of the string, you can use your next character after the state hits ‘0’ to make this final XOR generate a final hash value of anything between 0 and 255. For example, if you supply the next character as the decimal value ‘5’, an XOR by the new string length of ‘5’ will result in a hash value of zero.
Hash collision stretching
If the internal hash state ever hits zero, processing a character with the value of zero doesn’t change anything except the string length. Because of this, you can stretch a single hash collision into more hash collisions by simply adding new NULL characters when the internal hash value reaches zero. This changes the string length, so you also need to use technique #1 (controlling the final hash value) to remove the string length from the final calculation. This works for strings of length up to 255 (the maximum value of a byte you control), which lets you generate about 250 more collisions for every collision you find where the internal state reaches zero.
Using this approach, this technique was able to generate 10,000+ collisions with only a few hours of brute force. A fun and huge improvement to the state of the art, but we’re not done yet :)
Generating Billions of Collisions: The Null Points method
As mentioned, Python’s FNV implementation has a critical weakness whenever the internal hash state reaches zero. But finding those strings with an internal hash state of zero required a couple of hours of brute force. This leads to a question: Since the algorithm starts the hash state at zero and we find a part of a string (i.e. f8ds3k3kk) that leads to an internal hash state of zero, couldn’t we just repeat that substring to keep on finding new strings with internal hash states of zero? Like f8dsf8dsf8dsf8ds3k3kk?
Initially, the answer appeared to be ‘No’.
The big issue is Python’s processing of the initial character. If you recall the very first step, Python shifts the initial character to the left by 7 bits, thoroughly placing it into the unpredictable Bit Blender. If we tried to place that character again after we hit the internal state of zero, we wouldn’t be able to replicate the left shift by 7 places. So it seems we’re out of luck.
Except for one initial character!
When we shift left by seven bits, there is exactly one byte that doesn’t shift anything into the bit blender. The value ‘1’. When the algorithm shifts ‘1’ left by 7 bits, you get the value ‘128’ (0x80) with nothing in the bit blender – leaving an intermediate hash value that we can still completely influence with further characters under our control. So if we ever find a string that starts with the byte ‘1’ and eventually gets to an internal state of ‘0’, we can absolutely use these little islands of characters to get back to an internal hash state of ‘0’.
Here’s an example. After a few seconds of brute force, we find the string \x01\x02\xBC\x6F\xA4\xB6 that gets to an internal hash state of ‘0’.
As our next character, we use 0x80 (the initial ‘1’ shifted left 7 bits), and then we can clone our attack string again (“\x01\x02\xBC\x6F\xA4\xB6”) to get back to an internal hash state of zero. We use our final byte to influence the string length XOR to produce a final hash result of our choice, and we now have a repeatable way to influence the hashing operation from beginning to end.
More importantly, our little attack meta-pattern (“\x01\x02\xBC\x6F\xA4\xB6”) is only one example of a character sequence vulnerable to the null points method. With a bit more brute forcing, it’s possible to find these meta-patterns pretty quickly. With a few hours of computation, I easily found 16 patterns that fit the definition of criticality:
- They started with the byte ‘1’
- They caused Python’s internal hash state to reach ‘0’ at some point
The true beauty of this attack is that the meta-patterns used in this attack are interchangeable. The example above demonstrates that we can generate a lot of collisions from strings in the form of <Pattern1>0x80<Pattern1>0x80<Pattern1>
. But since we’re just abusing the fact that we got to an internal state of ‘0’, we can also generate them in the form of <Pattern1>0x80<Pattern2>0x80<Pattern1>
. Or <Pattern1>0x80<Pattern1>0x80<Pattern2>
. Start combining these meta-patterns systematically, and you have nearly infinite collisions at your disposal.
With these 16 meta-patterns of only 6 bytes each, you can generate 4 billion hash collisions against Python’s hash algorithm using only 48 bytes each. Due to the magic of combinatorics, the sky really is the limit: this technique will let you generate hash collisions against Python 3.2 and below as fast as your hard drive or network can process them. Have a 256-byte limit on key length? That’s 3.74144419156711E+50 (374144419156711000000000000000000000000000000000000) collisions! And if you want to experience a true embarrassment of riches, we can make this attack 250x more efficient with the hash collision stretching approach discussed earlier. But with this many zeros, who’s counting :)
Impact on Python Dictionary Insertion Performance
In raw numbers, Python’s dictionary insertion performance is pretty fast – even with collisions. But at only 50,000 colliding entries in a hashtable, Python starts to peg a CPU for 80 or 90 seconds. This represents about 2mb of raw data. So with a second or two of data transfer, you can keep a CPU busy for a minute or two.
The critical problem is that this performance degrades exponentially. Submit 4mb of data (10,000 collisions), and you may never see that CPU again. Submit 4 billion collisions? 3.74144419156711E+50 collisions? It’s safe to say that if you are exposing Python 3.2 (or below) hash tables to untrusted input, you have a significant security vulnerability on your hands. And by the way – json.loads() generates a dictionary backed by a hashtable.
Coordinated Vulnerability Disclosure
While this research demonstrates a fundamental break in the Python 3.2 (or below) hash algorithm, Python fixed this issue 7 years ago. It’s time to upgrade.