A hash flooding attack is a low bandwidth denial of service that turns a data structure your server relies on against itself. A hash table promises constant time lookups, but that promise only holds when keys scatter evenly across buckets. If an attacker knows the hash function, they can craft hundreds of keys that all land in the same bucket, collapsing the table into a single long chain. Average case O(1) becomes worst case O(n) per operation, and building the table from n such keys costs O(n^2). A few hundred kilobytes of carefully chosen form fields or JSON keys, parsed automatically by the framework before any of your code runs, can pin a CPU core for seconds. This post walks the mechanism one step at a time: how a hash table actually stores keys, how an attacker engineers the collisions, why request parsing amplifies the damage, the 2011 wave that hit every major web platform at once, and the keyed hashing fix that closed the door.
How a hash table earns its O(1) reputation
A hash table is the workhorse behind every dictionary, map, and associative array you have ever used. The idea is simple. You have some keys, say the names of form fields, and you want to find the value for any key fast. Instead of scanning a list, the table keeps an array of buckets and runs each key through a hash function, a small piece of math that turns a string of bytes into a number. That number, taken modulo the number of buckets, tells the table which bucket the key belongs in.
When two keys land in the same bucket, that is a collision, and it is normal. Real hash functions cannot map an unlimited set of strings to a fixed array without overlaps. The standard way to handle a collision is chaining: each bucket holds a small linked list, and colliding keys are appended to it. To look up a key, the table hashes it to find the bucket, then walks that bucket’s chain comparing keys until it finds a match.
The reason this is fast is entirely about distribution. If a good hash function spreads n keys roughly evenly across n buckets, every chain is one or two entries long. Finding a key means hashing once and comparing once or twice, regardless of how many keys are in the table. That is the constant time, O(1), behavior everyone counts on. Inserting n keys one after another costs O(n) total, because each insert is O(1). The whole edifice of fast lookups rests on that even spread.
The catch is that O(1) is an average, not a guarantee. It assumes the keys arriving at the table are not chosen by someone who wants them to collide. Drop that assumption and the same data structure behaves very differently.
How a hash flooding attack engineers collisions on purpose
Now suppose every key you insert hashes to the exact same bucket. The table never spreads anything. One chain grows longer with every insert while every other bucket sits empty. Looking up a key now means walking a chain of length n, comparing against every key already there. A single lookup is O(n) instead of O(1).
Inserting is worse, because insertion has to check whether the key is already present before adding it. When you insert the kth colliding key, the table walks the existing chain of length k minus one to confirm the key is new. So the first insert does zero comparisons, the second does one, the third does two, and the nth does n minus one. The total work is 0 plus 1 plus 2 and so on up to n minus 1, which is n times n minus one over two. That is the quadratic blowup: building a table from n colliding keys costs on the order of n^2 comparisons rather than n.
The math is what makes the attack so cheap for the attacker and so expensive for the server. Double the number of colliding keys and you quadruple the work. Ten thousand colliding keys is not ten thousand units of work, it is on the order of fifty million. A hundred thousand colliding keys is on the order of five billion comparisons, all to insert a payload that fits comfortably in a single request body. The attacker spends a few kilobytes of upload; the server spends seconds of one core grinding through a linked list.
It helps to walk the asymmetry concretely. A parameter name like field0000 is about ten bytes on the wire. Ten thousand such names, separated by ampersands, is roughly a hundred kilobytes, a request body smaller than many images on a typical web page. An honest hundred kilobyte form with ten thousand distinct fields would insert into the table in about ten thousand operations, finishing in microseconds, because each key lands in its own bucket and the chains stay short. The same hundred kilobytes of colliding keys forces about fifty million comparisons, because every key has to be checked against the entire growing chain before it is added. The wire cost is identical. The CPU cost differs by a factor of five thousand. That ratio is the entire point of the attack: the work the server does is not proportional to the work the attacker does, and the gap between them widens with every key.
The same quadratic curve also explains why the cap based mitigation discussed later actually works. The pain lives in the n^2 term, and n^2 is gentle for small n and brutal for large n. A thousand colliding keys is only about half a million comparisons, finished in a blink. Ten thousand is fifty million. A hundred thousand is five billion. Cutting the maximum n the parser will accept does not slow the attack by a constant factor, it moves you back down the steep part of the curve, where even adversarial input is cheap to process.
Why knowing the hash function is the whole game
None of this works if the attacker cannot predict where keys land. The collisions have to be engineered, and to engineer them you need to know the function. Many of the platforms hit in 2011 used a well known, fixed, non keyed hash. PHP arrays and a number of Java systems used Daniel Bernstein’s DJBX33A and DJBX33X functions, which are short, fast, and completely public. The function multiplies a running value by 33 and adds the next byte, so its behavior is easy to reason about and easy to reverse.
With a fixed function and no secret, an attacker can compute, offline and ahead of time, large sets of distinct strings that all produce the same hash value. For DJBX33A there are well known short building blocks, pairs of two character strings that collide, and you can concatenate them to manufacture as many colliding keys as you like. The strings look like ordinary parameter names. There is nothing malformed about them. They simply happen to be chosen so the cheap public hash maps every one of them to the same number. The attacker does the hard combinatorial work once and reuses the result against every server running that function.
The construction is worth understanding because it shows how little effort the attack takes once the function is known. Suppose you find two short strings, call them Aa and BB, that the hash maps to the same value. Because the hash processes a string one byte at a time, building the running value as it goes, any longer string built by gluing these blocks together in any order produces the same final value as long as the blocks are interchangeable at each position. Two interchangeable two byte blocks give you four colliding strings of four bytes, eight of six bytes, sixteen of eight bytes, and in general 2 to the power of the number of slots. A handful of base collisions, concatenated, yields an effectively unlimited supply of distinct keys that all hash to one bucket. The attacker never has to brute force the full set. They find a few small collisions and let concatenation multiply them. This is why the payload is cheap to generate and why every server running the same unseeded function is vulnerable to the same precomputed list.
The request parsing amplifier
An attacker still needs a way to get a server to insert thousands of attacker chosen keys into a hash table without writing any code on the server. Web frameworks hand them exactly that, for free, on every request.
When a browser or a client sends a POST request with a form body or a JSON document, the framework parses it before your handler ever sees it. A body like a=1&b=2&c=3 is split on the ampersands and equals signs, and each name is inserted as a key into a dictionary so your code can read request.params["a"]. The same happens for JSON objects, for query string parameters, for multipart form fields, and in many stacks for HTTP headers and cookies. Parsing untrusted request data into a hash table is not an edge case. It is the single most common thing a web framework does, and it happens automatically, on the parsing path, with the keys taken verbatim from the request.
That is the amplifier. The attacker does not need authentication, a vulnerable endpoint, or any application logic at all. They send one POST request whose body is nothing but colliding parameter names, a few hundred kilobytes of key1=&key2=&key3= where every key name is one of the precomputed collisions. The framework dutifully parses each one and inserts it into a single overloaded bucket, paying the quadratic cost on the way. One ordinary looking request, well under a megabyte, pins a core while the parser grinds. At demonstration bandwidth on the order of a slow home connection, a steady trickle of these requests was enough to keep a modern CPU core fully busy. The bandwidth to attack is trivial; the bandwidth to absorb the attack is the server’s entire core.
The attacker never overwhelms the network or floods the server with volume. They send one small, well formed request and let the server’s own data structure do the expensive work, turning a few kilobytes of input into seconds of CPU.
Consider our invented app, Acme Notes, which exposes a JSON API. A client posts a note as a JSON object, and the framework parses that object into a dictionary keyed by field name before validation. An attacker posts a single note whose JSON body has a hundred thousand keys, all engineered to collide. Acme Notes never gets to reject the note for being malformed, because the denial of service happens during parsing, inside the framework, before a line of Acme Notes code runs. The application looks blameless. The vulnerability lives one layer down, in the assumption that request keys are not adversarial.
The 28C3 wave of 2011
This stopped being theoretical at the end of 2011. At the 28th Chaos Communication Congress, Alexander Klink and Julian Walde presented Efficient Denial of Service Attacks on Web Application Platforms, and the impact was that it hit nearly every major web stack at the same time. PHP, Java based servers, Python, Ruby, and ASP.NET all parsed request parameters into hash tables built on predictable, non keyed hash functions. One technique, slightly retargeted per language, took them all down.
The coordinated disclosure was tracked as oCERT-2011-003, which assigned a row of CVE identifiers across the affected platforms. PHP before 5.3.9 was CVE-2011-4885: it computed hash values for form parameters without restricting predictable collisions, letting a remote attacker burn CPU with many crafted parameters. Python was assigned CVE-2012-1150 under the same oCERT advisory. Ruby was CVE-2011-4815. On the Java side, Apache Tomcat was CVE-2011-4858, with sibling identifiers for Jetty, Glassfish, Geronimo, and the Rack middleware, among others. The point of the wave was not any single language. It was that an entire industry had independently reached for the same cheap public hash and the same automatic parameter parsing, and so shared the same flaw.
It was not a new idea
The class of attack was already eight years old in 2011. In 2003, Scott Crosby and Dan Wallach published Denial of Service via Algorithmic Complexity Attacks at the USENIX Security Symposium. They named the general category, algorithmic complexity attacks, where an attacker feeds an input crafted to drive a data structure or algorithm into its worst case rather than its average case. They demonstrated it against the hash tables in Perl and against the Bro intrusion detection system and the Squid proxy, knocking a Bro server over with less bandwidth than a dialup modem. They also pointed to the fix: universal hashing, where the hash function is parameterized by a secret the attacker does not know. The 2011 wave was the same attack the 2003 paper had described, finally cashed in against the web at scale.
SipHash and the real fix
The patches came in two flavors, and only one of them addresses the root cause.
Capping the number of parameters
The immediate, pragmatic mitigation was to limit how many parameters a request is allowed to carry. PHP’s fix for CVE-2011-4885 added a configuration directive, max_input_vars, defaulting to 1000, that caps the number of input variables parsed from a single request. If the quadratic cost only becomes painful past tens of thousands of keys, refusing to parse more than a thousand keeps any single request cheap. Other stacks added equivalent caps on parameter counts, header counts, and body sizes.
This works, but it treats the symptom. The hash function is still predictable, so an attacker who finds any path that inserts more than the cap of attacker chosen keys, or any code that builds a large dictionary from untrusted input outside the parameter parser, can still trigger the blowup. A cap narrows the attack surface. It does not remove the property the attack depends on.
Keyed hashing with SipHash
The real fix is to make the hash function unpredictable, so the attacker can no longer compute colliding keys ahead of time. You introduce a secret key, chosen randomly at process startup, and mix it into the hash. The function still spreads keys evenly and runs fast, but its exact mapping is different in every process and unknown to anyone outside it. An attacker cannot precompute collisions for a function whose seed they cannot see. This is the universal hashing idea from the 2003 paper, made practical.
The algorithm the ecosystem settled on is SipHash, a keyed hash function designed in 2012 by Jean Philippe Aumasson and Daniel Bernstein specifically in response to the hash flooding wave. SipHash is fast on the short strings that hash tables actually use as keys, and it takes a 128 bit secret key, so without that key you cannot find collisions. It was adopted as the keyed hash behind the default hash table implementations in Perl, Python, Ruby, and Rust, among others. Python exposed the seed through the PYTHONHASHSEED environment variable while it stabilized the change. Rust ships SipHash as the default hasher for its standard library hash map out of the box.
The thing to hold onto is why a non keyed hash was the actual bug. A fixed public hash function is a contract the attacker can read. Once they know the function, the set of colliding keys is just a calculation, and the data structure has no defense because it cannot tell an adversarial key from an honest one. Adding a secret key changes the function from something public into something private, and the entire attack rests on the function being public. Cap the parameters if you like, but the function being predictable is the root, and a keyed hash is what pulls it out.
If you want to see where this sits among related classes of bug, it shares DNA with the rest of the most common web vulnerabilities: a server trusting attacker controlled input to behave the way honest input does. Here the betrayed assumption is not about the content of a value but about the statistical shape of a set of keys. The closest relative is regular expression denial of service, where crafted input drives a backtracking regex into its worst case time instead of its average case, the same algorithmic complexity class seen from the regex engine; our free ReDoS regex analyzer checks a pattern for the runaway backtracking that makes that possible.
The assumption that breaks
Step back from the buckets and the chains and one assumption is holding the whole thing up. Every hash table assumes its keys are not chosen by an adversary who knows the hash function. That assumption is invisible in the textbook, where O(1) is stated as a fact rather than as an average over honest inputs. It is invisible in the framework, where parsing a request body into a dictionary looks like plumbing rather than a security boundary. And it was invisible across an entire industry that reached for the same fast public hash, until one conference talk made the cost visible everywhere at once.
The bug was never a slow hash table or a careless parser. The bug was a data structure whose performance guarantee quietly depended on the goodwill of whoever supplied its keys, deployed on the one path where the keys come straight from an attacker. The fix was not to make the table faster. It was to remove the attacker’s ability to predict it, by making the function secret. That gap, between what a system assumes about its inputs and what an adversary can actually arrange, is the kind of flaw you find by asking what each component takes for granted and whether that thing can be chosen against it, rather than by scanning for a known bad string. It is exactly the kind of assumption an autonomous researcher built to test assumptions is meant to catch. Question the average case, key your hashes, and cap what you parse. Learn more about that approach on our about page.
Frequently asked questions
What is a hash flooding attack?
It is a denial of service that abuses how hash tables store keys. A hash table gives O(1) average lookups only when keys spread across buckets. If an attacker knows the hash function, they can craft many keys that all collide into one bucket, so each operation degrades to O(n) and inserting n keys costs O(n^2). A small request of colliding keys then burns a CPU core. The class was first named by Scott Crosby and Dan Wallach in Denial of Service via Algorithmic Complexity Attacks at USENIX 2003.
How does a small request cause so much load?
Web frameworks parse request bodies, query strings, JSON keys, and headers into a hash table before your code runs. An attacker sends one POST request whose parameter names are all engineered to collide, often a few hundred kilobytes of key1=&key2= pairs. The framework inserts each name into a single overloaded bucket, paying the quadratic cost during parsing. The talk that demonstrated this across platforms was Efficient Denial of Service Attacks on Web Application Platforms at 28C3 in 2011.
Which platforms were affected and what were the CVEs?
The 2011 disclosure hit PHP, Java based servers, Python, Ruby, and ASP.NET at once, coordinated as oCERT-2011-003. PHP before 5.3.9 was CVE-2011-4885, Python was CVE-2012-1150, Ruby was CVE-2011-4815, and Apache Tomcat on the Java side was CVE-2011-4858, alongside identifiers for Jetty, Glassfish, Geronimo, and Rack. The shared cause was a predictable, non keyed hash function applied to attacker controlled request keys.
How do you fix hash flooding?
The real fix is keyed hashing: mix a secret seed chosen at startup into the hash so an attacker cannot precompute collisions. The ecosystem adopted SipHash, designed in 2012 by Jean Philippe Aumasson and Daniel Bernstein, as the default keyed hash in Perl, Python, Ruby, and Rust. Capping the number of parameters per request, such as PHP’s max_input_vars directive defaulting to 1000, helps as a mitigation, but a non keyed hash is the root problem because its collisions can be computed in advance.