What Is Web Cache Deception and How a Crafted URL Leaks Private Pages

Web cache deception is an attack where a CDN or caching proxy is tricked into storing a victim’s authenticated, private response under a URL the attacker can fetch for themselves. The attacker lures a logged in victim to a crafted link like https://app.acmenotes.com/account/settings/nonexistent.css. The origin server ignores the extra suffix and serves the victim’s real account page, full of personal data and tokens. The cache, looking at the same URL, sees a .css ending and decides this must be a harmless stylesheet worth saving. It stores the private page under that key. The attacker then requests the very same URL, the cache serves its stored copy, and the victim’s private response lands in the attacker’s browser. This post walks the mechanism one step at a time: why the origin and the cache read the same URL differently, how a cache decides what to store, what the attacker actually walks away with, how this differs from web cache poisoning, and how to close the gap.

The disagreement at the heart of web cache deception

A cache sits between your users and your origin server to make things fast. When many people ask for the same stylesheet, script, or image, there is no reason to bother the origin every time. The cache keeps a copy of the response and hands it out to everyone who asks for that URL. This works beautifully for content that is the same for every visitor and stays the same for a while. Static assets are the textbook case.

The whole arrangement rests on one quiet assumption: that a given URL means the same thing to the cache as it does to the origin. Web cache deception is what happens when that assumption is false. The cache and the origin both look at /account/settings/nonexistent.css and reach different conclusions about what it is. The origin routes by path prefix and decides this is the account settings page. The cache classifies by file extension and decides this is a CSS file. One of them is serving private, per user content. The other is treating that content as a public asset safe to store and replay. The attack lives entirely in that gap.

How the origin reads the path

Most application servers do not match a request against a literal file on disk. They route. A framework looks at the leading part of the path, matches it to a handler, and treats whatever trails behind as a parameter, a path variable, or simply noise it can ignore. A request to /account/settings hits the settings handler. A request to /account/settings/nonexistent.css very often hits the exact same handler, because the router matched on /account/settings and never cared about the /nonexistent.css tacked on the end. The origin happily renders the logged in user’s settings page and returns it with a 200 OK. As Omer Gil described the condition in his original 2017 research, the requirement is simply that the server returns the content of the real page for the decorated URL rather than a 404. The suffix is invisible to the application but very visible to everything downstream.

How the cache reads the same path

The cache makes its decision on different grounds. A common and reasonable cache rule says: anything ending in a known static extension is cacheable. CSS, JS, PNG, GIF, ICO, WOFF, and a long tail of others. The logic is that files with those extensions are assets, assets do not contain secrets, and caching them is pure speed with no downside. So when the response to /account/settings/nonexistent.css comes back, the cache looks at the URL, sees .css, and stores the response. It often does this even when the origin’s own caching headers said not to, because an extension based rule can be configured to override or ignore Cache-Control. The cache is not reading the body. It does not know it just filed a page full of one specific user’s data under a public key. Omer Gil’s PayPal report listed more than forty extensions that PayPal’s cache would store this way, from css and js down to ico and swf.

Two independent, defensible decisions have now combined into a vulnerability. The origin decided the suffix was meaningless. The cache decided the suffix was authoritative. Neither component is broken on its own. The bug is the disagreement between them.

It helps to see why each side made the choice it did. The origin’s router is built for flexibility. Modern frameworks encourage clean, expressive routes, and matching on a leading prefix while ignoring trailing junk is a feature, not an oversight. It lets developers write one handler for /account/settings and not worry about every odd thing a browser or proxy might append. The cache, for its part, was tuned for a world where the URL is an honest signal of content type. For most of the web’s history, a path ending in .css really was a stylesheet, and trusting the extension was a cheap, reliable shortcut. Each component optimized for its own job under a reasonable assumption about the other. The attacker simply found the one input where those two reasonable assumptions point in opposite directions.

The cache key is not the whole URL

To see why the attacker can retrieve what the victim triggered, you have to look at the cache key. A cache does not index its stored responses by the full request. It builds a key, usually from the URL path and some chosen query parameters, and crucially that key does not include the victim’s session cookie. Cookies are exactly the thing that makes a response personal, and they are normally left out of the key so that the cache can serve one stored copy to many users.

That omission is the engine of the attack. The victim’s request to /account/settings/nonexistent.css carried their session cookie, so the origin rendered their private page. But the cache filed that private response under a key built only from the path. When the attacker later requests the identical URL, with no cookie or with their own, they produce the same cache key. The cache matches the key, sees a stored response, and serves it without ever consulting the origin. The session that authorized the content is long gone from the picture. The attacker did not need the victim’s cookie because the cache already stripped the cookie out of the key and kept the response.

The victim’s credentials fetch the private page once. The cache then serves that page to anyone who knows the URL, because the thing that made it private was never part of the key.

Beyond the file extension trick

The clean .css suffix is the original and most intuitive form, but the same disagreement shows up in subtler shapes. The PortSwigger Web Security Academy catalogs several, and they all reduce to the cache and the origin parsing the URL by different rules.

Static directory rules

Caches are often told to store anything under a particular directory prefix, like /static, /assets, or /resources. The intent is to cache the asset folder wholesale. If the origin’s router is loose about where that prefix appears, an attacker can craft a path that the cache sees as living under /assets while the origin still routes it to a dynamic, authenticated handler. No file extension is needed at all. The cacheable signal is the directory, and the path confusion smuggles private content into it.

Delimiter and path parameter discrepancies

Different stacks disagree about which characters end a path and which are just data. A semicolon is a meaningful path parameter delimiter in some Java servers and inert punctuation elsewhere. An encoded character like %2f may be decoded to a slash by one component and left literal by another. When the cache truncates the URL at a delimiter the origin ignores, or matches an extension the origin treats as part of an earlier path segment, the two views split apart again. The attacker’s job is to find a single character or encoding that the origin reads one way and the cache reads another, then build the gap from that seam. OWASP files this whole family under path confusion, and its testing guide points testers at exactly these decorated URLs.

Normalization gaps

Caches and origins also resolve path traversal and normalize sequences differently. If a cache collapses ..%2f before keying but the origin resolves it after routing, or the reverse, an attacker can present a path that appears to sit under a static prefix to one and under a dynamic route to the other. Same root cause, different mechanical lever: the two parsers do not agree on what the bytes in the path mean.

What the attacker actually walks away with

The stored response is whatever the victim would have seen on that authenticated page, and that is rarely just cosmetic. Account pages routinely embed the things that matter most. Personally identifiable information sits in the page body: names, email addresses, postal addresses, phone numbers, partial card numbers, balances. Gil’s PayPal disclosure noted the leak could expose exactly this class of data, names, account balances, card digits, transaction history, and more. That alone is a serious data breach with no further work.

It frequently gets worse, because authenticated pages also carry security tokens in their markup. A CSRF token printed into a hidden form field is meant to prove that a request came from the real user. If the page holding that token gets cached and handed to an attacker, the token leaks, and a defense against forged requests becomes a gift to the forger. Some pages expose session identifiers, API keys, or single use links in the same way. Once any of those land in the cached copy, the attacker can escalate from reading the victim’s data to acting as the victim, which is the path to full account takeover.

The delivery is also low effort for the attacker. There is no malware and no exploit chain to detonate; there is a link. The attacker sends the victim a crafted URL through email, a chat message, or any page the victim will click, exactly the way a phishing link travels. The victim does not have to type anything, log in again, or approve a prompt. They are already authenticated, and clicking the link silently fires off the request that primes the cache. From the victim’s point of view nothing dramatic happens; the page they land on may even look normal or show a missing stylesheet for a fraction of a second. The damage is invisible until the attacker fetches the stored copy. This is part of why the attack is so durable: the visible footprint on the victim’s side is close to nothing.

The reach of the attack is not narrow. Gil reported that when he tested high profile sites, a meaningful fraction were exploitable. Later academic work has kept confirming the prevalence at scale. A 2024 study, Hidden Web Caches Discovery by Matteo Golinelli and Bruno Crispo, used a timing based method to find caches that do not even announce themselves through response headers, measuring roughly 5.8 percent of the Tranco top 50,000 sites running such hidden caches, of which over a thousand were susceptible to web cache deception. A cache you cannot see in the headers is still a cache that can store and leak a private page. That last point is worth dwelling on, because it undercuts a common defensive instinct. Teams often reason about caching by reading response headers, assuming that if they do not see a cache status header they are not being cached. A hidden cache breaks that assumption outright. The infrastructure may cache silently, and the only way to know is to probe its behavior rather than trust what it advertises.

What separates this from cache poisoning

Web cache deception is constantly confused with web cache poisoning, and they are genuinely different attacks pointed in opposite directions. PortSwigger draws the line cleanly: poisoning manipulates the cache key to inject malicious content into a cached response that is then served to other users, while deception exploits cache rules to trick the cache into storing sensitive content that the attacker then retrieves for themselves.

Read that again by the direction of harm. In cache poisoning, the attacker is the source of bad content and the victims are everyone else. The attacker finds an unkeyed input, some header or parameter the origin reflects into the response but the cache leaves out of the key, and they use it to plant a malicious payload under a popular URL. The next thousand visitors who request that URL get the attacker’s poisoned response. The flow runs from attacker, into the cache, out to the crowd.

In web cache deception, the direction reverses. The victim is the source of the sensitive content and the attacker is the single beneficiary. The attacker lures one logged in victim to fetch their own private page, the cache stores it, and the attacker pulls that one stored copy back out. Nothing malicious is injected. The response is completely legitimate; it is simply the wrong person’s response, served to the wrong person. Poisoning is about controlling what a shared cache serves. Deception is about reading what a shared cache should never have stored. The mechanics rhyme, because both exploit a mismatch between what the cache keys and what actually varies the response, but the payload, the victim, and the goal are inverted.

This origin versus intermediary disagreement is a recurring pattern rather than a one off. It is the same shape as HTTP request smuggling, where the front end and the back end disagree about where one request ends and the next begins. In both cases there is no single broken component, only two components that parse the same bytes by different rules, and an attacker who lives in the gap between their interpretations.

Closing the gap

Because the root cause is a disagreement, the fixes all work by removing the disagreement or refusing to act on it.

The strongest move is to make caching decisions on what the origin actually returns, not on what the URL looks like. A cache that respects the origin’s Cache-Control: no-store and private directives will not store an authenticated page no matter what extension is glued to the path, because the page that produced it asked not to be stored. Send those headers on every response that contains per user data, and configure the cache to honor them rather than override them with a blanket extension rule.

Next, close the parsing gap at the origin. If a request decorated with /nonexistent.css or a stray delimiter is not a real route, the application should return a 404 or a redirect, not silently serve the underlying page. A router that rejects the decorated path denies the cache anything worth storing. Verify that the response Content-Type matches the extension the cache thinks it is caching; a page served as text/html under a .css URL is the exact contradiction the cache should refuse.

Finally, narrow the cache rules and align the two parsers. Prefer caching by explicit, known safe paths over broad extension or directory rules that match anything ending a certain way. Where the cache and the origin must both parse a URL, make sure they normalize delimiters, encodings, and traversal sequences identically, so there is no seam for an attacker to pry open. Each of these turns the two disagreeing views back into one.

It is worth testing for this directly rather than assuming you are safe. Pick an authenticated page, request it with a static suffix like /nonexistent.css appended, and watch what comes back. If the origin still returns the private page with a 200 OK, request the same decorated URL a second time without any session cookie and see whether the private content comes back from the cache. If it does, you have reproduced the attack against your own application, and you know precisely which of the fixes above is missing. Run the same probe against the delimiter and directory variants, because a site can be hardened against the plain extension trick while still leaking through a semicolon or a static directory prefix. This kind of hands on probing is what OWASP’s path confusion guidance asks testers to do, and it surfaces the disagreement far more reliably than reading configuration files and hoping the cache and the origin agree.

The assumption that breaks

Strip away the extensions and the delimiters and one assumption is holding the whole thing up. The cache assumes that a URL means the same thing to it as it does to the origin, and that anything dressed as a static asset is safe to store and replay to anyone. The origin assumes the cache will only keep what is genuinely public. Neither side ever checks the other, and the request itself never carries a signal that says this response was personal. So the two views drift apart on a single crafted path, and the gap between them is exactly wide enough to slip one user’s private page into a public slot.

The bug is not a broken cache or a careless framework. The bug is two correct components trusting that they agree on what a URL means when they do not, and a trust boundary everyone assumed sat at the response when it actually sat in the disagreement over the path. That kind of flaw does not show up by scanning for a known bad string. You find it by asking what each component assumes about the request, and whether the component on the other side shares that assumption. It is exactly the kind of question an autonomous researcher built to test assumptions is meant to ask. Honor the origin’s caching headers, make your router reject the decorated path, and keep the cache and the origin reading the same URL the same way. Learn more about that approach on our about page.

Frequently asked questions

What causes a web cache deception vulnerability?

It is caused by the cache and the origin server disagreeing about what a URL means. The origin routes by path prefix and serves a private page for a decorated URL like /account/settings/nonexistent.css, ignoring the suffix, while the cache classifies the same URL by its .css extension and stores the response as a public asset. Neither component is broken alone; the bug is the mismatch. Omer Gil first described this condition in his 2017 Web Cache Deception Attack research.

How does the attacker retrieve the victim’s private data?

Through the cache key. A cache indexes stored responses by the URL path, not by the victim’s session cookie, since cookies are normally excluded so one copy can be served to many users. The victim’s authenticated request stores their private page under a cookieless key, and the attacker then requests the identical URL, produces the same key, and the cache serves the stored copy without consulting the origin. The PortSwigger Web Security Academy explains how cache keys and cache rules combine to enable this.

How is web cache deception different from web cache poisoning?

They point in opposite directions. Web cache poisoning manipulates the cache key to inject malicious content into a cached response that is then served to many other users, so the attacker is the source and the crowd is the victim. Web cache deception tricks the cache into storing one victim’s sensitive response, which the attacker alone retrieves, so the victim is the source and nothing malicious is injected. PortSwigger draws this exact distinction in its web cache deception writeup.

How do you prevent web cache deception?

Make caching decisions on what the origin returns, not on what the URL looks like. Send Cache-Control: no-store and private on every authenticated response and configure the cache to honor them rather than override them with an extension rule. Have the router return a 404 for decorated paths like /account/settings/nonexistent.css, verify the Content-Type matches the extension, and align how the cache and origin normalize delimiters and encodings. OWASP covers testing for this under Test for Path Confusion.