What is path traversal?

A path traversal bug lets an attacker step out of the folder your app meant to serve and read files it never intended to share. It shows up when an app takes a file name from the URL, like ?file=invoice.pdf, and hands it straight to the file system. Change that value to ../../etc/passwd and the same code that served an invoice now reads a system password file. This post explains how the bug works from zero, also called directory traversal, and how to shut it down.

What is path traversal in plain terms

Most web apps store files on disk and let users fetch them by name. A download endpoint might map a request to a folder like /var/www/files/ and tack the requested name onto the end. That is fine until the name contains the sequence ../, which means “go up one directory.” Each ../ climbs one level toward the root of the disk. Stack enough of them and you escape the intended folder.

Here is the shape of the request. A normal one looks like this:

GET /download?file=invoice.pdf HTTP/1.1
Host: acme-notes.example

The server reads /var/www/files/invoice.pdf and returns it. Now the attacker sends this instead:

GET /download?file=../../../../etc/passwd HTTP/1.1
Host: acme-notes.example

The server joins the path and ends up reading /etc/passwd from the disk root. Nothing about the request looks malformed. It is the same parameter, the same code path, just a different value. That is what makes path traversal easy to miss in a quick test.

The file name in a URL is user input. The moment it touches a file API without being checked against a fixed base directory, the whole disk is in scope.

What an attacker can actually read

The damage depends on what the app can reach on the host. Common targets include:

Source code. Reading your own application files, like ../config/database.yml or ../../app/settings.py, exposes logic and secrets in one shot.
Config and credentials. Files such as .env, cloud credential files, and database config often sit a few folders above the served directory.
Secrets and keys. Private keys, API tokens, and session signing keys turn a read bug into account takeover or full server access.
System files. On Unix, /etc/passwd confirms the bug and lists user accounts. On Windows, files like C:\Windows\win.ini serve the same proof.

A read primitive sounds limited. In practice, reading the right config file once is enough to pivot into the database or the cloud account.

Encoding tricks at a high level

Apps that try to block traversal with a simple text filter often check for the literal string ../ and stop there. Attackers get around that by encoding the same characters so the filter does not recognize them, while the file system still decodes them back to ../ later.

Percent encoding. A dot can be written as %2e, so ../ becomes %2e%2e%2f. A naive filter scanning for dots and slashes sees nothing.
Double encoding. Encode the percent sign itself and you get %252e. If one layer of the stack decodes once and passes it on, a second decode step turns it back into a dot.
Null bytes, historically. Older platforms truncated a string at a null byte (%00), so secret.key%00.pdf could pass a .pdf check and still open secret.key. Modern runtimes mostly closed this, but legacy code and native libraries can still be exposed.

The lesson is not to memorize each trick. Filtering for bad strings is the wrong model. You cannot list every encoding of ../. You have to validate the resolved path instead, which I cover below.

Windows versus Unix paths

Path separators differ by platform, and that matters for both attack and defense. Unix uses the forward slash /. Windows accepts both the backslash \ and the forward slash, so ..\..\..\windows\win.ini and ../../../windows/win.ini can both work. A filter that only looks for ../ misses the backslash form on a Windows host. Windows also has drive letters and UNC paths, which give attackers more ways to name an absolute location. If your defense assumes one separator, it is already incomplete on the other platform.

The link to local file inclusion

Path traversal is about reading a file off disk. Local file inclusion, or LFI, goes a step further: the app does not just read the file, it executes or interprets it. In a templating or scripting setup, a traversal that points at a file the engine will run can turn a read bug into code execution. The same untrusted file name reaches a more dangerous sink. So when you find a traversal, ask what the app does with the file after reading it. If it ever interprets the contents, the impact jumps from disclosure to execution.

A vulnerable endpoint and a fixed version

Here is a download handler written the wrong way, then the same handler with the holes closed. The vulnerable version joins user input straight onto a base path:

// VULNERABLE: user input reaches the file API directly
const path = require('path');
const fs = require('fs');

app.get('/download', (req, res) => {
  const base = '/var/www/files';
  const filePath = base + '/' + req.query.file;   // ../../etc/passwd escapes base
  res.sendFile(filePath);
});

The fixed version resolves the full path, confirms it still sits inside the base directory, and only serves names from a known set:

// FIXED: canonicalize, verify containment, allowlist the name
const path = require('path');
const fs = require('fs');

const BASE = path.resolve('/var/www/files');
const ALLOWED = new Set(['invoice.pdf', 'receipt.pdf', 'terms.pdf']);

app.get('/download', (req, res) => {
  const requested = path.basename(req.query.file || '');  // strip any directory parts

  if (!ALLOWED.has(requested)) {
    return res.status(404).send('Not found');
  }

  const resolved = path.resolve(BASE, requested);

  // Containment check: resolved path must stay inside BASE
  if (resolved !== BASE && !resolved.startsWith(BASE + path.sep)) {
    return res.status(400).send('Bad request');
  }

  res.sendFile(resolved);
});

Three things make the fixed version safe. It canonicalizes the path with path.resolve, which collapses every ../ into a real absolute location, so encoded or stacked traversals all reduce to one concrete path you can check. It then verifies containment, confirming the resolved path still starts with the base directory before any read happens. And it uses an allowlist of known names, so anything outside that set is refused before path logic runs. The rule underneath all three: never pass raw user input to a file API.

How to detect and prevent it

Detection starts with finding every place a request value reaches the file system. Look for download, export, preview, avatar, and report endpoints, and any code that builds a path by joining strings. Then test those values with traversal sequences and their encoded forms, watching for a system file in the response or an error that leaks a path.

Prevention checklist

Resolve, then verify. Canonicalize the full path and confirm it stays inside the intended base directory. Reject anything that does not.
Prefer an allowlist. Map requests to a fixed set of known names or IDs rather than accepting arbitrary file names.
Strip directory parts. Reduce input to a bare file name with a function like basename so separators cannot survive.
Decode fully before checking. Validate after all decoding is done, so %2e%2e and double encoded forms cannot slip past a string filter.
Handle both separators. Account for / and \, drive letters, and absolute paths, especially on Windows hosts.
Least privilege on disk. Run the app as a user that cannot read secrets or system files, so a bug that slips through still reads little.

The reason this bug survives is that the app’s assumption, “the file name only ever points inside this folder,” is never enforced at the line where the file is read. Testing that assumption directly is exactly the kind of work an autonomous security researcher that tests assumptions is built for. If you want more on this family of issues, the injection and input category collects related reads. Find your file endpoints, resolve and check every path, and the whole class goes away.

Frequently asked questions

Is path traversal the same as local file inclusion?

They are related but not identical. Path traversal lets an attacker read a file off disk that should be out of reach, while local file inclusion goes further and makes the app execute or interpret that file. If you find a traversal, check what the app does with the file after reading it, because a read bug becomes code execution when the contents are later run.

Why does filtering for ../ fail to stop path traversal?

Because there are too many ways to write the same sequence. Attackers use percent encoding like %2e%2e%2f, double encoding like %252e, and on Windows the backslash form ..\, all of which a literal string filter misses. The reliable fix is to canonicalize the full path and confirm it still sits inside your intended base directory before any read happens.

How is path traversal different on Windows versus Unix?

Unix uses the forward slash, but Windows accepts both the backslash and the forward slash, plus drive letters and UNC paths, so it offers more ways to name an absolute location. A filter that only looks for ../ misses the ..\ form on a Windows host, so any defense that assumes one separator is already incomplete on the other platform.

What is the most reliable way to prevent path traversal?

Resolve the full path first, then verify it still starts with your base directory, and prefer mapping requests to an allowlist of known names or IDs over accepting arbitrary file names. Running the app with least privilege on disk limits what a missed case can reach. MITRE tracks this weakness as CWE-22.