Author: UnboundCompute

  • Web and API Security Glossary: Vulnerabilities and Terms Explained

    Web and API Security Glossary: Vulnerabilities and Terms Explained

    This glossary explains the most common web application vulnerabilities and the security terms that go with them, in plain language. It is built for people starting from zero, so each entry is short and concrete. Where we have a deeper post, the term links to it. Skim it, bookmark it, or read it top to bottom.

    The entries are grouped into core concepts, access control, injection and input, logic and API flaws, and the ways people test for all of these. If you only read one section, read common web application vulnerabilities first.

    Core concepts

    • Vulnerability. A weakness in an application that lets someone do something they should not be able to do, like read another user’s data or run their own code on the server.
    • Exploit. The specific steps or request that turns a vulnerability into real impact. A vulnerability is the open window; the exploit is climbing through it.
    • Threat. Anyone or anything that could act against the app, from a bored attacker to an automated bot scanning the whole internet.
    • Attack surface. Every place an attacker can touch the app: pages, forms, API endpoints, file uploads, headers, and parameters. The bigger the surface, the more there is to get wrong.
    • Payload. The piece of input that triggers the bug, for example a snippet of script or a crafted value in a URL parameter.
    • Proof of concept. A safe demonstration that a bug is real, without causing damage. It is the difference between “this might be exploitable” and “here is the proof.”
    • False positive. A finding a tool reports that is not actually exploitable. Too many of these waste a team’s time and train them to ignore alerts.
    • CVE. A public identifier for a known vulnerability in a specific piece of software, like CVE 2024 12345. It lets everyone refer to the same issue.
    • CVSS. A scoring system from 0 to 10 that rates how severe a vulnerability is. Higher means worse, but context still matters more than the number.
    • Zero day. A vulnerability that is being exploited before the vendor has a fix available. Defenders have zero days of warning.

    Access control

    Access control bugs are about who is allowed to do what. They are some of the highest impact issues because they often expose other users’ data directly. See access control vulnerabilities for the full picture.

    • Authentication. Proving who you are, usually with a password or a login token. See authentication vs authorization.
    • Authorization. Deciding what you are allowed to do once you are logged in. Many breaches come from getting this step wrong.
    • Broken access control. When the app fails to check that a user is allowed to perform an action, so a normal user can reach admin pages or other people’s records.
    • IDOR. Insecure direct object reference. Changing an id in a URL like /invoice/123 to /invoice/124 and seeing data that is not yours. See the IDOR and BOLA entry.
    • BOLA. Broken object level authorization. The API version of IDOR, and the most common serious API flaw. The endpoint returns an object without checking it belongs to the caller.
    • Privilege escalation. Gaining rights you should not have. See privilege escalation.
    • Horizontal escalation. Acting as another user at the same level, for example reading a peer’s messages.
    • Vertical escalation. Jumping to a higher level, for example a normal user gaining admin powers.
    • Session. The server’s memory that you are logged in, tracked by a cookie or token. Steal the session and you become that user.
    • JWT. JSON web token. A signed token that carries login claims. Weak signing or trusting unverified claims turns it into an access control bug.

    Injection and input

    Injection happens when input is treated as a command instead of plain data. The app mixes attacker text into a query, a page, or a shell, and the attacker’s text takes over.

    • Injection. The general class where untrusted input changes the meaning of a command the app runs.
    • SQL injection. Injecting database query syntax to read or change data the app never meant to expose. See SQL injection.
    • Cross site scripting. XSS. Injecting script that runs in another user’s browser, often to steal sessions. See cross site scripting.
    • Stored XSS. The script is saved by the app, for example in a comment, and runs for every visitor who views it.
    • Reflected XSS. The script comes back in the response to a single crafted request, usually delivered through a link.
    • DOM XSS. The bug lives in client side JavaScript that writes attacker input into the page without cleaning it.
    • Command injection. Getting the server to run your operating system commands. See command injection.
    • SSTI. Server side template injection. Input is rendered as a template expression, which can lead to running code on the server.
    • Path traversal. Using sequences like ../ to read files outside the intended folder, such as configuration or password files.
    • SSRF. Server side request forgery. Tricking the server into making requests for you, often to reach internal systems a user cannot touch directly.
    • XXE. XML external entity. Abusing XML parsing to read local files or make the server send requests.
    • CSRF. Cross site request forgery. Tricking a logged in user’s browser into sending an action they did not intend, like changing their email.
    • Open redirect. A redirect that sends users to any URL an attacker supplies, useful for convincing phishing links.

    Logic and API flaws

    These bugs are not about malformed input. The request is valid; the app’s rules are wrong. They are hard for scanners to catch because nothing looks broken on the surface.

    • Business logic vulnerability. A flaw in the app’s rules, like applying a discount twice or skipping a payment step. See business logic vulnerabilities.
    • Mass assignment. Sending extra fields in a request, like role=admin, that the app blindly saves because it trusts the whole object.
    • Broken function level authorization. An admin only action that is reachable by anyone who knows the endpoint, because the function itself never checks the role.
    • Rate limiting. A control that caps how often an action can run. Missing it enables brute force, scraping, and abuse.
    • Race condition. Sending requests at the same moment to slip between two steps, for example redeeming one gift code twice before the balance updates.

    How these get found and tested

    Finding web application vulnerabilities is its own discipline. Different methods catch different things, and none catches everything. For the bigger picture see how hackers find vulnerabilities and web application security.

    • Penetration testing. A skilled person tries to break the app on purpose and reports what worked.
    • Automated penetration testing. Software that does much of that work continuously. See automated penetration testing.
    • Vulnerability scanner. A tool that checks an app against a list of known issues and patterns. Fast and broad, but prone to false positives and blind to logic bugs.
    • SAST. Static analysis. Reads the source code without running it. See SAST vs DAST vs IAST.
    • DAST. Dynamic analysis. Tests the running app from the outside, the way an attacker sees it.
    • IAST. Interactive analysis. Watches the app from the inside while it runs to spot issues with more context.
    • Fuzzing. Throwing large amounts of malformed or random input at the app to see what crashes or misbehaves.
    • Verification. Proving a finding is real with concrete evidence before reporting it, so the output is signal and not a pile of maybes.

    See the ideas in action

    Definitions only go so far. Two teardowns walk through how these pieces combine in a realistic app: an IDOR that exposes user data and chaining small bugs into a real breach.

    The highest impact bugs rarely come from one exotic payload. They come from understanding how an app is meant to work, then noticing where that logic quietly breaks.

    That is exactly the kind of reasoning an autonomous researcher that tests an app’s assumptions is built for. It learns how the app works, forms ideas about where it breaks, runs experiments, and proves a finding before reporting it. If that approach interests you, read more about UnboundCompute.

  • Why we only report proven vulnerabilities

    Why we only report proven vulnerabilities

    Most security tools hand you a list of maybes. We do the opposite. Our rule is simple: we only report a bug after vulnerability verification, which means we have shown concrete evidence that the bug is real and exploitable. If we cannot prove it, we hold it back.

    This sounds obvious, but it goes against how most scanning works. A scanner sees a pattern it recognizes and raises an alert. It does not check whether that pattern actually breaks anything in your app. So the team on the other end inherits the hard part: deciding which alerts are real.

    A finding is not the same as a proven finding

    A finding is a guess. A tool noticed something that looks like a known weakness. Maybe a parameter name matches a SQL injection signature. Maybe a response header is missing. Maybe a login form lacks a rate limit field the tool expected to see. These are reasons to look closer. They are not proof.

    A proven finding is different. It comes with evidence that the bug works. Not a signature match, but a sequence you can replay: this request, sent in this way, produced this result that should not have been possible.

    Consider an invented app called Acme Notes. A scanner flags this endpoint because the URL has a numeric id:

    GET /api/notes/1042
    Authorization: Bearer <user A token>

    The flag says “possible insecure direct object reference”. That is a finding. It is a hint, nothing more. To turn it into a proven finding, you have to do the thing the scanner did not: log in as user A, request a note that belongs to user B, and show that user A reads private data.

    GET /api/notes/2099
    Authorization: Bearer <user A token>
    
    200 OK
    { "id": 2099, "owner": "userB", "body": "userB private note" }

    Now you have something real. User A read user B data. That response body is the evidence. The bug is no longer a guess about a numeric id. It is a confirmed access control failure with a request and a response that prove it.

    Why unproven alerts waste a security team’s time

    Every unproven alert is work pushed downstream. Someone has to triage it. They read the alert, open the app, try to reproduce it, and most of the time discover the alert was wrong. The parameter was safe. The missing header did not matter behind the gateway. The flagged id was scoped to the user all along.

    That triage cost is real and it repeats. A queue full of maybes does three bad things:

    • It buries the real bugs. When most alerts are noise, the few that matter get the same tired glance as the rest.
    • It trains people to ignore alerts. After the tenth false alarm, the eleventh gets closed without a real look. That is how a true positive slips through.
    • It moves the proof work onto humans. The tool guessed. Now an engineer spends an afternoon confirming or dismissing the guess, which is the expensive part the tool skipped.

    If a tool cannot prove the bug, it has not finished the job. It has only handed you a longer to do list.

    The difference between scanners and research is exactly this gap. We wrote more about that split in scanners vs research. A scanner matches patterns at scale and stops there. A researcher keeps going until the bug is shown to be real or shown to be nothing.

    What vulnerability verification actually means

    Vulnerability verification is the step where a candidate bug earns the word “vulnerability”. It means producing evidence that the issue is both real and exploitable in the running app, under realistic conditions.

    Real, not theoretical

    A pattern match says “this looks like a bug”. Verification says “I made the bug happen”. For the Acme Notes case, that is the second request above and the response body that should never have reached user A. For an injection bug, it is not a payload that matches a regex. It is a request that changes the query and returns data the query was never meant to return.

    Exploitable under real conditions

    Some flagged issues are real in theory but dead in practice. A parameter looks injectable, but a parser upstream strips the input before it reaches the database. Verification accounts for that. You test against the live behavior, not against a guess about the code. If the input never lands, there is no bug to report.

    Repeatable, not a one time fluke

    Evidence has to hold up when someone runs it again. A proven finding includes the steps to reproduce it, so the person who fixes it can watch the bug happen and then watch it stop. No reproduction means no proof.

    How UnboundCompute holds back what it cannot prove

    UnboundCompute is an autonomous security researcher. It learns how an app is meant to work, forms ideas about where that logic could break, designs experiments to test those ideas, and then tries to prove a finding with hard evidence. Understand, assume, experiment, verify, chain.

    The verify step is a gate, not a formality. If an experiment does not produce evidence that a bug is real and exploitable, the idea stays an idea. It does not become an alert. We would rather report fewer things and have every one of them be true than flood a queue and let people sort it out.

    This is an honest description of an early stage product. We are still building it. We are not claiming customers, benchmarks, or a finished tool. As an early and encouraging signal, a frontier model drove the full methodology on its own and identified and verified real access control and injection issues in test applications it had not seen before. We treat that as a reason to keep going, not as a number to wave around.

    A proven finding becomes a check that keeps watching

    Here is the part we like most. Once a bug is proven, the evidence is already a recipe. The request that broke Acme Notes, and the response that proved it, describe a test you can run again.

    So a confirmed finding can become a repeatable check. After the fix ships, the same steps run again and should now fail to reproduce the bug. If a later change brings the bug back, that check catches it. The access control gap that let user A read user B notes turns into a standing test:

    • Authenticate as user A.
    • Request a note owned by user B.
    • Expect a denial, not a 200 OK with private data.

    The proof you gathered once keeps paying off. A maybe cannot do that. You cannot build a regression test out of a guess, because you never knew what the real bug was. Proof gives you a fixed target, and a fixed target is something you can watch forever.

    This is the kind of bug an autonomous researcher that tests assumptions is built to find, prove, and keep watching. If that approach is interesting to you, read more about who we are on our about page.

  • How UnboundCompute differs from a vulnerability scanner

    How UnboundCompute differs from a vulnerability scanner

    If you search for an ai vulnerability scanner, you will find a lot of tools that promise to find every bug in your app. Most of them work the same way underneath: they match your app against a list of known patterns and hand you back a long report of maybes. UnboundCompute is a different kind of tool, and this post is an honest look at how it differs and where it stands today.

    We are early. The product is being built. So this is not a sales pitch. It is a comparison of two ways of looking for bugs, and an explanation of why we chose the harder one.

    What a traditional vulnerability scanner actually does

    A classic scanner crawls your app, collects every URL, form, and parameter it can reach, then fires a fixed set of test payloads at each one. It watches the response for signs that something went wrong. A reflected string here, a database error message there, a slow response that hints at a sleep command.

    This works for whole classes of well known bugs. If a field echoes back <script>alert(1)</script> without encoding, a scanner will catch it. If a search box passes ' OR '1'='1 straight into a query, it will often catch that too. That is real value, and pattern matching is good at finding the obvious mistakes quickly.

    The trouble starts past the obvious. A scanner does not know what your app is for. It does not know that a user on a free plan should never reach /api/v1/exports/full, or that order id=1043 belongs to a different account. It sees a request that returns 200 OK and moves on. To the scanner, a working feature and a broken access control check look identical.

    Why the report is full of maybes

    Because a scanner guesses from surface signals, it has to play it safe. If a payload causes any change at all, it tends to flag it so it does not miss a real bug. The result is a report with many items marked “possible” or “medium confidence,” and a real chance that most of them are false positives. Someone on your team then spends a day or two checking each one by hand to find the few that are real.

    That is the core problem. The scanner did the easy part and left the hard part, proving the bug, to you.

    A scanner tells you where something might be wrong. The expensive work, proving whether it really is, still lands on a human.

    How an ai vulnerability scanner that reasons is different

    UnboundCompute is built around a different loop. Instead of matching payloads against a list, it tries to understand the app first, then form ideas about where the logic could break, then run experiments to test those ideas, and only report a finding once it has proof. Understand, assume, experiment, verify, chain.

    Here is what that looks like in practice on an invented example. Say a typical SaaS app called Acme Notes lets users share a note by id:

    GET /api/notes/4471
    Authorization: Bearer <user A token>

    A pattern matcher checks that the response is valid and moves on. A researcher that reasons about the app notices the id is a plain number and forms an assumption: the server might be trusting the id in the URL without checking who owns the note. So it designs an experiment. It logs in as a second user, takes that user’s token, and asks for a note id that belongs to user A:

    GET /api/notes/4471
    Authorization: Bearer <user B token>

    If user B gets back user A’s private note, that assumption was correct. The tool does not stop at a hunch. It confirms the note content belongs to a different account, records the exact request and response as evidence, and only then reports it. That is an access control bug a payload list would never spot, because nothing in the request looks malicious. The request is perfectly well formed. The problem is what the app assumed.

    Proof before report

    The rule that changes the output is simple: a finding is only reported when it is proven with concrete evidence. No proof, no report. This flips the work. Instead of handing you candidates to verify, the tool does the verification itself and hands you the ones that survived. The output is signal rather than a stack of maybes.

    A confirmed finding can also be turned into a repeatable check, so the same test keeps running and tells you if the bug ever comes back after a fix or a refactor.

    A short comparison

    • How it finds bugs. A scanner matches known patterns. UnboundCompute forms an idea about the app’s logic and tests it.
    • What it understands. A scanner sees URLs and parameters. The researcher tries to learn what the app is meant to do and where that intent could break.
    • What it reports. A scanner reports candidates, many of them false positives. UnboundCompute reports findings it has already proven.
    • Who proves the bug. With a scanner, a human triages the list. Here, the tool runs the experiment and keeps the evidence.
    • The kind of bug it catches. Scanners are strong on known payload bugs. The researcher reaches logic and access control flaws that have no fixed payload.
    • After the fix. A proven finding becomes a repeatable check that watches for the bug returning.

    We go deeper on this split in scanners vs research, since it is the line that matters most when you are choosing a tool.

    Where we are honest about the limits

    None of this means scanners are useless. They are fast, cheap, and good at sweeping for the common, known issues. If you have never run one, run one. The point is that pattern matching has a ceiling, and the highest impact bugs usually live above it, in the assumptions an app makes about who you are and what you are allowed to do.

    It also does not mean UnboundCompute is finished. It is not. We are building it, and we are not going to dress that up with customer counts or benchmark charts we do not have. What we can say is an early, encouraging signal: a frontier model drove the full methodology on its own and identified and verified real access control and injection issues in test applications it had not seen before. That is a hint that the approach works, not a promise of a result on your app.

    Which one should you use

    Think of them as different jobs. A vulnerability scanner is a smoke detector for the known stuff, cheap to run and worth keeping on. An autonomous researcher is closer to a person who reads your app, asks “what if the server trusts this id,” and goes and checks. They answer different questions.

    If you take one thing from this, take the difference between a maybe and a proof. A maybe costs you time. A proof saves it. That gap is exactly what an autonomous researcher that tests assumptions is built to close. You can read more about who we are and where we are headed on our about page.

  • How UnboundCompute works, from understanding an app to proving a bug

    How UnboundCompute works, from understanding an app to proving a bug

    This post is a deeper look at how UnboundCompute does ai penetration testing, walked through one step at a time with a concrete example. UnboundCompute is an autonomous security researcher for web apps and APIs. Instead of running a fixed list of payloads, it learns how an app is meant to work, forms an idea about where that logic could break, and proves a finding before it ever reports one.

    To make the steps real, we will use an invented app called Acme Notes. It is a simple note taking SaaS where users sign up, create notes, and share them with teammates. No real system is being attacked here. Acme Notes exists only so we can show the method on something you can picture.

    Why ai penetration testing starts with understanding, not payloads

    Most scanners begin from a catalog of known attacks and fire them at every input. That finds the bugs everyone already knows to look for. It misses the bugs that come from a specific app making a specific assumption.

    UnboundCompute starts somewhere else. Before it tests anything, it reads Acme Notes the way a careful new engineer would. It maps the routes, the request shapes, and the rules the app seems to enforce. For Acme Notes, that means noticing things like this:

    • A note is fetched with GET /api/notes/{id}.
    • Sharing a note is a POST /api/notes/{id}/share with a teammate email in the body.
    • The app appears to assume that only a note owner can share that note.

    That last line is the interesting one. It is not a payload. It is an assumption the app is making. The whole method points at assumptions like that, because the bugs with the most impact usually live there.

    The highest impact bugs come from understanding the app, not from matching patterns. So the first job is to learn the app, then ask where its own rules might not hold.

    Form an assumption about where it could break

    Once the app is understood, the next step is a clear guess. Not a vague worry. A testable claim about one rule that might not be enforced everywhere.

    For Acme Notes, here is the assumption to challenge:

    • The app checks ownership when you read a note, but it may not recheck ownership when you share one.

    This is a guess about how access control can quietly fail. The read path and the share path were probably written at different times by different people. It is common for one path to enforce a rule that the other forgot. The guess is specific, so we can design a test that either confirms it or kills it.

    Design an experiment

    A good experiment isolates one variable. We want to know whether a user who does not own a note can still act on it through the share endpoint.

    So we set up two accounts in the test app, Alice and Bob. Alice owns a note. Bob does not. Bob has a valid session because he is a normal signed in user. The experiment is simple. Bob asks the share endpoint to operate on Alice’s note id.

    The point is control. If Bob’s request needs Alice’s note id and Bob’s own token, and nothing else changes, then any result we see is caused by the one thing we are testing.

    Verify with hard evidence

    This is the step that separates a real finding from a maybe. We do not report a guess. We run the experiment and look at what the app actually does.

    Here is the kind of request the experiment sends, using Bob’s session against Alice’s note:

    POST /api/notes/9d2f/share HTTP/1.1
    Host: acmenotes.test
    Authorization: Bearer <bob_session_token>
    Content-Type: application/json
    
    { "email": "bob@evil.test", "role": "editor" }

    Note 9d2f belongs to Alice. The token belongs to Bob. If Acme Notes were enforcing ownership on this path, the right answer is 403 Forbidden and no change to the note.

    Proof is what the response shows. If the app instead returns this:

    HTTP/1.1 200 OK
    Content-Type: application/json
    
    { "note_id": "9d2f", "shared_with": "bob@evil.test", "role": "editor" }

    then the assumption was right and the bug is real. Bob, who never owned the note, just gave himself editor access to it. The evidence is concrete: a 200, the response naming Bob as an editor, and a follow up GET /api/notes/9d2f with Bob’s token now returning the note body. That follow up read is the part that turns a suspicious response into a proven one. We can see Bob holding access he should never have had.

    What counts as proof

    Proof is not a status code on its own. It is a short chain that any engineer can replay:

    • The exact request that should have been denied.
    • The response showing it was allowed.
    • A second request that confirms the new access is real, not just an echo.

    If any link is missing, the finding stays unproven and is not reported. No bug is reported until it is proven. That is the rule that keeps the output as signal instead of a stack of guesses someone else has to triage.

    Chain a confirmed finding into the next

    A proven finding is not the end. It is a new fact about the app, and facts open doors.

    Now that Bob can grant himself editor access to any note id, the next question writes itself. What can an editor reach that a stranger cannot? If editors can read attachments, and attachments are served from a shared store, then the access control gap on sharing may lead to reading files that belong to other teams. So the next experiment targets that, using the access Bob just proved he can get.

    This is the chaining step. Each confirmed finding becomes the starting point for the next assumption, so a single broken rule gets followed as far as it really goes, with evidence at every step.

    A finding can become a repeatable check

    Once the share endpoint bug is proven and fixed, the proof does not get thrown away. The exact request and the expected 403 become a check that runs again later. If a future change reintroduces the gap, the check catches it. A confirmed finding turns into a small guard that keeps watching for the bug coming back.

    Where this stands today

    We are early and honest about it. The product is being built. We are not claiming customers, benchmarks, or finished results.

    What we can say is encouraging. A frontier model drove this full method on its own and identified and verified real access control and injection issues in test applications it had not seen before. We treat that as an early signal that the approach works, not as a final score.

    The Acme Notes walkthrough is the whole idea in one example. Understand the app, assume where it could break, design a clean experiment, verify with evidence you can replay, then chain the result into the next finding. This is exactly the kind of logic bug an autonomous researcher that tests assumptions is built to find. If you want the fuller picture of who we are and where we are headed, read more on our about page.

  • Why we are building UnboundCompute

    Why we are building UnboundCompute

    We started UnboundCompute because of a gap we kept running into. Most automated security testing checks a fixed list of known bugs and stops there. That misses the flaws that hurt most, the ones that need a real understanding of how an app works, like broken access control and business logic abuse. This post explains the gap, why we think it matters, and what we are betting on.

    We are early. The product is being built. We would rather tell you what we believe and why than sell you on results we have not earned yet. So this is a point of view, written plainly.

    What most automated security testing actually checks

    A normal scanner works from a catalogue. It knows what a reflected script looks like, what a classic injection string returns, what an outdated library version means. It sends those patterns at every endpoint it can find and reports the matches. This is genuinely useful. It catches the well understood bugs fast, on a schedule no human could keep, and it never gets tired.

    But notice what that approach assumes. It assumes the dangerous bugs all look like something the tool has seen before. Many do not. Consider a request like this:

    GET /api/orders/8841
    Authorization: Bearer trial-user-token
    
    HTTP/1.1 200 OK
    { "id": 8841, "owner": "another-account", "total": 1290 }

    There is no malformed payload here. No quote to break a query, no script tag, no signature to match. Yet the trial user just read an order that belongs to someone else. That is broken access control, and a pattern matcher has nothing to match against, because the request looks perfectly ordinary. The bug lives in the rule the app forgot to enforce, not in the shape of the input.

    Why automated security testing misses the bugs that matter

    The highest impact flaws come from understanding what an app is trying to do, then asking what happens when you bend one of its rules. Two examples make the point.

    Broken access control

    An app decides who is allowed to see or change what. When a check is missing, one user can reach another user’s data by changing an id in a URL, or reach an admin route that was never linked from the menu. To find this, you have to know who the current user is supposed to be and what they should not be able to touch. A fixed payload list does not carry that idea.

    Business logic abuse

    Logic bugs are worse to automate, because the app is behaving exactly as written. The code is just wrong about its own rules. Picture a checkout that takes a discount code. A tool sending known strings will never think to apply the same code three times, or set the quantity to a negative number so the total drops below zero:

    POST /api/cart/apply
    { "code": "SAVE20", "quantity": -4 }

    Nothing about that request is malformed. It is a valid call that exploits a rule the app assumed no one would break. You only find it by understanding the flow first, then probing the assumption underneath it.

    The bugs that hurt most are not strange inputs to known holes. They are ordinary requests that break a rule the app forgot to enforce.

    Why skilled humans cannot cover the gap alone

    Human testers find these bugs. A good one reads the screen, guesses the business rules, and chases behavior no rulebook predicted. That is exactly the kind of judgment a payload catalogue lacks. The problem is supply.

    • They are scarce. The people who are genuinely good at this work are few, and demand far outruns them.
    • They are expensive. A deep manual test is a serious cost, so most teams can only afford it once or twice a year.
    • They cannot keep up with shipping. Teams deploy many times a week. A test run once a year cannot see the code that shipped last Tuesday.

    So you end up with two options that each fall short. Scanners run constantly but miss the bugs that need understanding. Humans understand but cannot run constantly. The deeper version of this comparison lives in our scanners vs research category, which goes through where each one earns its keep and where it does not.

    Our bet: an autonomous researcher that tests assumptions

    Here is what we are building toward. Instead of a tool that matches known payloads, an autonomous researcher that works the way a thoughtful human tester does. It learns how the application is meant to behave. It forms ideas about where that logic could break. It designs experiments to test those ideas. Then it proves a finding before it ever reports it. Understand, assume, experiment, verify, chain.

    The order of those words matters. Understanding comes first, because the bugs we care about only appear once you know what the app expects. The verify step matters just as much. A finding is only reported when it is backed by concrete evidence, so the output is signal, not a pile of maybes. Take the order example above. The researcher would not flag a “possible” issue. It would replay the request, show the other account’s data coming back, and hand you a result you can reproduce.

    That last step changes the cost of reading a report. Every false alarm costs someone an hour of triage, and after enough of them people stop reading. Proof cuts the noise. And once a finding is confirmed, it can become a repeatable check that keeps watching for the same bug coming back after a future deploy.

    Where we are, honestly

    We will not pretend we are finished. We are early, and the product is being built. We have no customers to name, no benchmark to wave around, and we are not going to invent one.

    What we can share is an early signal that keeps us going. A frontier model drove the full methodology on its own and identified and verified real access control and injection issues in test applications it had not seen before. We frame that as encouraging, not as proof. It is enough to tell us the bet is worth making, and not enough to claim the work is done.

    Why this is worth building

    The pattern is hard to ignore. Software ships faster every year. The bugs that cause the worst days, the leaked records and the abused workflows, are the ones that need understanding, not pattern matching. Scanners cannot supply that understanding, and skilled humans cannot supply enough of their time. Something has to test the assumptions an app makes, at the speed teams now ship, and prove what it finds before it interrupts anyone.

    That is the thing we are trying to build. An autonomous researcher that tests the assumptions your app makes and reports a bug only once it is proven. We are early and we know it, but this is the gap worth closing, and it is why UnboundCompute exists. If you want to follow along or tell us where we are wrong, read more on our about page.

  • Meet UnboundCompute, an autonomous security researcher for web apps and APIs

    Meet UnboundCompute, an autonomous security researcher for web apps and APIs

    UnboundCompute is an autonomous security researcher for web apps and APIs. It reads an application the way a careful person would, builds a picture of how the app is meant to behave, then goes looking for the places where that intent quietly falls apart. This post explains who it is, what it does, and where it fits in the wider story of autonomous penetration testing.

    What UnboundCompute actually is

    Think of it as a researcher that never gets bored and never stops reading. You point it at a web app or an API. It studies the routes, the parameters, the responses, and the rules the app seems to enforce. From that it forms a working model of the system: who is supposed to do what, which actions need permission, and which inputs the app trusts.

    That model is the whole point. Most bugs that matter are not missing patches. They are gaps between what the app intends and what it allows. A user who can read another user’s invoice by changing one number in a URL. An endpoint that checks your login but forgets to check whether the record belongs to you. These are logic gaps, and you only see them once you understand the logic.

    The loop: understand, assume, experiment, verify, chain

    UnboundCompute works in a loop. Each step feeds the next, and the loop keeps tightening until there is either a proven finding or nothing left to test.

    Understand

    First it learns how the app is meant to work. It maps the surface and reads the behavior. If GET /api/orders/1042 returns your order, the researcher notes that orders are addressed by a simple number and asks the obvious follow up: what enforces that 1042 is yours?

    Assume

    Next it forms ideas about where the logic could break. This is the part a fixed checklist cannot do. The researcher reasons about the app in front of it, not a generic template. For an orders endpoint it might assume that ownership is checked at login but not at the record level. For a password reset flow it might assume the token is predictable or reusable.

    Experiment

    Then it designs a test for each idea and runs it. One assumption, one experiment. For the ownership idea, it requests a record it should not own:

    GET /api/orders/1043
    Authorization: Bearer <a different user's session>

    If that returns someone else’s order, the assumption held and there is a real access control bug to confirm.

    Verify

    This is the step that separates a researcher from a noise machine. A guess is not a finding. UnboundCompute only reports something when it can prove it with concrete evidence, the request that triggered the behavior and the response that shows the impact. The output is signal, not a pile of maybes you have to sort through by hand.

    A finding is only worth reporting when you can show the exact request that proves it. Everything else is a guess wearing a confident face.

    Chain

    Single bugs are useful. Chained bugs are how real damage happens. Once a finding is verified, the researcher asks what it opens up. A leaked email here, a guessable identifier there, an endpoint that trusts a value it should not. On their own each looks minor. Together they can add up to a full account takeover. Because UnboundCompute carries its model of the app through the whole loop, it can connect one verified result to the next instead of treating every test as a fresh start.

    Why this beats a scanner that checks a known list

    A traditional scanner is a list reader. It carries a set of known signatures and fires them at every input it finds. That has real value for catching the obvious and the already known. It also has a hard ceiling. A scanner that only checks a known list cannot find a bug that is not on the list, and the bugs that hurt most are almost never on any list.

    Here is the difference in one example. A scanner sends a SQL injection string at /search?q= and checks whether the response looks like a database error. Useful. But it will happily pass an endpoint like this:

    POST /api/account/transfer
    { "from": "acct_self", "to": "acct_other", "amount": 500 }

    There is no payload to match here. The bug, if there is one, is that the server never checks whether you own acct_self. No signature catches that. You catch it by understanding what the endpoint is for and testing the assumption it makes about who is calling it. We write more about this split between checking and researching in our scanners versus research category.

    • A scanner asks: does this input match a known bad pattern?
    • A researcher asks: what does this app assume, and what happens when that assumption is false?

    Both questions are fair. The second one is where the high impact findings live, and it is the question UnboundCompute is built around.

    Where this sits in autonomous penetration testing

    Autonomous penetration testing is the idea that a system can plan and run its own security tests, not just replay a script. UnboundCompute fits there, but with a specific stance: the value is not in running more checks faster. It is in reasoning about the target, testing assumptions, and proving impact before saying a word.

    Verification also pays off after the first run. Once a finding is confirmed, it can become a repeatable check that keeps watching for the same bug coming back. So the work is not throwaway. A proven issue today becomes a guard against regressions tomorrow.

    Where we are right now: honest version

    We are early. The product is being built, and we are not going to dress that up. We have no customer numbers to share, no benchmark to wave around, and we are not promising results we cannot back.

    What we will say is this. In our own testing, a frontier model drove the full methodology on its own and identified and verified real access control and injection issues in test applications it had not seen before. We read that as an early, encouraging signal that the approach holds, not as a benchmark and not as proof. There is a long way to go from a good signal to a tool you can rely on every day, and that gap is the work in front of us.

    The short version

    UnboundCompute is a security researcher that runs on its own. It learns how an app is meant to work, guesses where the logic breaks, tests those guesses, and only reports what it can prove. That is a different job from a scanner reading a list of known payloads, and it is the job we think matters most. If you want to know who is building this and why, read more on our about page.

  • Teardown: chaining small bugs into a real breach

    Teardown: chaining small bugs into a real breach

    Most reports score a bug on its own, then move on. That habit hides the real danger, because exploit chaining is how three small issues that each look harmless turn into one account takeover. In this teardown we walk through an invented app called Acme Notes and follow a chain from a leaky endpoint to a full password reset, link by link, proving each step before we connect it to the next.

    What exploit chaining means

    A chain is a sequence of findings where the output of one becomes the input of the next. Alone, each link earns a low severity rating. Read in order, they hand an attacker something they should never reach. The exploit chain meaning is simple to state and easy to miss: severity is not a property of one bug, it is a property of the path.

    Acme Notes is a small notes app. Users sign up, write notes, and reset a forgotten password by email. We found three issues. A public endpoint that lists user ids. An access control gap that returns a reset token for any id you ask for. A reset flow that accepts that token without a second check. Each was filed by a different reviewer as low. Together they are critical.

    Severity is not a property of one bug. It is a property of the path an attacker can walk end to end.

    Link one: a public endpoint leaks user ids

    Acme Notes has a directory feature so teammates can find each other. The endpoint needs no auth and returns a tidy list.

    GET /api/v1/directory?team=acme HTTP/1.1
    Host: app.acmenotes.example
    
    200 OK
    [
      { "id": 4821, "name": "Dana Lee" },
      { "id": 4822, "name": "Sam Ortiz" }
    ]

    On its own this reads as minor. Names are semi public anyway, and the team field is guessable. The reviewer who filed it wrote “info disclosure, low” and they were right about the impact in isolation. What matters for a chain is not the names. It is the id field. We now have a clean list of valid internal user ids, the exact input the next link wants.

    Why prove it first

    Before treating this as link one, we confirmed the endpoint really needs no session. We sent the request with no cookie and with a logged out client. Same 200, same ids. That is the evidence. We do not assume the ids are real or stable, we test that the same id maps to the same user across requests. It does. Now the link is verified and we can build on it.

    Link two: an IDOR exposes a reset token tied to an id

    Acme Notes lets a signed in user view their own pending reset status, so the support team can tell people whether a reset email is still valid. The route takes a user id.

    GET /api/v1/users/4821/reset_status HTTP/1.1
    Host: app.acmenotes.example
    Authorization: Bearer <any valid user token>
    
    200 OK
    { "pending": true, "token": "f3a9c1e8b2d47..." }

    This is an insecure direct object reference. The server checks that you are logged in. It never checks that the id you asked for is your own. So any authenticated user, even a brand new free account, can read the reset status of any other id, and the response includes the live reset token.

    Filed alone, this looks like a leak of a value that should be secret but that an attacker cannot target, because how would they know which ids exist or matter? That assumption is the weak point. Link one already answered it. We have the id list, so we are not guessing.

    Verify, then connect

    We confirmed the IDOR with two accounts we controlled. From account A we requested the reset status of account B by its id and read back B’s token. We did not stop at “the field is present.” We checked that the token value actually belonged to B’s account and not a placeholder. Only after that evidence did we treat link one and link two as joined.

    Link three: a weak reset flow accepts the token

    The final link is the reset endpoint itself. A well built flow ties the token to a session, an email confirmation, or a short expiry plus a one time use guard. Acme Notes does none of that. It accepts any token that matches a pending reset and sets the new password.

    POST /api/v1/password/reset HTTP/1.1
    Host: app.acmenotes.example
    Content-Type: application/json
    
    { "token": "f3a9c1e8b2d47...", "new_password": "attacker_chosen" }
    
    200 OK
    { "status": "password_updated" }

    On its own the team rated this medium and noted the token “is hard to obtain.” True in a vacuum. Links one and two removed that condition. The token is no longer hard to obtain, it is a field in a JSON response any user can read.

    Reading the chain end to end

    Put the three verified links in order and the picture changes:

    • Step one. Pull the user id for a target from the public directory.
    • Step two. Use any logged in account to read that id’s reset status and copy the live token.
    • Step three. Submit the token to the reset endpoint and set a new password.

    The result is account takeover of any user, starting from a free signup. None of the three findings would have triggered a page on their own. The chain is the bug. This is the gap between scanning for known payloads and understanding what an app assumes about its own data, a theme we cover across our attack teardowns.

    The defensive lesson

    The fix is not only to patch each link, though you should. It is to stop trusting that a low severity finding stays low. Three habits help.

    • Treat identifiers as reachable. Once an id appears in any unauthenticated response, plan as if every attacker holds the full list. Sequential integer ids make this worse, so prefer unguessable values, but do not rely on secrecy of ids as a control.
    • Check ownership on every object route. The IDOR existed because the server confirmed authentication but never authorization. “Is this caller allowed to see this specific record” is a separate question from “is this caller logged in.” Ask both.
    • Bind reset tokens to context. A reset token should be single use, short lived, and tied to the email that requested it or the session that follows the link. A token that any holder can redeem is a password waiting to be changed.

    The wider lesson is about how you review. When you file a finding, write down what the next attacker would need to make it worse, and whether your own app already provides that. The reset bug looked safe only because the reviewer assumed tokens were hard to reach. A second reviewer looking one step ahead would have asked where reset tokens are exposed, and found link two.

    How to verify a chain honestly

    Do not claim a chain you have not walked. Reproduce each link with evidence: the raw request, the raw response, and the accounts you used. Confirm that the value carried between links is the real value, not a lookalike. Then walk the whole path once, from public directory to changed password, on accounts you own in a test environment. If any link fails to reproduce, the chain is a theory, not a finding.

    Closing

    Small bugs are not small when they line up. The way to catch a chain is to understand the app, question each assumption, and prove every link before you trust it. This is exactly the kind of problem an autonomous researcher that tests assumptions, rather than matching a fixed list of payloads, is built to find. You can read more about that approach on our about page.

  • Teardown: how an IDOR quietly exposes another user’s data

    Teardown: how an IDOR quietly exposes another user’s data

    This is an idor example built from scratch so you can watch how one quietly exposes another user’s data. We will use an invented app called Acme Notes, map how it works, form an assumption about a weak spot, then test it with real requests. Nothing here touches a live system. The goal is to teach how the bug works and how to spot it before an attacker does.

    What an idor example actually is

    IDOR stands for insecure direct object reference. It happens when an app uses an id from the request to look up a record, but never checks that the person asking is allowed to see that record. The id is the direct object reference. When ownership is not verified, the reference becomes insecure. That gap is the whole bug.

    This bug is common for one reason. Developers think about authentication, who you are, far more than authorization, what you are allowed to touch. Acme Notes asks you to log in. It forgets to ask whether the note you requested is yours.

    An IDOR is rarely about a clever payload. It is the server trusting a number it should have checked.

    Step one: map the app like a researcher

    Before testing anything, understand how the app is meant to work. Acme Notes is a small notes tool. You sign in, you see a list of your notes, you click one to read it. Open the browser network tab and watch what the page sends. When you click a note, the front end makes this request:

    GET /api/notes/4012 HTTP/1.1
    Host: app.acmenotes.example
    Authorization: Bearer eyJhbGciOiJI (your token)
    Accept: application/json

    The server answers with the note as JSON:

    HTTP/1.1 200 OK
    Content-Type: application/json
    
    {
      "id": 4012,
      "owner_id": 88,
      "title": "Q3 launch checklist",
      "body": "Ship the billing page before Friday."
    }

    Two facts stand out. The note id, 4012, is a plain sequential number that sits right in the URL. The response also carries an owner_id. Your account is owner 88. So the app knows who owns the note. The question is whether it checks that ownership on every read.

    Step two: form the assumption

    Good testing starts with a guess you can prove or disprove. Here the assumption is direct: the server may load a note by id without confirming the requester owns it. Sequential ids make this worth testing, because note 4011 and note 4013 almost certainly belong to other users. If the server only checks your token and then trusts the id, you can read notes that are not yours.

    An attacker would form the same assumption. The difference is that a researcher tests it on an app they control or have permission to test, and reports it so it gets fixed.

    Step three: test by requesting a neighbouring id

    Keep your own valid login. Change only the id in the URL. Ask for the note next door:

    GET /api/notes/4011 HTTP/1.1
    Host: app.acmenotes.example
    Authorization: Bearer eyJhbGciOiJI (your token)
    Accept: application/json

    If the app is safe, you should get a refusal. Something like this:

    HTTP/1.1 403 Forbidden
    Content-Type: application/json
    
    { "error": "You do not have access to this note." }

    But Acme Notes is not safe. It returns the note in full:

    HTTP/1.1 200 OK
    Content-Type: application/json
    
    {
      "id": 4011,
      "owner_id": 73,
      "title": "Investor call notes",
      "body": "Runway is tight. Do not share outside the board."
    }

    Look at owner_id. It is 73, not 88. You are logged in as 88, yet the server handed you another user’s note. That is the bug, proven in one request.

    Step four: confirm it is real, not a guess

    One odd response is not proof. Before you call this a finding, rule out the boring explanations. A careful check answers a few questions.

    • Is the data really someone else’s? The owner_id in the response differs from your account id. Log in as a second test user, note their real id, and confirm the leaked note belongs to a third party, not to you under another label.
    • Does it repeat? Request 4010, 4009, 4008. If a range of ids you do not own all return 200 with full bodies, this is a pattern, not a fluke.
    • Is the token doing anything? Send the same request with no Authorization header. If that returns 401 but a valid token for the wrong user returns 200, the app checks login but not ownership. That is the exact shape of an IDOR.
    • Can you see the write side too? Try a read only method first. Only test edits or deletes on data you are allowed to change, so you never damage real records while confirming the issue.

    When the leaked owner id is consistently not yours, the behaviour repeats across a range, and a valid login is the only thing the server checks, you have evidence rather than a hunch. That is the line between a real finding and noise. For more on how access control bugs are grouped and tested, see access control.

    Step five: assess the impact

    Impact is about what an attacker can reach and how easily. In Acme Notes, ids are sequential and the endpoint returns full note bodies. A script can count from 1 upward and pull every note in the system in minutes. That turns one weak check into a full data exposure.

    Now widen the lens. The same pattern often appears on more than one route. If /api/notes/{id} is broken, test the siblings the same way:

    • /api/invoices/{id} for billing records
    • /api/users/{id}/profile for personal details
    • /api/files/{id}/download for attachments

    One missing ownership check is bad. The same check missing across several endpoints is how a small bug becomes a breach. This is why one confirmed finding is worth turning into a repeatable test, so the same gap cannot return on a new route later.

    How to fix an insecure direct object reference example

    The fix is not to hide the id or scramble it. Hiding the reference only slows an attacker down. The real fix is to check ownership on the server, on every request, every time.

    Check ownership at the data layer

    Bind the lookup to the logged in user. Instead of fetching a note by id alone, fetch it by id and owner together:

    -- weak: trusts the id from the request
    SELECT * FROM notes WHERE id = 4011;
    
    -- safe: ties the note to the caller
    SELECT * FROM notes
    WHERE id = 4011 AND owner_id = :current_user_id;

    If the second query returns no rows, the app returns a 404 or 403. The user never learns whether the note exists, so they cannot map your id space by probing.

    Centralise the rule and add a regression test

    Put the ownership check in one place that every route calls, not copied into each handler where one can be forgotten. Then write a test that logs in as user A, requests user B’s note, and fails the build if the response is anything but a refusal. That test is what keeps the bug from coming back during the next refactor.

    What to take away

    An IDOR is a trust mistake, not a complex exploit. The app trusts an id it should have checked against the logged in user. You find it by mapping the app, noticing a guessable reference like a sequential note id, assuming ownership might not be verified, and proving it with a single request that returns someone else’s data. You fix it by checking ownership on the server for every object, every time.

    This is exactly the kind of bug an autonomous researcher that tests an app’s assumptions is built to find, because it comes from understanding how the app should behave, not from matching a known payload. If that approach is useful to you, read more about UnboundCompute.

  • How do hackers find vulnerabilities?

    How do hackers find vulnerabilities?

    Ask most people how do hackers find vulnerabilities and they picture a tool that scans an app and spits out a list of holes. That happens, but it is the weak version. The strongest finding comes from a person sitting with an app, working out how it is meant to behave, then probing the spot where that intent quietly breaks.

    How do hackers find vulnerabilities by reasoning, not just scanning

    A scanner fires a fixed set of payloads at every field it can see and waits for a known pattern in the response. It is fast and it catches old, well documented bugs. It is also blind to the logic of the app. It does not know that an account ID in a URL was never supposed to be editable, or that a coupon code should only apply once. A researcher does know, because the researcher first learns the rules.

    So the real process is closer to detective work than to button pushing. You map the app. You learn what it promises. You guess where those promises are enforced by hope instead of by code. Then you test that exact guess.

    The best bugs are not hidden. They sit in plain sight, in the gap between what the app assumes and what it actually checks.

    Step one: map the application

    Before any testing, you build a picture of the app. What pages exist, what actions they offer, what data they touch. You watch the network traffic while you click around as a normal user. Every request and response is a clue about how the backend is wired.

    Take an invented example, a notes app called Acme Notes. As you use it, you notice a request like this when you open one of your own notes:

    GET /api/notes/4812 HTTP/1.1
    Host: app.acmenotes.example
    Authorization: Bearer your_token_here

    That single line tells you a lot. Notes are addressed by a plain number. Your note is 4812. The obvious question follows on its own. What happens if you ask for note 4811?

    What you are looking for while mapping

    • Identifiers you can change. Numbers and slugs in URLs and request bodies, like user_id, order=1099, or file=report.pdf.
    • Hidden actions. Buttons that only admins see, but that may still call an endpoint anyone can reach.
    • State the app tracks. Cart totals, account balances, draft versus published flags, anything the app expects to control.
    • Trust boundaries. The line between what the browser sends and what the server is willing to believe.

    Step two: understand how it is meant to work

    This is the part scanners skip. You read the app the way its designers read it. A note belongs to one user. A user should see only their own notes. An order total should equal the sum of its items. A password reset link should work once and then die.

    Each of those sentences is a rule. Each rule is a promise the app makes. The interesting question is always the same. Is this promise enforced on the server, or only suggested by the screen?

    Step three: form ideas about where assumptions break

    Now you turn rules into guesses. A good guess is specific and testable. Vague suspicion gets you nowhere. Concrete bets get you findings.

    • The server checks that you are logged in, but maybe it never checks that note 4811 is yours.
    • The price comes from a hidden form field, so maybe the server trusts whatever price the browser sends.
    • The reset token is a short number, so maybe you can guess another user’s token.
    • The admin panel link is hidden in the menu, but maybe POST /api/admin/users answers anyone who calls it.

    Notice the shape of every guess. The app assumes something. You bet that the assumption is checked in the wrong place, or not at all.

    Step four: test inputs and access

    With a guess in hand, you design the smallest experiment that would prove it. For the Acme Notes guess, you keep your own valid login but change one number:

    GET /api/notes/4811 HTTP/1.1
    Host: app.acmenotes.example
    Authorization: Bearer your_token_here

    If the response is 403 Forbidden or 404 Not Found, the promise held. The app checked ownership. You move on. If the response is 200 OK and you are reading a stranger’s private note, you have found a broken access control bug, the kind often called an insecure direct object reference.

    The same habit applies to input. If a search box builds a database query, you send a value that would break out of the intended query and watch how the app reacts. If a file name is echoed into a page, you send a value that would run as script and see whether the app cleans it. You are always asking one thing. Does the server defend this, or did it assume nobody would try?

    Step five: confirm impact

    A surprising response is not yet a finding. A guess is not evidence. You confirm. You read another account’s data on purpose, then read a second one to show it was not a fluke. You change a price to 0 and complete a checkout to show money actually moved. You prove the bug does what you claim, with a clear request and response that anyone can repeat.

    This is where honest work separates itself from noise. A confirmed bug with a reproduction is something a team can fix today. A list of maybes from a scanner is something a team has to triage, often only to find that most entries are false alarms.

    Blind scanning versus reasoning about the app

    Both approaches exist, and they fail in different ways. The difference is worth keeping straight, which is why we wrote a whole piece on scanners versus research.

    • Blind scanning throws known payloads at everything and matches known patterns. It finds the bug everyone already knows about. It misses logic flaws because it never learns the logic.
    • Reasoning about the app learns the rules first, then targets the exact place a rule is likely unenforced. It finds the access control and business logic bugs that scanners walk straight past.

    You can sum up the whole method in five words. Understand, assume, experiment, verify, chain. Learn the app. Bet on a broken assumption. Run a small test. Prove the impact. Then see whether one bug opens the door to the next.

    Why this matters for defenders

    If you build software, the lesson points straight at your code. Attackers will model your app’s rules and then check, one by one, whether each rule is enforced on the server. So enforce them on the server. Check ownership on every object lookup, not just login. Recompute prices and totals from trusted data, never from the request. Treat every value from a browser as a claim to verify, not a fact to trust.

    Finding vulnerabilities, done well, is just disciplined curiosity about where an app’s assumptions and its checks part ways. This is exactly the kind of bug an autonomous researcher that tests assumptions is built to find, working through understand, assume, experiment, and verify on its own. You can read more about that approach on our about page.

  • What is command injection? Examples explained

    What is command injection? Examples explained

    Command injection is one of the oldest and most dangerous web bugs, and it is also one of the easiest to understand once you see it in action. It happens when an app takes input from a user, drops that input into a system command, and runs the whole thing in a shell. If the app trusts the input too much, the user can append their own commands and make the server run them.

    What command injection means

    The short version of the command injection meaning is this: your app wanted to run one command, but the attacker tricked it into running two. The first is the command you intended. The second is whatever the attacker tacked on. The shell happily runs both because, to the shell, it is just text.

    The root cause is mixing two things that should stay apart: data (the value a user typed) and code (the command the server runs). When user data flows straight into a command string, the data can change what command runs. That is the whole bug in one sentence.

    The app meant to run one command. The attacker made it run two. The shell cannot tell your intent from their input, so it runs both.

    A simple command injection example

    Let us invent a small app called Acme Netcheck. It is a network tool with one feature: you give it a hostname, and it pings that host so you can see if the host is reachable. The form has one field named host, and the backend runs a ping for you.

    Here is the kind of code that causes the problem. This is written to show the mistake, not to copy:

    # DANGEROUS: user input goes straight into a shell command
    host = request.form["host"]
    command = "ping -c 1 " + host
    output = os.popen(command).read()
    return output
    

    If a normal user types example.com, the server builds and runs this:

    ping -c 1 example.com
    

    That works as intended. The trouble starts when someone types something that is not just a hostname. On a typical shell, a semicolon ends one command and starts another. So an attacker types this into the same field:

    example.com; whoami
    

    Now the server builds and runs this:

    ping -c 1 example.com; whoami
    

    The shell runs the ping, then runs whoami, and the app returns the output of both. The attacker just learned which user the web server runs as. They did not break into anything clever. They only added a semicolon and a second command to a field that was supposed to hold a hostname.

    Other command injection examples that work the same way

    The semicolon is one of several shell characters that chain or redirect commands. These all let an attacker smuggle a second command into a single input field:

    • example.com && whoami runs whoami only if the ping succeeds.
    • example.com | whoami pipes the first command into the second.
    • $(whoami) or `whoami` runs the inner command and pastes its result back in.

    These are command injection examples you will see again and again because the cause is identical every time: input was treated as part of a command instead of as plain text.

    Why command injection is so serious

    With SQL injection, an attacker reaches your database. With command injection, the attacker reaches the operating system itself, running as whatever user your app runs as. That is a wider blast radius. Once they can run shell commands on your server, they can:

    • Read files the app can read, including config files and secrets like API keys and database passwords.
    • Reach other machines on the internal network that the server can talk to but you cannot reach from outside.
    • Install a backdoor or a reverse shell so they can come back later.

    A field meant to hold a hostname turned into full control of a server. That is why this bug class sits near the top of every serious security list.

    How to fix command injection

    The strongest fix is to stop building shell command strings out of user input. Most of the time you do not need a shell at all.

    Do not shell out when an API exists

    If you only need to read a file, use the file API in your language. If you need to make an HTTP request, use an HTTP library. Reaching for a shell command to do a job your language already does is the start of most of these bugs. No shell means no shell injection.

    If you must run a program, pass arguments as a list

    When you genuinely need to run an external program, call it directly and pass each argument as a separate list item instead of as one big string. Most languages support this. In Python it looks like this:

    # Safer: no shell, arguments passed as a list
    import subprocess
    host = request.form["host"]
    output = subprocess.run(
        ["ping", "-c", "1", host],
        capture_output=True, text=True
    ).stdout
    

    Here host is handed to ping as a single argument. There is no shell to interpret the semicolon, so example.com; whoami is passed to ping as one odd hostname, which fails to resolve. The second command never runs.

    Validate input with an allowlist

    Defense in depth helps too. Decide exactly what valid input looks like and reject everything else. For a hostname, you can allow only letters, digits, dots, and hyphens, and reject anything else before the value goes near a command:

    import re
    host = request.form["host"]
    if not re.fullmatch(r"[A-Za-z0-9.-]+", host):
        return "Invalid host", 400
    

    An allowlist describes what you accept. A blocklist tries to list every bad character and always misses some. Prefer the allowlist.

    Lower the impact when things go wrong

    Run the app as a low privilege user, not as root. Limit what that user can read and which machines it can reach. None of this fixes the bug, but it shrinks the damage if one slips through. You can read more patterns like this in our guide to injection and input bugs.

    How to spot it in your own code

    Search your codebase for the places where commands get run. Look for os.system, os.popen, subprocess calls with shell=True, backticks, exec, and eval. For each one, ask a single question: does any part of this command come from a request, a form, a URL, a header, or a file an outside user can influence? If yes, treat it as suspect and fix it with the steps above.

    Command injection survives because the dangerous code reads as harmless. Joining a string and running it looks fine in review. The bug only shows when someone tries the input you did not expect. This is exactly the kind of assumption an autonomous researcher that tests how an app really behaves is built to find. To see how we think about bugs like this, read more about UnboundCompute.