Meet UnboundCompute, an autonomous security researcher for web apps and APIs

UnboundCompute is an autonomous security researcher for web apps and APIs. It reads an application the way a careful person would, builds a picture of how the app is meant to behave, then goes looking for the places where that intent quietly falls apart. This post explains who it is, what it does, and where it fits in the wider story of autonomous penetration testing.

What UnboundCompute actually is

Think of it as a researcher that never gets bored and never stops reading. You point it at a web app or an API. It studies the routes, the parameters, the responses, and the rules the app seems to enforce. From that it forms a working model of the system: who is supposed to do what, which actions need permission, and which inputs the app trusts.

That model is the whole point. Most bugs that matter are not missing patches. They are gaps between what the app intends and what it allows. A user who can read another user’s invoice by changing one number in a URL. An endpoint that checks your login but forgets to check whether the record belongs to you. These are logic gaps, and you only see them once you understand the logic.

The loop: understand, assume, experiment, verify, chain

UnboundCompute works in a loop. Each step feeds the next, and the loop keeps tightening until there is either a proven finding or nothing left to test.

Understand

First it learns how the app is meant to work. It maps the surface and reads the behavior. If GET /api/orders/1042 returns your order, the researcher notes that orders are addressed by a simple number and asks the obvious follow up: what enforces that 1042 is yours?

Assume

Next it forms ideas about where the logic could break. This is the part a fixed checklist cannot do. The researcher reasons about the app in front of it, not a generic template. For an orders endpoint it might assume that ownership is checked at login but not at the record level. For a password reset flow it might assume the token is predictable or reusable.

Experiment

Then it designs a test for each idea and runs it. One assumption, one experiment. For the ownership idea, it requests a record it should not own:

GET /api/orders/1043
Authorization: Bearer <a different user's session>

If that returns someone else’s order, the assumption held and there is a real access control bug to confirm.

Verify

This is the step that separates a researcher from a noise machine. A guess is not a finding. UnboundCompute only reports something when it can prove it with concrete evidence, the request that triggered the behavior and the response that shows the impact. The output is signal, not a pile of maybes you have to sort through by hand.

A finding is only worth reporting when you can show the exact request that proves it. Everything else is a guess wearing a confident face.

Chain

Single bugs are useful. Chained bugs are how real damage happens. Once a finding is verified, the researcher asks what it opens up. A leaked email here, a guessable identifier there, an endpoint that trusts a value it should not. On their own each looks minor. Together they can add up to a full account takeover. Because UnboundCompute carries its model of the app through the whole loop, it can connect one verified result to the next instead of treating every test as a fresh start.

Why this beats a scanner that checks a known list

A traditional scanner is a list reader. It carries a set of known signatures and fires them at every input it finds. That has real value for catching the obvious and the already known. It also has a hard ceiling. A scanner that only checks a known list cannot find a bug that is not on the list, and the bugs that hurt most are almost never on any list.

Here is the difference in one example. A scanner sends a SQL injection string at /search?q= and checks whether the response looks like a database error. Useful. But it will happily pass an endpoint like this:

POST /api/account/transfer
{ "from": "acct_self", "to": "acct_other", "amount": 500 }

There is no payload to match here. The bug, if there is one, is that the server never checks whether you own acct_self. No signature catches that. You catch it by understanding what the endpoint is for and testing the assumption it makes about who is calling it. We write more about this split between checking and researching in our scanners versus research category.

A scanner asks: does this input match a known bad pattern?
A researcher asks: what does this app assume, and what happens when that assumption is false?

Both questions are fair. The second one is where the high impact findings live, and it is the question UnboundCompute is built around.

Where this sits in autonomous penetration testing

Autonomous penetration testing is the idea that a system can plan and run its own security tests, not just replay a script. UnboundCompute fits there, but with a specific stance: the value is not in running more checks faster. It is in reasoning about the target, testing assumptions, and proving impact before saying a word.

Verification also pays off after the first run. Once a finding is confirmed, it can become a repeatable check that keeps watching for the same bug coming back. So the work is not throwaway. A proven issue today becomes a guard against regressions tomorrow.

Where we are right now: honest version

We are early. The product is being built, and we are not going to dress that up. We have no customer numbers to share, no benchmark to wave around, and we are not promising results we cannot back.

What we will say is this. In our own testing, a frontier model drove the full methodology on its own and identified and verified real access control and injection issues in test applications it had not seen before. We read that as an early, encouraging signal that the approach holds, not as a benchmark and not as proof. There is a long way to go from a good signal to a tool you can rely on every day, and that gap is the work in front of us.

The short version

UnboundCompute is a security researcher that runs on its own. It learns how an app is meant to work, guesses where the logic breaks, tests those guesses, and only reports what it can prove. That is a different job from a scanner reading a list of known payloads, and it is the job we think matters most. If you want to know who is building this and why, read more on our about page.