We started UnboundCompute because of a gap we kept running into. Most automated security testing checks a fixed list of known bugs and stops there. That misses the flaws that hurt most, the ones that need a real understanding of how an app works, like broken access control and business logic abuse. This post explains the gap, why we think it matters, and what we are betting on.
We are early. The product is being built. We would rather tell you what we believe and why than sell you on results we have not earned yet. So this is a point of view, written plainly.
What most automated security testing actually checks
A normal scanner works from a catalogue. It knows what a reflected script looks like, what a classic injection string returns, what an outdated library version means. It sends those patterns at every endpoint it can find and reports the matches. This is genuinely useful. It catches the well understood bugs fast, on a schedule no human could keep, and it never gets tired.
But notice what that approach assumes. It assumes the dangerous bugs all look like something the tool has seen before. Many do not. Consider a request like this:
GET /api/orders/8841
Authorization: Bearer trial-user-token
HTTP/1.1 200 OK
{ "id": 8841, "owner": "another-account", "total": 1290 }
There is no malformed payload here. No quote to break a query, no script tag, no signature to match. Yet the trial user just read an order that belongs to someone else. That is broken access control, and a pattern matcher has nothing to match against, because the request looks perfectly ordinary. The bug lives in the rule the app forgot to enforce, not in the shape of the input.
Why automated security testing misses the bugs that matter
The highest impact flaws come from understanding what an app is trying to do, then asking what happens when you bend one of its rules. Two examples make the point.
Broken access control
An app decides who is allowed to see or change what. When a check is missing, one user can reach another user’s data by changing an id in a URL, or reach an admin route that was never linked from the menu. To find this, you have to know who the current user is supposed to be and what they should not be able to touch. A fixed payload list does not carry that idea.
Business logic abuse
Logic bugs are worse to automate, because the app is behaving exactly as written. The code is just wrong about its own rules. Picture a checkout that takes a discount code. A tool sending known strings will never think to apply the same code three times, or set the quantity to a negative number so the total drops below zero:
POST /api/cart/apply
{ "code": "SAVE20", "quantity": -4 }
Nothing about that request is malformed. It is a valid call that exploits a rule the app assumed no one would break. You only find it by understanding the flow first, then probing the assumption underneath it.
The bugs that hurt most are not strange inputs to known holes. They are ordinary requests that break a rule the app forgot to enforce.
Why skilled humans cannot cover the gap alone
Human testers find these bugs. A good one reads the screen, guesses the business rules, and chases behavior no rulebook predicted. That is exactly the kind of judgment a payload catalogue lacks. The problem is supply.
- They are scarce. The people who are genuinely good at this work are few, and demand far outruns them.
- They are expensive. A deep manual test is a serious cost, so most teams can only afford it once or twice a year.
- They cannot keep up with shipping. Teams deploy many times a week. A test run once a year cannot see the code that shipped last Tuesday.
So you end up with two options that each fall short. Scanners run constantly but miss the bugs that need understanding. Humans understand but cannot run constantly. The deeper version of this comparison lives in our scanners vs research category, which goes through where each one earns its keep and where it does not.
Our bet: an autonomous researcher that tests assumptions
Here is what we are building toward. Instead of a tool that matches known payloads, an autonomous researcher that works the way a thoughtful human tester does. It learns how the application is meant to behave. It forms ideas about where that logic could break. It designs experiments to test those ideas. Then it proves a finding before it ever reports it. Understand, assume, experiment, verify, chain.
The order of those words matters. Understanding comes first, because the bugs we care about only appear once you know what the app expects. The verify step matters just as much. A finding is only reported when it is backed by concrete evidence, so the output is signal, not a pile of maybes. Take the order example above. The researcher would not flag a “possible” issue. It would replay the request, show the other account’s data coming back, and hand you a result you can reproduce.
That last step changes the cost of reading a report. Every false alarm costs someone an hour of triage, and after enough of them people stop reading. Proof cuts the noise. And once a finding is confirmed, it can become a repeatable check that keeps watching for the same bug coming back after a future deploy.
Where we are, honestly
We will not pretend we are finished. We are early, and the product is being built. We have no customers to name, no benchmark to wave around, and we are not going to invent one.
What we can share is an early signal that keeps us going. A frontier model drove the full methodology on its own and identified and verified real access control and injection issues in test applications it had not seen before. We frame that as encouraging, not as proof. It is enough to tell us the bet is worth making, and not enough to claim the work is done.
Why this is worth building
The pattern is hard to ignore. Software ships faster every year. The bugs that cause the worst days, the leaked records and the abused workflows, are the ones that need understanding, not pattern matching. Scanners cannot supply that understanding, and skilled humans cannot supply enough of their time. Something has to test the assumptions an app makes, at the speed teams now ship, and prove what it finds before it interrupts anyone.
That is the thing we are trying to build. An autonomous researcher that tests the assumptions your app makes and reports a bug only once it is proven. We are early and we know it, but this is the gap worth closing, and it is why UnboundCompute exists. If you want to follow along or tell us where we are wrong, read more on our about page.
