Why we only report proven vulnerabilities

Why we only report proven vulnerabilities

Most security tools hand you a list of maybes. We do the opposite. Our rule is simple: we only report a bug after vulnerability verification, which means we have shown concrete evidence that the bug is real and exploitable. If we cannot prove it, we hold it back.

This sounds obvious, but it goes against how most scanning works. A scanner sees a pattern it recognizes and raises an alert. It does not check whether that pattern actually breaks anything in your app. So the team on the other end inherits the hard part: deciding which alerts are real.

A finding is not the same as a proven finding

A finding is a guess. A tool noticed something that looks like a known weakness. Maybe a parameter name matches a SQL injection signature. Maybe a response header is missing. Maybe a login form lacks a rate limit field the tool expected to see. These are reasons to look closer. They are not proof.

A proven finding is different. It comes with evidence that the bug works. Not a signature match, but a sequence you can replay: this request, sent in this way, produced this result that should not have been possible.

Consider an invented app called Acme Notes. A scanner flags this endpoint because the URL has a numeric id:

GET /api/notes/1042
Authorization: Bearer <user A token>

The flag says “possible insecure direct object reference”. That is a finding. It is a hint, nothing more. To turn it into a proven finding, you have to do the thing the scanner did not: log in as user A, request a note that belongs to user B, and show that user A reads private data.

GET /api/notes/2099
Authorization: Bearer <user A token>

200 OK
{ "id": 2099, "owner": "userB", "body": "userB private note" }

Now you have something real. User A read user B data. That response body is the evidence. The bug is no longer a guess about a numeric id. It is a confirmed access control failure with a request and a response that prove it.

Why unproven alerts waste a security team’s time

Every unproven alert is work pushed downstream. Someone has to triage it. They read the alert, open the app, try to reproduce it, and most of the time discover the alert was wrong. The parameter was safe. The missing header did not matter behind the gateway. The flagged id was scoped to the user all along.

That triage cost is real and it repeats. A queue full of maybes does three bad things:

  • It buries the real bugs. When most alerts are noise, the few that matter get the same tired glance as the rest.
  • It trains people to ignore alerts. After the tenth false alarm, the eleventh gets closed without a real look. That is how a true positive slips through.
  • It moves the proof work onto humans. The tool guessed. Now an engineer spends an afternoon confirming or dismissing the guess, which is the expensive part the tool skipped.

If a tool cannot prove the bug, it has not finished the job. It has only handed you a longer to do list.

The difference between scanners and research is exactly this gap. We wrote more about that split in scanners vs research. A scanner matches patterns at scale and stops there. A researcher keeps going until the bug is shown to be real or shown to be nothing.

What vulnerability verification actually means

Vulnerability verification is the step where a candidate bug earns the word “vulnerability”. It means producing evidence that the issue is both real and exploitable in the running app, under realistic conditions.

Real, not theoretical

A pattern match says “this looks like a bug”. Verification says “I made the bug happen”. For the Acme Notes case, that is the second request above and the response body that should never have reached user A. For an injection bug, it is not a payload that matches a regex. It is a request that changes the query and returns data the query was never meant to return.

Exploitable under real conditions

Some flagged issues are real in theory but dead in practice. A parameter looks injectable, but a parser upstream strips the input before it reaches the database. Verification accounts for that. You test against the live behavior, not against a guess about the code. If the input never lands, there is no bug to report.

Repeatable, not a one time fluke

Evidence has to hold up when someone runs it again. A proven finding includes the steps to reproduce it, so the person who fixes it can watch the bug happen and then watch it stop. No reproduction means no proof.

How UnboundCompute holds back what it cannot prove

UnboundCompute is an autonomous security researcher. It learns how an app is meant to work, forms ideas about where that logic could break, designs experiments to test those ideas, and then tries to prove a finding with hard evidence. Understand, assume, experiment, verify, chain.

The verify step is a gate, not a formality. If an experiment does not produce evidence that a bug is real and exploitable, the idea stays an idea. It does not become an alert. We would rather report fewer things and have every one of them be true than flood a queue and let people sort it out.

This is an honest description of an early stage product. We are still building it. We are not claiming customers, benchmarks, or a finished tool. As an early and encouraging signal, a frontier model drove the full methodology on its own and identified and verified real access control and injection issues in test applications it had not seen before. We treat that as a reason to keep going, not as a number to wave around.

A proven finding becomes a check that keeps watching

Here is the part we like most. Once a bug is proven, the evidence is already a recipe. The request that broke Acme Notes, and the response that proved it, describe a test you can run again.

So a confirmed finding can become a repeatable check. After the fix ships, the same steps run again and should now fail to reproduce the bug. If a later change brings the bug back, that check catches it. The access control gap that let user A read user B notes turns into a standing test:

  • Authenticate as user A.
  • Request a note owned by user B.
  • Expect a denial, not a 200 OK with private data.

The proof you gathered once keeps paying off. A maybe cannot do that. You cannot build a regression test out of a guess, because you never knew what the real bug was. Proof gives you a fixed target, and a fixed target is something you can watch forever.

This is the kind of bug an autonomous researcher that tests assumptions is built to find, prove, and keep watching. If that approach is interesting to you, read more about who we are on our about page.