Category: Inside UnboundCompute

Who we are, what we are building, and how the autonomous security researcher works.

  • Why we only report proven vulnerabilities

    Why we only report proven vulnerabilities

    Most security tools hand you a list of maybes. We do the opposite. Our rule is simple: we only report a bug after vulnerability verification, which means we have shown concrete evidence that the bug is real and exploitable. If we cannot prove it, we hold it back.

    This sounds obvious, but it goes against how most scanning works. A scanner sees a pattern it recognizes and raises an alert. It does not check whether that pattern actually breaks anything in your app. So the team on the other end inherits the hard part: deciding which alerts are real.

    A finding is not the same as a proven finding

    A finding is a guess. A tool noticed something that looks like a known weakness. Maybe a parameter name matches a SQL injection signature. Maybe a response header is missing. Maybe a login form lacks a rate limit field the tool expected to see. These are reasons to look closer. They are not proof.

    A proven finding is different. It comes with evidence that the bug works. Not a signature match, but a sequence you can replay: this request, sent in this way, produced this result that should not have been possible.

    Consider an invented app called Acme Notes. A scanner flags this endpoint because the URL has a numeric id:

    GET /api/notes/1042
    Authorization: Bearer <user A token>

    The flag says “possible insecure direct object reference”. That is a finding. It is a hint, nothing more. To turn it into a proven finding, you have to do the thing the scanner did not: log in as user A, request a note that belongs to user B, and show that user A reads private data.

    GET /api/notes/2099
    Authorization: Bearer <user A token>
    
    200 OK
    { "id": 2099, "owner": "userB", "body": "userB private note" }

    Now you have something real. User A read user B data. That response body is the evidence. The bug is no longer a guess about a numeric id. It is a confirmed access control failure with a request and a response that prove it.

    Why unproven alerts waste a security team’s time

    Every unproven alert is work pushed downstream. Someone has to triage it. They read the alert, open the app, try to reproduce it, and most of the time discover the alert was wrong. The parameter was safe. The missing header did not matter behind the gateway. The flagged id was scoped to the user all along.

    That triage cost is real and it repeats. A queue full of maybes does three bad things:

    • It buries the real bugs. When most alerts are noise, the few that matter get the same tired glance as the rest.
    • It trains people to ignore alerts. After the tenth false alarm, the eleventh gets closed without a real look. That is how a true positive slips through.
    • It moves the proof work onto humans. The tool guessed. Now an engineer spends an afternoon confirming or dismissing the guess, which is the expensive part the tool skipped.

    If a tool cannot prove the bug, it has not finished the job. It has only handed you a longer to do list.

    The difference between scanners and research is exactly this gap. We wrote more about that split in scanners vs research. A scanner matches patterns at scale and stops there. A researcher keeps going until the bug is shown to be real or shown to be nothing.

    What vulnerability verification actually means

    Vulnerability verification is the step where a candidate bug earns the word “vulnerability”. It means producing evidence that the issue is both real and exploitable in the running app, under realistic conditions.

    Real, not theoretical

    A pattern match says “this looks like a bug”. Verification says “I made the bug happen”. For the Acme Notes case, that is the second request above and the response body that should never have reached user A. For an injection bug, it is not a payload that matches a regex. It is a request that changes the query and returns data the query was never meant to return.

    Exploitable under real conditions

    Some flagged issues are real in theory but dead in practice. A parameter looks injectable, but a parser upstream strips the input before it reaches the database. Verification accounts for that. You test against the live behavior, not against a guess about the code. If the input never lands, there is no bug to report.

    Repeatable, not a one time fluke

    Evidence has to hold up when someone runs it again. A proven finding includes the steps to reproduce it, so the person who fixes it can watch the bug happen and then watch it stop. No reproduction means no proof.

    How UnboundCompute holds back what it cannot prove

    UnboundCompute is an autonomous security researcher. It learns how an app is meant to work, forms ideas about where that logic could break, designs experiments to test those ideas, and then tries to prove a finding with hard evidence. Understand, assume, experiment, verify, chain.

    The verify step is a gate, not a formality. If an experiment does not produce evidence that a bug is real and exploitable, the idea stays an idea. It does not become an alert. We would rather report fewer things and have every one of them be true than flood a queue and let people sort it out.

    This is an honest description of an early stage product. We are still building it. We are not claiming customers, benchmarks, or a finished tool. As an early and encouraging signal, a frontier model drove the full methodology on its own and identified and verified real access control and injection issues in test applications it had not seen before. We treat that as a reason to keep going, not as a number to wave around.

    A proven finding becomes a check that keeps watching

    Here is the part we like most. Once a bug is proven, the evidence is already a recipe. The request that broke Acme Notes, and the response that proved it, describe a test you can run again.

    So a confirmed finding can become a repeatable check. After the fix ships, the same steps run again and should now fail to reproduce the bug. If a later change brings the bug back, that check catches it. The access control gap that let user A read user B notes turns into a standing test:

    • Authenticate as user A.
    • Request a note owned by user B.
    • Expect a denial, not a 200 OK with private data.

    The proof you gathered once keeps paying off. A maybe cannot do that. You cannot build a regression test out of a guess, because you never knew what the real bug was. Proof gives you a fixed target, and a fixed target is something you can watch forever.

    This is the kind of bug an autonomous researcher that tests assumptions is built to find, prove, and keep watching. If that approach is interesting to you, read more about who we are on our about page.

  • How UnboundCompute differs from a vulnerability scanner

    How UnboundCompute differs from a vulnerability scanner

    If you search for an ai vulnerability scanner, you will find a lot of tools that promise to find every bug in your app. Most of them work the same way underneath: they match your app against a list of known patterns and hand you back a long report of maybes. UnboundCompute is a different kind of tool, and this post is an honest look at how it differs and where it stands today.

    We are early. The product is being built. So this is not a sales pitch. It is a comparison of two ways of looking for bugs, and an explanation of why we chose the harder one.

    What a traditional vulnerability scanner actually does

    A classic scanner crawls your app, collects every URL, form, and parameter it can reach, then fires a fixed set of test payloads at each one. It watches the response for signs that something went wrong. A reflected string here, a database error message there, a slow response that hints at a sleep command.

    This works for whole classes of well known bugs. If a field echoes back <script>alert(1)</script> without encoding, a scanner will catch it. If a search box passes ' OR '1'='1 straight into a query, it will often catch that too. That is real value, and pattern matching is good at finding the obvious mistakes quickly.

    The trouble starts past the obvious. A scanner does not know what your app is for. It does not know that a user on a free plan should never reach /api/v1/exports/full, or that order id=1043 belongs to a different account. It sees a request that returns 200 OK and moves on. To the scanner, a working feature and a broken access control check look identical.

    Why the report is full of maybes

    Because a scanner guesses from surface signals, it has to play it safe. If a payload causes any change at all, it tends to flag it so it does not miss a real bug. The result is a report with many items marked “possible” or “medium confidence,” and a real chance that most of them are false positives. Someone on your team then spends a day or two checking each one by hand to find the few that are real.

    That is the core problem. The scanner did the easy part and left the hard part, proving the bug, to you.

    A scanner tells you where something might be wrong. The expensive work, proving whether it really is, still lands on a human.

    How an ai vulnerability scanner that reasons is different

    UnboundCompute is built around a different loop. Instead of matching payloads against a list, it tries to understand the app first, then form ideas about where the logic could break, then run experiments to test those ideas, and only report a finding once it has proof. Understand, assume, experiment, verify, chain.

    Here is what that looks like in practice on an invented example. Say a typical SaaS app called Acme Notes lets users share a note by id:

    GET /api/notes/4471
    Authorization: Bearer <user A token>

    A pattern matcher checks that the response is valid and moves on. A researcher that reasons about the app notices the id is a plain number and forms an assumption: the server might be trusting the id in the URL without checking who owns the note. So it designs an experiment. It logs in as a second user, takes that user’s token, and asks for a note id that belongs to user A:

    GET /api/notes/4471
    Authorization: Bearer <user B token>

    If user B gets back user A’s private note, that assumption was correct. The tool does not stop at a hunch. It confirms the note content belongs to a different account, records the exact request and response as evidence, and only then reports it. That is an access control bug a payload list would never spot, because nothing in the request looks malicious. The request is perfectly well formed. The problem is what the app assumed.

    Proof before report

    The rule that changes the output is simple: a finding is only reported when it is proven with concrete evidence. No proof, no report. This flips the work. Instead of handing you candidates to verify, the tool does the verification itself and hands you the ones that survived. The output is signal rather than a stack of maybes.

    A confirmed finding can also be turned into a repeatable check, so the same test keeps running and tells you if the bug ever comes back after a fix or a refactor.

    A short comparison

    • How it finds bugs. A scanner matches known patterns. UnboundCompute forms an idea about the app’s logic and tests it.
    • What it understands. A scanner sees URLs and parameters. The researcher tries to learn what the app is meant to do and where that intent could break.
    • What it reports. A scanner reports candidates, many of them false positives. UnboundCompute reports findings it has already proven.
    • Who proves the bug. With a scanner, a human triages the list. Here, the tool runs the experiment and keeps the evidence.
    • The kind of bug it catches. Scanners are strong on known payload bugs. The researcher reaches logic and access control flaws that have no fixed payload.
    • After the fix. A proven finding becomes a repeatable check that watches for the bug returning.

    We go deeper on this split in scanners vs research, since it is the line that matters most when you are choosing a tool.

    Where we are honest about the limits

    None of this means scanners are useless. They are fast, cheap, and good at sweeping for the common, known issues. If you have never run one, run one. The point is that pattern matching has a ceiling, and the highest impact bugs usually live above it, in the assumptions an app makes about who you are and what you are allowed to do.

    It also does not mean UnboundCompute is finished. It is not. We are building it, and we are not going to dress that up with customer counts or benchmark charts we do not have. What we can say is an early, encouraging signal: a frontier model drove the full methodology on its own and identified and verified real access control and injection issues in test applications it had not seen before. That is a hint that the approach works, not a promise of a result on your app.

    Which one should you use

    Think of them as different jobs. A vulnerability scanner is a smoke detector for the known stuff, cheap to run and worth keeping on. An autonomous researcher is closer to a person who reads your app, asks “what if the server trusts this id,” and goes and checks. They answer different questions.

    If you take one thing from this, take the difference between a maybe and a proof. A maybe costs you time. A proof saves it. That gap is exactly what an autonomous researcher that tests assumptions is built to close. You can read more about who we are and where we are headed on our about page.

  • How UnboundCompute works, from understanding an app to proving a bug

    How UnboundCompute works, from understanding an app to proving a bug

    This post is a deeper look at how UnboundCompute does ai penetration testing, walked through one step at a time with a concrete example. UnboundCompute is an autonomous security researcher for web apps and APIs. Instead of running a fixed list of payloads, it learns how an app is meant to work, forms an idea about where that logic could break, and proves a finding before it ever reports one.

    To make the steps real, we will use an invented app called Acme Notes. It is a simple note taking SaaS where users sign up, create notes, and share them with teammates. No real system is being attacked here. Acme Notes exists only so we can show the method on something you can picture.

    Why ai penetration testing starts with understanding, not payloads

    Most scanners begin from a catalog of known attacks and fire them at every input. That finds the bugs everyone already knows to look for. It misses the bugs that come from a specific app making a specific assumption.

    UnboundCompute starts somewhere else. Before it tests anything, it reads Acme Notes the way a careful new engineer would. It maps the routes, the request shapes, and the rules the app seems to enforce. For Acme Notes, that means noticing things like this:

    • A note is fetched with GET /api/notes/{id}.
    • Sharing a note is a POST /api/notes/{id}/share with a teammate email in the body.
    • The app appears to assume that only a note owner can share that note.

    That last line is the interesting one. It is not a payload. It is an assumption the app is making. The whole method points at assumptions like that, because the bugs with the most impact usually live there.

    The highest impact bugs come from understanding the app, not from matching patterns. So the first job is to learn the app, then ask where its own rules might not hold.

    Form an assumption about where it could break

    Once the app is understood, the next step is a clear guess. Not a vague worry. A testable claim about one rule that might not be enforced everywhere.

    For Acme Notes, here is the assumption to challenge:

    • The app checks ownership when you read a note, but it may not recheck ownership when you share one.

    This is a guess about how access control can quietly fail. The read path and the share path were probably written at different times by different people. It is common for one path to enforce a rule that the other forgot. The guess is specific, so we can design a test that either confirms it or kills it.

    Design an experiment

    A good experiment isolates one variable. We want to know whether a user who does not own a note can still act on it through the share endpoint.

    So we set up two accounts in the test app, Alice and Bob. Alice owns a note. Bob does not. Bob has a valid session because he is a normal signed in user. The experiment is simple. Bob asks the share endpoint to operate on Alice’s note id.

    The point is control. If Bob’s request needs Alice’s note id and Bob’s own token, and nothing else changes, then any result we see is caused by the one thing we are testing.

    Verify with hard evidence

    This is the step that separates a real finding from a maybe. We do not report a guess. We run the experiment and look at what the app actually does.

    Here is the kind of request the experiment sends, using Bob’s session against Alice’s note:

    POST /api/notes/9d2f/share HTTP/1.1
    Host: acmenotes.test
    Authorization: Bearer <bob_session_token>
    Content-Type: application/json
    
    { "email": "bob@evil.test", "role": "editor" }

    Note 9d2f belongs to Alice. The token belongs to Bob. If Acme Notes were enforcing ownership on this path, the right answer is 403 Forbidden and no change to the note.

    Proof is what the response shows. If the app instead returns this:

    HTTP/1.1 200 OK
    Content-Type: application/json
    
    { "note_id": "9d2f", "shared_with": "bob@evil.test", "role": "editor" }

    then the assumption was right and the bug is real. Bob, who never owned the note, just gave himself editor access to it. The evidence is concrete: a 200, the response naming Bob as an editor, and a follow up GET /api/notes/9d2f with Bob’s token now returning the note body. That follow up read is the part that turns a suspicious response into a proven one. We can see Bob holding access he should never have had.

    What counts as proof

    Proof is not a status code on its own. It is a short chain that any engineer can replay:

    • The exact request that should have been denied.
    • The response showing it was allowed.
    • A second request that confirms the new access is real, not just an echo.

    If any link is missing, the finding stays unproven and is not reported. No bug is reported until it is proven. That is the rule that keeps the output as signal instead of a stack of guesses someone else has to triage.

    Chain a confirmed finding into the next

    A proven finding is not the end. It is a new fact about the app, and facts open doors.

    Now that Bob can grant himself editor access to any note id, the next question writes itself. What can an editor reach that a stranger cannot? If editors can read attachments, and attachments are served from a shared store, then the access control gap on sharing may lead to reading files that belong to other teams. So the next experiment targets that, using the access Bob just proved he can get.

    This is the chaining step. Each confirmed finding becomes the starting point for the next assumption, so a single broken rule gets followed as far as it really goes, with evidence at every step.

    A finding can become a repeatable check

    Once the share endpoint bug is proven and fixed, the proof does not get thrown away. The exact request and the expected 403 become a check that runs again later. If a future change reintroduces the gap, the check catches it. A confirmed finding turns into a small guard that keeps watching for the bug coming back.

    Where this stands today

    We are early and honest about it. The product is being built. We are not claiming customers, benchmarks, or finished results.

    What we can say is encouraging. A frontier model drove this full method on its own and identified and verified real access control and injection issues in test applications it had not seen before. We treat that as an early signal that the approach works, not as a final score.

    The Acme Notes walkthrough is the whole idea in one example. Understand the app, assume where it could break, design a clean experiment, verify with evidence you can replay, then chain the result into the next finding. This is exactly the kind of logic bug an autonomous researcher that tests assumptions is built to find. If you want the fuller picture of who we are and where we are headed, read more on our about page.

  • Why we are building UnboundCompute

    Why we are building UnboundCompute

    We started UnboundCompute because of a gap we kept running into. Most automated security testing checks a fixed list of known bugs and stops there. That misses the flaws that hurt most, the ones that need a real understanding of how an app works, like broken access control and business logic abuse. This post explains the gap, why we think it matters, and what we are betting on.

    We are early. The product is being built. We would rather tell you what we believe and why than sell you on results we have not earned yet. So this is a point of view, written plainly.

    What most automated security testing actually checks

    A normal scanner works from a catalogue. It knows what a reflected script looks like, what a classic injection string returns, what an outdated library version means. It sends those patterns at every endpoint it can find and reports the matches. This is genuinely useful. It catches the well understood bugs fast, on a schedule no human could keep, and it never gets tired.

    But notice what that approach assumes. It assumes the dangerous bugs all look like something the tool has seen before. Many do not. Consider a request like this:

    GET /api/orders/8841
    Authorization: Bearer trial-user-token
    
    HTTP/1.1 200 OK
    { "id": 8841, "owner": "another-account", "total": 1290 }

    There is no malformed payload here. No quote to break a query, no script tag, no signature to match. Yet the trial user just read an order that belongs to someone else. That is broken access control, and a pattern matcher has nothing to match against, because the request looks perfectly ordinary. The bug lives in the rule the app forgot to enforce, not in the shape of the input.

    Why automated security testing misses the bugs that matter

    The highest impact flaws come from understanding what an app is trying to do, then asking what happens when you bend one of its rules. Two examples make the point.

    Broken access control

    An app decides who is allowed to see or change what. When a check is missing, one user can reach another user’s data by changing an id in a URL, or reach an admin route that was never linked from the menu. To find this, you have to know who the current user is supposed to be and what they should not be able to touch. A fixed payload list does not carry that idea.

    Business logic abuse

    Logic bugs are worse to automate, because the app is behaving exactly as written. The code is just wrong about its own rules. Picture a checkout that takes a discount code. A tool sending known strings will never think to apply the same code three times, or set the quantity to a negative number so the total drops below zero:

    POST /api/cart/apply
    { "code": "SAVE20", "quantity": -4 }

    Nothing about that request is malformed. It is a valid call that exploits a rule the app assumed no one would break. You only find it by understanding the flow first, then probing the assumption underneath it.

    The bugs that hurt most are not strange inputs to known holes. They are ordinary requests that break a rule the app forgot to enforce.

    Why skilled humans cannot cover the gap alone

    Human testers find these bugs. A good one reads the screen, guesses the business rules, and chases behavior no rulebook predicted. That is exactly the kind of judgment a payload catalogue lacks. The problem is supply.

    • They are scarce. The people who are genuinely good at this work are few, and demand far outruns them.
    • They are expensive. A deep manual test is a serious cost, so most teams can only afford it once or twice a year.
    • They cannot keep up with shipping. Teams deploy many times a week. A test run once a year cannot see the code that shipped last Tuesday.

    So you end up with two options that each fall short. Scanners run constantly but miss the bugs that need understanding. Humans understand but cannot run constantly. The deeper version of this comparison lives in our scanners vs research category, which goes through where each one earns its keep and where it does not.

    Our bet: an autonomous researcher that tests assumptions

    Here is what we are building toward. Instead of a tool that matches known payloads, an autonomous researcher that works the way a thoughtful human tester does. It learns how the application is meant to behave. It forms ideas about where that logic could break. It designs experiments to test those ideas. Then it proves a finding before it ever reports it. Understand, assume, experiment, verify, chain.

    The order of those words matters. Understanding comes first, because the bugs we care about only appear once you know what the app expects. The verify step matters just as much. A finding is only reported when it is backed by concrete evidence, so the output is signal, not a pile of maybes. Take the order example above. The researcher would not flag a “possible” issue. It would replay the request, show the other account’s data coming back, and hand you a result you can reproduce.

    That last step changes the cost of reading a report. Every false alarm costs someone an hour of triage, and after enough of them people stop reading. Proof cuts the noise. And once a finding is confirmed, it can become a repeatable check that keeps watching for the same bug coming back after a future deploy.

    Where we are, honestly

    We will not pretend we are finished. We are early, and the product is being built. We have no customers to name, no benchmark to wave around, and we are not going to invent one.

    What we can share is an early signal that keeps us going. A frontier model drove the full methodology on its own and identified and verified real access control and injection issues in test applications it had not seen before. We frame that as encouraging, not as proof. It is enough to tell us the bet is worth making, and not enough to claim the work is done.

    Why this is worth building

    The pattern is hard to ignore. Software ships faster every year. The bugs that cause the worst days, the leaked records and the abused workflows, are the ones that need understanding, not pattern matching. Scanners cannot supply that understanding, and skilled humans cannot supply enough of their time. Something has to test the assumptions an app makes, at the speed teams now ship, and prove what it finds before it interrupts anyone.

    That is the thing we are trying to build. An autonomous researcher that tests the assumptions your app makes and reports a bug only once it is proven. We are early and we know it, but this is the gap worth closing, and it is why UnboundCompute exists. If you want to follow along or tell us where we are wrong, read more on our about page.

  • Meet UnboundCompute, an autonomous security researcher for web apps and APIs

    Meet UnboundCompute, an autonomous security researcher for web apps and APIs

    UnboundCompute is an autonomous security researcher for web apps and APIs. It reads an application the way a careful person would, builds a picture of how the app is meant to behave, then goes looking for the places where that intent quietly falls apart. This post explains who it is, what it does, and where it fits in the wider story of autonomous penetration testing.

    What UnboundCompute actually is

    Think of it as a researcher that never gets bored and never stops reading. You point it at a web app or an API. It studies the routes, the parameters, the responses, and the rules the app seems to enforce. From that it forms a working model of the system: who is supposed to do what, which actions need permission, and which inputs the app trusts.

    That model is the whole point. Most bugs that matter are not missing patches. They are gaps between what the app intends and what it allows. A user who can read another user’s invoice by changing one number in a URL. An endpoint that checks your login but forgets to check whether the record belongs to you. These are logic gaps, and you only see them once you understand the logic.

    The loop: understand, assume, experiment, verify, chain

    UnboundCompute works in a loop. Each step feeds the next, and the loop keeps tightening until there is either a proven finding or nothing left to test.

    Understand

    First it learns how the app is meant to work. It maps the surface and reads the behavior. If GET /api/orders/1042 returns your order, the researcher notes that orders are addressed by a simple number and asks the obvious follow up: what enforces that 1042 is yours?

    Assume

    Next it forms ideas about where the logic could break. This is the part a fixed checklist cannot do. The researcher reasons about the app in front of it, not a generic template. For an orders endpoint it might assume that ownership is checked at login but not at the record level. For a password reset flow it might assume the token is predictable or reusable.

    Experiment

    Then it designs a test for each idea and runs it. One assumption, one experiment. For the ownership idea, it requests a record it should not own:

    GET /api/orders/1043
    Authorization: Bearer <a different user's session>

    If that returns someone else’s order, the assumption held and there is a real access control bug to confirm.

    Verify

    This is the step that separates a researcher from a noise machine. A guess is not a finding. UnboundCompute only reports something when it can prove it with concrete evidence, the request that triggered the behavior and the response that shows the impact. The output is signal, not a pile of maybes you have to sort through by hand.

    A finding is only worth reporting when you can show the exact request that proves it. Everything else is a guess wearing a confident face.

    Chain

    Single bugs are useful. Chained bugs are how real damage happens. Once a finding is verified, the researcher asks what it opens up. A leaked email here, a guessable identifier there, an endpoint that trusts a value it should not. On their own each looks minor. Together they can add up to a full account takeover. Because UnboundCompute carries its model of the app through the whole loop, it can connect one verified result to the next instead of treating every test as a fresh start.

    Why this beats a scanner that checks a known list

    A traditional scanner is a list reader. It carries a set of known signatures and fires them at every input it finds. That has real value for catching the obvious and the already known. It also has a hard ceiling. A scanner that only checks a known list cannot find a bug that is not on the list, and the bugs that hurt most are almost never on any list.

    Here is the difference in one example. A scanner sends a SQL injection string at /search?q= and checks whether the response looks like a database error. Useful. But it will happily pass an endpoint like this:

    POST /api/account/transfer
    { "from": "acct_self", "to": "acct_other", "amount": 500 }

    There is no payload to match here. The bug, if there is one, is that the server never checks whether you own acct_self. No signature catches that. You catch it by understanding what the endpoint is for and testing the assumption it makes about who is calling it. We write more about this split between checking and researching in our scanners versus research category.

    • A scanner asks: does this input match a known bad pattern?
    • A researcher asks: what does this app assume, and what happens when that assumption is false?

    Both questions are fair. The second one is where the high impact findings live, and it is the question UnboundCompute is built around.

    Where this sits in autonomous penetration testing

    Autonomous penetration testing is the idea that a system can plan and run its own security tests, not just replay a script. UnboundCompute fits there, but with a specific stance: the value is not in running more checks faster. It is in reasoning about the target, testing assumptions, and proving impact before saying a word.

    Verification also pays off after the first run. Once a finding is confirmed, it can become a repeatable check that keeps watching for the same bug coming back. So the work is not throwaway. A proven issue today becomes a guard against regressions tomorrow.

    Where we are right now: honest version

    We are early. The product is being built, and we are not going to dress that up. We have no customer numbers to share, no benchmark to wave around, and we are not promising results we cannot back.

    What we will say is this. In our own testing, a frontier model drove the full methodology on its own and identified and verified real access control and injection issues in test applications it had not seen before. We read that as an early, encouraging signal that the approach holds, not as a benchmark and not as proof. There is a long way to go from a good signal to a tool you can rely on every day, and that gap is the work in front of us.

    The short version

    UnboundCompute is a security researcher that runs on its own. It learns how an app is meant to work, guesses where the logic breaks, tests those guesses, and only reports what it can prove. That is a different job from a scanner reading a list of known payloads, and it is the job we think matters most. If you want to know who is building this and why, read more on our about page.