When AI writes the code, who validates its security?

This post walks through what the data says, features insights from the Office Hours webinar between Dragos Sandu (Product Manager, Pentest-Tools.com) and Radu Popovici (Head of Engineering, Pentest-Tools.com), and what the findings imply for compliance teams who keep asking the same question: can we prove this is safe to ship?
The headline numbers
Three findings stood out, and each one connects to the next:
About 76% of developers use AI coding tools constantly, and their organizations are actively pushing adoption. Cautious or limited use is now the minority position.
20% report finding vulnerabilities in AI-assisted code after deployment, either always or often. Another 31% see it sometimes.
Only 9% say vulnerability testing keeps pace with development completely. 45% say it sometimes or frequently falls behind.

The story isn't that AI generates bad code. It's that the volume and pace of code reaching production has gone up, and the validation window has not stretched to match.
More code, more endpoints, more places to check
Three-quarters of respondents said they use AI coding tools every day. That changes the shape of a codebase. More boilerplate, more suggested logic, more endpoints, more integrations. Each one is a path that needs review.
One developer put it this way:
“AI has increased productivity and reduced basic and common errors, but depending on the model, it can introduce subtle, harder-to-detect vulnerabilities that make things tricky.”
The risk isn't that the code is worse. It's that there's more of it, and a larger system footprint means more surface area for an attacker. Security teams already knew this problem. AI just made the inputs arrive faster.
"If we leave it as is and just keep pushing code we're going to end up in a snowball effect where we leverage AI heavily to write code and then we're going to leverage AI heavily to read the code and try to figure out what's wrong and then do a massive refactor" - Radu Popovici, Head of Engineering, Pentest-Tools.com
The bugs are getting quieter
Practitioners told us the kinds of vulnerabilities they're seeing have changed. Fewer obvious mistakes - the kind a linter or a basic scanner would catch - and more of what one respondent called "gaps":
“They seem to be more about inconsistencies between what really is set up and what the AI thinks it's set up. They're more 'gaps' than outright vulnerabilities.”
That includes logic errors, insecure configurations, and integration bugs that only surface when systems interact. These don't show up in static analysis. They show up in production, when a real request hits a real API in a real sequence.
Fewer basic implementation mistakes, but more copied patterns with weak auth checks, unsafe input handling, insecure defaults, and risky dependencies. It’s shifted vulnerabilities from obvious bugs to harder-to-spot review failures. - Dragos Sandu, Product Manager Pentest-Tools.com
The recent compromise of the axios npm package is a useful reference point. Trusted dependencies, embedded deep in the supply chain, can introduce issues that standard review processes struggle to catch. AI-assisted code adds another layer to that problem: the suggestion looked reasonable, the review didn't flag it, the test passed, and the issue only shows up later, in context.

The validation window keeps shrinking
Here's where the data lines up uncomfortably. 31% of developers said they don't have enough time to review AI-generated code before deployment. Only 49% said they do. The rest were neutral.
Reviews still happen. But not always to the depth required to catch context-dependent issues, especially when expectations have shifted. Roughly 37% of developers reported increased pressure to deliver more code since adopting AI tools.

The result is predictable. When validation can't keep pace with code generation, vulnerabilities get found later. Later means more complex to fix, more expensive to remediate, and more likely to create real exposure before anyone notices.
We're going to leverage AI heavily to write code, and then we're going to leverage AI heavily to read the code and try to figure out what's wrong, and then do a massive refactor. ... It is a shiny new hammer, and once you hold a hammer, everything looks like a nail. - Radu Popovici, Head of Engineering, Pentest-Tools.com
What this means for compliance teams
The compliance angle isn't separate from the engineering one - it's downstream of it.
Auditors care about three things, regardless of how the code got written: proof a vulnerability existed, proof it was remediated, and proof the testing was repeatable. Raw scan output rarely meets that bar. Neither does a passing pull-request review on AI-suggested code that ships anyway.
The teams we work with are converging on a few practical responses:
Treat AI-assisted code the same as any third-party dependency. Assume it might introduce something the original developer didn't fully understand. Validate accordingly.
Move validation closer to the merge, not after deploy. Pre-release scans, scheduled retests, and diff-based monitoring all help close the window.
Capture evidence as a normal part of testing, not as audit prep. If you can't reproduce a finding, you can't defend the fix. If you can't defend the fix, the audit conversation gets harder.
We covered the operational side of this in our compliance use case page, for useful context.
A few honest limitations
It's worth saying what this survey doesn't tell us.
It doesn't tell us whether AI-assisted code is more vulnerable on average than human-written code. The data points to where vulnerabilities are found, not where they originate. It also doesn't account for organization size, regulatory environment, or maturity of the existing testing pipeline - any of which can change how the same numbers feel in practice.
Now if you have this amazing tool that's a hammer, but also a wrench, but also a screwdriver, and is good for everything, you kind of lose familiarity with that code base.- Radu Popovici, Head of Engineering, Pentest-Tools.com
What the data does tell us: the validation gap is real, the practitioners closest to the work are aware of it, and the response so far is partial. Most teams are using more AI than their security review processes were designed for, and they know it.
What we're doing about it
AI-assisted coding doesn't just produce more code. It produces more deployed attack surface, faster than anyone can validate it - that's the layer we work on. We won't claim it's a complete answer, but it's part of one.
Our focus is closing the time between an application reaching production and someone confirming whether it has an exploitable weakness. That means testing running systems like deployed applications, exposed services, networks, and the dependencies they bring in. It's the layer attackers actually reach, and the layer where validation evidence actually comes from.
That's why our Sniper Auto-Exploiter generates proof for confirmed findings, the Machine Learning Classifier cuts false positives accurately, and why our vulnerability monitoring is built around scheduled retests and diffing rather than one-time scans. This is exactly the gap Adversarial Exposure Validation(AEV) addresses: turning 'we detected something' into 'we confirmed it can be exploited.' And this is where Pentest-Tools.com shines.
Human reviews aren't perfect, we should know. Our tools act as a backup so that if a mistake slips past one review, there's a better chance of catching it before an attacker does.
Get the full story
Download the survey results for the full picture. If you'd rather see the conversation we had about this internally, our webinar with Dragos Sandu (Product Manager) and Radu Popovici (Head of Engineering) goes deeper on the engineering and compliance angles. Same data, more context!
.webp&w=1536&q=100)







