The code review that nobody wrote
Picture this: You’re reviewing a pull request. The code is clean: it’s properly formatted and sensibly structured, and tests are passing. But you can’t explain why it’s shaped this way.
You check the commit history: there’s one author, one commit, and minimal messages. You check the ticket: there’s high-level acceptance criteria, but nothing about approach. So you ping the author to ask about the reasoning.
Their response: “AI generated most of it. I tweaked a few things, but honestly, I’m not sure why it went this direction. It works though.”
Even though it works, the tests pass, the build is green, and the logic appears sound, you still can’t answer why this particular approach was taken, what alternatives were considered, or what constraints shaped the design. And maybe you don’t need to know every decision that was made, but you do need to trust that it could be explained. Understanding of intent is missing: not just what the code does, but why it exists in this form.
And this is where things get uncomfortable. Just because we can generate something quickly doesn’t mean we should accept it as-is. Questions of reusability, scalability, and alignment with existing patterns don’t disappear, they just get easier to skip. The code works, but does it belong?
This scenario is showing up more often. If you’ve felt this unease, that things are moving faster but your ability to hold, explain, and stand behind that confidence isn’t keeping pace, you’re noticing something real. The gap between what was intended, what is understood, and what is running in production is widening in many teams and we haven’t yet adapted how we maintain alignment across that gap.
The problem isn’t building anymore; it’s knowing
Modern tooling, including AI, has changed the economics of software delivery. What used to take days and require deep domain knowledge now takes hours and can be bootstrapped with a well-crafted prompt.
The time and effort to produce working code has been compressed. But where is the thinking time? Where is the pause to ask, “Why this?” or even, “Should we do this at all?”
This creates a problem that’s easy to miss if you’re measuring success purely by delivery velocity. The challenge isn’t getting something to exist anymore. It’s staying oriented once it does.
The constraint shifts from “Can we build this?” to “Can we keep up with what we’ve built, understand it, and decide whether we should continue shaping it this way?”
In the past, the difficulty of creation forced understanding. You couldn’t ship something without building up mental models along the way. You learned the shape of the problem and the rationale behind the solution from the struggle. Understanding was a byproduct of effort.
AI can short-circuit that process, offering you a solution without requiring you to work through the problem. This is powerful, but it means that understanding is no longer guaranteed. It’s now something you have to deliberately create and maintain, rather than something you accumulate.
And this shift is happening under real delivery pressure: sprint commitments, demo deadlines, roadmap expectations, etc. Teams are making trade-offs, like accepting technical debt they know will show up later, because the organisational context demands speed.
Faster creation, unchanged responsibility
A concept that lived in a slide deck or Figma can now be a functioning demo in an afternoon. That’s extraordinary. But the harder jump, from pitch to scalable, maintainable, and operable software that teams can confidently evolve, is where integrity either exists or quietly erodes.
Responsibility hasn’t moved. It hasn’t been delegated.
But accountability has become blurred.
We’re now operating in a space where work can be delegated, generated, or accelerated, but accountability still lands with the people closest to the outcome. The question is no longer just “Who built this?” but “Who stands behind it?”
You’re still responsible for understanding what the system does, explaining why it’s structured the way it is, diagnosing failures when production goes down, evolving it safely as requirements change, and onboarding new team members who need to make sense of it.
AI doesn’t assume those responsibilities. AI amplifies the ability to create, but it doesn’t maintain context. We still need to manage the distance between what was intended, what is currently understood, and what is running in production. And as the pace of creation accelerates, that gap widens unless we actively work to close it.
Application integrity isn’t something you declare. It’s something you earn.
You might hear application integrity described as assurance that software works as intended. That’s the outcome. The harder question is how that assurance is actually earned.
It’s the confidence that outcomes match what was intended, that the team understands the system’s behaviour, and that structure reflects intent. It’s a property of the whole system: code, decisions, knowledge, and the relationships between them, not just the correctness of individual components.
It shows up in operational terms:
- Confidence to change the system without unexpected breakage
- Confidence to answer “Why does this work this way?”
- Confidence to recover when things go wrong because you understand how they’re connected
When speed outpaces situational awareness
This gap shows up in very real moments:
- During incident response: Production is down and you’re digging through the codebase trying to understand why a particular module behaves the way it does. The logic is there, but the reasoning isn’t, so you can’t reconstruct what assumptions were made or what constraints drove the design. The time pressure is intense, and without context, debugging becomes archaeology.
- During code review: You’re looking at a change that’s technically correct but conceptually unclear. The author is competent, the tests pass, but you can’t articulate what problem the code solves or why this approach was chosen. So, what do you do: approve it anyway, ask for context that might not even exist, or hold it while you reverse-engineer the intent?
- During handoff or onboarding: A new team member asks: “Why did we build this feature this way? What were we optimising for?” The answers either don’t exist or are scattered across Slack threads, commit messages, and conversations that are already fading.
These aren’t failures of skill, but symptoms of a mismatch between creation pace and the mechanisms used to maintain shared understanding. When you’re pushing to hit a sprint deadline, that mismatch becomes acute.
If we apply AI to systems that already lack shared understanding, we don’t fix the problem. We amplify it.
The core discomfort is about losing the confidence that you know what you know, that you can point to sources of truth, and that you can give an answer when asked “How does this work?” or “Why does this exist?”
When that grounding erodes, several operational consequences follow:
- Decision-making becomes harder: You can’t confidently say, “Yes, this is the right change” or “No, this will break something” because you don’t have the full picture.
- Risk tolerance shifts: Either you become overly cautious (because you don’t trust your understanding) or overly reckless (because you’ve normalised not understanding). Both are problematic.
- Team cohesion suffers: Shared understanding allows teams to move quickly. Without it, everything requires more meetings, clarification, and rework.
The question showing up in retrospectives and quiet moments: “Have I been taken off the work but remain on the hook for it?”
Sometimes the answer is yes. Not because AI is replacing judgement, but because generation speed can outpace comprehension speed, leaving us shepherding systems we don’t fully understand.
Collaboration is no longer just human and machine
We talk about “human-AI collaboration” as if it’s straightforward: you provide intent, AI provides implementation, everyone wins. The reality in practice is more nuanced.
AI introduces a mediation layer that changes how we relate to the software we build. Three specific shifts are showing up:
1. The code has an author, but not always an owner
In traditional development, authorship is clear. If I write a function, I know and can explain why it exists, what it’s meant to do, and what edge cases I considered.
With AI-generated code, authorship is distributed. The human provided the prompt and AI generated the implementation. The human then reviewed and merged it, but who owns the decisions embedded in the code?
This shows up in code review: “Who decided to use this algorithm?” “Who chose this error-handling approach?” If the AI made the choice and the human didn’t interrogate it, does anyone actually own the decision?
Ownership has become uncertain, which then affects maintenance, refactoring, and debugging, all the moments when you need to understand why something is the way it is.
2. Intent gets encoded implicitly
Traditional code carries intent through structure, naming conventions, comments, and accumulated patterns. When you read well-crafted code, you can infer reasoning even without explicit documentation.
AI-generated code can be syntactically clean but semantically orphaned. It solves the stated problem, but may not reflect deeper intent, organisational context, or constraints that should have shaped it. It’s code that works but doesn’t clearly belong.
This is subtle. The code isn’t wrong, it’s just disconnected. Design decision traceability breaks down. Six months later, when someone asks why the code was built this way, the answer may be: “Because AI suggested it and nobody questioned it.”
3. Trust shifts from “I built it” to “Claude built it”
The confidence you have in something you’ve built from scratch comes from direct experience, whereas the confidence in something you’ve directed someone else to build is achieved through verification.
When you delegate to AI, you’re relying on a different kind of trust. You didn’t trace the logic or write the tests that validate it. You prompted, reviewed, and approved, but you didn’t construct.
That’s workable. Except verification is harder when you don’t know what to verify. If you don’t fully understand the problem space, how do you assess whether the solution is appropriate? You can check for correctness and performance, but can you check for alignment with intent? For maintainability? For resilience under conditions you haven’t thought to test?
Generated test coverage presents its own version of this problem (testing your own homework). When AI writes both code and tests, you have circularity. The tests validate that the implementation matches the generated behaviour, but not necessarily that the behaviour matches actual requirements or handles real-world edge cases.
This is the collaboration dynamic. It’s not just human and machine. It’s intention ↔ outcome, with AI as the mediator. And mediation introduces distance that has to be actively managed.
Integrity depends on signals, not just output
The pace of AI-assisted development makes it easier to slip into the faulty assumption that if something works, it’s good. If tests pass, builds are clean, deployment succeeds, and everything looks green, confidence is established, right?
Not quite.
Traditional signals of confidence are getting noisier. They still matter, but they don’t tell the whole story when generation is fast and understanding is shallow.
- Passing tests: Many tests validate behaviour, but they don’t necessarily validate intent. A test suite can be comprehensive and green while the underlying implementation is fragile, overly complex, or misaligned with actual use cases. If the AI generated both code and tests, you’ve validated that the code does what it says it does, but not that it does what it should do.
- Clean builds: Syntactic correctness is baseline. The code compiles. The linter is happy. This tells you nothing about conceptual coherence, maintainability, or whether the design is sustainable as the system grows.
- Successful deployments: This confirms the artifact can run, not that it’s running the right thing in the right way.
These signals used to be reliable proxies for quality because they correlated with understanding. If you wrote the code, wrote the tests, and saw it deploy successfully, you had earned confidence. When those steps are mediated by AI, the correlation weakens.
Green is not the same as safe. Passing is not the same as understood. Coverage is not the same as confidence.
So, what signals give us confidence in the integrity of our application?
- Explainability: Can someone on the team explain why the code is structured the way it is? Can they walk through reasoning, trade-offs, and alternatives that were considered?
- Traceability: Can you trace from intention (the ticket, the requirement) through implementation to outcome? Is there a clear line of reasoning, or did things “just happen” in the middle?
- Shared understanding: Does the team agree on what the system is, how it behaves, and why it’s shaped that way? Or has knowledge fragmented into silos where different people have different mental models?
- Resilience under change: When requirements shift or new edge cases emerge, can the system adapt? Or does every change feel like invasive surgery because nobody’s confident about what might break?
These aren’t new concepts. They’ve always been part of sound engineering. But they’re harder to maintain when creation is fast and understanding is deferred.
This is where quality thinking naturally belongs: not as a testing function, not as a gate, but as a practice of continuously asking, “Do we know what we’ve built, and are we confident it does what we intended?”
One way to think about quality is as the removal of unnecessary friction: between intention and outcome, between what we think the system does and what it actually does, and between what we can explain and what we’re responsible for.
Shared responsibility, contextual balance
Maintaining application integrity is a shared responsibility across multiple layers, and the balance will look different depending on organisational context, maturity, and risk tolerance.
Practitioners: Maintaining grounding
As individual contributors, we’re on the front lines reviewing and prompting AI-generated code, integrating it, debugging it, and evolving it. We have agency here, even when the system incentivises speed.
What we can do:
- Ask “why?” more often: Even if the code works, ask why it’s shaped this way. If you can’t answer, flag it.
- Slow down at decision points: When you’re about to merge something you don’t understand, take a pause.
- Treat understanding as a deliverable: Documentation, comments, and design notes aren’t “nice to haves.” They’re part of the work.
We’re not helpless. But we also can’t shoulder this alone.
Teams: Creating space for understanding
Teams set cultural norms, so if the implicit message is “ship fast, understand later,” that’s what happens. If the message is “ship thoughtfully, understand continuously,” behaviour changes.
What teams can do:
- Normalise questions about intent: Make it safe to say, “I don’t understand why we’re doing this,” without it being seen as a blocker or weakness.
- Allocate time for context-building: Sprint planning, retrospectives, and design reviews are opportunities to build shared understanding.
- Reward clarity, not just velocity: Celebrate the PR that’s easy to understand, not just the one that closes the most tickets.
Leadership: Resourcing for integrity, not just velocity
Leaders control the constraints, deciding what gets measured, rewarded, and funded. These tooling choices, incentive structures, and metrics then shape behaviour, which shortcuts get taken, and risk appetite.
If leadership signals that velocity is the only metric that matters, integrity suffers. This shows up as incidents, rework, turnover, and burnout. The cost compounds.
What leadership can do:
- Recognise the cost of speed without understanding: It may show up later, but it does show up.
- Resource for sustainability: Give teams time to build understanding, refactor, document, and onboard.
- Model the behaviour: Ask questions about intent. Express uncertainty when you don’t understand something.
Tooling: Transparency in generation
AI tools themselves have a role. The more opaque the generation process, the harder it is to achieve application integrity.
What tooling can provide:
- Contextual explanations: Why this approach and what alternatives were considered?
- Traceability: Link generated code back to the prompts, requirements, and constraints that shaped it.
- Confidence indicators: Not just “here’s the code,” but “here’s the code, and here’s how confident you should be in various aspects of it.”
Some tools do pieces of this, but it’s where the ecosystem will move.
The balance of responsibility will look different depending on context. A startup moving fast in greenfield territory may accept more uncertainty than an enterprise maintaining critical infrastructure. A highly regulated industry may demand more rigour than a consumer app.
There’s no universal prescription, but there is a principle: application integrity is a collective property, not an individual burden.
Moving forward with confidence
If you’ve read this far and you’re feeling uneasy, that’s not necessarily a problem. Unease means you’re paying attention. You’re noticing the gap between what’s possible and what’s sustainable.
We’re not going to slow down. AI-assisted development is here, and it’s going to get more capable, not less. The expectation of velocity will increase, even as the relationship between code output and actual delivery value becomes less direct.
We’re not going to pretend the old model still fits. The mental models and practices that worked when creation was slow don’t map cleanly onto a world where creation is fast. We can’t just “do testing better” or “write more documentation” and expect that to be enough.
We need new practices for staying oriented at speed. This isn’t about replacing humans with AI. It’s not about rejecting automation. It’s about building practices, norms, tooling, and cultures that maintain alignment between intention, understanding, and production, even when generation is fast.
Application integrity becomes visible through the questions we ask, such as:
- Do we understand what we’ve built?
- Can we explain why it’s shaped this way?
- Are we confident it does what we intended?
These questions don’t have binary answers. They have degrees and the degree matters.
Achieving application integrity is an evolving practice, not a solved problem. It’s something we’ll continue to refine as tools improve, as understanding deepens, and as we learn what actually works at the intersection of speed and sustainability.
We don’t have to choose between velocity and understanding. The path forward is about building technical, cultural, and organisational systems that allow both to coexist. That’s always been the work. It just looks different now.
Leave a comment