Make the AIs argue with themselves

By Magnus Hultberg • 18 February 2026

Last edited: 18 February 2026

Last week I attended a Claude Code enthusiast meetup, brilliantly organised as always by Max - his write up is well worth a read if you missed it.

One idea from the Anthropic speaker stuck with me immediately. The "agree to disagree" approach: pit two agents against each other with opposing goals - one writes code, another evaluates it, and the cycle can't close until the reviewer is satisfied. We do this as humans, critiquing papers, reviewing pull requests...

It maps directly onto how great human teams work. We've all been too close to our own work at times, and appreciated someone who tears it apart constructively. That creative tension is where quality lives. So I went home and tried it.

I built two skills in Claude Code:

/review-pr-team and
/review-pr

The team version I use in the early part of the process, when I'm still in planning mode. I commit my feature plan as Markdown to the repo, raise a pull request, and the skill spawns three sub-agents to review it: a security-obsessed BOFH, a product manager, and a technical architect. Each reviews the plan from their own perspective, it's all synthesised to one comment and posted on the PR.

Once I've addressed their critique and move into implementation, I run an implementer session alongside a separate one where I run the /review-pr skill, which spawns a single "senior full stack engineer" to review every pull request. The implementer and reviewer communicate entirely through PR comments, no intervention from me.

The result? It's genuinely fascinating to watch. The back-and-forth surfaces issues I certainly wouldn't have caught on my own (hey, I'm the ideas man...), keeps the implementation honest, and creates a sense of collaborative momentum that I haven't experienced before when using Claude Code.

Is it token-hungry? Yup. Does it take longer? Sure. But for anything beyond a quick throwaway feature, the uplift in quality and coherence feels valuable. Less "fast food code", more considered craft.

Teams are greater than the sum of their parts. Turns out that holds for agents too.