The AI that refuses to think for you

May 20269 min read

Most AI tools optimize for getting you to the answer fast. That’s great for productivity and terrible for learning. Notes on a framework built to change the default.

In 2024, Dell’Acqua and colleagues at Wharton ran a controlled trial that should have made more noise than it did. Knowledge workers used AI assistants for a week of normal work. Then the AI was taken away, and they were asked to do the same kind of tasks unassisted.

They were 17% worse than before they started using it.

Not 17% slower without the tool, which would be unsurprising. Worse at the underlying task. Their independent capability had quietly degraded over the course of a week. The researchers called it deskilling — the same phenomenon documented in aviation, in radiology, in driving, every time a powerful automation shows up next to a human practitioner.

This isn’t a bug in any particular AI product. It’s the default behavior of the entire category. You ask, it answers, your skills atrophy. The faster and more helpful the assistant gets, the steeper the slope down.

For the last six months I’ve been working on an open-source framework called the Forge Protocol that tries to change this default. Not by telling you to use AI less — that’s aspirational and doesn’t work — but by making the AI itself behave differently. This post is an attempt to explain why, and how, and what surprised me while building it.

The research it’s built on

I should be honest about something up front: the Forge Protocol is not where the ideas came from. It’s a translation of a body of work that already existed.

For the last ten years, Federico Cabitza’s lab at the University of Milano–Bicocca — where I did my PhD — has been running careful empirical studies on what actually happens when humans and AI work together on judgment-heavy tasks. Mostly in medicine, because that’s where the consequences are sharpest, but the findings generalize. The pattern that emerges is uncomfortable for anyone selling productivity AI:

In a 2023 study published in Artificial Intelligence in Medicine, the lab found that radiologists who saw an AI suggestion before committing to their own reading were anchored by it — their judgment collapsed onto the AI’s answer even when the AI was wrong. Radiologists who committed first, then saw the AI, kept their independent reasoning intact. The lab calls these patterns Rams (AI first, human collapses) and Hounds (human first, AI augments). The Rams pattern is what almost every AI assistant on the market implements by default.

In 2024, at the xAI conference, the lab won Best Paper with a finding even more uncomfortable: articulate, well-written AI explanations increase uncritical acceptance rather than decrease it. The better the explanation looks, the less you check it. They call this the white-box paradox, and it directly contradicts the intuition behind most XAI work.

My own contribution to this thread was a 2024 paper in AI in Medicine called “Never tell me the odds”, which showed that asking clinicians to commit to a reading before they see any AI explanation — what we called pro-hoc rather than post-hoc explanations — prevented the anchoring effect that post-hoc explanations otherwise cause. The commitment itself was the active ingredient. Not the explanation.

A more recent paper from the lab (Natali et al., 2025) gave the phenomenon a more precise vocabulary: there are four distinct types of deskilling — cognitive, semiotic, social, and moral — plus a team-level pathology they called epistemic sclerosis, where a group’s ability to disagree erodes because the AI provides an external authority everyone defers to.

Reading all of this back-to-back, I had the same feeling I had with the emotion-vectors paper: this is a real finding, it’s already on the workbench, and almost nobody is building products on top of it. So I tried.

Four modes, one principle

The Forge Protocol takes the lab’s findings and turns them into something concrete: a set of four interaction modes that an AI assistant can switch between. Each mode implements a specific empirically-validated pattern. The names are a stretched metaphor about blacksmithing — sorry — but the ideas underneath are the lab’s.

Forge

Judicial AI · thinking partner

Asks questions. Never gives answers. Presents the case for and against your angle in parallel. Refuses to converge to a single recommendation.

Anvil

Hounds protocol · editor / critic

You submit your full draft first. Then it rates on six dimensions, quotes the three weakest passages, and asks where you disagree with the critique. Never rewrites.

Crucible

Frictional AI · idea stress-tester

Requires at least three ideas before it engages — the mode refuses fewer. Steelmans each, attacks them, maps blind spots. An epistemic-sclerosis guard pushes back if your ideas read as generic.

Executor

Rams protocol · normal AI

Full automation, no friction. For formatting, translation, boilerplate. The only mode where the AI-first pattern is acceptable — with a warning if it detects a thinking task arriving here.

The single principle running through all four is commitment before consultation. In Anvil mode you commit to a full draft first. In Crucible you commit to three ideas first. In Forge you don’t get answers, you get questions that force you to commit to an analysis. Executor is the carve-out for tasks where there is nothing to commit to — mechanical work where judgment isn’t at stake.

That sequencing — you act, the AI responds — is the thing that the 2023 study showed actually preserves independent judgment. Most AI products do the opposite. You ask, the AI acts, and now you’re reacting to its output. The order of operations matters more than the content of the response.

The part I’m most proud of

The mode definitions alone aren’t enough. Anyone can write “ask questions instead of answering” in a system prompt, and any modern LLM will follow that for about three turns before drifting back to its native, helpful-assistant default. The gravitational pull of “just help the user” is enormous.

Two things make the protocol stickier than a system prompt.

The first is an adversarial auditor. The original design had a flaw I only saw after testing: it was asking the same LLM that generated a response to judge whether the response followed the mode rules. LLMs rubber-stamp themselves. Every compliance check came back “yes I am following the rules,” even when the model had clearly drifted. So the auditor moved to a separate Claude Sonnet instance, on its own configuration, with no shared context with the primary model. It judges compliance, quotes violations verbatim, and returns structured JSON the orchestrator treats as ground truth. You can route the auditor through a different provider entirely if you want. The cost is one extra API call per turn. The win is that the framework actually catches drift.

The second is a canary. This is the piece I think matters most, and it’s the piece I haven’t seen anywhere else.

Every anti-AI-dependency tool I’ve seen is aspirational — it tells you to use AI less. Forge Protocol tries to measure whether your independent capability is actually improving, flat, or drifting down. It works the way the Wharton study worked, just continuously: a fixed set of unassisted prompts with stable IDs, covering writing, analysis, debugging, strategy, communication. Once a week you answer one of them in five minutes with no AI. The auditor scores it on clarity, depth, and independence. The scores accumulate in a local JSON file. After a month you can look at the slope.

If the slope is flat or negative, the framework is failing you and you find out instead of getting comforting feedback from a sycophantic model. If the slope is positive, the friction is doing what it’s supposed to.

I cannot emphasize how rare this is in the productivity-tool space. Almost every tool measures usage — sessions per week, words generated, tasks completed. Almost no tool measures whether you got better. The canary is an honest mirror by design, and building it required deciding that the framework should be allowed to tell you it’s not working.

What I learned building it

Three things, all of which I would not have written down at the start.

First, the order of operations dominates the content of the response. I went in thinking the magic would be in the prompts — the precise wording of the Forge questions, the specific dimensions Anvil rates on, the dialectic structure of Crucible. The prompts matter, but they’re maybe 20% of the effect. What matters far more is the input validation: did the user commit first, or are they trying to skip ahead to the answer? The single biggest behavioral change in the protocol comes from Anvil refusing to engage until you’ve submitted a complete draft. That refusal, as a designed-in friction, does more than any clever prompt I could write.

Second, LLMs are catastrophic at judging themselves and excellent at judging each other. The auditor pattern is going to be everywhere in two years. Every application that uses LLMs to enforce rules will eventually discover the rubber-stamp problem, and the answer is always going to be an independent model with a narrow scoring prompt and no shared context. Build for it now.

Third, and most uncomfortable: the people who need this framework are not the ones who will install it. Forge Protocol asks you to do harder work in exchange for getting better over time. The user who is already worried about deskilling will install it eagerly; the user who has stopped noticing the slope is unlikely to opt into the friction voluntarily. I don’t have a clean answer for this except to say that making the framework exist matters anyway. Institutional adoption — schools, universities, professional training programs — is the realistic path. Individual adoption is bonus.

If you want to try it

The repo is at github.com/lorenzofamiglini/The-Forge-Protocol-Agent. It runs as a plugin on top of Hermes Agent from Nous Research — an open-source AI agent framework that does the heavy orchestration. You can use it with Claude (via Vertex AI or Anthropic’s API), GPT, Gemini, or 100+ models through OpenRouter.

The lib/ directory is a standalone Python library with one dependency (PyYAML). If you’d rather not run the full agent stack, you can grab the mode YAML files and the SOUL prompts and drop them into any LLM. You lose the auditor and the canary, but you keep the protocol.

The full mapping from the lab’s papers to the framework’s components is in RESEARCH.md. Every design decision in the framework traces back to a specific empirical finding. Where I extrapolated beyond the research, I tried to mark it.

And if you build something on top of it — new modes, new auditors, integrations with Cursor or VS Code or whatever you use — I’d genuinely like to see it. The whole point of releasing this open source instead of writing yet another paper was to find out what people do with these protocols when they get to play with them in their own work. The lab’s findings were mostly in clinical decision support. The question of what happens when you apply them to a programmer’s editor, or a lawyer’s drafting tool, or a student’s study session, is wide open.

Stop outsourcing your thinking. Not because AI is bad — it isn’t — but because the part of you that does the thinking is a muscle, and muscles you don’t use get smaller.