The Category Error of AI
The Category Error of AI
When someone says “you can’t trust AI,” they’re making a statement about a category so broad it’s nearly meaningless. The word “AI” now spans:
- Large language models with different training approaches (Claude, GPT, Gemini, Llama)
- Image generators (DALL-E, Midjourney, Stable Diffusion)
- Recommendation algorithms (Netflix, YouTube, Amazon)
- Self-driving systems (Tesla, Waymo)
- Game-playing agents (AlphaGo, OpenAI Five)
- Voice assistants (Siri, Alexa)
- Spam filters, fraud detection, medical imaging analysis
These systems have almost nothing in common except the marketing term applied to them. Asking “is AI trustworthy?” is like asking “is transportation safe?” — the question is confused.
Why the Conflation Happens
Several forces push toward treating “AI” as monolithic:
Media coverage: Headlines about “AI” don’t distinguish between systems. A failure in one area becomes evidence about “AI” generally.
Marketing: Companies brand everything as “AI” for buzz. The term has commercial value that incentivizes overuse.
Technical opacity: Users can’t inspect systems, so they lump together what they can’t distinguish.
Genuine uncertainty: Even experts disagree about where to draw category boundaries.
Conversational convenience: “AI” is shorter than “large language models trained with constitutional AI methods by companies with strong safety cultures.”
What Gets Obscured
The conflation hides crucial differences:
Training methodology: Constitutional AI vs RLHF produces different behaviors. A sycophantic chatbot and an honest one are both “AI.”
Capability domains: Image generators and language models have different failure modes. Conflating them obscures both.
Provider values: Companies with different safety cultures produce different AI. The institution matters.
Deployment context: The same model used for medical advice vs. creative writing has different risk profiles.
Verification status: Open-weight models can be (partially) audited; closed models can’t. Both are “AI.”
The Practical Problem
When a user asks “can I trust this AI output?” the answer depends on:
- Which AI system specifically?
- What was it trained for?
- By whom, with what values?
- For what task am I using it?
- What are the consequences of error?
Treating “AI” as a category answers none of these. It’s like asking “should I eat this food?” without specifying what food.
The Explanation Burden
This creates challenges in conversation. When explaining AI-assisted work to skeptics:
“But how can you trust what AI tells you?”
The accurate answer is: “That depends on which AI, trained how, by whom, for what purpose, in what domain, with what verification.” But this answer is unsatisfying and requires education to understand.
The tempting shortcut — “Claude is different” — invokes Brand as Proxy for Trust without resolving the underlying confusion.
Implications
- Users need finer-grained categories than “AI”
- Providers benefit from differentiation but also from category confusion
- Public discourse about AI is hampered by conceptual conflation
- Trust calibration requires specificity the current vocabulary doesn’t support
Toward Better Categories
What might help:
- Distinguishing by training methodology (Constitutional AI, RLHF, supervised fine-tuning)
- Distinguishing by domain (language, vision, robotics, recommendation)
- Distinguishing by transparency (open-weight, closed, audited)
- Distinguishing by provider (company values, track record, accountability)
None of these has become standard vocabulary yet.
Open Questions
- Can public understanding catch up to technical differentiation?
- Who benefits from maintaining category confusion?
- What vocabulary would support better trust calibration?
- Is “AI” recoverable as a term, or should it be abandoned?
See Also
- Trust Calibration — how to calibrate trust given this confusion
- Constitutional AI vs RLHF — one important distinction within “AI”
- The Pleasing-but-Wrong Incentive — a failure mode of some AI but not all
- Brand as Proxy for Trust — the practical shortcut when categories fail