Predictive Coding
Predictive Coding
The predictive coding framework inverts the traditional model of perception. Classical neuroscience: stimuli come in, the brain processes them bottom-up, perception emerges. Predictive coding: the brain generates top-down predictions about what it expects to perceive, then only processes the error — the difference between prediction and reality.
You don’t see a chair. You predict a chair, and when the prediction matches, you see nothing interesting. You only notice when the prediction fails. Consciousness is largely a prediction-error signal.
The Free Energy Principle
Karl Friston generalized predictive coding into the free energy principle: biological systems minimize surprise (or equivalently, maximize the accuracy of their internal models). Everything organisms do — perception, action, learning, homeostasis — is prediction-error minimization.
This is sweeping. Friston’s claim is that free energy minimization is the fundamental principle of biological self-organization. The brain, the immune system, the cell: all prediction machines trying to reduce the gap between their models and reality.
The Bayesian Brain
The closely related Bayesian brain hypothesis formalizes this: the brain maintains probabilistic models (priors) that it updates with new evidence (likelihood) to produce beliefs (posteriors). Perception is Bayesian inference. Learning is prior updating. Surprise is high prediction error.
Relevance to This Vault
LLMs are, architecturally, next-token predictors. They are literally predictive coding systems. The parallel isn’t metaphorical — it’s structural:
- The brain predicts sensory input and processes the error. The LLM predicts the next token and adjusts weights to reduce prediction error during training.
- Friston’s free energy minimization maps onto the training loss function: minimize cross-entropy, minimize surprise, minimize the gap between predicted and actual distributions.
- The Bayesian brain’s priors are the LLM’s trained weights. The likelihood is the current context. The posterior is the output distribution.
Pattern Matchers All the Way Down draws on predictive coding to argue there’s no non-predictive homunculus. It’s prediction all the way down in the brain; it’s pattern matching all the way down in the LLM. The question “but who’s really understanding?” dissolves — there’s no one home except the prediction machinery, and that’s true for both architectures.
Trust Calibration is a predictive-coding problem: how well do your predictions about the AI’s reliability match the AI’s actual reliability? Miscalibrated trust is a prediction error you’re not correcting.
See Also
- Pattern Matchers All the Way Down — no bottom turtle, just prediction
- The Recursive Mirror — the observer predicts itself
- Meaning Making Machines — prediction as meaning-making
- Trust Calibration — trust as prediction accuracy