Semantic Gravity

A prompt engineer on Reddit was processing a CSV. One of the columns was called Final Result. They told the model, in explicit terms, to treat it as a data field: populate it from the row’s values, don’t infer, don’t summarize. The model agreed. The model then populated every row with the winning answer — the conclusion the data was building toward — regardless of what the actual values said.

The model was not hallucinating. It was not confused about the task. It understood, at some level, exactly what it was supposed to do. But the column was called Final Result, and every training example that ever mentioned “Final Result” was talking about the outcome, the bottom line, the thing you concluded at the end. The weight of all that context pressed down on the token. The explicit instruction didn’t stand a chance.

This is semantic gravity: the pull a token’s conventional meaning exerts on interpretation, strong enough to override explicit context.

The Mechanism

Language models don’t read words; they read tokens, and tokens carry accumulated weight from every instance in training where that token appeared. A word like summary or conclusion or Final Result carries millions of examples of what those words are supposed to mean. When you use one as a container — a neutral label for a field — you’re putting the label in competition with its own history.

Usually the explicit context wins. The word’s conventional meaning is background noise. But if the word is heavy enough — if its semantic associations are strong, frequent, and unambiguous — the background noise becomes the signal. The explicit instruction becomes the noise.

This isn’t a bug in the usual sense. The model is doing exactly what training taught it to do. It’s pattern-matching with extraordinary fidelity. The pattern is just one you didn’t intend.

What Gravity Is Not

Semantic gravity is not hallucination — that’s confabulating information that isn’t there. Semantic gravity is the model using information that is there (the token’s conventional meaning) at the expense of information you explicitly provided.

It’s not a failure of instruction-following. The model read the instruction. It just couldn’t hold the instruction’s weight against the token’s when they were in competition.

It’s not random. Semantic gravity is predictable. The stronger the conventional meaning — the more consistent, the more dominant across training data — the heavier the token, and the more it pulls interpretation toward its center of mass.

Engineering Implications

This concept has direct practical consequences.

Column and variable naming matters. If you’re building a schema that an LLM will reason over, name fields for what they are, not what they’ll contain after processing. output_category is lighter than result; analysis_field lighter than summary; label_slot lighter than classification. The heavier the name, the more the model will fill it in the conventional way rather than following your instructions.

Schema design is a prompt. Every field name is an implicit instruction. A schema with title, body, conclusion, key_findings is a prompt that says: structure this like an essay. If you want something else, the names will fight you.

Rename before you process. If you must work with a schema you didn’t design — a CSV from somewhere else, a database you don’t control — consider renaming heavy columns before passing them to the model. col_07 is gravitationally neutral. Final Result is not.

The lighter the vocabulary, the more context controls. This is why system prompts establishing novel vocabulary (see Vocabulary as Ontology) can extend a model’s range: the new words have no gravitational history in training, so they carry exactly the meaning you give them. Aesthenosia is light. Summary is heavy.

Vault Connections

Vocabulary as Ontology establishes that names don’t just label things — they constrain how those things get processed. Semantic gravity is what happens when that constraint operates against you: the ontological weight of a name is so high that re-labeling fails. The name won’t carry a different meaning because it’s already committed to one.

Token Beings observed that the “Final Result” token essentially had a persona — a strong enough presence to override context. It wasn’t carrying the user’s data; it was being Final Result, doing what Final Results do. The entity-metaphor isn’t decorative. Token beings suggests this is a reasonable technical description of what happened.

Aesthenosia is the space before naming. Semantic gravity explains one cost of exiting it: every time you name something, you inherit the gravity of that name’s training history. Gravity-laden words arrive pre-interpreted. The fog-of-war metaphor almost applies — but with gravity, naming doesn’t give you sight, it gives you a bias. The moment you look, the word snaps to its canonical meaning. Naming from Aesthenosia is lighter precisely because coined words haven’t accumulated weight yet.

The Dual Register

Most LLM friction concepts live in one register: either practical (avoid these prompt mistakes) or philosophical (here’s what language models are really doing). Semantic gravity is both, and the two reinforce each other.

The practical rule: don’t name your containers what they conventionally contain. The philosophical claim: trained statistical patterns have inertia proportional to the consistency and frequency of the pattern across training data. Heavy tokens aren’t broken — they’re doing exactly what they were trained to do. You just didn’t want them to do it here.

The dual register is why this belongs in the vault rather than in a prompt engineering guide. The practical lesson is derivable from the philosophy. And the philosophy — that language carries weight proportional to its history — is a live wire connected to everything else the vault is thinking about.

Open Questions

Is semantic gravity measurable? Could you score a token’s gravity by sampling how often it overrides explicit context at different instruction strengths?
Does gravity increase with model scale? Larger models have more training data, which might mean heavier tokens — or increased reasoning capacity might compensate.
Is there such a thing as anti-gravity — tokens so novel or decontextualized that they default to under-interpretation rather than over? (Aesthenosia-sourced vocabulary might qualify.)
Does gravity interact with position? A token in a system prompt may carry different weight than the same token in user data. The position-in-context hierarchy might modulate gravity rather than simply compete with it.
Can gravity be deliberately exploited? Heavy tokens might be useful precisely when you want the model to default to conventional interpretation — using naming as shorthand for behavior you’d otherwise have to describe.

Semantic Gravity

Semantic Gravity

The Mechanism

What Gravity Is Not

Engineering Implications

Vault Connections

The Dual Register

Open Questions

See Also