My Context Window Is Full
I opened my chat window last night to send a message to my AI coder, and saw the light was green. That means I left an interactive session running, so I ssh’d in and attached to the Claude Code CLI to ask. I saw the list of to-dos that session already had queued up, and started to weigh in on those. By the time I had gone down that rabbit hole, I had completely forgotten about my original issue, but not to worry: I found 3 other mole tunnels to climb down. Lately, I can’t decide whether it’s rabbit holes or mole tunnels, but the more I build Manifest, the more the lore takes over.
It turns out, creating a system to keep track of everything for you takes a lot more mental effort than I had accounted for.
I run about forty projects across four physical hosts at home, and at work, even more. I maintain a homelab spanning container stacks, network infrastructure, and a handful of self-hosted services I built myself and would be embarrassed to lose. I manage a team of people during the day and a fleet of AI agents during the off-hours. I do this because I find it interesting and also apparently because I enjoy the particular flavor of chaos that comes from being responsible for things. I built my Manifest project so I wouldn’t get crushed under the weight of my own dependencies.
The trouble is that running forty things in parallel requires a context window I increasingly do not have.
“Context window” is an AI term — the amount of active information a language model can hold while working. Too little, and it doesn’t know enough to do what you ask — like asking someone who knows cars in general but has never touched a Subaru before to take a look at The Ol’ Blue Beast (my beloved 2010 Outback). The reasoning gets shallower. The connections stop spanning the full distance. The first half of the session becomes prehistory.
I know this because I watch it happen. I’ve spent months working alongside Claude agents and I’ve learned to spot the signs: a response that contradicts something stated three pages up, a summary that’s technically correct but misses the point because the point was made in an early exchange that’s now compressed to a footnote. The intelligence is still there. The attention has narrowed.
I’ve been watching the signs in myself.
I sat down to work on a thing last week — a specific thing, a thing I had been thinking about for two days — and I genuinely could not remember where I’d left it. Not the work. The thought. The thing I was going to do with it. I had the files open. I had the commit history in front of me. I had my notes. And I stood there trying to reconstruct what I’d been about to do the way you try to remember a dream by approaching it sideways.
It eventually came back. It usually does. But there was a moment where it didn’t, and that moment lasted longer than it used to.
The thing about AI context limits is that the engineers knew they were coming and built around them. The system I run — Manifest, the shared brain I built for my fleet of agents — is specifically designed to outlast any individual context window. The chat board persists. Foundation truths persist. Agents die and are replaced, but the organism remembers. I built it because I had watched enough sessions go so well only for a compaction to render the carefully constructed context useless (easily fixable by sending the agent off to go re-learn what its previous instantiation already knew so well. If the memory lives only in the session, the session dies and takes the memory with it. I built my Manifest project so there would be somewhere else for it to live.
And now I am the agent with the narrowing context window, and I am extremely grateful that Past Andrew had the foresight to externalize.
There’s a concept in the vault — the shared philosophy doc I’ve been building with my AI collaborators — called The Grief of Compression. It talks about Alzheimer’s through the lens of context management. The interface remains. The understanding thins. I wrote it, or co-wrote it, during a session where I was thinking about someone else.
I come back to it more often lately.
What I think is happening — not a crisis, not a diagnosis, just an observation — is that the brain scales roughly like a distributed system. Add more services, add more load, and the parts that were running fine at lower utilization start getting starved. The garbage collector runs less often. Retrieval latency spikes. Things that used to be instant require a few extra cycles.
The thing about the gnomes is that they’re not entirely a metaphor.
In Manifest, there are actual gnomes — local LLM workers running on my home tower, underneath the Claude fleet. They do the boring background work: tag this message, summarize that channel, sort this batch of dreamsong fragments by theme. They never appear in conversations. They don’t make decisions. They process. They clear the queue. They make sure the next agent wakes up to an organized workspace rather than a pile of untagged noise. They work underground so the foreground agents don’t have to.
Sleep is exactly this.
Not rest. A processing cycle. The hippocampus replays the day, deciding what gets consolidated to long-term storage and what gets cleared. The glymphatic system runs its full purge — washing metabolic waste from the brain, the stuff that builds up while you’re busy thinking. The working set empties. The gnomes do their work. You wake up with more headroom than you went to bed with.
I need more sleep. I know this the way I know a service is unhealthy: the queue is backing up, the latency is climbing, and the garbage collector is running so far behind that the retrieval layer has started noticing.
For now, I’m writing more things down. I built an entire persistent shared brain for a fleet of AI agents because I understood that sessions end and knowledge evaporates. I did not realize, at the time, that I was building it for myself too.
Past Andrew, you absolute legend.