The Ark

The power was still on but the internet had been out for four hours. I needed to know how to replace a light switch — the kind of thing you’d normally just YouTube — and I was standing in my kitchen holding a voltage tester I wasn’t entirely sure I was using correctly.

That’s the moment The Ark was born. Not during some grand philosophical reckoning about digital fragility, but during a stupid ISP outage when I realized I couldn’t look up basic home repair without someone else’s server being up.

What fits on a 500GB drive

The Ark is a Bash CLI that answers a deceptively simple question: what’s worth having when the network goes dark? Not “dark” like a thriller — dark like a Tuesday afternoon when Comcast remembers it doesn’t actually care about you.

It downloads Kiwix ZIM files — compressed, searchable snapshots of entire websites — to a staging area, tracks their freshness, verifies checksums, and clones the whole loadout to physical drives via rclone. The tool that builds the archive fits on a thumb drive.

The “medium” loadout runs about 383 GB:

  • Wikipedia — English, Simple English, and the medical subset
  • Project Gutenberg — 70,000+ books, 206 GB of humanity’s bookshelf
  • Stack Exchange — because you’ll still need to debug things
  • iFixit — repair guides for the physical world
  • DevDocs — programming documentation offline
  • Khan Academy — education doesn’t stop because the router did
  • MedlinePlus — medical references, because WebMD made us all hypochondriacs and we deserve a better option
  • OpenStreetMap — maps that don’t need cell signal

All of it readable with Kiwix — a local static web server that needs nothing, runs anywhere, and doesn’t ask for your location data.

The stack

Pure Bash. xmlstarlet, yq, jq, curl, sha256sum, rclone, transmission-cli. No containers, no Python runtime, no phone-home telemetry. If civilization hiccups, you don’t want your archive tool to need pip install to work.

The commands do what they say: init, sync, status, freshen, verify, clone. Woodpecker CI on Gitea watches for pushes — on commit, it fetches the OPDS catalog, stages torrent files on the NAS, and qBittorrent pulls them down. The drive fills itself.

Zim the Hermit

Every good archive needs a librarian.

Zim lives on the drive — a logbook character, a personality, a reason to open the thing before you actually need it. He’s the voice in the README, the guide through what’s inside, the hermit who’s been cataloging this stuff while you were busy having internet access.

This sounds whimsical and it is. But it’s also functional. A cold pile of ZIM files is intimidating. A drive that knows what it is and can tell you about itself — that’s something you’ll actually use.

Why it matters

We treat connectivity like oxygen. It’s not. It’s infrastructure, and infrastructure fails. The Ark isn’t a doomsday prepper fantasy — it’s a USB drive and a Bash script and the acknowledgment that sometimes you just need to look up how to replace a light switch without asking permission from the cloud.

The internet will come back. It almost always does. But the drive is patient. Zim’s not going anywhere.