An ongoing research project

Polylithic.

Pipeline-parallel inference for pretrained dense LLMs with empirically-placed cuts, plus a reducibility map of every MLP projection inside.

We slice frozen pretrained dense LLMs (Qwen2.5-3B and 14B in our experiments) at functional layer boundaries — found by spotting CKA cliffs in the residual stream — and place the resulting tiles across multiple GPUs. Pipeline parallelism with measured cut placement, lossless on N=200 prompts.

Inside each tile, we measure which MLP down_proj matrices compress to a k-NN lookup table without breaking generation. Boundary layers do (k=1024 holds 100% same_top1, KL 0.002). Middle layers don't — that's the cocoon, the irreducible computational kernel; the reducible boundary is silk. Coordinating the assembled tiles uses small classifier models (~19K params each, what we call notes) — a post-hoc, structural mixture-of-experts gate that runs on frozen weights with no end-to-end training.

Three cats curled together on a slate-grey couch — a cream tabby, a large orange tabby, and a tortoiseshell. Their fur tones provided the site's palette.
Artemis & Pippin (and a sleepy guest) — the site palette is theirs
The notebook is being moved. The full site — papers, interactive React-Flow diagrams of the dissections, day-to-day findings, and the reducibility heatmap — is being deployed and will live here shortly. While that's in flight: