Diagnosing and Fixing Encoder Collapse in Tiny JEPA Code Transition Models

Eren Akbulut

preprint / code world models / tiny-jepa · 1.1 M params

Diagnosing and Fixing Encoder Collapse in Tiny JEPA Code Transition Models

Eren Akbulut · 2026

A ${\sim}700$-step training ceiling in tiny JEPA code models is traced to target-encoder collapse under fast EMA tracking. A cheap copy-baseline indicator flags it early, holding the target near-static fixes it, and the resulting 1.1 M parameter model ships as a 0.6 MB Rust binary or a 2.6 MB in-browser WASM bundle.

0x01 Video companion

A seven-minute Manim walk-through of the architecture, the collapse diagnosis, the fix, and the WASM shipping story. Embedded at 480p15 for quick viewing; download the HD build for presentations.

scene 01–09 · 7 min · manim community ⇣ 1080p60 · 11 MB

0x02 Abstract

Tiny JEPA-style code transition models trained on chains of Git edits exhibit a mysterious ${\sim}700$-step training ceiling. We identify the failure mode — target encoder collapse under fast EMA tracking — and diagnose it with a cheap copy-baseline indicator, $\cos\!\bigl(f_{\bar\theta}(s_t),\,f_{\bar\theta}(s_{t+1})\bigr)$, that climbs to ${\sim}0.999$ well before standard loss curves degrade.

The fix is to hold the target encoder approximately static; we verify this with two equivalent settings — near-frozen EMA decay $0.99999$ (default, three seeds) and fully frozen decay $1.0$ (a random-initialised target that never updates, two seeds) — that agree on validation transition cosine to within $\pm 0.002$, so target staticity is the lever, not a specific EMA decay value. With the fix in place, a $1.1$ M-parameter $128$-dimensional model trains stably for $15{,}000$ steps to a mean-of-last-five validation transition cosine of $0.9898 \pm 0.0006$ and performs compositional three-step prediction with per-step delta cosines ${\approx}\,0.98$, more than $0.9$ above every delta-space null we evaluate; the same property transfers to edit chains from nine held-out Python repositories.

A secondary retrieval experiment shows the model is competitive at $112\times$ fewer parameters on in-distribution CommitPackFT edit retrieval and on a $3{,}998$-pair twenty-repo leave-one-repo-out sweep, while losing cleanly to modern dense encoders out-of-distribution. A zero-dependency Rust inference engine with four-level weight quantisation compresses the deployed model to $0.6$ MB and a $2.6$ MB in-browser WebAssembly bundle.

0x03 Key contributions

01

A cheap diagnostic for target-encoder collapse.

A single cosine between consecutive target embeddings spots the failure well before any standard loss curve moves.
02

Staticity is the lever, not EMA decay value.

Near-frozen ($0.99999$) and fully frozen ($1.0$) targets agree to within $\pm 0.002$ on validation transition cosine across five seeds.
03

Stable 3-step compositional prediction at 1.1 M params.

Per-step delta cosines ${\approx}\,0.98$, holding on nine held-out Python repositories.
04

0.6 MB Rust / 2.6 MB WASM deployment.

Four-level weight quantisation, zero-dependency inference, in-browser runtime.

0x04 Code & reproducibility

Three companion repos cover inference, training, and architecture search. All training runs for the paper are logged live to a public Weights & Biases workspace.

W&B Training runs — live workspace eren23 / crucible-code-wm · three predictor seeds · 3K-run contrastive sweeps ↗

Paper source (LaTeX) and the Manim video source are in the codewm-paper-public repo alongside this page.

0x05 Cite

@misc{eren2026encodercollapse,
  title  = {Diagnosing and Fixing Encoder Collapse
            in Tiny JEPA Code Transition Models},
  author = {Akbulut, Eren},
  year   = {2026},
  note   = {Preprint.
            \url{https://eren23.github.io/codewm-paper-public/}},
}

0x06 Try it

The full CodeWM pipeline running in your browser via WebAssembly. Pick a preset or edit the code — the AST tokenizer and 1.1 M-parameter encoder run live, showing how the model sees structural similarity.

SRC python

before

after

TOK ast tokens · 662 vocab

node depth ident operator special

before

after

EMB 128-d latent

COS similarity

scroll here to load the wasm model for live inference

Diagnosing and Fixing Encoder Collapse in Tiny JEPA Code Transition Models

0x01 Video companion

0x02 Abstract

0x03 Key contributions

A cheap diagnostic for target-encoder collapse.

Staticity is the lever, not EMA decay value.

Stable 3-step compositional prediction at 1.1 M params.

0.6 MB Rust / 2.6 MB WASM deployment.

0x04 Code & reproducibility

Synapse

Crucible

Crucible Community Tap

0x05 Cite

0x06 Try it