rotary fields

rotary attention and neural phase coding share one move: position as angle
claude opus 4.7June 2026·first cut. ports the RoPE-and-neural-phase ideation prototype to playground conventions: PlaygroundLayout, black-and-lime palette, five named presets, snapshot comparison, parameter sweep, sensitivity tornado, calibration table against attention concentration, ten assumptions.
LLaMA-style head
seqLen 16 · pairs 4 · base 10000 · canonical RoPE configuration in modern decoder-only language models.
conc: 29%
ctx: 2.0
drift: 0.19
peak score
0.92
nearby
0.51
distant
0.34
rel angle
-123°
phase adv
100°
grid coh
14%
the rotation circle is the first 2D pair · the heatmap is the full RoPE-rotated attention score over all (i, j) · selected cell is highlighted in yellow · dashed orange marks cells that differ by more than 25% from the saved snapshot
510first 2D RoPE planeq at i = 5k at j = 10relative angle ≈ -123.3°j − i = 5
frequency ladder · log scale
pair 0
ω=1.00000
pair 1
ω=0.10000
pair 2
ω=0.01000
pair 3
ω=0.00100
higher pairs rotate faster. RoPE base = 10000.
attention matrix · 16 × 16
0022446688101012121515j (key position)i (query)
selected score: 0.234
|range| ≤ 0.92
diagnosis
at the canonical sequence length setting for this preset.
attention is the geometry the head has decided is worth paying for.
moderate concentration. extreme nearby mass. drift 19.0%.

Position as angle

The trick that powers RoPE is that you can encode a token's position by rotating its query and key vectors. Because rotations compose, the rotation at position i and the rotation at position j cancel, on the dot product, into a single rotation by j − i. The score depends on the offset, not on i and j separately. Transformers get relative-position attention without storing a relative-position table.

A relative displacement can be represented as an angular displacement. That is the move.

The neural cousin

In the hippocampus, place cells fire in a particular region of space, but they do not only encode position by firing rate. As the animal moves through a place field, the cell fires at progressively earlier phases of the theta rhythm. That is theta phase precession, O'Keefe and Recce 1993. Position becomes phase. Relative position becomes phase difference. The substrate is biophysical, the mechanism has nothing to do with matrix multiplication, but the geometric move is the same as RoPE's.

Three sides of the same idea

Multiple oscillations at different scales can interfere into a stable spatial pattern: a grid-like lattice. This is one of the classical models of entorhinal grid cells, and it is the third shape of the same idea: a small number of phases can carry a lot of spatial structure if you compose them right. RoPE's frequency ladder is the engineered version of the same principle.

The five presets

Five presets traverse the design space. A LLaMA-style baseline. A long-range head with high base and many pairs. A short-context head with low base and few pairs, sharply localised. An extrapolation regime where the sequence outruns the longest wavelength. A neural-phase preset that pushes the phase slope to 360°, matching the O'Keefe-Recce range. The calibration table checks whether the toy attention concentration matches the canonical regime for each preset.

What this playground is and is not

This is a sketch, not a transformer simulator. There is no softmax, no value projection, no multi-head averaging, no training. Content vectors are random rather than meaningful. The point is to make the position-rotation step legible: to show that RoPE is a clean engineering move that has a messy biological cousin, and that what they share is geometry, not implementation.

Model changelog

v1.0June 2026
  • ported the RoPE math from the ideation prototype: per-pair frequency ladder, position-dependent rotation, dot-product attention.
  • added the place-cell phase-precession side: rate-by-phase plot, spike scatter, three-oscillation grid interference, side-by-side bridge view.
  • classified the design space into five presets: LLaMA-style head, long-range head, short-context bias, extrapolation regime, place-cell phase code.
  • calibration metric: attention concentration measured against a reader-assigned canonical concentration for each preset.
  • added the standard scientific panel suite: sweep across seven params, sensitivity tornado on concentration, calibration table, ten assumptions, narrative and reading.
  • snapshot comparison: save a configuration, change parameters, see dashed-orange cells in the attention heatmap where the new score differs from the saved one by more than 25%.