Speccast Relay
A small drafter sprints ahead; the sovereign model verifies in one pass.
Grounded inSpeculative decoding (draft
A fast sketch artist drafts a sentence while the expert only nods yes/no at each word — parallelizing what was serial.
Pagewright KV
Virtual-memory paging for the KV cache: no fragmentation, near-zero waste.
Grounded inPagedAttention
OS virtual memory for the attention cache: sequences own logical page tables, not contiguous RAM, so fragmentation drops to near zero.
Flowbatch Loom
Sequences join and leave the batch every step; the GPU never idles.
Grounded inContinuous (in
Rather than waiting for the whole table to finish before seating new guests, seat one diner the instant any seat opens.
Shardloom Mesh
Split each layer across GPUs, stage the layers in a pipeline — weave both.
Grounded inCombined tensor parallelism (intra
Tensor parallelism splits one wide highway across lanes within a node; pipeline parallelism is a relay-race baton pass between nodes.
Castfuse Kernel
IO-aware attention that never writes the full score matrix to HBM.
Grounded inFlashAttention
Compute attention scores tile by tile in fast scratchpad memory, accumulating on the fly, so the full score matrix never touches slow global memory.