We have successfully replaced the KV-cache bottleneck inherent in Softmax Attention with Causal Monoid State Compression. By defining the causal history as a monoid recurrence, , the entire prefix is lossily compressed into a fixed-size state matrix per head.
The technical core of this architecture relies on the associativity of the monoid operator:
Training: parallel prefix scan using Triton-accelerated JIT kernels to compute all prefix states simultaneously. Inference: True sequential updates. Memory and time complexity per token are decoupled from sequence length. Explicit Causality: We discard RoPE and attention masks. Causality is a first-class citizen, explicitly modeled through learned, content-dependent decay gates.
Current zero-shot benchmarks demonstrate that Spartacus-1B-Instruct (1.3B) is already outperforming established sub-quadratic models like Mamba-1.4B and RWKV-6-1.6B on ARC-Challenge (0.3063). Recent integration of structured Chain-of-Thought (CoT) data has further pushed reasoning accuracy to 75%.
The "Spartacus" era is about scaling intelligence, not the memory wall βΎοΈ.
KittenTTS Nano is a lightweight, CPU-only text-to-speech model designed to prove that natural, expressive voices donβt require massive cloud stacks or GPUs. At roughly ~15M parameters, it runs fast on modest hardware, supports multiple expressive voices, and exposes simple controls for pacing and tone. This makes it ideal for edge devices, demos, and anyone who wants full control over TTS without latency, lock-in, or infrastructure overhead.