Zixi "Oz" Li's picture

Building on HF

Zixi "Oz" Li PRO

OzTianlu

·

https://github.com/lizixi-0x2F

lizixi-0x2F

AI & ML interests

My research focuses on deep reasoning with small language models, Transformer architecture innovation, and knowledge distillation for efficient alignment and transfer.

Recent Activity

updated a model 16 minutes ago

NoesisLab/Spartacus-1B-Instruct

reacted to their post with 🤗 31 minutes ago

🛡️ Meet Spartacus-1B: Shattering the Memory Wall with True O(1) Inference! 🚀 https://huggingface.co/NoesisLab/Spartacus-1B-Instruct https://huggingface.co/spaces/NoesisLab/ChatSpartacus At NoesisLab, we've entirely ripped out Softmax Attention and replaced it with Causal Monoid State Compression. Say hello to Spartacus-1B-Instruct (1.3B) 🗡️. Instead of maintaining a massive, ever-growing list of past tokens, Spartacus compresses its entire causal history into a fixed-size state matrix per head. The result? ⚡ True O(1) Inference: Memory footprint and generation time per token remain absolutely constant, whether you are on token 10 or token 100,000. 🧠 Explicit Causality: We threw away RoPE and attention masks. The model learns when to forget using dynamic, content-aware vector decay. 🔥 Blazing Fast Training: Full hardware utilization via our custom Triton-accelerated JIT parallel prefix scan. 📊 Zero-Shot Benchmarks that Hit Hard: O(1) architectures usually sacrifice zero-shot accuracy. Not Spartacus. It is punching way above its weight class, beating established sub-quadratic models (like Mamba-1.4B and RWKV-6-1.6B): 🏆 ARC-Challenge: 0.3063 (vs Mamba 0.284) 🏆 ARC-Easy: 0.5518 🏆 PIQA: 0.6915

reacted to their post with 🔥 37 minutes ago

🛡️ Meet Spartacus-1B: Shattering the Memory Wall with True O(1) Inference! 🚀 https://huggingface.co/NoesisLab/Spartacus-1B-Instruct https://huggingface.co/spaces/NoesisLab/ChatSpartacus At NoesisLab, we've entirely ripped out Softmax Attention and replaced it with Causal Monoid State Compression. Say hello to Spartacus-1B-Instruct (1.3B) 🗡️. Instead of maintaining a massive, ever-growing list of past tokens, Spartacus compresses its entire causal history into a fixed-size state matrix per head. The result? ⚡ True O(1) Inference: Memory footprint and generation time per token remain absolutely constant, whether you are on token 10 or token 100,000. 🧠 Explicit Causality: We threw away RoPE and attention masks. The model learns when to forget using dynamic, content-aware vector decay. 🔥 Blazing Fast Training: Full hardware utilization via our custom Triton-accelerated JIT parallel prefix scan. 📊 Zero-Shot Benchmarks that Hit Hard: O(1) architectures usually sacrifice zero-shot accuracy. Not Spartacus. It is punching way above its weight class, beating established sub-quadratic models (like Mamba-1.4B and RWKV-6-1.6B): 🏆 ARC-Challenge: 0.3063 (vs Mamba 0.284) 🏆 ARC-Easy: 0.5518 🏆 PIQA: 0.6915

View all activity

Organizations

OzTianlu 's models

None public yet