SpargeAttention2: Trainable Sparse Attention via Hybrid Top-k+Top-p Masking and Distillation Fine-Tuning Paper ⢠2602.13515 ⢠Published 12 days ago ⢠43 ⢠6
SpargeAttention2: Trainable Sparse Attention via Hybrid Top-k+Top-p Masking and Distillation Fine-Tuning Paper ⢠2602.13515 ⢠Published 12 days ago ⢠43 ⢠6