SpargeAttention2: Trainable Sparse Attention via Hybrid Top-k+Top-p Masking and Distillation Fine-Tuning Paper • 2602.13515 • Published 8 days ago • 33
SpargeAttention2: Trainable Sparse Attention via Hybrid Top-k+Top-p Masking and Distillation Fine-Tuning Paper • 2602.13515 • Published 8 days ago • 33
Geometry-Aware Rotary Position Embedding for Consistent Video World Model Paper • 2602.07854 • Published 13 days ago • 8
Geometry-Aware Rotary Position Embedding for Consistent Video World Model Paper • 2602.07854 • Published 13 days ago • 8
Causal Forcing: Autoregressive Diffusion Distillation Done Right for High-Quality Real-Time Interactive Video Generation Paper • 2602.02214 • Published 19 days ago • 24
view article Article From DeepSpeed to FSDP and Back Again with Hugging Face Accelerate +2 Jun 13, 2024 • 62
SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse-Linear Attention Paper • 2509.24006 • Published Sep 28, 2025 • 118
SageAttention2++: A More Efficient Implementation of SageAttention2 Paper • 2505.21136 • Published May 27, 2025 • 45
SpargeAttn: Accurate Sparse Attention Accelerating Any Model Inference Paper • 2502.18137 • Published Feb 25, 2025 • 60
SpargeAttn: Accurate Sparse Attention Accelerating Any Model Inference Paper • 2502.18137 • Published Feb 25, 2025 • 60
CRM: Single Image to 3D Textured Mesh with Convolutional Reconstruction Model Paper • 2403.05034 • Published Mar 8, 2024 • 21