Gaussian Adaptive Attention is All You Need: Robust Contextual Representations Across Multiple Modalities Paper • 2401.11143 • Published Jan 20, 2024 • 1
RetroInfer: A Vector-Storage Approach for Scalable Long-Context LLM Inference Paper • 2505.02922 • Published May 5, 2025 • 29
Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters Paper • 2408.04093 • Published Aug 7, 2024 • 5
Core Context Aware Attention for Long Context Language Modeling Paper • 2412.12465 • Published Dec 17, 2024 • 1