NACL: A General and Effective KV Cache Eviction Framework for LLMs at Inference Time Paper • 2408.03675 • Published Aug 7, 2024
DHA: Learning Decoupled-Head Attention from Transformer Checkpoints via Adaptive Heads Fusion Paper • 2406.06567 • Published Jun 3, 2024
Inner Thinking Transformer: Leveraging Dynamic Depth Scaling to Foster Adaptive Internal Thinking Paper • 2502.13842 • Published Feb 19, 2025