Length-Induced Embedding Collapse in Transformer-based Models
Paper • 2410.24200 • Published • 2
None defined yet.
Dynamic Long Context Reasoning over Compressed Memory via End-to-End Reinforcement Learning
LycheeDecode: Accelerating Long-Context LLM Inference via Hybrid-Head Sparse Decoding