arxiv:2604.10098

Attention Sink in Transformers: A Survey on Utilization, Interpretation, and Mitigation

Published on Apr 11

· Submitted by

zunhaisu on Apr 14

#3 Paper of the day

LongCat

Upvote

Authors:

Zunhai Su ,

Abstract

Transformers face challenges from Attention Sink phenomenon where excessive attention focuses on uninformative tokens, impacting interpretability and performance, necessitating comprehensive research survey addressing fundamental usage, mechanistic understanding, and strategic mitigation approaches.

AI-generated summary

As the foundational architecture of modern machine learning, Transformers have driven remarkable progress across diverse AI domains. Despite their transformative impact, a persistent challenge across various Transformers is Attention Sink (AS), in which a disproportionate amount of attention is focused on a small subset of specific yet uninformative tokens. AS complicates interpretability, significantly affecting the training and inference dynamics, and exacerbates issues such as hallucinations. In recent years, substantial research has been dedicated to understanding and harnessing AS. However, a comprehensive survey that systematically consolidates AS-related research and offers guidance for future advancements remains lacking. To address this gap, we present the first survey on AS, structured around three key dimensions that define the current research landscape: Fundamental Utilization, Mechanistic Interpretation, and Strategic Mitigation. Our work provides a pivotal contribution by clarifying key concepts and guiding researchers through the evolution and trends of the field. We envision this survey as a definitive resource, empowering researchers and practitioners to effectively manage AS within the current Transformer paradigm, while simultaneously inspiring innovative advancements for the next generation of Transformers. The paper list of this work is available at https://github.com/ZunhaiSu/Awesome-Attention-Sink.

View arXiv page View PDF Project page GitHub 30 Add to collection

Community

zunhai

Paper author Paper submitter about 17 hours ago

•

edited about 9 hours ago

Excited to share our first survey on Attention Sink — Attention Sink in Transformers: A Survey on Utilization, Interpretation, and Mitigation 🚀

📌 We systematically reviewed 180+ papers and identified a clear evolutionary trajectory:
1️⃣ Fundamental Utilization (from 2023) – Leveraging inherent sink properties or managing their immediate effects.
2️⃣ Mechanistic Interpretation (from 2024) – Understanding the internal drivers (softmax constraints, outlier circuits, implicit bias, geometric anchoring).
3️⃣ Strategic Mitigation (from 2025) – Direct structural elimination of sinks based on mechanistic insights (gated attention, modified softmax, learnable bias, pre‑training interventions).

📚 We categorized all 180+ papers by these three phases and by application scenarios, with clear labels for quick navigation.
🎯 We also highlight key challenges and future directions, pushing Transformers from passively accepting sinks to actively mastering them.

Continuously updating — feel free to discuss! 🙌

zunhai

Paper author Paper submitter about 10 hours ago

✨ We thank all related works for their contributions to AS-related research. We are currently fine-tuning the paper for journal submission. For any inquiries or to include your work in the paper list, please contact us at zh-su23@mails.tsinghua.edu.cn.