Papers
arxiv:2604.10098

Attention Sink in Transformers: A Survey on Utilization, Interpretation, and Mitigation

Published on Apr 11
· Submitted by
zunhaisu
on Apr 14
#3 Paper of the day
Authors:
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,

Abstract

Transformers face challenges from Attention Sink phenomenon where excessive attention focuses on uninformative tokens, impacting interpretability and performance, necessitating comprehensive research survey addressing fundamental usage, mechanistic understanding, and strategic mitigation approaches.

AI-generated summary

As the foundational architecture of modern machine learning, Transformers have driven remarkable progress across diverse AI domains. Despite their transformative impact, a persistent challenge across various Transformers is Attention Sink (AS), in which a disproportionate amount of attention is focused on a small subset of specific yet uninformative tokens. AS complicates interpretability, significantly affecting the training and inference dynamics, and exacerbates issues such as hallucinations. In recent years, substantial research has been dedicated to understanding and harnessing AS. However, a comprehensive survey that systematically consolidates AS-related research and offers guidance for future advancements remains lacking. To address this gap, we present the first survey on AS, structured around three key dimensions that define the current research landscape: Fundamental Utilization, Mechanistic Interpretation, and Strategic Mitigation. Our work provides a pivotal contribution by clarifying key concepts and guiding researchers through the evolution and trends of the field. We envision this survey as a definitive resource, empowering researchers and practitioners to effectively manage AS within the current Transformer paradigm, while simultaneously inspiring innovative advancements for the next generation of Transformers. The paper list of this work is available at https://github.com/ZunhaiSu/Awesome-Attention-Sink.

Community

Paper author Paper submitter
edited about 9 hours ago

Excited to share our first survey on Attention Sink — Attention Sink in Transformers: A Survey on Utilization, Interpretation, and Mitigation 🚀

📌 We systematically reviewed 180+ papers and identified a clear evolutionary trajectory:
1️⃣ Fundamental Utilization (from 2023) – Leveraging inherent sink properties or managing their immediate effects.
2️⃣ Mechanistic Interpretation (from 2024) – Understanding the internal drivers (softmax constraints, outlier circuits, implicit bias, geometric anchoring).
3️⃣ Strategic Mitigation (from 2025) – Direct structural elimination of sinks based on mechanistic insights (gated attention, modified softmax, learnable bias, pre‑training interventions).

📚 We categorized all 180+ papers by these three phases and by application scenarios, with clear labels for quick navigation.
🎯 We also highlight key challenges and future directions, pushing Transformers from passively accepting sinks to actively mastering them.

Continuously updating — feel free to discuss! 🙌

Paper author Paper submitter

✨ We thank all related works for their contributions to AS-related research. We are currently fine-tuning the paper for journal submission. For any inquiries or to include your work in the paper list, please contact us at zh-su23@mails.tsinghua.edu.cn.

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2604.10098
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2604.10098 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2604.10098 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2604.10098 in a Space README.md to link it from this page.

Collections including this paper 3