Active Self-Supervised Learning: A Few Low-Cost Relationships Are All You Need Paper • 2303.15256 • Published Mar 27, 2023 • 1
Clustering Head: A Visual Case Study of the Training Dynamics in Transformers Paper • 2410.24050 • Published Oct 31, 2024
Unveiling Simplicities of Attention: Adaptive Long-Context Head Identification Paper • 2502.09647 • Published Feb 11, 2025
Provable Benefits of In-Tool Learning for Large Language Models Paper • 2508.20755 • Published Aug 28, 2025 • 11