AGIEval: A Human-Centric Benchmark for Evaluating Foundation Models Paper • 2304.06364 • Published Apr 13, 2023 • 3
Michelangelo: Long Context Evaluations Beyond Haystacks via Latent Structure Queries Paper • 2409.12640 • Published Sep 19, 2024 • 3
WMT24++: Expanding the Language Coverage of WMT24 to 55 Languages & Dialects Paper • 2502.12404 • Published Feb 18, 2025 • 4
EmbeddingGemma: Powerful and Lightweight Text Representations Paper • 2509.20354 • Published Sep 24, 2025 • 47
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer Paper • 1910.10683 • Published Oct 23, 2019 • 16
ForecastPFN: Synthetically-Trained Zero-Shot Forecasting Paper • 2311.01933 • Published Nov 3, 2023 • 1
Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free Paper • 2505.06708 • Published May 10, 2025 • 11
ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training Paper • 2001.04063 • Published Jan 13, 2020 • 1
Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models Paper • 2501.11873 • Published Jan 21, 2025 • 67
Gated Delta Networks: Improving Mamba2 with Delta Rule Paper • 2412.06464 • Published Dec 9, 2024 • 15
SWE-Universe: Scale Real-World Verifiable Environments to Millions Paper • 2602.02361 • Published 10 days ago • 59
RECALL: A Benchmark for LLMs Robustness against External Counterfactual Knowledge Paper • 2311.08147 • Published Nov 14, 2023 • 1
zELO: ELO-inspired Training Method for Rerankers and Embedding Models Paper • 2509.12541 • Published Sep 16, 2025 • 6
ViDoRe V3: A Comprehensive Evaluation of Retrieval Augmented Generation in Complex Real-World Scenarios Paper • 2601.08620 • Published about 1 month ago • 11