SWE-bench Collection SWE-bench (Lite, Verified, Multimodal, Multilingual) all in one place! • 5 items • Updated Dec 14, 2025 • 5
Scaling Latent Reasoning via Looped Language Models Paper • 2510.25741 • Published Oct 29, 2025 • 226
PromptCoT 2.0: Scaling Prompt Synthesis for Large Language Model Reasoning Paper • 2509.19894 • Published Sep 24, 2025 • 34
Reasoning over Boundaries: Enhancing Specification Alignment via Test-time Delibration Paper • 2509.14760 • Published Sep 18, 2025 • 53