Running on CPU Upgrade Agents 246 MMLU-Pro Leaderboard 🥇 246 More advanced and challenging multi-task evaluation
Running 596 Scaling test-time compute 📈 596 Run advanced search strategies to boost LLM problem solving