Gym ServiceNow-AI/EnterpriseOps-Gym Viewer • Updated 11 days ago • 2.56k • 5.23k • 84 allenai/MolmoWeb-HumanSkills Viewer • Updated 9 days ago • 116k • 336 • 9 allenai/MolmoWeb-SyntheticSkills Viewer • Updated 9 days ago • 5.55k • 146 • 7 allenai/MolmoWeb-SyntheticTrajs Viewer • Updated 10 days ago • 108k • 806 • 8
Bench Datasets Idavidrein/gpqa Benchmark • Updated 28 days ago • 1.25k • 115k • 407 openai/gsm8k Benchmark • Updated 10 days ago • 17.6k • 754k • 1.23k princeton-nlp/SWE-bench_Verified Viewer • Updated Feb 18, 2025 • 500 • 693k • 316 ScaleAI/SWE-bench_Pro Benchmark • Updated Feb 23 • 731 • 854k • 68
Gym ServiceNow-AI/EnterpriseOps-Gym Viewer • Updated 11 days ago • 2.56k • 5.23k • 84 allenai/MolmoWeb-HumanSkills Viewer • Updated 9 days ago • 116k • 336 • 9 allenai/MolmoWeb-SyntheticSkills Viewer • Updated 9 days ago • 5.55k • 146 • 7 allenai/MolmoWeb-SyntheticTrajs Viewer • Updated 10 days ago • 108k • 806 • 8
Bench Datasets Idavidrein/gpqa Benchmark • Updated 28 days ago • 1.25k • 115k • 407 openai/gsm8k Benchmark • Updated 10 days ago • 17.6k • 754k • 1.23k princeton-nlp/SWE-bench_Verified Viewer • Updated Feb 18, 2025 • 500 • 693k • 316 ScaleAI/SWE-bench_Pro Benchmark • Updated Feb 23 • 731 • 854k • 68