LLM Evaluation Benchmarks - a Alanox Collection

Alanox 's Collections

LLM Evaluation Benchmarks

LLM Evaluation Benchmarks

updated Apr 7, 2025

This collection is here is make references to the evaluation benchmarks we see in traditional LLM papers

Running on CPU Upgrade

Agents

245

MMLU-Pro Leaderboard

🥇

245

More advanced and challenging multi-task evaluation
Running on CPU Upgrade

Agents

599

GAIA Leaderboard

🦾

599

Submit your model answers to GAIA benchmark and view leaderboard