Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
Alanox 's Collections
LLM Evaluation Benchmarks

LLM Evaluation Benchmarks

updated Apr 7, 2025

This collection is here is make references to the evaluation benchmarks we see in traditional LLM papers

Upvote
-

  • Running on CPU Upgrade
    245

    MMLU-Pro Leaderboard

    🥇
    245

    More advanced and challenging multi-task evaluation


  • Running on CPU Upgrade
    596

    GAIA Leaderboard

    🦾
    596

    Submit your model answers to GAIA benchmark and view leaderboard

Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs