SimpleQA score?

#6
by phil111 - opened

You repeatedly mention world knowledge in your blog, yet only list various MMLU and GPQA scores despite said tests only covering a tiny sliver of popular world knowledge.

Please consider including OpenAI's SimpleQA test next time. It's a true world knowledge test that covers a broad spectrum of popular domains of knowledge.

Despite this model's relatively high English MMLU score the broad English knowledge of this model is apparently very low. I estimate that it's English SimpleQA score is only ~5.

phil111 changed discussion status to closed

Sign up or log in to comment