SimpleQA score?
#6
by phil111 - opened
You repeatedly mention world knowledge in your blog, yet only list various MMLU and GPQA scores despite said tests only covering a tiny sliver of popular world knowledge.
Please consider including OpenAI's SimpleQA test next time. It's a true world knowledge test that covers a broad spectrum of popular domains of knowledge.
Despite this model's relatively high English MMLU score the broad English knowledge of this model is apparently very low. I estimate that it's English SimpleQA score is only ~5.
phil111 changed discussion status to closed