SNU Thunder-LLM Korean Benchmark Suite - a thunder-research-group Collection

thunder-research-group 's Collections

SNU Thunder-LLM Korean Benchmark Suite

SNU Thunder-LLM English Benchmark Suite

SNU Thunder-LLM Dataset Suite

Post-Training Datasets

Negation Benchmarks

SNU Thunder-LLM Korean Benchmark Suite

updated 13 days ago

thunder-research-group/SNU_Ko-LAMBADA

Viewer • Updated Jun 13, 2025 • 2.26k • 168
thunder-research-group/SNU_Ko-WinoGrande

Viewer • Updated Jun 13, 2025 • 1.27k • 104
thunder-research-group/SNU_Ko-ARC

Viewer • Updated Jun 13, 2025 • 3.54k • 61
thunder-research-group/SNU_Ko-GSM8K

Viewer • Updated 4 days ago • 1.32k • 108 • 1
thunder-research-group/SNU_Ko-IFEval

Viewer • Updated Jun 13, 2025 • 841 • 270
thunder-research-group/SNU_Ko-EQ-Bench

Viewer • Updated Jun 13, 2025 • 171 • 23
skt/kobest_v1

Viewer • Updated Mar 28, 2024 • 23.4k • 3.78k • 54

Note We use hellaswag > test set for evaluation
HAERAE-HUB/KMMLU

Viewer • Updated Mar 5, 2024 • 244k • 11k • 97
HYU-NLP/KR-HumanEval

Viewer • Updated Jun 3, 2025 • 328 • 42

Note We use v1 for evaluation
LGCNS/KorQuAD_2.0

Viewer • Updated Aug 7, 2025 • 93.7k • 156 • 2
thunder-research-group/SNU_Ko-MuSR

Viewer • Updated Nov 24, 2025 • 750 • 38
thunder-research-group/SNU_Thunder-KoNUBench

Viewer • Updated 11 days ago • 4.78k • 41 • 1