A collection of mutiple benchmarks for large reasoning model evaluation
datasets-and-models
non-profit
AI & ML interests
None defined yet.
Recent Activity
View all activity
models 65
guanning-ai/countdown_20260225
Updated
guanning-ai/20260215_pkpo_T16
Updated
guanning-ai/SmolLM-Checkpoints-Final-0124
Updated
guanning-ai/SmolLM-Checkpoints-Final-0123
Updated
guanning-ai/SmolLM-maclaurin-baseline-T16
Updated
guanning-ai/SmolLM-pkpo-T16
Updated
guanning-ai/SmolLM-grpo-32rollouts
Updated
guanning-ai/SmolLM-p-normalization-32rollouts
Updated
guanning-ai/Smollm004
Updated
guanning-ai/Smollm002
Updated
datasets 138
guanning-ai/gsm8k-platinum
Viewer
• Updated
• 1.21k • 14
guanning-ai/math500_level5
Viewer
• Updated
• 134 • 4
guanning-ai/math500_level4
Viewer
• Updated
• 128 • 5
guanning-ai/math500_level3
Viewer
• Updated
• 105 • 5
guanning-ai/math500_level2
Viewer
• Updated
• 90 • 3
guanning-ai/math500_level1
Viewer
• Updated
• 43 • 3
guanning-ai/minervamath
Viewer
• Updated
• 272 • 3
guanning-ai/smollm-gsm8k-data-1024
Viewer
• Updated
• 7.65M • 39
guanning-ai/gsm8k-metamath
Viewer
• Updated
• 160k • 19
guanning-ai/gsm8k-mumath
Viewer
• Updated
• 92k • 4