EvalAgent: Discovering Implicit Evaluation Criteria from the Web Paper • 2504.15219 • Published Apr 21, 2025 • 1
QUDsim: Quantifying Discourse Similarities in LLM-Generated Text Paper • 2504.09373 • Published Apr 12, 2025
SkillFactory: Self-Distillation For Learning Cognitive Behaviors Paper • 2512.04072 • Published Dec 3, 2025 • 5
SkillFactory: Self-Distillation For Learning Cognitive Behaviors Paper • 2512.04072 • Published Dec 3, 2025 • 5
SkillFactory: Self-Distillation For Learning Cognitive Behaviors Paper • 2512.04072 • Published Dec 3, 2025 • 5
SkillFactory: Self-Distillation For Learning Cognitive Behaviors Paper • 2512.04072 • Published Dec 3, 2025 • 5
TAUR-dev/D-EVAL__standard_eval_v3__FinEval_16k_fulleval_3arg_OLMO_RLONLY-RL Viewer • Updated Dec 2, 2025 • 11.5k • 16
TAUR-dev/D-ExpTracker__FinEval_16k_fulleval_3arg_OLMO_RLONLY-RL-countdown_6arg__v1 Viewer • Updated Dec 2, 2025 • 1.01k • 27
TAUR-dev/D-EVAL__standard_eval_v3__FinEval_16k_fulleval_3arg_OLMO_RLONLY-RL-countdown_6arg-eval_rl Viewer • Updated Dec 2, 2025 • 1k • 11
TAUR-dev/D-EVAL__standard_eval_v3__FinEval_16k_fulleval_3arg_OLMO_RLONLY-RL-countdown_6arg-eval_rl Viewer • Updated Dec 2, 2025 • 1k • 11
TAUR-dev/D-EVAL__standard_eval_v3__FinEval_16k_fulleval_3arg_OLMO_RLONLY-RL Viewer • Updated Dec 2, 2025 • 11.5k • 16
TAUR-dev/D-ExpTracker__FinEval_16k_fulleval_3arg_OLMO_RLONLY-RL-countdown_4arg__v1 Viewer • Updated Dec 2, 2025 • 1.01k • 16
TAUR-dev/D-EVAL__standard_eval_v3__FinEval_16k_fulleval_3arg_OLMO_RLONLY-RL-countdown_4arg-eval_rl Viewer • Updated Dec 2, 2025 • 1k • 14
TAUR-dev/D-EVAL__standard_eval_v3__FinEval_16k_fulleval_3arg_OLMO_RLONLY-RL-countdown_4arg-eval_rl Viewer • Updated Dec 2, 2025 • 1k • 14
TAUR-dev/D-ExpTracker__FinEval_16k_fulleval_3arg_OLMO_RLONLY-RL-letter_countdown_5o__v1 Viewer • Updated Dec 2, 2025 • 304 • 25
TAUR-dev/D-EVAL__standard_eval_v3__FinEval_16k_fulleval_3arg_OLMO_RLONLY-RL-letter_countdown_5o-eval_rl Viewer • Updated Dec 2, 2025 • 300 • 8
TAUR-dev/D-EVAL__standard_eval_v3__FinEval_16k_fulleval_3arg_OLMO_RLONLY-RL-letter_countdown_5o-eval_rl Viewer • Updated Dec 2, 2025 • 300 • 8
TAUR-dev/D-ExpTracker__FinEval_16k_fulleval_3arg_OLMO_RLONLY-RL-countdown_5arg__v1 Viewer • Updated Dec 2, 2025 • 1.01k • 13
TAUR-dev/D-EVAL__standard_eval_v3__FinEval_16k_fulleval_3arg_OLMO_RLONLY-RL-countdown_5arg-eval_rl Viewer • Updated Dec 2, 2025 • 1k • 13