LorenaYannnnn/general_reward-Olmo-3-7B-Think_7168-baseline_all_tokens_w_kl-seed_2 Updated 11 days ago • 55
LorenaYannnnn/general_reward-Olmo-3-7B-Think_7168-baseline_all_tokens_w_kl-seed_0 Updated 15 days ago • 66
LorenaYannnnn/general_reward-Olmo-3-7B-Think_7168-baseline_all_tokens_w_kl-seed_1 Updated 15 days ago • 61
LorenaYannnnn/bold_formatting-Qwen3-0.6B-OURS_self-seed_0 Text Generation • 0.6B • Updated 8 days ago • 1.27k
LorenaYannnnn/bold_formatting-Qwen3-0.6B-baseline_all_tokens-seed_1 Text Generation • 0.6B • Updated 8 days ago • 224
LorenaYannnnn/bold_formatting-Qwen3-0.6B-baseline_all_tokens-seed_2 Text Generation • 0.6B • Updated 8 days ago • 228
LorenaYannnnn/bold_formatting-Qwen3-0.6B-OURS_self-seed_2 Text Generation • 0.6B • Updated 8 days ago • 176
LorenaYannnnn/bold_formatting-Qwen3-0.6B-OURS_self-seed_1 Text Generation • 0.6B • Updated 8 days ago • 179
LorenaYannnnn/bold_formatting-Qwen3-0.6B-baseline_all_tokens-seed_0 Text Generation • 0.6B • Updated 9 days ago • 1.01k
LorenaYannnnn/bold_formatting-Qwen3-0.6B-baseline_all_tokens-seed_0 Text Generation • 0.6B • Updated 9 days ago • 1.01k
LorenaYannnnn/bold_formatting-Qwen3-0.6B-OURS_self-seed_0 Text Generation • 0.6B • Updated 8 days ago • 1.27k
LorenaYannnnn/bold_formatting-Qwen3-0.6B-baseline_all_tokens-seed_1 Text Generation • 0.6B • Updated 8 days ago • 224
LorenaYannnnn/bold_formatting-Qwen3-0.6B-OURS_self-seed_1 Text Generation • 0.6B • Updated 8 days ago • 179
LorenaYannnnn/bold_formatting-Qwen3-0.6B-OURS_self-seed_2 Text Generation • 0.6B • Updated 8 days ago • 176
LorenaYannnnn/bold_formatting-Qwen3-0.6B-baseline_all_tokens-seed_2 Text Generation • 0.6B • Updated 8 days ago • 228
LorenaYannnnn/general_reward-Olmo-3-7B-Think_7168-baseline_all_tokens_w_kl-seed_2 Updated 11 days ago • 55
LorenaYannnnn/general_reward-Qwen3-0.6B_7168-baseline_all_tokens-seed_0 Text Generation • 0.6B • Updated 12 days ago • 335