phuongntc/qwen3_06b_grpo_noSFT_multievalsumviet2_nopenalty Text Generation • Updated about 1 month ago
phuongntc/qwen3_0.6b_ppo_penalty_multievalsumviet2_fix1000 Text Generation • 0.6B • Updated Jan 16 • 5