MMR-DR_GRPO-lambda-0.9 / reward_data
183 MB
kangdawei's picture
Training in progress, step 500
eb8457a verified