RLHFlow

university

AI & ML interests

Workflow of Reinforcement Learning from Human Feedback (RLHF). Blog: https://rlhflow.github.io/

Recent Activity

Chenlu123 submitted a paper 9 days ago

Adaptive Layerwise Perturbation: Unifying Off-Policy Corrections for LLM RL

baohao submitted a paper about 2 months ago

Self-Hinting Language Models Enhance Reinforcement Learning

baohao updated a collection 5 months ago

View all activity

Papers

Reinforce-Ada: An Adaptive Sampling Framework for Reinforce-Style LLM Training

View all Papers

submitted a paper to Daily Papers 9 days ago

Adaptive Layerwise Perturbation: Unifying Off-Policy Corrections for LLM RL

Paper • 2603.19470 • Published 13 days ago • 3

submitted a paper to Daily Papers about 2 months ago

Self-Hinting Language Models Enhance Reinforcement Learning

Paper • 2602.03143 • Published Feb 3 • 31

updated a collection 5 months ago

Reinforce-Ada

Training & test sets and finetuned models • 19 items • Updated Oct 26, 2025 • 3

updated 2 models 5 months ago

RLHFlow/Qwen2.5-Math-1.5B-DAPO-easy

2B • Updated Oct 26, 2025 • 2

RLHFlow/Qwen2.5-Math-1.5B-GRPO-n8-easy

2B • Updated Oct 26, 2025 • 11

published 2 models 5 months ago

RLHFlow/Qwen2.5-Math-1.5B-DAPO-easy

2B • Updated Oct 26, 2025 • 2

RLHFlow/Qwen2.5-Math-1.5B-GRPO-n8-easy

2B • Updated Oct 26, 2025 • 11

updated 2 datasets 6 months ago

RLHFlow/reinforce_ada_hard_prompt_1-5b

Viewer • Updated Oct 16, 2025 • 13.3k • 33

RLHFlow/reinforce_ada_simple_prompt_1-5b

Viewer • Updated Oct 16, 2025 • 25k • 27

updated a model 6 months ago

RLHFlow/Qwen2.5-Math-1-5B-Reinforce-Ada-balance-hard

Updated Oct 15, 2025 • 2

updated a collection 6 months ago

Reinforce-Ada

Training & test sets and finetuned models • 19 items • Updated Oct 26, 2025 • 3

published a model 6 months ago

RLHFlow/Qwen2.5-Math-1-5B-Reinforce-Ada-balance-hard

Updated Oct 15, 2025 • 2

updated a model 6 months ago

RLHFlow/Qwen2.5-Math-1-5B-Reinforce-Ada-balance-easy

2B • Updated Oct 11, 2025

published a model 6 months ago

RLHFlow/Qwen2.5-Math-1-5B-Reinforce-Ada-balance-easy

2B • Updated Oct 11, 2025

updated a dataset 6 months ago

RLHFlow/reinforce_ada_simple_prompt_1-5b

Viewer • Updated Oct 16, 2025 • 25k • 27

published a dataset 6 months ago

RLHFlow/reinforce_ada_simple_prompt_1-5b

Viewer • Updated Oct 16, 2025 • 25k • 27