-
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization
Paper • 2601.05242 • Published • 229 -
On Predictability of Reinforcement Learning Dynamics for Large Language Models
Paper • 2510.00553 • Published • 9 -
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models
Paper • 2501.03262 • Published • 104 -
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Paper • 2402.03300 • Published • 141
K
naviyaYu
AI & ML interests
None yet
Recent Activity
updated a collection about 2 months ago
RL updated a collection about 2 months ago
RL updated a collection about 2 months ago
RLOrganizations
None yet
nvidia
RL
-
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization
Paper • 2601.05242 • Published • 229 -
On Predictability of Reinforcement Learning Dynamics for Large Language Models
Paper • 2510.00553 • Published • 9 -
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models
Paper • 2501.03262 • Published • 104 -
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Paper • 2402.03300 • Published • 141
deepseek
nvidia