arxiv:2502.05609
Taeho Hwang
doubleyyh
AI & ML interests
None yet
Recent Activity
reacted
to
sergiopaniego's
post
with 🚀
about 5 hours ago
TRL v0.27.0 is out!! 🥳
It includes GDPO, the latest variant of GRPO for multi-reward RL ✨
GDPO decouples reward normalization to avoid reward collapse and improve per-reward convergence — developed by
@sliuau @SimonX et al.
Explore the paper: https://huggingface.co/papers/2601.05242
Explore the full set of changes here:
https://github.com/huggingface/trl/releases/tag/v0.27.0
liked
a Space
11 days ago
SamsungResearch/TRUEBench
upvoted
a
paper
3 months ago
Adaptive Multi-Agent Response Refinement in Conversational Systems