arxiv:2412.14135
Bo Wang
Musicode
AI & ML interests
None yet
Recent Activity
upvoted a paper about 15 hours ago
Implicit Reward as the Bridge: A Unified View of SFT and DPO Connections submitted a paper about 15 hours ago
The Past Is Not Past: Memory-Enhanced Dynamic Reward Shaping upvoted a paper about 17 hours ago
The Past Is Not Past: Memory-Enhanced Dynamic Reward ShapingOrganizations
None yet