The Synergy between Data and Multi-Modal Large Language Models: A Survey from Co-Development Perspective Paper โข 2407.08583 โข Published Jul 11, 2024 โข 13
Trinity-RFT: A General-Purpose and Unified Framework for Reinforcement Fine-Tuning of Large Language Models Paper โข 2505.17826 โข Published May 23, 2025 โข 10
Data-Juicer 2.0: Cloud-Scale Adaptive Data Processing for and with Foundation Models Paper โข 2501.14755 โข Published Dec 23, 2024
Group-Relative REINFORCE Is Secretly an Off-Policy Algorithm: Demystifying Some Myths About GRPO and Its Friends Paper โข 2509.24203 โข Published Sep 29, 2025 โข 8
On the Entropy Dynamics in Reinforcement Fine-Tuning of Large Language Models Paper โข 2602.03392 โข Published 10 days ago โข 52
Exploring Selective Layer Fine-Tuning in Federated Learning Paper โข 2408.15600 โข Published Aug 28, 2024
Trinity-RFT: A General-Purpose and Unified Framework for Reinforcement Fine-Tuning of Large Language Models Paper โข 2505.17826 โข Published May 23, 2025 โข 10
Enhancing Latent Computation in Transformers with Latent Tokens Paper โข 2505.12629 โข Published May 19, 2025
On-Policy RL Meets Off-Policy Experts: Harmonizing Supervised Fine-Tuning and Reinforcement Learning via Dynamic Weighting Paper โข 2508.11408 โข Published Aug 15, 2025 โข 8
Group-Relative REINFORCE Is Secretly an Off-Policy Algorithm: Demystifying Some Myths About GRPO and Its Friends Paper โข 2509.24203 โข Published Sep 29, 2025 โข 8
R$^3$L: Reflect-then-Retry Reinforcement Learning with Language-Guided Exploration, Pivotal Credit, and Positive Amplification Paper โข 2601.03715 โข Published Jan 7 โข 1
On the Entropy Dynamics in Reinforcement Fine-Tuning of Large Language Models Paper โข 2602.03392 โข Published 10 days ago โข 52