Adam's Law: Textual Frequency Law on Large Language Models Paper • 2604.02176 • Published 2 days ago • 22
OptiMer: Optimal Distribution Vector Merging Is Better than Data Mixing for Continual Pre-Training Paper • 2603.28858 • Published 4 days ago • 6
FIPO: Eliciting Deep Reasoning with Future-KL Influenced Policy Optimization Paper • 2603.19835 • Published 15 days ago • 308