More is Less: The Pitfalls of Multi-Model Synthetic Preference Data in DPO Safety Alignment Paper • 2504.02193 • Published Apr 3, 2025 • 1
DRIFT: Learning from Abundant User Dissatisfaction in Real-World Preference Learning Paper • 2510.02341 • Published Sep 27, 2025 • 4
Learning Self-Correction in Vision-Language Models via Rollout Augmentation Paper • 2602.08503 • Published Feb 9 • 3
Why Reasoning Fails to Plan: A Planning-Centric Analysis of Long-Horizon Decision Making in LLM Agents Paper • 2601.22311 • Published Jan 29
Addressing Performance Saturation for LLM RL via Precise Entropy Curve Control Paper • 2604.26326 • Published 4 days ago • 11
More is Less: The Pitfalls of Multi-Model Synthetic Preference Data in DPO Safety Alignment Paper • 2504.02193 • Published Apr 3, 2025 • 1
Purdue LLM Paper List Collection A collection of LLM-related papers by Purdue researchers. Welcome to add your own. • 5 items • Updated about 19 hours ago • 1
ETA: Evaluating Then Aligning Safety of Vision Language Models at Inference Time Paper • 2410.06625 • Published Oct 9, 2024 • 1
Purdue LLM Paper List Collection A collection of LLM-related papers by Purdue researchers. Welcome to add your own. • 5 items • Updated about 19 hours ago • 1
DRIFT: Learning from Abundant User Dissatisfaction in Real-World Preference Learning Paper • 2510.02341 • Published Sep 27, 2025 • 4
Cascade Reward Sampling for Efficient Decoding-Time Alignment Paper • 2406.16306 • Published Jun 24, 2024 • 1
Purdue LLM Paper List Collection A collection of LLM-related papers by Purdue researchers. Welcome to add your own. • 5 items • Updated about 19 hours ago • 1
Purdue LLM Paper List Collection A collection of LLM-related papers by Purdue researchers. Welcome to add your own. • 5 items • Updated about 19 hours ago • 1
Addressing Performance Saturation for LLM RL via Precise Entropy Curve Control Paper • 2604.26326 • Published 4 days ago • 11
Addressing Performance Saturation for LLM RL via Precise Entropy Curve Control Paper • 2604.26326 • Published 4 days ago • 11
Hybrid Reinforcement: When Reward Is Sparse, It's Better to Be Dense Paper • 2510.07242 • Published Oct 8, 2025 • 30