2 6

Junkang Wu

junkang0909

https://junkangwu.github.io/

AI & ML interests

LLM alignment

Recent Activity

upvoted a paper 13 days ago

On the Direction of RLVR Updates for LLM Reasoning: Identification and Exploitation

upvoted a paper 6 months ago

EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning

authored a paper 6 months ago

Aligning Multimodal LLM with Human Preference: A Survey

View all activity

Organizations

None yet

upvoted a paper 13 days ago

On the Direction of RLVR Updates for LLM Reasoning: Identification and Exploitation

Paper • 2603.22117 • Published 15 days ago • 29

upvoted a paper 6 months ago

EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning

Paper • 2509.22576 • Published Sep 26, 2025 • 137

authored 3 papers 6 months ago

upvoted a paper 6 months ago

Quantile Advantage Estimation for Entropy-Safe Reasoning

Paper • 2509.22611 • Published Sep 26, 2025 • 120

commented a paper 6 months ago

Quantile Advantage Estimation for Entropy-Safe Reasoning

Paper • 2509.22611 • Published Sep 26, 2025 • 120 •

upvoted 2 papers about 1 year ago

Aligning Multimodal LLM with Human Preference: A Survey

Paper • 2503.14504 • Published Mar 18, 2025 • 26

Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning

Paper • 2503.07572 • Published Mar 10, 2025 • 48

authored a paper about 1 year ago

RePO: ReLU-based Preference Optimization

Paper • 2503.07426 • Published Mar 10, 2025 • 2

commented a paper about 1 year ago

RePO: ReLU-based Preference Optimization

Paper • 2503.07426 • Published Mar 10, 2025 • 2 •

authored 4 papers about 1 year ago

Direct Multi-Turn Preference Optimization for Language Agents

Paper • 2406.14868 • Published Jun 21, 2024

MM-RLHF: The Next Step Forward in Multimodal LLM Alignment

Paper • 2502.10391 • Published Feb 14, 2025 • 34

$β$-DPO: Direct Preference Optimization with Dynamic $β$

Paper • 2407.08639 • Published Jul 11, 2024

Towards Robust Alignment of Language Models: Distributionally Robustifying Direct Preference Optimization

Paper • 2407.07880 • Published Jul 10, 2024

upvoted a paper about 1 year ago

MM-RLHF: The Next Step Forward in Multimodal LLM Alignment

Paper • 2502.10391 • Published Feb 14, 2025 • 34

Junkang Wu

AI & ML interests

Recent Activity

Organizations

junkang0909's activity