4 416

M Saad Salman

MSS444

MSS444

AI & ML interests

None yet

Recent Activity

upvoted a paper about 21 hours ago

RetroAgent: From Solving to Evolving via Retrospective Dual Intrinsic Feedback

upvoted a paper about 21 hours ago

Lost in Backpropagation: The LM Head is a Gradient Bottleneck

upvoted a paper about 21 hours ago

ReMix: Reinforcement routing for mixtures of LoRAs in LLM finetuning

View all activity

Organizations

None yet

upvoted 8 papers about 21 hours ago

Bootstrapping Exploration with Group-Level Natural Language Feedback in Reinforcement Learning

Paper • 2603.04597 • Published 13 days ago • 195

Automatic Generation of High-Performance RL Environments

Paper • 2603.12145 • Published 5 days ago • 6

The Curse and Blessing of Mean Bias in FP4-Quantized LLM Training

Paper • 2603.10444 • Published 7 days ago • 10

CreativeBench: Benchmarking and Enhancing Machine Creativity via Self-Evolving Challenges

Paper • 2603.11863 • Published 5 days ago • 5

upvoted 2 papers 5 days ago

Understanding by Reconstruction: Reversing the Software Development Process for LLM Pretraining

Paper • 2603.11103 • Published 7 days ago • 8

Examining Reasoning LLMs-as-Judges in Non-Verifiable LLM Post-Training

Paper • 2603.12246 • Published 5 days ago • 4

upvoted 4 papers 6 days ago

ReflexiCoder: Teaching Large Language Models to Self-Reflect on Generated Code and Self-Correct It via Reinforcement Learning

Paper • 2603.05863 • Published 12 days ago • 5

Towards a Neural Debugger for Python

Paper • 2603.09951 • Published 7 days ago • 5

The Reasoning Trap -- Logical Reasoning as a Mechanistic Pathway to Situational Awareness

Paper • 2603.09200 • Published 8 days ago • 5

Thinking to Recall: How Reasoning Unlocks Parametric Knowledge in LLMs

Paper • 2603.09906 • Published 7 days ago • 67

upvoted 6 papers 7 days ago

ByteFlow: Language Modeling through Adaptive Byte Compression without a Tokenizer

Paper • 2603.03583 • Published 14 days ago • 2

Breaking Training Bottlenecks: Effective and Stable Reinforcement Learning for Coding Models

Paper • 2603.07777 • Published 9 days ago • 5

Scaling Data Difficulty: Improving Coding Models via Reinforcement Learning on Fresh and Challenging Problems

Paper • 2603.07779 • Published 9 days ago • 5

Agentic Critical Training

Paper • 2603.08706 • Published 8 days ago • 13

Unlocking Data Value in Finance: A Study on Distillation and Difficulty-Aware Training

Paper • 2603.07223 • Published 10 days ago • 13

NLE: Non-autoregressive LLM-based ASR by Transcript Editing

Paper • 2603.08397 • Published 8 days ago • 21

M Saad Salman

AI & ML interests

Recent Activity

Organizations

MSS444's activity