article collection
updated
EPO: Entropy-regularized Policy Optimization for LLM Agents
Reinforcement Learning
Paper
• 2509.22576
• Published
• 135
AgentBench: Evaluating LLMs as Agents
Paper
• 2308.03688
• Published
• 26
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and
lighter
Paper
• 1910.01108
• Published
• 22
Direct Preference Optimization: Your Language Model is Secretly a Reward
Model
Paper
• 2305.18290
• Published
• 64
AWQ: Activation-aware Weight Quantization for LLM Compression and
Acceleration
Paper
• 2306.00978
• Published
• 11
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained
Transformers
Paper
• 2210.17323
• Published
• 10
TPO: Aligning Large Language Models with Multi-branch & Multi-step
Preference Trees
Paper
• 2410.12854
• Published
• 1
KTO: Model Alignment as Prospect Theoretic Optimization
Paper
• 2402.01306
• Published
• 22
Training language models to follow instructions with human feedback
Paper
• 2203.02155
• Published
• 24