Training
updated
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Paper
• 2403.03507
• Published
• 189
RAFT: Adapting Language Model to Domain Specific RAG
Paper
• 2403.10131
• Published
• 72
LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models
Paper
• 2403.13372
• Published
• 179
InternLM2 Technical Report
Paper
• 2403.17297
• Published
• 34
sDPO: Don't Use Your Data All at Once
Paper
• 2403.19270
• Published
• 41
ReFT: Representation Finetuning for Language Models
Paper
• 2404.03592
• Published
• 101
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
Paper
• 2402.13753
• Published
• 116
MiniCPM: Unveiling the Potential of Small Language Models with Scalable
Training Strategies
Paper
• 2404.06395
• Published
• 24
ORPO: Monolithic Preference Optimization without Reference Model
Paper
• 2403.07691
• Published
• 72
Rho-1: Not All Tokens Are What You Need
Paper
• 2404.07965
• Published
• 94
Learn Your Reference Model for Real Good Alignment
Paper
• 2404.09656
• Published
• 90
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your
Phone
Paper
• 2404.14219
• Published
• 259
The Instruction Hierarchy: Training LLMs to Prioritize Privileged
Instructions
Paper
• 2404.13208
• Published
• 40
Simple and Scalable Strategies to Continually Pre-train Large Language
Models
Paper
• 2403.08763
• Published
• 51
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning
Paper
• 2405.12130
• Published
• 50
RLHF Workflow: From Reward Modeling to Online RLHF
Paper
• 2405.07863
• Published
• 71
SimPO: Simple Preference Optimization with a Reference-Free Reward
Paper
• 2405.14734
• Published
• 12
DialSim: A Real-Time Simulator for Evaluating Long-Term Dialogue
Understanding of Conversational Agents
Paper
• 2406.13144
• Published
• 11
Paper
• 2407.10671
• Published
• 168
Improving Text Embeddings with Large Language Models
Paper
• 2401.00368
• Published
• 82
LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders
Paper
• 2404.05961
• Published
• 66
SeaLLMs 3: Open Foundation and Chat Multilingual Large Language Models
for Southeast Asian Languages
Paper
• 2407.19672
• Published
• 57
Self-Training with Direct Preference Optimization Improves
Chain-of-Thought Reasoning
Paper
• 2407.18248
• Published
• 33
Meta-Rewarding Language Models: Self-Improving Alignment with
LLM-as-a-Meta-Judge
Paper
• 2407.19594
• Published
• 21
Gemma 2: Improving Open Language Models at a Practical Size
Paper
• 2408.00118
• Published
• 78
The Llama 3 Herd of Models
Paper
• 2407.21783
• Published
• 117
LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs
Paper
• 2408.07055
• Published
• 68
DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for
Reinforcement Learning and Monte-Carlo Tree Search
Paper
• 2408.08152
• Published
• 61
Controllable Text Generation for Large Language Models: A Survey
Paper
• 2408.12599
• Published
• 65
Training Language Models to Self-Correct via Reinforcement Learning
Paper
• 2409.12917
• Published
• 140
A Survey on the Honesty of Large Language Models
Paper
• 2409.18786
• Published
• 31
Paper
• 2410.05258
• Published
• 180
Addition is All You Need for Energy-efficient Language Models
Paper
• 2410.00907
• Published
• 151
Training Large Language Models to Reason in a Continuous Latent Space
Paper
• 2412.06769
• Published
• 94