Papers: Models
updated
Llemma: An Open Language Model For Mathematics
Paper
• 2310.10631
• Published • 57
Paper
• 2310.06825
• Published • 58
Paper
• 2309.16609
• Published • 38
BTLM-3B-8K: 7B Parameter Performance in a 3B Parameter Model
Paper
• 2309.11568
• Published • 11
Textbooks Are All You Need II: phi-1.5 technical report
Paper
• 2309.05463
• Published • 89
Paper
• 2309.03450
• Published • 8
Code Llama: Open Foundation Models for Code
Paper
• 2308.12950
• Published • 29
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper
• 2307.09288
• Published • 250
LLaMA: Open and Efficient Foundation Language Models
Paper
• 2302.13971
• Published • 21
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
Paper
• 2211.05100
• Published • 37
Scaling Instruction-Finetuned Language Models
Paper
• 2210.11416
• Published • 8
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and
lighter
Paper
• 1910.01108
• Published • 22
Exploring the Limits of Transfer Learning with a Unified Text-to-Text
Transformer
Paper
• 1910.10683
• Published • 17
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Paper
• 1907.11692
• Published • 10
BERT: Pre-training of Deep Bidirectional Transformers for Language
Understanding
Paper
• 1810.04805
• Published • 26
Skywork: A More Open Bilingual Foundation Model
Paper
• 2310.19341
• Published • 6
SkyMath: Technical Report
Paper
• 2310.16713
• Published • 2
LaMDA: Language Models for Dialog Applications
Paper
• 2201.08239
• Published • 5
Sheared LLaMA: Accelerating Language Model Pre-training via Structured
Pruning
Paper
• 2310.06694
• Published • 3
UT5: Pretraining Non autoregressive T5 with unrolled denoising
Paper
• 2311.08552
• Published • 8
TinyLlama: An Open-Source Small Language Model
Paper
• 2401.02385
• Published • 95
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Paper
• 2401.02954
• Published • 53
Paper
• 2401.04088
• Published • 160
MoE-Mamba: Efficient Selective State Space Models with Mixture of
Experts
Paper
• 2401.04081
• Published • 74
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper
• 2312.00752
• Published • 150
H2O-Danube-1.8B Technical Report
Paper
• 2401.16818
• Published • 18
Aya Model: An Instruction Finetuned Open-Access Multilingual Language
Model
Paper
• 2402.07827
• Published • 48