DeepSeek R1 (All Versions) Collection DeepSeek-R1-0528 is here! The most powerful reasoning open LLM, available in GGUF, original & 4-bit formats. Includes Llama & Qwen distilled models. • 37 items • Updated 1 day ago • 262
view article Article Model2Vec: Distill a Small Fast Model from any Sentence Transformer Oct 14, 2024 • 103
Simple linear attention language models balance the recall-throughput tradeoff Paper • 2402.18668 • Published Feb 28, 2024 • 20
SigLIP Collection Contrastive (sigmoid) image-text models from https://arxiv.org/abs/2303.15343 • 10 items • Updated Jul 10, 2025 • 63
Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization Paper • 2311.06243 • Published Nov 10, 2023 • 21
Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling Paper • 2311.00430 • Published Nov 1, 2023 • 56
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining Paper • 2308.05734 • Published Aug 10, 2023 • 38
LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models Paper • 2308.16137 • Published Aug 30, 2023 • 41
Sorted LLaMA: Unlocking the Potential of Intermediate Layers of Large Language Models for Dynamic Inference Using Sorted Fine-Tuning (SoFT) Paper • 2309.08968 • Published Sep 16, 2023 • 24
Contrastive Decoding Improves Reasoning in Large Language Models Paper • 2309.09117 • Published Sep 17, 2023 • 40