Nemotron-Terminal Collection We are releasing Nemotron-Terminal models and training datasets. • 5 items • Updated 2 days ago • 31
TADA Collection TADA: A Generative Framework for Speech Modeling via Text-Acoustic Dual Alignment | https://huggingface.co/papers/2602.23068 • 5 items • Updated 2 days ago • 59
🤏 Smol-Data Collection Tried and tested mixes for strong pretraining. Inspired by https://huggingface.co/blog/codelion/optimal-dataset-mixing • 14 items • Updated 11 days ago • 12
Quantized Qwen3.5 Collection Verified models. Compatible with Transformers v5.3 and vLLM v0.16.1rc1 (nightly). Under evaluation. • 9 items • Updated 1 day ago • 9
pplx-embed Collection Diffusion-Pretrained Dense and Contextual Embeddings • 7 items • Updated 15 days ago • 87
Mobile-Agent-v3.5: Multi-platform Fundamental GUI Agents Paper • 2602.16855 • Published 27 days ago • 48
ColBERT-Zero 🐶 Collection First large-scale fully pre-trained ColBERT model using only public data, outperforming GTE-ModernColBERT and GTE-ModernBERT • 10 items • Updated 10 days ago • 17
jina-embeddings-v5-text Collection Our 5th-gen embeddings: two lightweight multilingual models with SOTA performance in retrieval, matching, clustering, and classification. • 29 items • Updated 14 days ago • 35
OriOn Collection Visual long document VLMs based on Mistral-Small-3.1-24B-Instruct-2503 and Qwen3-VL-32B-Instruct • 4 items • Updated 10 days ago • 4
LateOn-Code 💻 Collection State-of-the-art late interaction code retrieval models • 6 items • Updated 10 days ago • 16
view article Article LateOn-Code & ColGrep: LightOn unveils state-of-the-art code retrieval models and code search tooling 29 days ago • 50
NeuTTS Nano Multilingual Collection Collection NeuTTS Nano is a TTS model, 3x smaller than NeuTTS Air, that runs on CPU in real-time - now in English, Spanish, French, and German versions! • 13 items • Updated 15 days ago • 16