dots.ocr: Multilingual Document Layout Parsing in a Single Vision-Language Model Paper • 2512.02498 • Published Dec 2, 2025 • 2
InternAgent-1.5: A Unified Agentic Framework for Long-Horizon Autonomous Scientific Discovery Paper • 2602.08990 • Published 10 days ago • 69
DFlash: Block Diffusion for Flash Speculative Decoding Paper • 2602.06036 • Published 14 days ago • 41
Privasis: Synthesizing the Largest "Public" Private Dataset from Scratch Paper • 2602.03183 • Published 16 days ago • 11
LightOnOCR-2 🦉 Collection LightOnOCR-2-1B: a lightweight high-performance end-to-end OCR model family • 12 items • Updated about 3 hours ago • 22
Next-Embedding Prediction Makes Strong Vision Learners Paper • 2512.16922 • Published Dec 18, 2025 • 87
view article Article Tokenization in Transformers v5: Simpler, Clearer, and More Modular +4 Dec 18, 2025 • 120
On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models Paper • 2512.07783 • Published Dec 8, 2025 • 38
PaperDebugger: A Plugin-Based Multi-Agent System for In-Editor Academic Writing, Review, and Editing Paper • 2512.02589 • Published Dec 2, 2025 • 71
Wikontic: Constructing Wikidata-Aligned, Ontology-Aware Knowledge Graphs with Large Language Models Paper • 2512.00590 • Published Nov 29, 2025 • 48
CLaRa: Bridging Retrieval and Generation with Continuous Latent Reasoning Paper • 2511.18659 • Published Nov 24, 2025 • 24
O-Mem: Omni Memory System for Personalized, Long Horizon, Self-Evolving Agents Paper • 2511.13593 • Published Nov 17, 2025 • 27
OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe Paper • 2511.16334 • Published Nov 20, 2025 • 93
Nemotron-Personas Collection A collection of multilingual, region-specific synthetic persona datasets that support sovereign AI development across many countries and regions. • 5 items • Updated 15 days ago • 22