view article Article Keep the Tokens Flowing: Lessons from 16 Open-Source RL Libraries +7 Mar 10 β’ 124
Manimator: Transforming Research Papers into Visual Explanations Paper β’ 2507.14306 β’ Published Jul 18, 2025 β’ 4
Aryabhata: An exam-focused language model for JEE Math Paper β’ 2508.08665 β’ Published Aug 12, 2025 β’ 17
Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment Paper β’ 2412.19326 β’ Published Dec 26, 2024 β’ 18
ChatRex: Taming Multimodal LLM for Joint Perception and Understanding Paper β’ 2411.18363 β’ Published Nov 27, 2024 β’ 10
LEOPARD : A Vision Language Model For Text-Rich Multi-Image Tasks Paper β’ 2410.01744 β’ Published Oct 2, 2024 β’ 27
TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models Paper β’ 2109.10282 β’ Published Sep 21, 2021 β’ 13
Medical Multimodal Datasets Collection Datasets that can be used to train and/or evaluate medical multimodal models. β’ 3 items β’ Updated Dec 9, 2023 β’ 2
StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation Paper β’ 2405.01434 β’ Published May 2, 2024 β’ 56
Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model Paper β’ 2402.07827 β’ Published Feb 12, 2024 β’ 48
VideoDirectorGPT: Consistent Multi-scene Video Generation via LLM-Guided Planning Paper β’ 2309.15091 β’ Published Sep 26, 2023 β’ 35