MolmoWeb-Data Collection This is the collection of all datasets in MolmoWebMix. • 6 items • Updated 7 days ago • 19
OpenSeeker: Democratizing Frontier Search Agents by Fully Open-Sourcing Training Data Paper • 2603.15594 • Published 15 days ago • 148
Running on CPU Upgrade 212 The Synthetic Data Playbook: Generating Trillions of the Finest Tokens 📝 212 Explore synthetic data experiments as an interactive bookshelf
OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration Paper • 2602.05400 • Published Feb 5 • 349
Green-VLA: Staged Vision-Language-Action Model for Generalist Robots Paper • 2602.00919 • Published Jan 31 • 319