FrenchBench Evaluation datasets Collection These datasets are used to evaluate models on French performance using: https://github.com/EleutherAI/lm-evaluation-harness (from CroissantLLM paper) • 11 items • Updated Jun 7, 2024 • 8
AutoResearch-RL: Perpetual Self-Evaluating Reinforcement Learning Agents for Autonomous Neural Architecture Discovery Paper • 2603.07300 • Published 8 days ago • 15
The Trinity of Consistency as a Defining Principle for General World Models Paper • 2602.23152 • Published 17 days ago • 196
CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation Paper • 2602.24286 • Published 16 days ago • 87
CHIMERA: Compact Synthetic Data for Generalizable LLM Reasoning Paper • 2603.00889 • Published 14 days ago • 50
Heterogeneous Agent Collaborative Reinforcement Learning Paper • 2603.02604 • Published 12 days ago • 174
DARE: Aligning LLM Agents with the R Statistical Ecosystem via Distribution-Aware Retrieval Paper • 2603.04743 • Published 10 days ago • 47