Can LLM Agents Be CFOs? A Benchmark for Resource Allocation in Dynamic Enterprise Environments
Paper • 2603.23638 • Published • 11
None defined yet.
ParEVO: Synthesizing Code for Irregular Data: High-Performance Parallelism through Agentic Evolution
QEDBENCH: Quantifying the Alignment Gap in Automated Evaluation of University-Level Mathematical Proofs