SWITCH: Benchmarking Modeling and Handling of Tangible Interfaces in Long-horizon Embodied Scenarios Paper • 2511.17649 • Published Nov 20, 2025 • 4
EgoActor: Grounding Task Planning into Spatial-aware Egocentric Actions for Humanoid Robots via Visual-Language Models Paper • 2602.04515 • Published Feb 4 • 38
AMO-Bench: Large Language Models Still Struggle in High School Math Competitions Paper • 2510.26768 • Published Oct 30, 2025 • 34
DeepPHY: Benchmarking Agentic VLMs on Physical Reasoning Paper • 2508.05405 • Published Aug 7, 2025 • 64
DualTHOR: A Dual-Arm Humanoid Simulation Platform for Contingency-Aware Planning Paper • 2506.16012 • Published Jun 19, 2025 • 22