CU-Benchmarks
updated
visualwebbench/VisualWebBench
Viewer
• Updated
• 1.54k • 1.33k
• 18
Updated
• 90
• 6
rootsautomation/RICO-ScreenQA
Viewer
• Updated
• 86k • 178
• 11
rootsautomation/ScreenSpot
Viewer
• Updated
• 1.27k • 1.33k
• 44
Viewer
• Updated
• 1.27k • 925
• 8
Viewer
• Updated
• 1.59k • 2.07k
• 44
Preview
• Updated
• 1.72k
• 15
Preview
• Updated
• 4.24k
• 25
Viewer
• Updated
• 168k • 283
• 5
Preview
• Updated
• 10
osunlp/Multimodal-Mind2Web
Viewer
• Updated
• 14.2k • 3.49k
• 91
Viewer
• Updated
• 259 • 140
• 2
Viewer
• Updated
• 253 • 3.63k
• 123
Viewer
• Updated
• 7.74k • 4.27k
• 26
xlangai/ubuntu_osworld_file_cache
Updated
• 315k
• 3
Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale
Paper
• 2409.08264
• Published
• 48
AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents
Paper
• 2405.14573
• Published
Viewer
• Updated
• 1.21k • 232
• 5