Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up

Open Agent Evaluation Laboratory

university
https://boxiyu.github.io/
BoshCavendish
BoxiYu
boxi-yu-194b63279
Activity Feed

AI & ML interests

Code Agent, Benchmark Augmentation

Recent Activity

CWCY  updated a dataset 3 days ago
OpenAgentLab/SWE-ABS
Bertsekas  authored a paper 8 months ago
How Should I Build A Benchmark? Revisiting Code-Related Benchmarks For LLMs
Bertsekas  authored a paper 8 months ago
UTBoost: Rigorous Evaluation of Coding Agents on SWE-Bench
View all activity

Boxi Yu's profile picture CWCY's profile picture

CWCY 
updated a dataset 3 days ago

OpenAgentLab/SWE-ABS

Viewer • Updated 3 days ago • 500 • 14
Bertsekas 
authored 2 papers 8 months ago

How Should I Build A Benchmark? Revisiting Code-Related Benchmarks For LLMs

Paper • 2501.10711 • Published Jan 18, 2025

UTBoost: Rigorous Evaluation of Coding Agents on SWE-Bench

Paper • 2506.09289 • Published Jun 10, 2025 • 2
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs