Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
Sherlock's picture
7 6 11

Sherlock

eyuansu71
lilaczheng's profile picture david-future's profile picture 21world's profile picture
·
https://scholar.google.com/citations?user=75pkx3YAAAAJ&hl=en

AI & ML interests

None yet

Organizations

Beijing Academy of Artificial Intelligence's profile picture FlagEval's profile picture The BIRD Team's profile picture LiveSQLBench's profile picture

upvoted 2 papers 3 months ago

Beyond Multiple Choice: Verifiable OpenQA for Robust Vision-Language RFT

Paper • 2511.17405 • Published Nov 21, 2025 • 11

Do Vision-Language Models Measure Up? Benchmarking Visual Measurement Reading with MeasureBench

Paper • 2510.26865 • Published Oct 30, 2025 • 12
upvoted 2 papers 5 months ago

SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks?

Paper • 2509.16941 • Published Sep 21, 2025 • 21

FlagEval Findings Report: A Preliminary Evaluation of Large Reasoning Models on Automatically Verifiable Textual and Visual Questions

Paper • 2509.17177 • Published Sep 21, 2025 • 13
upvoted a paper 6 months ago

Beyond Solving Math Quiz: Evaluating the Ability of Large Reasoning Models to Ask for Information

Paper • 2508.11252 • Published Aug 15, 2025 • 3
upvoted a paper 8 months ago

SWE-SQL: Illuminating LLM Pathways to Solve User SQL Issues in Real-World Applications

Paper • 2506.18951 • Published Jun 23, 2025 • 21
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs