Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
12
1
Thomas Broadley
tbroadley
Follow
0 followers
·
1 following
AI & ML interests
None yet
Recent Activity
updated
a dataset
about 13 hours ago
metr-evals/apps-with-input-validation
new
activity
about 13 hours ago
metr-evals/apps-with-input-validation:
Unify verification scripts into verify.py, add AGENTS.md
new
activity
about 13 hours ago
metr-evals/apps-with-input-validation:
Fix trailing empty strings in 94 train strs samples
View all activity
Organizations
tbroadley
's activity
All
Models
Datasets
Spaces
Papers
Collections
Community
Posts
Upvotes
Likes
Articles
updated
a dataset
about 13 hours ago
metr-evals/apps-with-input-validation
Preview
•
Updated
about 2 hours ago
•
208
New activity in
metr-evals/apps-with-input-validation
about 13 hours ago
Unify verification scripts into verify.py, add AGENTS.md
#12 opened about 13 hours ago by
tbroadley
Fix trailing empty strings in 94 train strs samples
#11 opened about 13 hours ago by
tbroadley
Fix trailing spaces in expected test outputs (89 samples across train and test)
#10 opened about 14 hours ago by
tbroadley
New activity in
metr-evals/apps-with-input-validation
about 14 hours ago
Fix trailing spaces in expected test outputs (95 samples across train and test)
1
#8 opened about 15 hours ago by
tbroadley
Re-apply fixes from PR #2 and PR #3 that were accidentally reverted by PR #5
#9 opened about 15 hours ago by
tbroadley
New activity in
metr-evals/apps-with-input-validation
about 15 hours ago
Fix trailing spaces in test output for sample 3341
2
#7 opened about 15 hours ago by
tbroadley
New activity in
metr-evals/apps-with-input-validation
4 days ago
Remove counterexample sample (id=737) from train split
#6 opened 4 days ago by
tbroadley
New activity in
metr-evals/apps-with-input-validation
28 days ago
Fix expected outputs to match golden solutions
#5 opened 28 days ago by
tbroadley
Fix expected outputs to match golden solutions
1
#4 opened 28 days ago by
tbroadley
New activity in
metr-evals/apps-with-input-validation
about 1 month ago
Remove 615 problems with ambiguous outputs
#3 opened about 1 month ago by
tbroadley
New activity in
metr-evals/apps-with-input-validation
about 2 months ago
Fix test set: whitespace issues, dependency errors, and Python 3.11+ compatibility
#2 opened about 2 months ago by
tbroadley
Fix whitespace discrepancies in expected test outputs
#1 opened about 2 months ago by
tbroadley
upvoted
a
paper
3 months ago
Measuring AI Ability to Complete Long Tasks
Paper
•
2503.14499
•
Published
Mar 18, 2025
•
16
authored
a paper
12 months ago
Measuring AI Ability to Complete Long Tasks
Paper
•
2503.14499
•
Published
Mar 18, 2025
•
16