Ai2 Open Coding Agents - Django, Sphinx, Sympy Data
AI & ML interests
Building breatkthrough AI to solve the world's biggest problems.
Recent Activity
View all activity
Papers
How2Everything: Mining the Web for How-To Procedures to Evaluate and Improve LLMs
Recurrent-Depth VLA: Implicit Test-Time Compute Scaling of Vision-Language-Action Models via Latent Iterative Reasoning
Organization Card
spaces
13
pinned
Running
17
AstaBench Leaderboard
🥇
View benchmark leaderboards
pinned
Running
420
Reward Bench Leaderboard
📐
Explore and compare LLM reward benchmark scores
pinned
Running
2
HREF Leaderboard
📐
Browse and search HREF leaderboard data
pinned
Running
91
Zebra Logic Bench
🦓
Display and explore a leaderboard for model evaluations
pinned
Running
3
SUPER Leaderboard
🤖
Display a static leaderboard from a JSON file
pinned
Running
53
ZeroEval Leaderboard
📊
Embed ZeroEval for evaluation
models
851
allenai/Sera-4.5A-Sympy-T2
Updated
•
1
allenai/SERA-14B
425k
•
Updated
•
60
•
8
allenai/SERA-8B-GA
8B
•
Updated
•
47
•
13
allenai/SERA-32B-GA
677k
•
Updated
•
39
•
19
allenai/SERA-8B
8B
•
Updated
•
11.8k
•
35
allenai/olmo-3-hybrid-tokenizer-think-dev
Updated
•
3
allenai/SERA-32B
677k
•
Updated
•
1.02k
•
97
allenai/Olmo-3-1025-7B
Text Generation
•
7B
•
Updated
•
60.1k
•
48
allenai/HiRO-ACE
Updated
•
2
•
13
allenai/Molmo2-O-7B
Image-Text-to-Text
•
8B
•
Updated
•
39.6k
•
19
datasets
365
allenai/molmospaces
Updated
•
35
•
5
allenai/Molmo2-AskModelAnything
Viewer
•
Updated
•
129k
•
180
•
3
allenai/Molmo2-VideoSubtitleQA
Viewer
•
Updated
•
469k
•
241
•
2
allenai/Molmo2-VideoCapQA
Viewer
•
Updated
•
951k
•
230
•
5
allenai/Molmo2-CapEval
Viewer
•
Updated
•
693
•
192
•
1
allenai/Sera-4.5A-Sphinx-T1
Viewer
•
Updated
•
16.4k
•
43
•
1
allenai/Sera-4.5A-Sympy-T1
Viewer
•
Updated
•
18.2k
•
44
•
1
allenai/Sera-4.5A-Django-T1
Viewer
•
Updated
•
16.2k
•
47
•
1
allenai/Sera-4.5A-Django-T2
Viewer
•
Updated
•
14.6k
•
44
•
1
allenai/Sera-4.5A-Sympy-T2
Viewer
•
Updated
•
25.4k
•
41
•
1