Update FINDINGS_PAPER.md with latest benchmarks and standardize DeepSeek-V3 ID 12b944d DeepParmar commited on about 23 hours ago
Add detailed model performance reasoning across all benchmark documentation 40ab31f DeepParmar commited on about 23 hours ago
Update docs with latest HF Native and OpenRouter benchmark scores bd428dc DeepParmar commited on about 23 hours ago
Compliance fix: Move inference.py to repo root strictly enforcing OpenEnv hackathon submission rules 4e7c1df DeepParmar commited on about 23 hours ago
Refine UI layout and remove raw terminal logs from benchmark records 5966a06 DeepParmar commited on about 24 hours ago
Untrack AUDIT_RESULTS.md and add to gitignore per user request 0793608 DeepParmar commited on 1 day ago
Final cleanup: Remove redundant testing scripts, un-track logs, sanitize comments c43ae5c DeepParmar commited on 1 day ago
Update master record with massive confidence table and exact native module names 48ab79c DeepParmar commited on 1 day ago
Add compiled benchmark_comparison, HF Native serverless testing logs 88518e4 DeepParmar commited on 1 day ago
Add final senior review checklist, final test-2last.txt tests with 5 frontier models against live HF Space! 8cddc5b DeepParmar commited on 1 day ago
Update last-test.txt and final-result.txt with fresh benchmark data f068648 DeepParmar commited on 1 day ago
Finalize submission: Add final-result.txt, clean up OpenRouter API keys from scripts, remove pycache, update logs 149378d DeepParmar commited on 1 day ago
Add extreme final submission tests (48 tests: math, load, cross-file, adversarial, compliance) 4757a2e DeepParmar commited on 1 day ago
fix: clamp inference score strictly to 0.999 to avoid float format rounding to 1.000 41aa728 DeepParmar commited on 3 days ago
chore: audit findings fixes, openenv yaml updates, run_benchmark config, and audit results report d64e3c6 DeepParmar commited on 3 days ago