Running Agents 37 BigCodeArena ๐ 37 Compare two AI models by sending them code and seeing their responses
Running on CPU Upgrade 19 BigCodeBench Evaluator ๐ฅ 19 Evaluate code samples using specified parameters