Submit video model evaluation results to a public benchmark
Evaluate AI model predictions with correctness scores