Suggestion: Separate Leaderboards for Constrained vs. Unconstrained Training Data

#1
by qwerwenqing - opened

Hi organizers,

A few days ago, I remember the leaderboard description specified that models should be trained only on ASVspoof 2019 in order to accurately evaluate generalization. However, it seems that this restriction was recently removed, and several models trained on large-scale external datasets has now been uploaded and evaluated together with constrained models.

This creates a fairness issue: models trained strictly under the original constraint cannot compete on equal footing with models using significantly larger training data. It also affects the scientific value of the benchmark, since training data size directly influences performance.

To maintain fairness and meaningful comparisons, I strongly suggest creating two separate leaderboards:

  1. Constrained Track โ€” training strictly on ASVspoof 2019
  2. Unconstrained Track โ€” training with any additional data

This is consistent with standard practice in many audio anti-spoofing and deepfake detection challenges (e.g., ASVspoof5), where constrained vs. unconstrained settings are evaluated separately.

Thank you for considering this. I believe that separating the leaderboards will make the benchmark more transparent, fair, and scientifically valuable for everyone.

Hello,

The initial release never stated that its restricted to asv19 trained systems, and objective was to proliferate systems using multiple datasets for training.

We are not running a challenge to create another constrained track.

In upcoming months, we will release new updated version of speech DF arena with more checks to ensure transparent, fair and scientifically valuable.

Regards
Speech Arena Team

Speech-Arena-2025 changed discussion status to closed

Sign up or log in to comment