Was training data decontamination applied when evaluating on RealBench?
Hi, thanks for the great work on InCoder!
I am one of the authors of RealBench. We noticed that some of InCoder's outputs show relatively high similarity to the reference solutions in our benchmark. We were wondering whether the training data decontamination procedure described in our paper was applied during evaluation?
We'd be happy to help if needed. Thanks!
Hi, thank you for reaching out and for your work on RealBench!
We appreciate your concern regarding potential data contamination. We'd like to clarify the following:
First, all of our training data has undergone a thorough deduplication process, so the risk of benchmark leakage is already mitigated on our end.
Second, we noticed that the reference solutions in your benchmark seem to be model-generated and are not always correct for they do not fully pass your own evaluation pipeline. So even if memorized during training, it would not provide a meaningful advantage.
Therefore, we believe data contamination is not a concern in our evaluation. We're happy to discuss further if needed. Thanks again!