VI-CuRL: Stabilizing Verifier-Independent RL Reasoning via Confidence-Guided Variance Reduction Paper β’ 2602.12579 β’ Published 26 days ago β’ 2
Reinforcement Learning with Verifiable yet Noisy Rewards under Imperfect Verifiers Paper β’ 2510.00915 β’ Published Oct 1, 2025 β’ 2