Gaming the Judge: Unfaithful Chain-of-Thought Can Undermine Agent Evaluation Paper • 2601.14691 • Published Jan 21 • 1