Sleeping Agents MultiHop Reasoning Trainer 🧠 Train and evaluate a Qwen3 model for multi‑hop reasoning