LLM-as-a-Judge & Reward Model: What They Can and Cannot Do Paper • 2409.11239 • Published Sep 17, 2024 • 3
view article Article DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge Feb 7, 2025 • 289
Open LLM Leaderboard best models ❤️🔥 Collection A daily uploaded list of models with best evaluations on the LLM leaderboard: • 50 items • Updated Mar 13 • 680