HriDal/agent-2048-game-qwen-7b-2k-ds Reinforcement Learning • 8B • Updated Apr 1, 2025 • 1 • 1
Alibaba-NLP/Tongyi-DeepResearch-30B-A3B Text Generation • 31B • Updated Oct 10, 2025 • 14.6k • 808
view article Article Kimina-Prover: Applying Test-time RL Search on Large Formal Reasoning Models Jul 10, 2025 • 54