Ishant06/Qwen3.5-0.8B-Claude-4.6-Opus-Reasoning-Distilled Text Generation • 0.8B • Updated 1 day ago • 252 • 4
view article Article DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge Feb 7, 2025 • 280
Running on Zero Featured 5.38k IllusionDiffusion 👁 5.38k Generate stunning high quality illusion artwork