Running on Zero 1 Genesis 1B Run 2 Playground 🔥 1 Generate text with a live‑training 1B‑parameter model
Running on Zero 1 Genesis 1B Run 2 Playground 🔥 1 Generate text with a live‑training 1B‑parameter model
Running on Zero 1 Genesis 1B Run 2 Playground 🔥 1 Generate text with a live‑training 1B‑parameter model
view post Post 144 Genesis 1B is now public. 🔥I’m training a 1.003B parameter model from scratch on 2× RTX 4090s and opened a public playground for early checkpoints.The real bottleneck wasn’t training. It was checkpointing:FSDP full-state gather over PCIe = NCCL timeout hellSwitching to DCP sharded checkpoints changed the trajectory of the run.- Playground: rob-x-ai/genesis-1b-playground- Write-up: https://kroonen.ai/blog/distributed-checkpoint-failures-rtx4090/ See translation 🔥 1 1 + Reply