Kernels
optimizer / build

Commit History

feat: extend QK-Clip to support MLA (MuonClip Algorithm 1) [skip-build] (#28)
e8e2c81
unverified

dongseokmotif Claude Sonnet 4.6 wyldecat github-actions[bot] commited on

Revert "fix: disable CUDA graphs in Newton-Schulz for cpu_offload compatibility" (#29)
313d56a
unverified

wyldecat github-actions[bot] commited on

Add built binary [skip-build]
e2f5aab

github-actions[bot] commited on

Replace cpu_offload constructor param with turn_on/turn_off API (#26)
05a75f1
unverified

wyldecat Claude Opus 4.6 (1M context) github-actions[bot] commited on

draft commit for cpu_offload (#23)
10848ab
unverified

TaehyunKim github-actions[bot] wyldecat Claude Opus 4.6 (1M context) commited on

Refactor pipeline to async generator pattern (#16)
33929c0
unverified

wyldecat github-actions[bot] commited on

Support mHC (#15)
ae32572
unverified

wyldecat github-actions[bot] commited on

Support param group with various placements (#13)
e2b41e5
unverified

wyldecat github-actions[bot] commited on

Add built binary [skip-build]
6ec5093

github-actions[bot] commited on

Add built binary [skip-build]
de5bead

github-actions[bot] commited on

Add built binary [skip-build]
aee4dc0

github-actions[bot] commited on

Add built binary [ci skip]
e93bd1e

github-actions[bot] commited on

Add built binary [ci skip]
678578a

github-actions[bot] commited on

Add built binary [ci skip]
99af74f

github-actions[bot] commited on

Add built binary [ci skip]
f88998f

github-actions[bot] commited on

feat(muon_clip) : add muon clip (#6)
d65066c
unverified

dongseokmotif dongseokmotif github-actions[bot] commited on

chore: add nix build workflow (#8)
8997e30
unverified

wyldecat github-actions[bot] commited on

feat(muon) : add tuned-abc-values & blfoat16 communication
f7faa93

wyldecat commited on

feat: update muon to receive paramgroups, not model (#4)
b0f46c7
unverified

leejunhyeok junhyeok.lee wyldecat commited on

fix(muon): add update_p stage and dealloc tensors properly
99e7c0c

wyldecat commited on

Support HSDP (#4)
8447fd1
verified

iamwyldecat commited on

support torch 2.8 (#3)
1e2b528
verified

iamwyldecat commited on

fix(muon): handle un-distributed env
1f13dae

iamwyldecat commited on

refactor(muon): change argument adam_wd to weight_decay and handle params' type
02ac540

iamwyldecat commited on

fix(muon): free tensors that are no longer needed
64757cb

iamwyldecat commited on

chore(muon): update comment
036642a

iamwyldecat commited on

chore(muon): clean build and update doc
febdf5b

iamwyldecat commited on

fix(muon): delete intermediate tensors immediately to lower peak mem usage
bdd2678

iamwyldecat commited on

chore: initial commit
8535e80

iamwyldecat commited on