Keep on Pruning

by lazyDataScientist - opened Feb 10

Discussion

lazyDataScientist

Feb 10

Can we get this bad boy down to 40B, please 🙏

nivvis

Owner Feb 11

•

edited Feb 11

lol i hear you. 40B params? or 40GB? some details for anyone curious ..

I don't think REAP goes much further than 40% cut .. that's already very aggressive and what REAP author did (lkevincc0 -- not me)

40GB is probably feasible with ik llama quants .. (2 or 3bit?) but 40% REAP takes 5-15% quality .. Q4 normally takes another 5-10% prob worse with REAP. going past that .. 🤷 . I actually haven't tried this q4 gguf yet but should work ok.

I don't think 40B params is even close to feasible. would be fun to try with some RL healing tho but I don't have a good DPO setup.

lazyDataScientist

Feb 11

40B parameters, lol

I have seen a team drop the Qwen 3 Coder Next 80B down to a 20B model (60% pruned), it is pretty smart or a pruned and then quantized model.

nivvis

Owner Feb 11

Not saying it wouldn't be usable -- imo at that big of a drop would use some RL to be useful (IMO)

but 40B + baby DPO would probably get a pretty great flash model. point me at a fresh DPO dataset.👀

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment