Any plans for distilling opus to 3.6 35B?

#2
by DXBTR74 - opened

MoE models are of better inference efficiency on low-VRAM devices. 3.6 35B Q4 has nearly the same tps as 9B Q4 on my 2060 laptop.

Sign up or log in to comment