Can you release an FP8 version for vLLM as well please? NVFP4 might not be so good.

#2
by AQLabs - opened

FP8 is also hardware accelerated on Ada+ GPUs, NVFP4 might be too lossy

Working on it - I want the fp8 version to run in a single 5090 comfortably, so I am selectively quantizing certain layers to int4(which is natively supported pre-blackwell cards), with fp8 being the default for most of the layers. Gonna benchmark it and publish it. Will let you know.

Appreciate it. The FP8 version from Qwen runs great on 48GB VRAM. If you can publish your FP8 alongside FP8-INT4 that would be much appreciated.

https://huggingface.co/mconcat/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-FP8-Dynamic

This one is the FP8-BF16 version, not INT4. Should be matching with your use case.

Much appreciated!

AQLabs changed discussion status to closed

Sign up or log in to comment