mconcat/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-NVFP4 · Can you release an FP8 version for vLLM as well please? NVFP4 might not be so good.

Can you release an FP8 version for vLLM as well please? NVFP4 might not be so good.

by AQLabs - opened Mar 11

Discussion

AQLabs

Mar 11

FP8 is also hardware accelerated on Ada+ GPUs, NVFP4 might be too lossy

mconcat

Owner Mar 11

•

edited Mar 11

Working on it - I want the fp8 version to run in a single 5090 comfortably, so I am selectively quantizing certain layers to int4(which is natively supported pre-blackwell cards), with fp8 being the default for most of the layers. Gonna benchmark it and publish it. Will let you know.

AQLabs

Mar 11

Appreciate it. The FP8 version from Qwen runs great on 48GB VRAM. If you can publish your FP8 alongside FP8-INT4 that would be much appreciated.

mconcat

Owner Mar 11

•

edited Mar 11

https://huggingface.co/mconcat/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-FP8-Dynamic

This one is the FP8-BF16 version, not INT4. Should be matching with your use case.

AQLabs

Mar 11

Much appreciated!

AQLabs changed discussion status to closed Mar 11

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment