Quantization command

#1
by utarn - opened

Would you mind sharing and guide me how to quantize this model to nvfp4?

This is quantized using nvidia modelopt:
python hf_ptq.py --pyt_ckpt_path <gpt-oss-120b> --qformat nvfp4 --export_path <gpt-oss-120b-nvfp4> --trust_remote_code

with the latest modelopt main branch https://github.com/NVIDIA/TensorRT-Model-Optimizer/tree/main/examples/llm_ptq

Would you please share some performance data.
Can you still convert the model to GGUF?
Would the size bloat to 240GB again if you convert to GGUF?

Sign up or log in to comment