Quantization command
#1
by utarn - opened
Would you mind sharing and guide me how to quantize this model to nvfp4?
This is quantized using nvidia modelopt:python hf_ptq.py --pyt_ckpt_path <gpt-oss-120b> --qformat nvfp4 --export_path <gpt-oss-120b-nvfp4> --trust_remote_code
with the latest modelopt main branch https://github.com/NVIDIA/TensorRT-Model-Optimizer/tree/main/examples/llm_ptq
Would you please share some performance data.
Can you still convert the model to GGUF?
Would the size bloat to 240GB again if you convert to GGUF?