Nice model, any info on scripts used to quantize?
#1
by RonanMcGovern - opened
and also commands for running with vLLM? Thanks
Just pass the stub to vLLM and it will run.
For the scripts, we have a bunch of examples in the vllm-project/llm-compressor repo for fp8. Just swap in the Llama 3.3 HF stub and youre good to go.
mgoin changed discussion status to closed