Vision in calibration
Does the lack of image input in calibration dataset make the vision capability worsen?
Vision does not need calibration as the vision layers are not quantized.
FYI I have updated the model to include MTP just now.
Does the lack of image input in calibration dataset make the vision capability worsen?
When I tested the NVFP4A4 version, I noticed that when you feed it images, the model tends to repeat its own chain-of-thought a lot, saying things over and over like, “I should figure out how to improve my answer.” You can try turning off thinking mode, or setting presence_penalty=1.5. That seems to reduce the failure rate from almost 100% down to around 25%, at least in my experience.
You could also just try the NVFP4A16 version instead. It doesn’t have this issue, and on simple prompts it doesn’t ramble nearly as much as NVFP4A4. Even on more complex coding benchmarks like LiveCodeBench v6, NVFP4A16 is about 20% faster than NVFP4A4.
https://huggingface.co/YCWTG/Qwen3.5-35B-A3B-Instruct-NVFP4A16