Vision in calibration

When I tested the NVFP4A4 version, I noticed that when you feed it images, the model tends to repeat its own chain-of-thought a lot, saying things over and over like, “I should figure out how to improve my answer.” You can try turning off thinking mode, or setting presence_penalty=1.5. That seems to reduce the failure rate from almost 100% down to around 25%, at least in my experience.
You could also just try the NVFP4A16 version instead. It doesn’t have this issue, and on simple prompts it doesn’t ramble nearly as much as NVFP4A4. Even on more complex coding benchmarks like LiveCodeBench v6, NVFP4A16 is about 20% faster than NVFP4A4.
https://huggingface.co/YCWTG/Qwen3.5-35B-A3B-Instruct-NVFP4A16

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment