What should the tool-call-parser be set to when running the model based on vllm?
#6
by wolfzr - opened
I tested and found that it does not match the common tool call parsing formats
Hi, you can try setting --tool-call-parser qwen3_xml. This should work for the tool call parsing format used by this model.
Example:
vllm serve /path/to/your/model \
--port 8080 \
--tensor-parallel-size 1 \
--data-parallel-size 8 \
--served-model-name InCoder-32B \
--disable-log-requests \
--max-model-len 131072 \
--gpu-memory-utilization 0.9 \
--trust-remote-code \
--enable-auto-tool-choice \
--tool-call-parser qwen3_xml