ValueError: The output_size of gate's and up's weight = 192 is not divisible by weight quantization block_n = 128

#3
by boydcheung - opened

Hello, could you help with the error from direct loading with vllm

(Worker_TP3 pid=2747862) INFO 12-17 20:09:15 [fp8.py:166] Using Marlin backend for FP8 MoE
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750] WorkerProc failed to start.
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750] Traceback (most recent call last):
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750]   File "/mnt/vlm/common/env/vllm/lib/python3.11/site-packages/vllm/v1/executor/multiproc_executor.py", line 722, in worker_main
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750]     worker = WorkerProc(*args, **kwargs)
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750]   File "/mnt/vlm/common/env/vllm/lib/python3.11/site-packages/vllm/v1/executor/multiproc_executor.py", line 562, in __init__
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750]     self.worker.load_model()
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750]   File "/mnt/vlm/common/env/vllm/lib/python3.11/site-packages/vllm/v1/worker/gpu_worker.py", line 273, in load_model
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750]     self.model_runner.load_model(eep_scale_up=eep_scale_up)
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750]   File "/mnt/vlm/common/env/vllm/lib/python3.11/site-packages/vllm/v1/worker/gpu_model_runner.py", line 3484, in load_model
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750]     self.model = model_loader.load_model(
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750]                  ^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750]   File "/mnt/vlm/common/env/vllm/lib/python3.11/site-packages/vllm/model_executor/model_loader/base_loader.py", line 49, in load_model
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750]     model = initialize_model(
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750]             ^^^^^^^^^^^^^^^^^
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750]   File "/mnt/vlm/common/env/vllm/lib/python3.11/site-packages/vllm/model_executor/model_loader/utils.py", line 48, in initialize_model
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750]     return model_class(vllm_config=vllm_config, prefix=prefix)
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750]   File "/mnt/vlm/common/env/vllm/lib/python3.11/site-packages/vllm/model_executor/models/qwen3_vl_moe.py", line 436, in __init__
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750]     self.language_model = Qwen3MoeLLMForCausalLM(
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750]                           ^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750]   File "/mnt/vlm/common/env/vllm/lib/python3.11/site-packages/vllm/model_executor/models/qwen3_vl_moe.py", line 337, in __init__
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750]     self.model = Qwen3MoeLLMModel(
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750]                  ^^^^^^^^^^^^^^^^^
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750]   File "/mnt/vlm/common/env/vllm/lib/python3.11/site-packages/vllm/compilation/decorators.py", line 291, in __init__
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750]     old_init(self, **kwargs)
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750]   File "/mnt/vlm/common/env/vllm/lib/python3.11/site-packages/vllm/model_executor/models/qwen3_vl_moe.py", line 86, in __init__
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750]     super().__init__(vllm_config=vllm_config, prefix=prefix)
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750]   File "/mnt/vlm/common/env/vllm/lib/python3.11/site-packages/vllm/compilation/decorators.py", line 291, in __init__
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750]     old_init(self, **kwargs)
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750]   File "/mnt/vlm/common/env/vllm/lib/python3.11/site-packages/vllm/model_executor/models/qwen3_moe.py", line 412, in __init__
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750]     self.start_layer, self.end_layer, self.layers = make_layers(
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750]                                                     ^^^^^^^^^^^^
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750]   File "/mnt/vlm/common/env/vllm/lib/python3.11/site-packages/vllm/model_executor/models/utils.py", line 605, in make_layers
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750]     + [
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750]       ^
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750]   File "/mnt/vlm/common/env/vllm/lib/python3.11/site-packages/vllm/model_executor/models/utils.py", line 606, in <listcomp>
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750]     maybe_offload_to_cpu(layer_fn(prefix=f"{prefix}.{idx}"))
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750]                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750]   File "/mnt/vlm/common/env/vllm/lib/python3.11/site-packages/vllm/model_executor/models/qwen3_moe.py", line 414, in <lambda>
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750]     lambda prefix: Qwen3MoeDecoderLayer(vllm_config=vllm_config, prefix=prefix),
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750]                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750]   File "/mnt/vlm/common/env/vllm/lib/python3.11/site-packages/vllm/model_executor/models/qwen3_moe.py", line 353, in __init__
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750]     self.mlp = Qwen3MoeSparseMoeBlock(
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750]                ^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750]   File "/mnt/vlm/common/env/vllm/lib/python3.11/site-packages/vllm/model_executor/models/qwen3_moe.py", line 163, in __init__
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750]     self.experts = FusedMoE(
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750]                    ^^^^^^^^^
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750]   File "/mnt/vlm/common/env/vllm/lib/python3.11/site-packages/vllm/model_executor/layers/fused_moe/layer.py", line 652, in __init__
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750]     self.quant_method.create_weights(layer=self, **moe_quant_params)
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750]   File "/mnt/vlm/common/env/vllm/lib/python3.11/site-packages/vllm/model_executor/layers/quantization/fp8.py", line 724, in create_weights
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750]     raise ValueError(
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750] ValueError: The output_size of gate's and up's weight = 192 is not divisible by weight quantization block_n = 128.

Decrease your tensor_parallel_size. If you're using TP=16, try TP=8 or TP=4. This worked for me.

Sign up or log in to comment