ValueError: The output_size of gate's and up's weight = 192 is not divisible by weight quantization block_n = 128
#3
by boydcheung - opened
Hello, could you help with the error from direct loading with vllm
(Worker_TP3 pid=2747862) INFO 12-17 20:09:15 [fp8.py:166] Using Marlin backend for FP8 MoE
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750] WorkerProc failed to start.
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750] Traceback (most recent call last):
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750] File "/mnt/vlm/common/env/vllm/lib/python3.11/site-packages/vllm/v1/executor/multiproc_executor.py", line 722, in worker_main
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750] worker = WorkerProc(*args, **kwargs)
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750] File "/mnt/vlm/common/env/vllm/lib/python3.11/site-packages/vllm/v1/executor/multiproc_executor.py", line 562, in __init__
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750] self.worker.load_model()
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750] File "/mnt/vlm/common/env/vllm/lib/python3.11/site-packages/vllm/v1/worker/gpu_worker.py", line 273, in load_model
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750] self.model_runner.load_model(eep_scale_up=eep_scale_up)
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750] File "/mnt/vlm/common/env/vllm/lib/python3.11/site-packages/vllm/v1/worker/gpu_model_runner.py", line 3484, in load_model
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750] self.model = model_loader.load_model(
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750] ^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750] File "/mnt/vlm/common/env/vllm/lib/python3.11/site-packages/vllm/model_executor/model_loader/base_loader.py", line 49, in load_model
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750] model = initialize_model(
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750] ^^^^^^^^^^^^^^^^^
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750] File "/mnt/vlm/common/env/vllm/lib/python3.11/site-packages/vllm/model_executor/model_loader/utils.py", line 48, in initialize_model
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750] return model_class(vllm_config=vllm_config, prefix=prefix)
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750] File "/mnt/vlm/common/env/vllm/lib/python3.11/site-packages/vllm/model_executor/models/qwen3_vl_moe.py", line 436, in __init__
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750] self.language_model = Qwen3MoeLLMForCausalLM(
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750] ^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750] File "/mnt/vlm/common/env/vllm/lib/python3.11/site-packages/vllm/model_executor/models/qwen3_vl_moe.py", line 337, in __init__
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750] self.model = Qwen3MoeLLMModel(
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750] ^^^^^^^^^^^^^^^^^
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750] File "/mnt/vlm/common/env/vllm/lib/python3.11/site-packages/vllm/compilation/decorators.py", line 291, in __init__
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750] old_init(self, **kwargs)
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750] File "/mnt/vlm/common/env/vllm/lib/python3.11/site-packages/vllm/model_executor/models/qwen3_vl_moe.py", line 86, in __init__
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750] super().__init__(vllm_config=vllm_config, prefix=prefix)
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750] File "/mnt/vlm/common/env/vllm/lib/python3.11/site-packages/vllm/compilation/decorators.py", line 291, in __init__
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750] old_init(self, **kwargs)
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750] File "/mnt/vlm/common/env/vllm/lib/python3.11/site-packages/vllm/model_executor/models/qwen3_moe.py", line 412, in __init__
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750] self.start_layer, self.end_layer, self.layers = make_layers(
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750] ^^^^^^^^^^^^
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750] File "/mnt/vlm/common/env/vllm/lib/python3.11/site-packages/vllm/model_executor/models/utils.py", line 605, in make_layers
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750] + [
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750] ^
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750] File "/mnt/vlm/common/env/vllm/lib/python3.11/site-packages/vllm/model_executor/models/utils.py", line 606, in <listcomp>
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750] maybe_offload_to_cpu(layer_fn(prefix=f"{prefix}.{idx}"))
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750] File "/mnt/vlm/common/env/vllm/lib/python3.11/site-packages/vllm/model_executor/models/qwen3_moe.py", line 414, in <lambda>
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750] lambda prefix: Qwen3MoeDecoderLayer(vllm_config=vllm_config, prefix=prefix),
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750] File "/mnt/vlm/common/env/vllm/lib/python3.11/site-packages/vllm/model_executor/models/qwen3_moe.py", line 353, in __init__
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750] self.mlp = Qwen3MoeSparseMoeBlock(
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750] ^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750] File "/mnt/vlm/common/env/vllm/lib/python3.11/site-packages/vllm/model_executor/models/qwen3_moe.py", line 163, in __init__
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750] self.experts = FusedMoE(
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750] ^^^^^^^^^
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750] File "/mnt/vlm/common/env/vllm/lib/python3.11/site-packages/vllm/model_executor/layers/fused_moe/layer.py", line 652, in __init__
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750] self.quant_method.create_weights(layer=self, **moe_quant_params)
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750] File "/mnt/vlm/common/env/vllm/lib/python3.11/site-packages/vllm/model_executor/layers/quantization/fp8.py", line 724, in create_weights
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750] raise ValueError(
(Worker_TP6 pid=2747865) ERROR 12-17 20:09:15 [multiproc_executor.py:750] ValueError: The output_size of gate's and up's weight = 192 is not divisible by weight quantization block_n = 128.
Decrease your tensor_parallel_size. If you're using TP=16, try TP=8 or TP=4. This worked for me.