Unable to run on SGLang?

by nbaughman - opened Nov 5, 2025

Nov 5, 2025

•

edited Nov 5, 2025

Thanks for creating these quants, its very much appreciated.

Unfortunately, I've had difficulty running them on SGLang. This applies to both the 8bit as well as the 4bit. I've had no issues with vLLM though. Qwen3-30B-A3B-Instruct-2507-AWQ-4bit works just fine. Any ideas?

services:
    sglang:
        image: lmsysorg/sglang:latest
        ipc: "host"
        shm_size: 16GB
        ports:
            - 8001:8001

        volumes:
            - ./models:/models

        command: >
            --model-path /models/Qwen3-Next-80B-A3B-Instruct-AWQ-8bit
            --host 0.0.0.0
            --port 8001
            --log-level info
            --enable-metrics
            --dtype float16

        entrypoint: ["python3", "-m", "sglang.launch_server"]

        healthcheck:
            test: ["CMD", "curl", "-f", "http://0.0.0.0:8001/v1/models"]
            interval: 30s
            timeout: 5s
            retries: 20

        deploy:
            resources:
                reservations:
                    devices:
                        - driver: nvidia
                          count: all
                          capabilities: [gpu]

Loading safetensors checkpoint shards:   0% Completed | 0/18 [00:00<?, ?it/s]
sglang-1  | [2025-11-05 00:14:14] Scheduler hit an exception: Traceback (most recent call last):
sglang-1  |   File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 2786, in run_scheduler_process
sglang-1  |     scheduler = Scheduler(
sglang-1  |                 ^^^^^^^^^^
sglang-1  |   File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 319, in __init__
sglang-1  |     self.tp_worker = TpModelWorker(
sglang-1  |                      ^^^^^^^^^^^^^^
sglang-1  |   File "/sgl-workspace/sglang/python/sglang/srt/managers/tp_worker.py", line 235, in __init__
sglang-1  |     self._model_runner = ModelRunner(
sglang-1  |                          ^^^^^^^^^^^^
sglang-1  |   File "/sgl-workspace/sglang/python/sglang/srt/model_executor/model_runner.py", line 319, in __init__
sglang-1  |     self.initialize(min_per_gpu_memory)
sglang-1  |   File "/sgl-workspace/sglang/python/sglang/srt/model_executor/model_runner.py", line 395, in initialize
sglang-1  |     self.load_model()
sglang-1  |   File "/sgl-workspace/sglang/python/sglang/srt/model_executor/model_runner.py", line 749, in load_model
sglang-1  |     self.model = get_model(
sglang-1  |                  ^^^^^^^^^^
sglang-1  |   File "/sgl-workspace/sglang/python/sglang/srt/model_loader/__init__.py", line 28, in get_model
sglang-1  |     return loader.load_model(
sglang-1  |            ^^^^^^^^^^^^^^^^^^
sglang-1  |   File "/sgl-workspace/sglang/python/sglang/srt/model_loader/loader.py", line 595, in load_model
sglang-1  |     self.load_weights_and_postprocess(
sglang-1  |   File "/sgl-workspace/sglang/python/sglang/srt/model_loader/loader.py", line 603, in load_weights_and_postprocess
sglang-1  |     model.load_weights(weights)
sglang-1  |   File "/sgl-workspace/sglang/python/sglang/srt/models/qwen3_next.py", line 1054, in load_weights
sglang-1  |     param = params_dict[name]
sglang-1  |             ~~~~~~~~~~~^^^^^^
sglang-1  | KeyError: 'model.layers.5.mlp.shared_expert.gate_gate_up_proj.weight'
sglang-1  | 
sglang-1  | [2025-11-05 00:14:14] Received sigquit from a child process. It usually means the child failed.
Loading safetensors checkpoint shards:   0% Completed | 0/18 [00:01<?, ?it/s]

QC38

Feb 1

the same for me :(

cpatonn

cyankiwi org Feb 2

Thanks for letting me know. If seems that sglang and vllm initiates the model differently and therefore, results in this error. I will look at this and give you an update of any fixes.

CHNtentes

Feb 2

Personally, I always use vllm to run these AWQ models, since llmcompressor is part of vllm-project.

cpatonn

cyankiwi org Feb 3

Please redownload the config.json file, it should load with sglang now :)

QC38

Feb 3

Hey Cpatonn, thanks mate, that works perfectly ! GAFAM => 😁

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment