QuantTrio
/

GLM-5-AWQ

Text Generation

4-bit precision

Model card Files Files and versions

vllm部署失败

#3

by Yuxin362 - opened 12 days ago

有人使用vllm的docker镜像部署成功的吗，我使用最新nightly版本的vllm docker，在cuda13.0环境下，还是显示找不到glm_moe_dsa，需要升级Transform，具体报错和下面的issue一样：https://github.com/vllm-project/recipes/issues/246

同样，使用sglang:dev，sglang:glm5，vllm:v0.17.0，vllm:glm5都不行

vllm:glm5构建的docker也不行吗，我正想自己构建下docker试试呢

vllm:glm5构建的docker也不行吗，我正想自己构建下docker试试呢

我是使用K8S部署的，无法部署成功，但是kimi2.5可以，现在用的是kimi2.5

QuantTrio org 10 days ago

Has anyone observed this issue with the non-quantized version as well?

QuantTrio org 10 days ago

Updating transformers via pip install -U transformers should fix the issue.

我使用pip install -U更新docker后成功启动了GLM-5，但是在8*H100环境下，平均只有5 Token/s，完全不可用，有人遇到过这种情况吗？

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment