vLLM V1 support
#1
by vcerny - opened
Hello, is this supposed to work with vLLM v0.11.2? I am getting error:
[port-8003] (EngineCore_DP0 pid=573) ERROR 12-01 05:27:02 [core.py:842] torch.AcceleratorError: CUDA error: the provided PTX was compiled with an unsupported toolchain.
[port-8003] (EngineCore_DP0 pid=573) ERROR 12-01 05:27:02 [core.py:842] Search for `cudaErrorUnsupportedPtxVersion' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
[port-8003] (EngineCore_DP0 pid=573) ERROR 12-01 05:27:02 [core.py:842] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
[port-8003] (EngineCore_DP0 pid=573) ERROR 12-01 05:27:02 [core.py:842] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
[port-8003] (EngineCore_DP0 pid=573) ERROR 12-01 05:27:02 [core.py:842] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
[port-8003] (EngineCore_DP0 pid=573) ERROR 12-01 05:27:02 [core.py:842]
@vcerny https://docs.vllm.ai/en/stable/usage/troubleshooting/?h=provided+ptx#cuda-error-the-provided-ptx-was-compiled-with-an-unsupported-toolchain
See here. The issue is that your driver is outdated compared to the build env used for vllm.
Indeed. Thanks!