vLLM V1 support

#1
by vcerny - opened

Hello, is this supposed to work with vLLM v0.11.2? I am getting error:

[port-8003] (EngineCore_DP0 pid=573) ERROR 12-01 05:27:02 [core.py:842] torch.AcceleratorError: CUDA error: the provided PTX was compiled with an unsupported toolchain.
[port-8003] (EngineCore_DP0 pid=573) ERROR 12-01 05:27:02 [core.py:842] Search for `cudaErrorUnsupportedPtxVersion' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
[port-8003] (EngineCore_DP0 pid=573) ERROR 12-01 05:27:02 [core.py:842] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
[port-8003] (EngineCore_DP0 pid=573) ERROR 12-01 05:27:02 [core.py:842] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
[port-8003] (EngineCore_DP0 pid=573) ERROR 12-01 05:27:02 [core.py:842] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
[port-8003] (EngineCore_DP0 pid=573) ERROR 12-01 05:27:02 [core.py:842]

Indeed. Thanks!

Sign up or log in to comment