Update README.md
Browse files
README.md
CHANGED
|
@@ -79,6 +79,7 @@ vllm serve Zyphra/ZAYA1-8B --port 8010 \
|
|
| 79 |
--mamba-cache-dtype float32 --dtype bfloat16 \
|
| 80 |
--reasoning-parser qwen3 --enable-auto-tool-choice --tool-call-parser zaya_xml
|
| 81 |
```
|
|
|
|
| 82 |
|
| 83 |
Once the server is up, you can query a model with `curl` like in the following example:
|
| 84 |
```bash
|
|
|
|
| 79 |
--mamba-cache-dtype float32 --dtype bfloat16 \
|
| 80 |
--reasoning-parser qwen3 --enable-auto-tool-choice --tool-call-parser zaya_xml
|
| 81 |
```
|
| 82 |
+
For parallel deployment we recommend using DP with EP as TP for CCA is not supported in the branch above. If running on 8 GPUs, set extra flags `-dp 8 -ep` to run with DP=EP=8.
|
| 83 |
|
| 84 |
Once the server is up, you can query a model with `curl` like in the following example:
|
| 85 |
```bash
|