bad result

#2
by snomile - opened

I have a dgx spark and tried to run the model as followed, but I can not even get a simple result such as "generate a Snake game in single html file".
At the beginning the model's outputs seems good, but after several hundreds words the output get messy gradually. and it always get to repeat itself forever, or even worse, the model generate a lot content doesn't make any sense.

Does anyone tried with this model and get real results? any suggestions please?

docker run -d --name minimax-139b --gpus all --ipc=host
-v /opt/models/MiniMax-M2.5-REAP-139B-A10B-NVFP4-GB10:/models/MiniMax-M2.5-REAP-139B-NVFP4
-p 11402:8000
-e VLLM_NVFP4_GEMM_BACKEND=marlin
-e VLLM_TEST_FORCE_FP8_MARLIN=1
-e VLLM_USE_FLASHINFER_MOE_FP4=0
-e VLLM_MARLIN_USE_ATOMIC_ADD=1
-e MODEL=/models/MiniMax-M2.5-REAP-139B-NVFP4
-e PORT=8000
-e MAX_MODEL_LEN=131072
-e GPU_MEMORY_UTIL=0.93
-e "VLLM_EXTRA_ARGS=--trust-remote-code --kv-cache-dtype fp8 --attention-backend flashinfer --enable-auto-tool-choice --tool-call-parser minimax_m2 --reasoning-parser minimax_m2_append_think"
avarok/dgx-vllm-nvfp4-kernel:v23

snomile changed discussion title from can't finish a simple program to bad result

Sign up or log in to comment