Running MiniMax M2.7 based on VLLM, the longer it runs, the dumber the model feels.
I have deployed the MiniMax M2.5 and MiniMax M2.7 models on H200 using VLLM. From the user experience, I noticed a pattern: it seems that the more service calls are made, the worse the model's accuracy becomes.
This reminds me of how vllm 0.17.1 fixed an issue where the Qwen3.5 model's accuracy gradually declined over time. I suspect there might be a similar hidden problem in the inference code for MiniMax models.
I want to believe that the reason for the terrible performance is the missing 6 safetensor files after 0124 of 130. Luckily, I did not delete M-2.5 and have no issue reverting to that version.
That's not the reason. The absence of MTP files does not cause a decrease in accuracy.
This is an issue with the runtime state. I started noticing this problem because the model initially frequently calls sub-agents to handle certain tasks. However, as the service continues to be invoked, it increasingly prefers to solve problems in one go within the main Agent during encoding. The frequency of calling sub-agents is decreasing more and more.
hmm... well whatever the reason, there's no denying in my mind that 2.5 is a much better performer. I have not deleted the 2.7 model yet, but at this point, I think it safe to say I'll continue using 2.5 for the foreseeable future.