Serving large numbers of models with spiky traffic creates cold starts, poor GPU utilization, and complex scheduling tradeoffs. 052ff1e verified Prashanth velidandi commited on Jan 15