Reduce thinking?
#1
by BingoBird - opened
I managed to ask two questions in one hour. Second question, it kept thinking til out of tokens.
Any tricks to tone down the thinking?
This configuration works for me. You can easily ask gemini-cli to fine-tune the parameters/configuration on your computer to make sure you get the best performance for the model on your hardware, but it works fine for me now.
Start server with MAXIMUM context settings
nohup "$SERVER_PATH"
-m "$MODEL_PATH"
-c 262144
-t 8
-tb 8
-b 512
-ub 512
-ctk q4_0
-ctv q4_0
-ngl 99
-fa auto
-np 1
--temp 0.6
--top-p 0.95
--repeat-penalty 1.0
--no-context-shift
--timeout 1800
--host 0.0.0.0
--port "$PORT" \