Reduce thinking?

by BingoBird - opened Mar 3

Discussion

BingoBird

Mar 3

I managed to ask two questions in one hour. Second question, it kept thinking til out of tokens.

Any tricks to tone down the thinking?

tobleron900

Owner Mar 31

This configuration works for me. You can easily ask gemini-cli to fine-tune the parameters/configuration on your computer to make sure you get the best performance for the model on your hardware, but it works fine for me now.

Start server with MAXIMUM context settings

nohup "$SERVER_PATH"
-m "$MODEL_PATH"
-c 262144
-t 8
-tb 8
-b 512
-ub 512
-ctk q4_0
-ctv q4_0
-ngl 99
-fa auto
-np 1
--temp 0.6
--top-p 0.95
--repeat-penalty 1.0
--no-context-shift
--timeout 1800
--host 0.0.0.0
--port "$PORT" \

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment