Real-Time Agent Thinking β Verbose Evaluation Guide
Quick Start
See agents thinking in real-time as they analyze and debate:
python evaluation/run_evaluation_verbose.py --questions 1
What You'll See
1. Orchestrator Initialization (40 seconds)
INFO:codette_orchestrator | INFO | Loading base model (one-time)...
INFO:codette_orchestrator | INFO | GPU layers: 35 (0=CPU only, 35+=full GPU offload)
INFO:codette_orchestrator | INFO | β GPU acceleration ENABLED (35 layers offloaded)
INFO:codette_orchestrator | INFO | Base model loaded in 8.2s
2. Agent Setup
[AGENT SETUP INSPECTION]
Orchestrator available: True
Available adapters: ['newton', 'davinci', 'empathy', 'philosophy', 'quantum', 'consciousness', 'multi_perspective', 'systems_architecture']
Agent LLM modes:
Newton β LLM (orch=True, adapter=newton)
Quantum β LLM (orch=True, adapter=quantum)
DaVinci β LLM (orch=True, adapter=davinci)
Philosophy β LLM (orch=True, adapter=philosophy)
Empathy β LLM (orch=True, adapter=empathy)
Ethics β LLM (orch=True, adapter=philosophy)
3. Real-Time Agent Thinking (Round 0)
As each agent analyzes the concept:
[Newton] Analyzing 'What is the speed of light in vacuum?...'
Adapter: newton
System prompt: Examining the methodological foundations of this concept through dimen...
Generated: 1247 chars, 342 tokens
Response preview: "Speed of light represents a fundamental velocity constant arising from Maxwell's equations...
[Quantum] Analyzing 'What is the speed of light in vacuum?...'
Adapter: quantum
System prompt: Probing the natural frequencies of 'What is the speed of light in...
Generated: 1089 chars, 298 tokens
Response preview: "Light exists in superposition of possibilities until measurement: it is both wave and partic...
[DaVinci] Analyzing 'What is the speed of light in vacuum?...'
Adapter: davinci
System prompt: Examining 'What is the speed of light in vacuum?...' through symmetry analysis...
Generated: 1345 chars, 378 tokens
Response preview: "Cross-domain insight: light's speed constant connects electromagnetic theory to relativi...
[Philosophy] Analyzing 'What is the speed of light in vacuum?...'
Adapter: philosophy
System prompt: Interrogating the epistemological boundaries of 'What is the speed o...
Generated: 1203 chars, 334 tokens
Response preview: "Epistemologically, light speed represents a boundary between measurable constants and th...
[Empathy] Analyzing 'What is the speed of light in vacuum?...'
Adapter: empathy
System prompt: Mapping the emotional landscape of 'What is the speed of light in...
Generated: 891 chars, 245 tokens
Response preview: "Humans experience light as fundamental to consciousness: vision, warmth, time perception...
Each line shows:
- Agent name (Newton, Quantum, etc.)
- Concept being analyzed (truncated)
- Adapter being used (e.g., "newton", "quantum")
- System prompt preview (first 100 chars)
- Output size: chars generated + tokens consumed
- Response preview: first 150 chars of what the agent generated
4. Conflict Detection (Round 0)
Domain-gated activation: detected 'physics' β 3 agents active
[CONFLICTS DETECTED] Round 0: 42 conflicts found
Top conflicts:
- Newton vs Quantum: 0.68 (Causality vs Probability)
- Newton vs DaVinci: 0.45 (Analytical vs Creative)
- Quantum vs Philosophy: 0.52 (Measurement vs Meaning)
5. Debate Rounds (Round 1+)
[R1] Newton vs Quantum
Challenge: "Where do you agree with Quantum's superposition view? Where is causality essential?"
Newton's response: 1234 chars
Quantum's reply: 1089 chars
[R1] Quantum vs Philosophy
Challenge: "How does the measurement problem relate to epistemology?"
Quantum's response: 945 chars
Philosophy's reply: 1123 chars
6. Final Synthesis
====================================================================================
[FINAL SYNTHESIS] (2847 characters)
The speed of light represents a fundamental constant that emerges from the intersection
of multiple ways of understanding reality. From Newton's causal-analytical perspective,
it's a boundary condition derived from Maxwell's equations and relativistic principles...
[From Quantum perspective: Light exhibits wave-particle duality...]
[From DaVinci's creative lens: Speed-of-light connects to broader patterns...]
[From Philosophy: Epistemologically grounded in measurement and uncertainty...]
[From Empathy: Light as human experience connects consciousness to physics...]
====================================================================================
7. Metadata Summary
[METADATA]
Conflicts detected: 42
Gamma (coherence): 0.784
Debate rounds: 2
GPU time: 2.3 sec total
Command Options
# See 1 question with full thinking (default)
python evaluation/run_evaluation_verbose.py
# See 3 questions
python evaluation/run_evaluation_verbose.py --questions 3
# Pipe to file for analysis
python evaluation/run_evaluation_verbose.py --questions 2 > debug.log 2>&1
What Each Log Line Means
| Log Pattern | Meaning |
|---|---|
[Agent] Analyzing 'X'... |
Agent starting to analyze concept |
Adapter: newton |
Which trained adapter is being used |
System prompt: ... |
The reasoning framework being provided |
Generated: 1247 chars, 342 tokens |
Output size and LLM tokens consumed |
Response preview: ... |
First 150 chars of actual reasoning |
Domain-gated: detected 'physics' β 3 agents |
Only these agents are active for this domain |
[R0] Newton β 1247 chars. Preview: ... |
Round 0 initial analysis excerpt |
[R1] Newton vs Quantum |
Debate round showing which agents are engaging |
Debugging Tips
If you see "TEMPLATE" instead of LLM output:
Response preview: "Tracing the causal chain within 'gravity': every observable..."
β This is the template. Agent didn't get the orchestrator!
If you see real reasoning:
Response preview: "Gravity is fundamentally a curvature of spacetime according to..."
β Agent is using real LLM! β
If GPU isn't being used:
Base model loaded in 42s
β CPU mode (GPU disabled)
β GPU isn't loaded. Check n_gpu_layers setting.
If GPU is working:
Base model loaded in 8.2s
β GPU acceleration ENABLED (35 layers offloaded)
β GPU is accelerating inference! β
Performance Metrics to Watch
- Base model load time: <15s = GPU working, >30s = CPU only
- Per-agent inference: <5s = GPU mode, >15s = CPU mode
- Token generation rate: >50 tok/s = GPU, <20 tok/s = CPU
- GPU memory: Should show VRAM usage in task manager
Comparing to Templates
To see the difference, create a test script:
# View template-based response
from reasoning_forge.agents.newton_agent import NewtonAgent
agent = NewtonAgent(orchestrator=None) # No LLM!
template_response = agent.analyze("gravity")
# View LLM-based response
from reasoning_forge.forge_engine import ForgeEngine
forge = ForgeEngine()
llm_response = forge.newton.analyze("gravity")
Template output will be generic substitution. LLM output will be domain-specific reasoning.
Ready to see agents thinking! Run it and let me know what you see. π―