Codette-Reasoning / docs /evaluation /VERBOSE_EVALUATION_GUIDE.md
Jonathan Harrison
Full Codette codebase sync β€” transparency release
74f2af5

Real-Time Agent Thinking β€” Verbose Evaluation Guide

Quick Start

See agents thinking in real-time as they analyze and debate:

python evaluation/run_evaluation_verbose.py --questions 1

What You'll See

1. Orchestrator Initialization (40 seconds)

INFO:codette_orchestrator  | INFO     | Loading base model (one-time)...
INFO:codette_orchestrator  | INFO     |   GPU layers: 35 (0=CPU only, 35+=full GPU offload)
INFO:codette_orchestrator  | INFO     | βœ“ GPU acceleration ENABLED (35 layers offloaded)
INFO:codette_orchestrator  | INFO     | Base model loaded in 8.2s

2. Agent Setup

[AGENT SETUP INSPECTION]
  Orchestrator available: True
  Available adapters: ['newton', 'davinci', 'empathy', 'philosophy', 'quantum', 'consciousness', 'multi_perspective', 'systems_architecture']

  Agent LLM modes:
    Newton       βœ“ LLM        (orch=True, adapter=newton)
    Quantum      βœ“ LLM        (orch=True, adapter=quantum)
    DaVinci      βœ“ LLM        (orch=True, adapter=davinci)
    Philosophy   βœ“ LLM        (orch=True, adapter=philosophy)
    Empathy      βœ“ LLM        (orch=True, adapter=empathy)
    Ethics       βœ“ LLM        (orch=True, adapter=philosophy)

3. Real-Time Agent Thinking (Round 0)

As each agent analyzes the concept:

[Newton] Analyzing 'What is the speed of light in vacuum?...'
  Adapter: newton
  System prompt: Examining the methodological foundations of this concept through dimen...
  Generated: 1247 chars, 342 tokens
  Response preview: "Speed of light represents a fundamental velocity constant arising from Maxwell's equations...

[Quantum] Analyzing 'What is the speed of light in vacuum?...'
  Adapter: quantum
  System prompt: Probing the natural frequencies of 'What is the speed of light in...
  Generated: 1089 chars, 298 tokens
  Response preview: "Light exists in superposition of possibilities until measurement: it is both wave and partic...

[DaVinci] Analyzing 'What is the speed of light in vacuum?...'
  Adapter: davinci
  System prompt: Examining 'What is the speed of light in vacuum?...' through symmetry analysis...
  Generated: 1345 chars, 378 tokens
  Response preview: "Cross-domain insight: light's speed constant connects electromagnetic theory to relativi...

[Philosophy] Analyzing 'What is the speed of light in vacuum?...'
  Adapter: philosophy
  System prompt: Interrogating the epistemological boundaries of 'What is the speed o...
  Generated: 1203 chars, 334 tokens
  Response preview: "Epistemologically, light speed represents a boundary between measurable constants and th...

[Empathy] Analyzing 'What is the speed of light in vacuum?...'
  Adapter: empathy
  System prompt: Mapping the emotional landscape of 'What is the speed of light in...
  Generated: 891 chars, 245 tokens
  Response preview: "Humans experience light as fundamental to consciousness: vision, warmth, time perception...

Each line shows:

  • Agent name (Newton, Quantum, etc.)
  • Concept being analyzed (truncated)
  • Adapter being used (e.g., "newton", "quantum")
  • System prompt preview (first 100 chars)
  • Output size: chars generated + tokens consumed
  • Response preview: first 150 chars of what the agent generated

4. Conflict Detection (Round 0)

Domain-gated activation: detected 'physics' β†’ 3 agents active

[CONFLICTS DETECTED] Round 0: 42 conflicts found
  Top conflicts:
  - Newton vs Quantum: 0.68 (Causality vs Probability)
  - Newton vs DaVinci: 0.45 (Analytical vs Creative)
  - Quantum vs Philosophy: 0.52 (Measurement vs Meaning)

5. Debate Rounds (Round 1+)

[R1] Newton vs Quantum
  Challenge: "Where do you agree with Quantum's superposition view? Where is causality essential?"
  Newton's response: 1234 chars
  Quantum's reply: 1089 chars

[R1] Quantum vs Philosophy
  Challenge: "How does the measurement problem relate to epistemology?"
  Quantum's response: 945 chars
  Philosophy's reply: 1123 chars

6. Final Synthesis

====================================================================================
[FINAL SYNTHESIS] (2847 characters)

The speed of light represents a fundamental constant that emerges from the intersection
of multiple ways of understanding reality. From Newton's causal-analytical perspective,
it's a boundary condition derived from Maxwell's equations and relativistic principles...

[From Quantum perspective: Light exhibits wave-particle duality...]
[From DaVinci's creative lens: Speed-of-light connects to broader patterns...]
[From Philosophy: Epistemologically grounded in measurement and uncertainty...]
[From Empathy: Light as human experience connects consciousness to physics...]
====================================================================================

7. Metadata Summary

[METADATA]
  Conflicts detected: 42
  Gamma (coherence): 0.784
  Debate rounds: 2
  GPU time: 2.3 sec total

Command Options

# See 1 question with full thinking (default)
python evaluation/run_evaluation_verbose.py

# See 3 questions
python evaluation/run_evaluation_verbose.py --questions 3

# Pipe to file for analysis
python evaluation/run_evaluation_verbose.py --questions 2 > debug.log 2>&1

What Each Log Line Means

Log Pattern Meaning
[Agent] Analyzing 'X'... Agent starting to analyze concept
Adapter: newton Which trained adapter is being used
System prompt: ... The reasoning framework being provided
Generated: 1247 chars, 342 tokens Output size and LLM tokens consumed
Response preview: ... First 150 chars of actual reasoning
Domain-gated: detected 'physics' β†’ 3 agents Only these agents are active for this domain
[R0] Newton β†’ 1247 chars. Preview: ... Round 0 initial analysis excerpt
[R1] Newton vs Quantum Debate round showing which agents are engaging

Debugging Tips

If you see "TEMPLATE" instead of LLM output:

Response preview: "Tracing the causal chain within 'gravity': every observable..."

β†’ This is the template. Agent didn't get the orchestrator!

If you see real reasoning:

Response preview: "Gravity is fundamentally a curvature of spacetime according to..."

β†’ Agent is using real LLM! βœ“

If GPU isn't being used:

Base model loaded in 42s
⚠ CPU mode (GPU disabled)

β†’ GPU isn't loaded. Check n_gpu_layers setting.

If GPU is working:

Base model loaded in 8.2s
βœ“ GPU acceleration ENABLED (35 layers offloaded)

β†’ GPU is accelerating inference! βœ“

Performance Metrics to Watch

  • Base model load time: <15s = GPU working, >30s = CPU only
  • Per-agent inference: <5s = GPU mode, >15s = CPU mode
  • Token generation rate: >50 tok/s = GPU, <20 tok/s = CPU
  • GPU memory: Should show VRAM usage in task manager

Comparing to Templates

To see the difference, create a test script:

# View template-based response
from reasoning_forge.agents.newton_agent import NewtonAgent
agent = NewtonAgent(orchestrator=None)  # No LLM!
template_response = agent.analyze("gravity")

# View LLM-based response
from reasoning_forge.forge_engine import ForgeEngine
forge = ForgeEngine()
llm_response = forge.newton.analyze("gravity")

Template output will be generic substitution. LLM output will be domain-specific reasoning.


Ready to see agents thinking! Run it and let me know what you see. 🎯