# Real-Time Agent Thinking — Verbose Evaluation Guide

## Quick Start

See agents thinking in real-time as they analyze and debate:

```bash
python evaluation/run_evaluation_verbose.py --questions 1
```

## What You'll See

### 1. **Orchestrator Initialization** (40 seconds)
```
INFO:codette_orchestrator  | INFO     | Loading base model (one-time)...
INFO:codette_orchestrator  | INFO     |   GPU layers: 35 (0=CPU only, 35+=full GPU offload)
INFO:codette_orchestrator  | INFO     | ✓ GPU acceleration ENABLED (35 layers offloaded)
INFO:codette_orchestrator  | INFO     | Base model loaded in 8.2s
```

### 2. **Agent Setup**
```
[AGENT SETUP INSPECTION]
  Orchestrator available: True
  Available adapters: ['newton', 'davinci', 'empathy', 'philosophy', 'quantum', 'consciousness', 'multi_perspective', 'systems_architecture']

  Agent LLM modes:
    Newton       ✓ LLM        (orch=True, adapter=newton)
    Quantum      ✓ LLM        (orch=True, adapter=quantum)
    DaVinci      ✓ LLM        (orch=True, adapter=davinci)
    Philosophy   ✓ LLM        (orch=True, adapter=philosophy)
    Empathy      ✓ LLM        (orch=True, adapter=empathy)
    Ethics       ✓ LLM        (orch=True, adapter=philosophy)
```

### 3. **Real-Time Agent Thinking (Round 0)**

As each agent analyzes the concept:

```
[Newton] Analyzing 'What is the speed of light in vacuum?...'
  Adapter: newton
  System prompt: Examining the methodological foundations of this concept through dimen...
  Generated: 1247 chars, 342 tokens
  Response preview: "Speed of light represents a fundamental velocity constant arising from Maxwell's equations...

[Quantum] Analyzing 'What is the speed of light in vacuum?...'
  Adapter: quantum
  System prompt: Probing the natural frequencies of 'What is the speed of light in...
  Generated: 1089 chars, 298 tokens
  Response preview: "Light exists in superposition of possibilities until measurement: it is both wave and partic...

[DaVinci] Analyzing 'What is the speed of light in vacuum?...'
  Adapter: davinci
  System prompt: Examining 'What is the speed of light in vacuum?...' through symmetry analysis...
  Generated: 1345 chars, 378 tokens
  Response preview: "Cross-domain insight: light's speed constant connects electromagnetic theory to relativi...

[Philosophy] Analyzing 'What is the speed of light in vacuum?...'
  Adapter: philosophy
  System prompt: Interrogating the epistemological boundaries of 'What is the speed o...
  Generated: 1203 chars, 334 tokens
  Response preview: "Epistemologically, light speed represents a boundary between measurable constants and th...

[Empathy] Analyzing 'What is the speed of light in vacuum?...'
  Adapter: empathy
  System prompt: Mapping the emotional landscape of 'What is the speed of light in...
  Generated: 891 chars, 245 tokens
  Response preview: "Humans experience light as fundamental to consciousness: vision, warmth, time perception...
```

Each line shows:
- **Agent name** (Newton, Quantum, etc.)
- **Concept being analyzed** (truncated)
- **Adapter being used** (e.g., "newton", "quantum")
- **System prompt preview** (first 100 chars)
- **Output size**: chars generated + tokens consumed
- **Response preview**: first 150 chars of what the agent generated

### 4. **Conflict Detection (Round 0)**
```
Domain-gated activation: detected 'physics' → 3 agents active

[CONFLICTS DETECTED] Round 0: 42 conflicts found
  Top conflicts:
  - Newton vs Quantum: 0.68 (Causality vs Probability)
  - Newton vs DaVinci: 0.45 (Analytical vs Creative)
  - Quantum vs Philosophy: 0.52 (Measurement vs Meaning)
```

### 5. **Debate Rounds (Round 1+)**
```
[R1] Newton vs Quantum
  Challenge: "Where do you agree with Quantum's superposition view? Where is causality essential?"
  Newton's response: 1234 chars
  Quantum's reply: 1089 chars

[R1] Quantum vs Philosophy
  Challenge: "How does the measurement problem relate to epistemology?"
  Quantum's response: 945 chars
  Philosophy's reply: 1123 chars
```

### 6. **Final Synthesis**
```
====================================================================================
[FINAL SYNTHESIS] (2847 characters)

The speed of light represents a fundamental constant that emerges from the intersection
of multiple ways of understanding reality. From Newton's causal-analytical perspective,
it's a boundary condition derived from Maxwell's equations and relativistic principles...

[From Quantum perspective: Light exhibits wave-particle duality...]
[From DaVinci's creative lens: Speed-of-light connects to broader patterns...]
[From Philosophy: Epistemologically grounded in measurement and uncertainty...]
[From Empathy: Light as human experience connects consciousness to physics...]
====================================================================================
```

### 7. **Metadata Summary**
```
[METADATA]
  Conflicts detected: 42
  Gamma (coherence): 0.784
  Debate rounds: 2
  GPU time: 2.3 sec total
```

## Command Options

```bash
# See 1 question with full thinking (default)
python evaluation/run_evaluation_verbose.py

# See 3 questions
python evaluation/run_evaluation_verbose.py --questions 3

# Pipe to file for analysis
python evaluation/run_evaluation_verbose.py --questions 2 > debug.log 2>&1
```

## What Each Log Line Means

| Log Pattern | Meaning |
|------------|---------|
| `[Agent] Analyzing 'X'...` | Agent starting to analyze concept |
| `Adapter: newton` | Which trained adapter is being used |
| `System prompt: ...` | The reasoning framework being provided |
| `Generated: 1247 chars, 342 tokens` | Output size and LLM tokens consumed |
| `Response preview: ...` | First 150 chars of actual reasoning |
| `Domain-gated: detected 'physics' → 3 agents` | Only these agents are active for this domain |
| `[R0] Newton → 1247 chars. Preview: ...` | Round 0 initial analysis excerpt |
| `[R1] Newton vs Quantum` | Debate round showing which agents are engaging |

## Debugging Tips

### If you see "TEMPLATE" instead of LLM output:
```
Response preview: "Tracing the causal chain within 'gravity': every observable..."
```
→ This is the template. Agent didn't get the orchestrator!

### If you see real reasoning:
```
Response preview: "Gravity is fundamentally a curvature of spacetime according to..."
```
→ Agent is using real LLM! ✓

### If GPU isn't being used:
```
Base model loaded in 42s
⚠ CPU mode (GPU disabled)
```
→ GPU isn't loaded. Check n_gpu_layers setting.

### If GPU is working:
```
Base model loaded in 8.2s
✓ GPU acceleration ENABLED (35 layers offloaded)
```
→ GPU is accelerating inference! ✓

## Performance Metrics to Watch

- **Base model load time**: <15s = GPU working, >30s = CPU only
- **Per-agent inference**: <5s = GPU mode, >15s = CPU mode
- **Token generation rate**: >50 tok/s = GPU, <20 tok/s = CPU
- **GPU memory**: Should show VRAM usage in task manager

## Comparing to Templates

To see the difference, create a test script:

```python
# View template-based response
from reasoning_forge.agents.newton_agent import NewtonAgent
agent = NewtonAgent(orchestrator=None)  # No LLM!
template_response = agent.analyze("gravity")

# View LLM-based response
from reasoning_forge.forge_engine import ForgeEngine
forge = ForgeEngine()
llm_response = forge.newton.analyze("gravity")
```

Template output will be generic substitution.
LLM output will be domain-specific reasoning.

---

Ready to see agents thinking! Run it and let me know what you see. 🎯