Spaces:

NeerajCodz
/

scrapeRL

Sleeping

App Files Files Community

scrapeRL / docs /observability.md

NeerajCodz

docs: update

df47251 8 days ago

preview code

raw

history blame contribute delete

2.62 kB

Observability and Dashboard

Overview

Observability provides deep insight into runtime behavior, model usage, tool execution, memory quality, and rewards.

Dashboard Sections

1. Live Thought Stream

chronological reasoning notes
model/router choice trace
action confidence timeline
override events

2. Navigation Map

Graph of visited pages:

nodes = URLs
edges = transitions
node color = relevance/confidence
revisit highlighting

3. MCP Usage Panel

tool call count by server
avg latency by tool
error rate and retries
top successful tool chains

4. Memory Viewer

inspect short/working/long/shared memory
filter by task/domain/confidence
edit/delete entries
prune previews

5. Reward Analytics

per-step reward breakdown
component contribution trends
penalty heatmap
episode comparison

6. Cost and Token Monitor

per-provider usage
per-model token counts
cumulative cost vs budget
forecasted burn rate

Core Metrics

Agent Metrics

task completion rate
avg steps to completion
recovery score
generalization score
exploration ratio

Tool Metrics

tool success rate
timeout ratio
fallback frequency
schema validation failures

Memory Metrics

retrieval hit rate
relevance score distribution
prune rate
memory-assisted success ratio

Search Metrics

query success rate
multi-hop depth distribution
credibility score average
duplicate result ratio

Logging Model

Structured logs (JSON):

{
  "timestamp": "2026-03-27T00:00:00Z",
  "episode_id": "ep_123",
  "step": 7,
  "event": "tool_call",
  "tool": "beautifulsoup.find_all",
  "latency_ms": 54,
  "success": true,
  "reward_delta": 0.08
}

Tracing

Per-episode trace includes:

observations
actions
rewards
tool calls
memory operations
final submission and grader results

Alerts

Configurable alerts:

budget threshold crossed
error spike
tool outage
memory bloat
anomalous low reward streak

APIs

GET /api/metrics/summary
GET /api/metrics/timeseries
GET /api/traces/{episode_id}
GET /api/costs
GET /api/memory/stats
GET /api/tools/stats

Recommended Dashboard Layout

Top row: completion, cost, latency, error rate
Mid row: thought stream + navigation graph
Lower row: reward breakdown + MCP usage + memory viewer
Bottom row: raw trace and export controls

Export and Audit

Exports:

JSON trace
CSV metrics
reward analysis report
model usage report

All exports include episode and configuration fingerprints for reproducibility.