Spaces:

NeerajCodz
/

scrapeRL

Sleeping

App Files Files Community

scrapeRL / docs /agents.md

NeerajCodz

docs: update

df47251 7 days ago

preview code

raw

history blame contribute delete

4.15 kB

Agents System Design

Overview

The agent runtime is a multi-agent, memory-aware RL orchestration layer for web extraction tasks. It supports:

Single-agent and multi-agent execution modes
Strategy selection (search-first, direct-extraction, multi-hop-reasoning)
Human-in-the-loop intervention
Explainable decision traces
Self-improvement from past episodes

Agent Roles

1. Planner Agent

Builds a plan before action:

Goal decomposition
Tool selection plan
Risk and fallback path

2. Navigator Agent

Explores pages and search results:

URL prioritization
Link traversal policy
Page relevance scoring

3. Extractor Agent

Extracts structured fields:

Selector and schema inference
Adaptive chunk extraction
Long-page batch processing

4. Verifier Agent

Checks consistency and trust:

Cross-source verification
Conflict resolution
Confidence calibration

5. Memory Agent

Manages memory write/read/search:

Episode summaries
Pattern persistence
Retrieval ranking and pruning

Execution Modes

Single-Agent

One policy handles all actions.

Pros: low overhead, simple. Cons: weaker specialization.

Multi-Agent

Coordinator delegates work:

Planner emits execution graph
Navigator discovers candidate pages
Extractor parses and emits data
Verifier validates outputs
Memory Agent stores reusable patterns

Pros: modular, robust, scalable. Cons: coordination overhead.

Agent Communication

Shared channels:

agent_messages: async inter-agent messages
task_state: current objective and progress
global_knowledge: reusable facts and patterns

Message schema:

{
  "message_id": "msg_123",
  "from": "navigator",
  "to": "extractor",
  "type": "page_candidate",
  "payload": {
    "url": "https://site.com/p/123",
    "relevance": 0.91
  },
  "timestamp": "2026-03-27T00:00:00Z"
}

Decision Policy

Policy input includes:

Observation
Working memory context
Retrieved long-term memory hits
Tool registry availability
Budget and constraints

Policy output includes:

Next action
Confidence
Rationale
Fallback action (optional)

Strategy Library

Built-in strategy templates:

search-first: broad discovery then narrow extraction
direct-extraction: immediate field extraction from target page
multi-hop-reasoning: iterative search and verification
table-centric: table-first parsing
form-centric: forms and input structures prioritized

Strategy selection can be:

Manual (user setting)
Automatic (router based on task signature)

Self-Improving Agent Loop

After each episode:

Compute reward breakdown
Extract failed and successful patterns
Update strategy performance table
Store high-confidence selectors in long-term memory
Penalize redundant navigation patterns

Explainable AI Mode

Each action can emit:

Why this action was chosen
Why alternatives were rejected
Which memory/tool evidence was used

Example trace:

Action: EXTRACT_FIELD(price)
Why: Pattern "span.product-price" had 0.93 historical confidence on similar domains.
Alternatives rejected: ".price-box .value" (lower confidence 0.58), regex-only extraction (unstable on this layout).

Human-in-the-Loop

Optional checkpoints:

Approve/reject planned action
Override selector/tool/model
Force verification before submit

Intervention modes:

off: fully autonomous
review: pause on low-confidence steps
strict: require approval on all submit/fetch/verify actions

Scenario Simulator Hooks

Agents can be tested against:

Noisy HTML
Missing fields
Broken pagination
Adversarial layouts
Dynamic content with delayed rendering

Simulation metrics:

Completion
Recovery score
Generalization score
Cost and latency

APIs

POST /api/agents/run
POST /api/agents/plan
POST /api/agents/override
GET /api/agents/state/{episode_id}
GET /api/agents/trace/{episode_id}

Dashboard Widgets

Live thought stream
Agent role timeline
Inter-agent message feed
Strategy performance chart
Confidence and override panel