scrapeRL / docs /features.md
NeerajCodz's picture
docs: update
df47251

Advanced Features

Overview

This document captures high-end platform capabilities beyond baseline extraction.

1) Self-Improving Agent

Post-episode learning loop:

  • classify failures by root cause
  • update selector/tool strategy priors
  • persist successful patterns with confidence
  • penalize repeated failure paths

2) Strategy Library

Built-in strategies:

  • Search-first
  • Direct extraction
  • Multi-hop reasoning
  • Verification-first
  • Table-first

Each strategy tracks:

  • win rate
  • cost per success
  • average latency
  • domain affinity

3) Explainable AI Mode

For every decision, provide:

  • selected action and confidence
  • top alternatives considered
  • evidence from memory/tools/search
  • expected reward impact

4) Human-in-the-Loop

Intervention controls:

  • approve/reject action
  • force tool/model switch
  • enforce verification before submit
  • set hard constraints during runtime

5) Scenario Simulator

Stress testing scenarios:

  • noisy HTML
  • broken DOM
  • pagination traps
  • conflicting facts
  • anti-scraping patterns

Outputs:

  • robustness score
  • recovery score
  • strategy suitability map

6) Context Compression

  • rolling summaries
  • salience-based pruning
  • token-aware context packing
  • differential memory refresh

7) Batch + Parallel Runtime

  • task queue with priorities
  • parallel extraction workers
  • bounded concurrency
  • idempotent retry handling

8) Prompt Versioning and Evaluation

  • versioned prompt templates
  • A/B testing by task type
  • reward/cost comparison dashboards
  • rollout and rollback controls

9) MCP Toolchain Composition

Composable flow examples:

  • Browser MCP -> Parser MCP -> Validator MCP -> DB MCP
  • Search MCP -> Fetch MCP -> Extract MCP -> Verify MCP

10) Governance and Safety

  • tool allowlist/denylist
  • PII redaction in logs
  • budget and rate guardrails
  • provenance tracking for extracted facts

Feature Flags

All advanced features should be toggleable from Settings and safely disabled by default where cost/latency impact is high.