Spaces:

yadnyeshkolte
/

api-debug-env

Sleeping

App Files Files Community

yadnyeshkolte commited on Mar 26

Commit

36dac03

verified ·

1 Parent(s): 5d59bf9

Upload folder using huggingface_hub

Browse files

Files changed (14) hide show

Dockerfile +81 -0
README.md +171 -4
__init__.py +16 -0
client.py +80 -0
models.py +71 -0
openenv.yaml +30 -0
pyproject.toml +45 -0
scenarios.py +375 -0
scripts/baseline_inference.py +223 -0
server/__init__.py +11 -0
server/api_debug_env_environment.py +446 -0
server/app.py +196 -0
server/requirements.txt +6 -0
uv.lock +0 -0

Dockerfile ADDED Viewed

	@@ -0,0 +1,81 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+# Multi-stage build using openenv-base
+# This Dockerfile is flexible and works for both:
+# - In-repo environments (with local OpenEnv sources)
+# - Standalone environments (with openenv from PyPI/Git)
+# The build script (openenv build) handles context detection and sets appropriate build args.
+ARG BASE_IMAGE=ghcr.io/meta-pytorch/openenv-base:latest
+FROM ${BASE_IMAGE} AS builder
+WORKDIR /app
+# Ensure git is available (required for installing dependencies from VCS)
+RUN apt-get update && \
+    apt-get install -y --no-install-recommends git && \
+    rm -rf /var/lib/apt/lists/*
+# Build argument to control whether we're building standalone or in-repo
+ARG BUILD_MODE=in-repo
+ARG ENV_NAME=api_debug_env
+# Copy environment code (always at root of build context)
+COPY . /app/env
+# For in-repo builds, openenv is already vendored in the build context
+# For standalone builds, openenv will be installed via pyproject.toml
+WORKDIR /app/env
+# Ensure uv is available (for local builds where base image lacks it)
+RUN if ! command -v uv >/dev/null 2>&1; then \
+        curl -LsSf https://astral.sh/uv/install.sh | sh && \
+        mv /root/.local/bin/uv /usr/local/bin/uv && \
+        mv /root/.local/bin/uvx /usr/local/bin/uvx; \
+    fi
+# Install dependencies using uv sync
+# If uv.lock exists, use it; otherwise resolve on the fly
+RUN --mount=type=cache,target=/root/.cache/uv \
+    if [ -f uv.lock ]; then \
+        uv sync --frozen --no-install-project --no-editable; \
+    else \
+        uv sync --no-install-project --no-editable; \
+    fi
+RUN --mount=type=cache,target=/root/.cache/uv \
+    if [ -f uv.lock ]; then \
+        uv sync --frozen --no-editable; \
+    else \
+        uv sync --no-editable; \
+    fi
+# Final runtime stage
+FROM ${BASE_IMAGE}
+WORKDIR /app
+# Copy the virtual environment from builder
+COPY --from=builder /app/env/.venv /app/.venv
+# Copy the environment code
+COPY --from=builder /app/env /app/env
+# Set PATH to use the virtual environment
+ENV PATH="/app/.venv/bin:$PATH"
+# Set PYTHONPATH so imports work correctly
+ENV PYTHONPATH="/app/env:$PYTHONPATH"
+# Health check
+HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
+    CMD curl -f http://localhost:8000/health || exit 1
+# Run the FastAPI server
+# The module path is constructed to work with the /app/env structure
+ENV ENABLE_WEB_INTERFACE=true
+CMD ["sh", "-c", "cd /app/env && uvicorn server.app:app --host 0.0.0.0 --port 8000"]

README.md CHANGED Viewed

@@ -1,10 +1,177 @@
 ---
 title: Api Debug Env
-emoji: 🦀
-colorFrom: indigo
-colorTo: green
 sdk: docker
 pinned: false
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
 title: Api Debug Env
+emoji: 🛠️
+colorFrom: blue
+colorTo: indigo
 sdk: docker
 pinned: false
+app_port: 8000
+base_path: /web
+tags:
+  - openenv
 ---
+# API Integration Debugging Environment
+An OpenEnv environment where AI agents diagnose and fix broken API integrations — a real-world task that developers face daily.
+## Overview
+Agents interact with a simulated multi-service API ecosystem that has various misconfigurations. Through a `step()/reset()/state()` API, the agent must:
+1. **Inspect error logs** to identify failure patterns
+2. **Inspect service configurations** to find misconfigurations
+3. **Test endpoints** to observe current behavior
+4. **Submit fixes** with corrected configuration payloads
+## Action Space
+```python
+class ApiDebugAction(Action):
+    action_type: str   # "inspect_logs" | "inspect_config" | "inspect_endpoint" | "submit_fix"
+    target: str        # Service name (e.g. "payment_client", "webhook_sender")
+    fix_payload: dict  # Required when action_type="submit_fix"
+```
+| Action | Description | Reward |
+|--------|-------------|--------|
+| `inspect_logs` | Read error logs for a service | +0.05 (relevant) / +0.15 (finds new issue) |
+| `inspect_config` | View current config of a service | +0.02 to +0.05 |
+| `inspect_endpoint` | Test-call an endpoint | +0.02 to +0.05 |
+| `submit_fix` | Submit a configuration fix | +0.25 (correct) / -0.1 (wrong) |
+## Observation Space
+```python
+class ApiDebugObservation(Observation):
+    task_id: str              # "easy", "medium", or "hard"
+    task_description: str     # Human-readable task description
+    logs: List[str]           # Error log lines from inspected service
+    config_snapshot: dict     # Configuration of inspected service
+    api_response: dict        # Response from endpoint test
+    hints: List[str]          # Progressive hints based on step count
+    remaining_steps: int      # Steps before episode timeout
+    issues_found: int         # Issues identified so far
+    issues_fixed: int         # Issues correctly fixed so far
+    issues_total: int         # Total issues in scenario
+    action_result: str        # Feedback on last action
+    available_targets: List   # Valid service names
+```
+## Tasks
+### Task 1: Easy — Payment API Auth Fix
+- **Issues**: 2 (missing `Authorization` header, wrong `Content-Type`)
+- **Max Steps**: 15
+- **Services**: `payment_client`, `payment_gateway`
+- **Scenario**: Payment gateway rejects requests with 401/415 errors
+### Task 2: Medium — Webhook Chain Debugging
+- **Issues**: 3 (rate limit too high, insufficient retries, empty webhook signature)
+- **Max Steps**: 25
+- **Services**: `webhook_sender`, `webhook_receiver`, `notification_service`
+- **Scenario**: Events are dropped across a webhook notification pipeline
+### Task 3: Hard — Microservice Cascade Failure
+- **Issues**: 5 (wrong endpoint URL, timeout too short, sync mode race condition, expired auth token, missing token refresh)
+- **Max Steps**: 40
+- **Services**: `order_service`, `inventory_service`, `shipping_service`, `api_gateway`, `auth_service`
+- **Scenario**: E-commerce order processing pipeline fails with cascading 500s
+## Reward Function
+- **Partial progress**: Every useful inspection earns reward (+0.05 to +0.15)
+- **Fix rewards**: +0.25 per correctly fixed issue
+- **Completion bonus**: +0.2 when all issues are resolved
+- **Penalties**: -0.1 for wrong fixes, -0.05 for invalid actions
+## Grading
+```
+Score = (issues_fixed / issues_total) × efficiency_bonus
+efficiency_bonus = 1.0 + (remaining_steps / max_steps × 0.3)
+```
+Faster fixes earn up to 30% bonus. Score capped at 1.0.
+## Baseline Scores
+| Task | Score | Reward | Issues Found | Issues Fixed | Steps |
+|------|-------|--------|-------------|-------------|-------|
+| Easy | 0.0000 | 0.34 | 2/2 | 0/2 | 6 |
+| Medium | 0.0000 | 0.53 | 3/3 | 0/3 | 9 |
+| Hard | 0.0000 | 0.87 | 5/5 | 0/5 | 15 |
+> The rule-based baseline only explores (inspects) without submitting fixes, establishing a floor. An LLM agent that also fixes issues will score significantly higher.
+## Setup & Usage
+### Prerequisites
+- Python 3.10+
+- Docker (for containerized deployment)
+### Local Development
+```bash
+cd api_debug_env
+# Install dependencies
+uv sync
+# Run server
+uv run server
+# or
+uvicorn server.app:app --reload --host 0.0.0.0 --port 8000
+```
+### Docker
+```bash
+cd api_debug_env
+docker build -t api_debug_env:latest -f server/Dockerfile .
+docker run -p 8000:8000 api_debug_env:latest
+```
+### Run Baseline
+```bash
+# Rule-based baseline (no API key needed)
+python scripts/baseline_inference.py --mode rule
+# LLM-powered baseline
+export OPENAI_API_KEY=your-key
+python scripts/baseline_inference.py --mode llm
+```
+### API Endpoints
+| Endpoint | Method | Description |
+|----------|--------|-------------|
+| `/reset` | POST | Reset environment, start new episode |
+| `/step` | POST | Execute an action |
+| `/state` | GET | Get current state |
+| `/tasks` | GET | List all tasks with action schemas |
+| `/grader` | POST | Get grader score for completed episode |
+| `/baseline` | POST | Run baseline inference on all tasks |
+| `/schema` | GET | Get action/observation JSON schemas |
+| `/ws` | WS | WebSocket for persistent sessions |
+## Project Structure
+```
+api_debug_env/
+├── models.py           # Pydantic Action & Observation models
+├── scenarios.py        # 3 task scenarios with issues, logs, configs
+├── client.py           # WebSocket client for the environment
+├── openenv.yaml        # OpenEnv metadata
+├── pyproject.toml      # Dependencies & build config
+├── server/
+│   ├── app.py                        # FastAPI application
+│   ├── api_debug_env_environment.py  # Core environment logic
+│   └── Dockerfile                    # Container build
+└── scripts/
+    └── baseline_inference.py         # Baseline agent script
+```
+## License
+BSD-style license. See LICENSE file.

__init__.py ADDED Viewed

	@@ -0,0 +1,16 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""Api Debug Env Environment."""
+from .client import ApiDebugEnv
+from .models import ApiDebugAction, ApiDebugObservation
+__all__ = [
+    "ApiDebugAction",
+    "ApiDebugObservation",
+    "ApiDebugEnv",
+]

client.py ADDED Viewed

	@@ -0,0 +1,80 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""API Integration Debugging Environment Client."""
+from typing import Dict, List, Optional
+from openenv.core import EnvClient
+from openenv.core.client_types import StepResult
+from openenv.core.env_server.types import State
+from .models import ApiDebugAction, ApiDebugObservation
+class ApiDebugEnv(
+    EnvClient[ApiDebugAction, ApiDebugObservation, State]
+):
+    """
+    Client for the API Integration Debugging Environment.
+    Maintains a persistent WebSocket connection to the environment server.
+    Example:
+        >>> with ApiDebugEnv(base_url="http://localhost:8000") as client:
+        ...     result = client.reset()
+        ...     print(result.observation.task_description)
+        ...
+        ...     result = client.step(ApiDebugAction(
+        ...         action_type="inspect_logs",
+        ...         target="payment_client"
+        ...     ))
+        ...     print(result.observation.logs)
+    """
+    def _step_payload(self, action: ApiDebugAction) -> Dict:
+        """Convert ApiDebugAction to JSON payload."""
+        payload = {
+            "action_type": action.action_type,
+            "target": action.target,
+        }
+        if action.fix_payload is not None:
+            payload["fix_payload"] = action.fix_payload
+        return payload
+    def _parse_result(self, payload: Dict) -> StepResult[ApiDebugObservation]:
+        """Parse server response into StepResult[ApiDebugObservation]."""
+        obs_data = payload.get("observation", {})
+        observation = ApiDebugObservation(
+            task_id=obs_data.get("task_id", ""),
+            task_description=obs_data.get("task_description", ""),
+            logs=obs_data.get("logs", []),
+            config_snapshot=obs_data.get("config_snapshot", {}),
+            api_response=obs_data.get("api_response"),
+            hints=obs_data.get("hints", []),
+            remaining_steps=obs_data.get("remaining_steps", 0),
+            issues_found=obs_data.get("issues_found", 0),
+            issues_fixed=obs_data.get("issues_fixed", 0),
+            issues_total=obs_data.get("issues_total", 0),
+            action_result=obs_data.get("action_result", ""),
+            available_targets=obs_data.get("available_targets", []),
+            done=payload.get("done", False),
+            reward=payload.get("reward"),
+            metadata=obs_data.get("metadata", {}),
+        )
+        return StepResult(
+            observation=observation,
+            reward=payload.get("reward"),
+            done=payload.get("done", False),
+        )
+    def _parse_state(self, payload: Dict) -> State:
+        """Parse server response into State object."""
+        return State(
+            episode_id=payload.get("episode_id"),
+            step_count=payload.get("step_count", 0),
+        )

models.py ADDED Viewed

	@@ -0,0 +1,71 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""
+Data models for the API Integration Debugging Environment.
+An agent must diagnose and fix broken API integrations by reading error logs,
+inspecting configurations, and writing corrected API calls.
+"""
+from typing import Dict, List, Optional
+from openenv.core.env_server.types import Action, Observation
+from pydantic import Field
+class ApiDebugAction(Action):
+    """
+    Agent action — what the agent does each step.
+    Supported action_type values:
+      - "inspect_logs"     : Read error logs for a specific service
+      - "inspect_config"   : Inspect the config of a specific service/endpoint
+      - "inspect_endpoint" : Test-call an endpoint to see current response
+      - "submit_fix"       : Submit a fix (requires fix_payload)
+    """
+    action_type: str = Field(
+        ...,
+        description="One of: 'inspect_logs', 'inspect_config', 'inspect_endpoint', 'submit_fix'",
+    )
+    target: str = Field(
+        ...,
+        description="The service or component to act on (e.g. 'auth_service', 'webhook_handler', 'service_a')",
+    )
+    fix_payload: Optional[Dict] = Field(
+        default=None,
+        description="Required when action_type='submit_fix'. Dict with the corrected configuration.",
+    )
+class ApiDebugObservation(Observation):
+    """
+    What the agent sees after each action.
+    Provides error logs, configuration snapshots, API responses,
+    and progress tracking for the debugging task.
+    """
+    # Environment context
+    task_id: str = Field(default="", description="Current task identifier (easy/medium/hard)")
+    task_description: str = Field(default="", description="Human-readable description of what needs debugging")
+    # Inspection results
+    logs: List[str] = Field(default_factory=list, description="Error log lines visible to the agent")
+    config_snapshot: Dict = Field(default_factory=dict, description="Current configuration of the inspected component")
+    api_response: Optional[Dict] = Field(default=None, description="Response from testing the current endpoint config")
+    hints: List[str] = Field(default_factory=list, description="Progressive hints based on step count")
+    # Progress tracking
+    remaining_steps: int = Field(default=0, description="Steps remaining before episode timeout")
+    issues_found: int = Field(default=0, description="Issues the agent has correctly identified so far")
+    issues_fixed: int = Field(default=0, description="Issues the agent has correctly fixed so far")
+    issues_total: int = Field(default=0, description="Total issues in the current scenario")
+    # Feedback
+    action_result: str = Field(default="", description="Feedback on the last action taken (e.g. 'Fix accepted', 'Wrong fix')")
+    available_targets: List[str] = Field(default_factory=list, description="List of valid targets the agent can inspect/fix")

openenv.yaml ADDED Viewed

	@@ -0,0 +1,30 @@

+spec_version: 1
+name: api_debug_env
+type: space
+runtime: fastapi
+app: server.app:app
+port: 8000
+description: >
+  API Integration Debugging Environment — an AI agent must diagnose and fix
+  broken API integrations by reading error logs, inspecting configurations,
+  and submitting corrected API calls.
+tasks:
+  - id: easy
+    description: "Fix missing Authorization header and wrong Content-Type in a payment API client"
+    difficulty: easy
+    max_steps: 15
+    issues_count: 2
+  - id: medium
+    description: "Debug a webhook chain with rate limiting, retry, and signature validation failures"
+    difficulty: medium
+    max_steps: 25
+    issues_count: 3
+  - id: hard
+    description: "Diagnose cascading failures across a 3-service order processing pipeline"
+    difficulty: hard
+    max_steps: 40
+    issues_count: 5

pyproject.toml ADDED Viewed

	@@ -0,0 +1,45 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+[build-system]
+requires = ["setuptools>=45", "wheel"]
+build-backend = "setuptools.build_meta"
+[project]
+name = "openenv-api_debug_env"
+version = "0.1.0"
+description = "Api Debug Env environment for OpenEnv"
+requires-python = ">=3.10"
+dependencies = [
+    # Core OpenEnv runtime (provides FastAPI server + HTTP client types)
+    # install from github
+    # "openenv-core[core] @ git+https://github.com/meta-pytorch/OpenEnv.git",
+    "openenv-core[core]>=0.2.1",
+    # Environment-specific dependencies
+    # Add all dependencies needed for your environment here
+    # Examples:
+    # "numpy>=1.19.0",
+    # "torch>=2.0.0",
+    # "gymnasium>=0.29.0",
+    # "openspiel>=1.0.0",
+    # "smolagents>=1.22.0,<2",
+]
+[project.optional-dependencies]
+dev = [
+    "pytest>=8.0.0",
+    "pytest-cov>=4.0.0",
+]
+[project.scripts]
+# Server entry point - enables running via: uv run --project . server
+# or: python -m api_debug_env.server.app
+server = "api_debug_env.server.app:main"
+[tool.setuptools]
+include-package-data = true
+packages = ["api_debug_env", "api_debug_env.server"]
+package-dir = { "api_debug_env" = ".", "api_debug_env.server" = "server" }

scenarios.py ADDED Viewed

	@@ -0,0 +1,375 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""
+Scenario definitions for the API Integration Debugging Environment.
+Each scenario defines a broken API integration that the agent must diagnose and fix.
+Scenarios contain: services, their configs, error logs, issues, and expected fixes.
+"""
+from dataclasses import dataclass, field
+from typing import Any, Dict, List
+@dataclass
+class Issue:
+    """A single issue in an API integration scenario."""
+    issue_id: str
+    service: str
+    description: str
+    expected_fix: Dict[str, Any]
+    fix_key: str  # The key in the config that needs fixing
+    log_hint: str  # Log line that hints at this issue
+@dataclass
+class Scenario:
+    """A complete API debugging scenario."""
+    task_id: str
+    difficulty: str
+    description: str
+    max_steps: int
+    services: List[str]
+    configs: Dict[str, Dict[str, Any]]
+    logs: Dict[str, List[str]]
+    issues: List[Issue]
+def get_scenario(task_id: str) -> Scenario:
+    """Load a scenario by task ID."""
+    scenarios = {
+        "easy": _easy_scenario(),
+        "medium": _medium_scenario(),
+        "hard": _hard_scenario(),
+    }
+    if task_id not in scenarios:
+        raise ValueError(f"Unknown task_id: {task_id}. Must be one of: {list(scenarios.keys())}")
+    return scenarios[task_id]
+def get_all_task_ids() -> List[str]:
+    """Return all available task IDs."""
+    return ["easy", "medium", "hard"]
+# ─── Easy Scenario ───────────────────────────────────────────────────────────
+def _easy_scenario() -> Scenario:
+    """
+    Easy: Missing Authorization header + wrong Content-Type in a payment API.
+    Agent must inspect logs, find the two issues, and submit fixes.
+    """
+    return Scenario(
+        task_id="easy",
+        difficulty="easy",
+        description=(
+            "A payment processing API integration is failing. "
+            "The client is sending requests to the payment gateway but getting 401 and 415 errors. "
+            "Diagnose and fix the API client configuration."
+        ),
+        max_steps=15,
+        services=["payment_client", "payment_gateway"],
+        configs={
+            "payment_client": {
+                "base_url": "https://api.paymentgateway.com/v2",
+                "headers": {
+                    "Content-Type": "text/plain",  # BUG: should be application/json
+                    "Accept": "application/json",
+                    # BUG: missing Authorization header
+                },
+                "timeout": 30,
+                "retry_count": 3,
+            },
+            "payment_gateway": {
+                "endpoint": "/process",
+                "method": "POST",
+                "required_headers": ["Authorization", "Content-Type"],
+                "accepted_content_types": ["application/json"],
+                "auth_scheme": "Bearer",
+            },
+        },
+        logs={
+            "payment_client": [
+                "[ERROR] 2026-03-25T10:15:23Z POST /process -> 401 Unauthorized",
+                "[ERROR] 2026-03-25T10:15:23Z Response: {'error': 'Missing or invalid Authorization header'}",
+                "[WARN]  2026-03-25T10:15:22Z Request headers: Content-Type=text/plain, Accept=application/json",
+                "[ERROR] 2026-03-25T10:15:24Z POST /process -> 415 Unsupported Media Type",
+                "[ERROR] 2026-03-25T10:15:24Z Response: {'error': 'Content-Type must be application/json'}",
+                "[INFO]  2026-03-25T10:15:20Z Payment client initialized with base_url=https://api.paymentgateway.com/v2",
+            ],
+            "payment_gateway": [
+                "[WARN]  2026-03-25T10:15:23Z Rejected request: no Authorization header present",
+                "[WARN]  2026-03-25T10:15:24Z Rejected request: unsupported Content-Type 'text/plain'",
+                "[INFO]  2026-03-25T10:15:20Z Gateway ready, accepting application/json with Bearer auth",
+            ],
+        },
+        issues=[
+            Issue(
+                issue_id="easy_auth",
+                service="payment_client",
+                description="Missing Authorization header in payment client",
+                expected_fix={"headers.Authorization": "Bearer <token>"},
+                fix_key="headers.Authorization",
+                log_hint="Missing or invalid Authorization header",
+            ),
+            Issue(
+                issue_id="easy_content_type",
+                service="payment_client",
+                description="Wrong Content-Type header (text/plain instead of application/json)",
+                expected_fix={"headers.Content-Type": "application/json"},
+                fix_key="headers.Content-Type",
+                log_hint="Content-Type must be application/json",
+            ),
+        ],
+    )
+# ─── Medium Scenario ─────────────────────────────────────────────────────────
+def _medium_scenario() -> Scenario:
+    """
+    Medium: Webhook chain with rate limiting misconfiguration,
+    incorrect retry logic, and missing signature validation.
+    """
+    return Scenario(
+        task_id="medium",
+        difficulty="medium",
+        description=(
+            "A webhook-based notification system is dropping events. "
+            "Service A sends webhooks to Service B, which forwards to Service C. "
+            "Events are being lost with 429, retry exhaustion, and signature validation failures. "
+            "Fix the webhook chain configuration."
+        ),
+        max_steps=25,
+        services=["webhook_sender", "webhook_receiver", "notification_service"],
+        configs={
+            "webhook_sender": {
+                "target_url": "https://receiver.internal/webhook",
+                "headers": {
+                    "Content-Type": "application/json",
+                    "X-Webhook-Signature": "",  # BUG: empty signature
+                },
+                "rate_limit": {
+                    "requests_per_second": 100,  # BUG: too high, receiver allows 10/s
+                    "burst_size": 200,
+                },
+                "retry": {
+                    "max_retries": 1,  # BUG: should be at least 3
+                    "backoff_factor": 0,  # BUG: no backoff
+                    "retry_on_status": [500],  # BUG: should also retry on 429
+                },
+                "signing_secret": "whsec_abc123secret",
+            },
+            "webhook_receiver": {
+                "endpoint": "/webhook",
+                "rate_limit": {
+                    "requests_per_second": 10,
+                    "burst_size": 20,
+                },
+                "signature_validation": True,
+                "expected_signature_header": "X-Webhook-Signature",
+                "signing_secret": "whsec_abc123secret",
+                "forward_to": "https://notifications.internal/notify",
+            },
+            "notification_service": {
+                "endpoint": "/notify",
+                "accepts_from": ["webhook_receiver"],
+                "status": "healthy",
+            },
+        },
+        logs={
+            "webhook_sender": [
+                "[ERROR] 2026-03-25T11:00:01Z POST /webhook -> 429 Too Many Requests",
+                "[ERROR] 2026-03-25T11:00:01Z Rate limited. Retry-After: 5s",
+                "[WARN]  2026-03-25T11:00:02Z Retry attempt 1/1 failed. No more retries.",
+                "[ERROR] 2026-03-25T11:00:03Z Event evt_12345 dropped after retry exhaustion",
+                "[WARN]  2026-03-25T11:00:00Z Sending at 100 req/s (burst=200)",
+                "[INFO]  2026-03-25T10:59:59Z Webhook sender started. Signature header: X-Webhook-Signature",
+            ],
+            "webhook_receiver": [
+                "[WARN]  2026-03-25T11:00:01Z Rate limit exceeded: 100 req/s > 10 req/s allowed",
+                "[ERROR] 2026-03-25T11:00:02Z Signature validation FAILED: received empty signature",
+                "[WARN]  2026-03-25T11:00:02Z Dropping event: invalid signature from webhook_sender",
+                "[INFO]  2026-03-25T10:59:59Z Receiver ready. Rate limit: 10 req/s. Signature validation: ON",
+            ],
+            "notification_service": [
+                "[WARN]  2026-03-25T11:00:05Z No events received in last 60s",
+                "[INFO]  2026-03-25T10:59:59Z Notification service healthy. Waiting for events.",
+            ],
+        },
+        issues=[
+            Issue(
+                issue_id="medium_rate_limit",
+                service="webhook_sender",
+                description="Rate limit too high (100/s vs receiver's 10/s limit)",
+                expected_fix={"rate_limit.requests_per_second": 10},
+                fix_key="rate_limit.requests_per_second",
+                log_hint="Rate limit exceeded: 100 req/s > 10 req/s allowed",
+            ),
+            Issue(
+                issue_id="medium_retry",
+                service="webhook_sender",
+                description="Insufficient retry config: only 1 retry, no backoff, missing 429 in retry_on_status",
+                expected_fix={
+                    "retry.max_retries": 3,
+                    "retry.backoff_factor": 2,
+                    "retry.retry_on_status": [429, 500],
+                },
+                fix_key="retry",
+                log_hint="Retry attempt 1/1 failed. No more retries.",
+            ),
+            Issue(
+                issue_id="medium_signature",
+                service="webhook_sender",
+                description="Webhook signature header is empty — receiver rejects unsigned events",
+                expected_fix={"headers.X-Webhook-Signature": "sha256=<computed>"},
+                fix_key="headers.X-Webhook-Signature",
+                log_hint="Signature validation FAILED: received empty signature",
+            ),
+        ],
+    )
+# ─── Hard Scenario ────────────────────────────────────────────────────────────
+def _hard_scenario() -> Scenario:
+    """
+    Hard: Race condition in a 3-service order processing chain.
+    Service A (order) -> Service B (inventory) -> Service C (shipping).
+    Cascading 500s due to ordering issues, wrong URLs, missing timeouts, and auth failures.
+    """
+    return Scenario(
+        task_id="hard",
+        difficulty="hard",
+        description=(
+            "An e-commerce order processing pipeline is failing with cascading errors. "
+            "Order Service sends to Inventory Service, which sends to Shipping Service. "
+            "Requests are timing out, hitting wrong endpoints, failing auth, and "
+            "the ordering causes race conditions. Fix all 5 issues across the chain."
+        ),
+        max_steps=40,
+        services=["order_service", "inventory_service", "shipping_service", "api_gateway", "auth_service"],
+        configs={
+            "order_service": {
+                "name": "order_service",
+                "inventory_url": "https://inventory.internal/v1/check",  # BUG: wrong path, should be /v2/reserve
+                "headers": {
+                    "Content-Type": "application/json",
+                    "Authorization": "Bearer valid_token_123",
+                },
+                "timeout": 2,  # BUG: too short for inventory, which needs 5s
+                "async_mode": False,  # BUG: should be True to avoid race condition
+                "callback_url": "https://orders.internal/callback",
+            },
+            "inventory_service": {
+                "name": "inventory_service",
+                "endpoint_version": "v2",
+                "reserve_path": "/v2/reserve",
+                "check_path": "/v2/check",
+                "shipping_url": "https://shipping.internal/v1/create",
+                "headers": {
+                    "Content-Type": "application/json",
+                    "Authorization": "Bearer expired_token_456",  # BUG: expired token
+                },
+                "timeout": 10,
+                "processing_time_avg": 4,  # seconds — this is why order_service's 2s timeout fails
+            },
+            "shipping_service": {
+                "name": "shipping_service",
+                "create_path": "/v1/create",
+                "requires_auth": True,
+                "accepted_auth": ["Bearer"],
+                "token_validation_url": "https://auth.internal/validate",
+                "status": "healthy",
+            },
+            "api_gateway": {
+                "routes": {
+                    "/v1/check": "DEPRECATED — use /v2/check",
+                    "/v2/reserve": "inventory_service",
+                    "/v2/check": "inventory_service",
+                    "/v1/create": "shipping_service",
+                },
+                "timeout": 30,
+            },
+            "auth_service": {
+                "valid_tokens": ["valid_token_123", "valid_token_789"],
+                "expired_tokens": ["expired_token_456"],
+                "token_refresh_endpoint": "/refresh",
+            },
+        },
+        logs={
+            "order_service": [
+                "[ERROR] 2026-03-25T12:00:05Z POST inventory.internal/v1/check -> 301 Moved Permanently",
+                "[ERROR] 2026-03-25T12:00:05Z Response: {'error': 'Endpoint deprecated. Use /v2/reserve'}",
+                "[ERROR] 2026-03-25T12:00:07Z Timeout after 2s waiting for inventory response",
+                "[ERROR] 2026-03-25T12:00:07Z Order ord_999 failed: inventory check timed out",
+                "[WARN]  2026-03-25T12:00:08Z Synchronous mode: blocking on inventory response",
+                "[ERROR] 2026-03-25T12:00:09Z Race condition: order ord_998 processed before ord_997 completed",
+            ],
+            "inventory_service": [
+                "[INFO]  2026-03-25T12:00:05Z Received request on /v1/check -> redirecting to /v2/check",
+                "[WARN]  2026-03-25T12:00:06Z Processing reservation... avg time: 4s",
+                "[ERROR] 2026-03-25T12:00:10Z POST shipping.internal/v1/create -> 401 Unauthorized",
+                "[ERROR] 2026-03-25T12:00:10Z Auth token expired_token_456 is no longer valid",
+                "[ERROR] 2026-03-25T12:00:10Z Cannot create shipment: authentication failed",
+            ],
+            "shipping_service": [
+                "[WARN]  2026-03-25T12:00:10Z Rejected request: token 'expired_token_456' is expired",
+                "[INFO]  2026-03-25T12:00:00Z Shipping service healthy, awaiting authenticated requests",
+            ],
+            "api_gateway": [
+                "[WARN]  2026-03-25T12:00:05Z Deprecated endpoint /v1/check accessed by order_service",
+                "[INFO]  2026-03-25T12:00:05Z Redirecting /v1/check -> /v2/check (301)",
+            ],
+            "auth_service": [
+                "[WARN]  2026-03-25T12:00:10Z Token validation failed: expired_token_456 expired at 2026-03-24T00:00:00Z",
+                "[INFO]  2026-03-25T12:00:00Z Auth service ready. Valid tokens: 2, Expired: 1",
+            ],
+        },
+        issues=[
+            Issue(
+                issue_id="hard_wrong_url",
+                service="order_service",
+                description="Order service calling deprecated /v1/check instead of /v2/reserve",
+                expected_fix={"inventory_url": "https://inventory.internal/v2/reserve"},
+                fix_key="inventory_url",
+                log_hint="Endpoint deprecated. Use /v2/reserve",
+            ),
+            Issue(
+                issue_id="hard_timeout",
+                service="order_service",
+                description="Timeout too short (2s) for inventory service that takes ~4s to process",
+                expected_fix={"timeout": 10},
+                fix_key="timeout",
+                log_hint="Timeout after 2s waiting for inventory response",
+            ),
+            Issue(
+                issue_id="hard_async",
+                service="order_service",
+                description="Synchronous mode causes race conditions between concurrent orders",
+                expected_fix={"async_mode": True},
+                fix_key="async_mode",
+                log_hint="Race condition: order ord_998 processed before ord_997 completed",
+            ),
+            Issue(
+                issue_id="hard_expired_token",
+                service="inventory_service",
+                description="Expired auth token used for shipping service requests",
+                expected_fix={"headers.Authorization": "Bearer valid_token_789"},
+                fix_key="headers.Authorization",
+                log_hint="Auth token expired_token_456 is no longer valid",
+            ),
+            Issue(
+                issue_id="hard_token_refresh",
+                service="inventory_service",
+                description="No automatic token refresh mechanism configured",
+                expected_fix={"token_refresh_url": "https://auth.internal/refresh", "auto_refresh": True},
+                fix_key="token_refresh_url",
+                log_hint="Token validation failed: expired_token_456 expired",
+            ),
+        ],
+    )

scripts/baseline_inference.py ADDED Viewed

	@@ -0,0 +1,223 @@

+"""
+Baseline inference script for the API Integration Debugging Environment.
+This script demonstrates an LLM-powered agent interacting with the environment
+using the OpenAI API. It runs all 3 tasks (easy, medium, hard) and reports
+baseline scores.
+Usage:
+    # Set your OpenAI API key
+    export OPENAI_API_KEY=your-key-here
+    # Run baseline
+    python scripts/baseline_inference.py
+    # Or specify a server URL
+    python scripts/baseline_inference.py --server-url http://localhost:8000
+"""
+import argparse
+import json
+import os
+import sys
+from typing import Any, Dict, List, Optional
+# Add parent directory to path
+sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
+from models import ApiDebugAction, ApiDebugObservation
+from scenarios import get_all_task_ids, get_scenario
+from server.api_debug_env_environment import ApiDebugEnvironment
+def run_rule_based_baseline(task_id: str) -> Dict[str, Any]:
+    """
+    Run a simple rule-based baseline agent (no LLM needed).
+    Strategy:
+    1. Inspect all logs
+    2. Inspect all configs
+    3. Test all endpoints
+    (Does not attempt fixes — tests reward signal for exploration-only behavior)
+    """
+    env = ApiDebugEnvironment(task_id=task_id)
+    obs = env.reset()
+    total_reward = 0.0
+    # Phase 1: Inspect all logs
+    for service in obs.available_targets:
+        if obs.done:
+            break
+        obs = env.step(ApiDebugAction(action_type="inspect_logs", target=service))
+        total_reward += obs.reward
+    # Phase 2: Inspect all configs
+    for service in obs.available_targets:
+        if obs.done:
+            break
+        obs = env.step(ApiDebugAction(action_type="inspect_config", target=service))
+        total_reward += obs.reward
+    # Phase 3: Test all endpoints
+    for service in obs.available_targets:
+        if obs.done:
+            break
+        obs = env.step(ApiDebugAction(action_type="inspect_endpoint", target=service))
+        total_reward += obs.reward
+    score = env.grade()
+    return {
+        "task_id": task_id,
+        "score": score,
+        "total_reward": round(total_reward, 4),
+        "steps_used": env._state.step_count,
+        "issues_found": len(env._issues_found),
+        "issues_fixed": len(env._issues_fixed),
+        "issues_total": len(env._scenario.issues) if env._scenario else 0,
+    }
+def run_llm_baseline(task_id: str, api_key: Optional[str] = None) -> Dict[str, Any]:
+    """
+    Run an LLM-powered baseline agent using OpenAI API.
+    The LLM reads observations and decides what to do next.
+    """
+    try:
+        from openai import OpenAI
+    except ImportError:
+        print("OpenAI package not installed. Running rule-based baseline instead.")
+        return run_rule_based_baseline(task_id)
+    key = api_key or os.environ.get("OPENAI_API_KEY")
+    if not key:
+        print("No OPENAI_API_KEY set. Running rule-based baseline instead.")
+        return run_rule_based_baseline(task_id)
+    client = OpenAI(api_key=key)
+    env = ApiDebugEnvironment(task_id=task_id)
+    obs = env.reset()
+    total_reward = 0.0
+    system_prompt = f"""You are an API debugging agent. Your task: {obs.task_description}
+Available actions:
+- inspect_logs: Read error logs for a service
+- inspect_config: See the configuration of a service
+- inspect_endpoint: Test-call an endpoint
+- submit_fix: Submit a config fix (requires fix_payload dict)
+Available targets: {obs.available_targets}
+Total issues to fix: {obs.issues_total}
+Respond with JSON: {{"action_type": "...", "target": "...", "fix_payload": {{...}} }}
+Only include fix_payload when action_type is "submit_fix"."""
+    messages = [{"role": "system", "content": system_prompt}]
+    while not obs.done:
+        # Build observation message
+        obs_text = f"""Step {env._state.step_count}/{env._scenario.max_steps if env._scenario else '?'}
+Remaining steps: {obs.remaining_steps}
+Issues found: {obs.issues_found}/{obs.issues_total}
+Issues fixed: {obs.issues_fixed}/{obs.issues_total}
+Last action result: {obs.action_result}"""
+        if obs.logs:
+            obs_text += f"\nLogs:\n" + "\n".join(obs.logs)
+        if obs.config_snapshot:
+            obs_text += f"\nConfig: {json.dumps(obs.config_snapshot, indent=2)}"
+        if obs.api_response:
+            obs_text += f"\nAPI Response: {json.dumps(obs.api_response, indent=2)}"
+        if obs.hints:
+            obs_text += f"\nHints: {'; '.join(obs.hints)}"
+        messages.append({"role": "user", "content": obs_text})
+        try:
+            response = client.chat.completions.create(
+                model="gpt-4o-mini",
+                messages=messages,
+                temperature=0.2,
+                max_tokens=500,
+                response_format={"type": "json_object"},
+            )
+            action_json = json.loads(response.choices[0].message.content)
+            messages.append({"role": "assistant", "content": json.dumps(action_json)})
+            action = ApiDebugAction(
+                action_type=action_json.get("action_type", "inspect_logs"),
+                target=action_json.get("target", obs.available_targets[0] if obs.available_targets else ""),
+                fix_payload=action_json.get("fix_payload"),
+            )
+        except Exception as e:
+            print(f"  LLM error: {e}. Falling back to inspect_logs.")
+            action = ApiDebugAction(
+                action_type="inspect_logs",
+                target=obs.available_targets[0] if obs.available_targets else "",
+            )
+        obs = env.step(action)
+        total_reward += obs.reward
+    score = env.grade()
+    return {
+        "task_id": task_id,
+        "score": score,
+        "total_reward": round(total_reward, 4),
+        "steps_used": env._state.step_count,
+        "issues_found": len(env._issues_found),
+        "issues_fixed": len(env._issues_fixed),
+        "issues_total": len(env._scenario.issues) if env._scenario else 0,
+    }
+def main():
+    parser = argparse.ArgumentParser(description="Baseline inference for API Debug Env")
+    parser.add_argument("--mode", choices=["rule", "llm"], default="rule",
+                        help="Baseline mode: 'rule' for rule-based, 'llm' for LLM-powered")
+    parser.add_argument("--api-key", type=str, default=None,
+                        help="OpenAI API key (or set OPENAI_API_KEY env var)")
+    parser.add_argument("--task", type=str, default=None,
+                        help="Run specific task only (easy/medium/hard)")
+    args = parser.parse_args()
+    print("=" * 60)
+    print("API Integration Debugging — Baseline Inference")
+    print("=" * 60)
+    task_ids = [args.task] if args.task else get_all_task_ids()
+    all_results = {}
+    for task_id in task_ids:
+        print(f"\n{'─' * 40}")
+        print(f"Task: {task_id}")
+        print(f"{'─' * 40}")
+        if args.mode == "llm":
+            result = run_llm_baseline(task_id, args.api_key)
+        else:
+            result = run_rule_based_baseline(task_id)
+        all_results[task_id] = result
+        print(f"  Score:        {result['score']}")
+        print(f"  Reward:       {result['total_reward']}")
+        print(f"  Steps:        {result['steps_used']}")
+        print(f"  Issues found: {result['issues_found']}/{result['issues_total']}")
+        print(f"  Issues fixed: {result['issues_fixed']}/{result['issues_total']}")
+    print(f"\n{'=' * 60}")
+    print("Summary")
+    print(f"{'=' * 60}")
+    for tid, res in all_results.items():
+        print(f"  {tid:8s}  score={res['score']:.4f}  fixed={res['issues_fixed']}/{res['issues_total']}")
+    avg_score = sum(r["score"] for r in all_results.values()) / len(all_results)
+    print(f"\n  Average score: {avg_score:.4f}")
+    return all_results
+if __name__ == "__main__":
+    main()

server/__init__.py ADDED Viewed

	@@ -0,0 +1,11 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""Api Debug Env environment server components."""
+from .api_debug_env_environment import ApiDebugEnvironment
+__all__ = ["ApiDebugEnvironment"]

server/api_debug_env_environment.py ADDED Viewed

	@@ -0,0 +1,446 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""
+API Integration Debugging Environment Implementation.
+A real-world environment where an AI agent diagnoses and fixes broken
+API integrations by reading error logs, inspecting configurations,
+and submitting corrected configurations.
+"""
+import copy
+from typing import Any, Dict, List, Optional, Set
+from uuid import uuid4
+from openenv.core.env_server.interfaces import Environment
+from openenv.core.env_server.types import State
+try:
+    from ..models import ApiDebugAction, ApiDebugObservation
+    from ..scenarios import Issue, Scenario, get_all_task_ids, get_scenario
+except ImportError:
+    from models import ApiDebugAction, ApiDebugObservation
+    from scenarios import Issue, Scenario, get_all_task_ids, get_scenario
+class ApiDebugEnvironment(Environment):
+    """
+    API Integration Debugging Environment.
+    An agent must diagnose and fix broken API integrations by:
+    1. Inspecting error logs to identify issues
+    2. Inspecting service configurations
+    3. Testing endpoints to observe failures
+    4. Submitting configuration fixes
+    Supports 3 difficulty levels (easy, medium, hard) with different
+    numbers of issues and complexity.
+    """
+    SUPPORTS_CONCURRENT_SESSIONS: bool = True
+    def __init__(self, task_id: str = "easy"):
+        """
+        Initialize the environment.
+        Args:
+            task_id: One of 'easy', 'medium', 'hard'
+        """
+        self._task_id = task_id
+        self._state = State(episode_id=str(uuid4()), step_count=0)
+        self._scenario: Optional[Scenario] = None
+        self._current_configs: Dict[str, Dict[str, Any]] = {}
+        self._issues_found: Set[str] = set()
+        self._issues_fixed: Set[str] = set()
+        self._inspected_targets: Set[str] = set()
+        self._done = False
+        self._last_action_result = ""
+        self._cumulative_reward = 0.0
+    def reset(self, task_id: Optional[str] = None) -> ApiDebugObservation:
+        """
+        Reset the environment, optionally with a new task.
+        Args:
+            task_id: Override the task difficulty. One of 'easy', 'medium', 'hard'.
+        Returns:
+            Initial observation with task description and available targets.
+        """
+        if task_id is not None:
+            self._task_id = task_id
+        self._state = State(episode_id=str(uuid4()), step_count=0)
+        self._scenario = get_scenario(self._task_id)
+        self._current_configs = copy.deepcopy(self._scenario.configs)
+        self._issues_found = set()
+        self._issues_fixed = set()
+        self._inspected_targets = set()
+        self._done = False
+        self._last_action_result = ""
+        self._cumulative_reward = 0.0
+        return ApiDebugObservation(
+            task_id=self._task_id,
+            task_description=self._scenario.description,
+            logs=[],
+            config_snapshot={},
+            api_response=None,
+            hints=self._get_hints(),
+            remaining_steps=self._scenario.max_steps,
+            issues_found=0,
+            issues_fixed=0,
+            issues_total=len(self._scenario.issues),
+            action_result="Environment reset. Use 'inspect_logs' or 'inspect_config' to start debugging.",
+            available_targets=self._scenario.services,
+            done=False,
+            reward=0.0,
+        )
+    def step(self, action: ApiDebugAction) -> ApiDebugObservation:  # type: ignore[override]
+        """
+        Execute one debugging step.
+        Args:
+            action: ApiDebugAction with action_type, target, and optional fix_payload
+        Returns:
+            ApiDebugObservation with results of the action
+        """
+        if self._scenario is None:
+            # Auto-reset if not initialized
+            self.reset()
+        assert self._scenario is not None  # for type checker
+        self._state.step_count += 1
+        reward = 0.0
+        logs: List[str] = []
+        config_snapshot: Dict[str, Any] = {}
+        api_response: Optional[Dict[str, Any]] = None
+        # Validate target
+        if action.target not in self._scenario.services:
+            self._last_action_result = (
+                f"Invalid target '{action.target}'. "
+                f"Valid targets: {self._scenario.services}"
+            )
+            reward = -0.05
+        elif action.action_type == "inspect_logs":
+            logs, reward = self._handle_inspect_logs(action.target)
+        elif action.action_type == "inspect_config":
+            config_snapshot, reward = self._handle_inspect_config(action.target)
+        elif action.action_type == "inspect_endpoint":
+            api_response, reward = self._handle_inspect_endpoint(action.target)
+        elif action.action_type == "submit_fix":
+            reward = self._handle_submit_fix(action.target, action.fix_payload or {})
+        else:
+            self._last_action_result = (
+                f"Invalid action_type '{action.action_type}'. "
+                "Valid types: inspect_logs, inspect_config, inspect_endpoint, submit_fix"
+            )
+            reward = -0.05
+        self._cumulative_reward += reward
+        # Check episode termination
+        remaining = self._scenario.max_steps - self._state.step_count
+        all_fixed = len(self._issues_fixed) == len(self._scenario.issues)
+        if all_fixed:
+            self._done = True
+            reward += 0.2  # completion bonus
+            self._cumulative_reward += 0.2
+            self._last_action_result += " 🎉 All issues fixed! Episode complete."
+        if remaining <= 0 and not self._done:
+            self._done = True
+            self._last_action_result += " ⏰ Out of steps. Episode ended."
+        return ApiDebugObservation(
+            task_id=self._task_id,
+            task_description=self._scenario.description,
+            logs=logs,
+            config_snapshot=config_snapshot,
+            api_response=api_response,
+            hints=self._get_hints(),
+            remaining_steps=max(0, remaining),
+            issues_found=len(self._issues_found),
+            issues_fixed=len(self._issues_fixed),
+            issues_total=len(self._scenario.issues),
+            action_result=self._last_action_result,
+            available_targets=self._scenario.services,
+            done=self._done,
+            reward=reward,
+            metadata={
+                "cumulative_reward": self._cumulative_reward,
+                "step": self._state.step_count,
+                "issues_found_ids": list(self._issues_found),
+                "issues_fixed_ids": list(self._issues_fixed),
+            },
+        )
+    @property
+    def state(self) -> State:
+        """Get current environment state."""
+        return self._state
+    # ─── Action Handlers ──────────────────────────────────────────────────
+    def _handle_inspect_logs(self, target: str) -> tuple:
+        """Return logs for a service and reward for relevant inspection."""
+        assert self._scenario is not None
+        logs = self._scenario.logs.get(target, [])
+        self._inspected_targets.add(f"logs:{target}")
+        # Check if any unfound issues have log hints in these logs
+        found_new = False
+        for issue in self._scenario.issues:
+            if issue.issue_id not in self._issues_found:
+                for log_line in logs:
+                    if issue.log_hint in log_line:
+                        self._issues_found.add(issue.issue_id)
+                        found_new = True
+        if found_new:
+            reward = 0.15
+            self._last_action_result = f"Inspected logs for '{target}'. Found relevant error patterns!"
+        elif logs:
+            reward = 0.05
+            self._last_action_result = f"Inspected logs for '{target}'. {len(logs)} log entries found."
+        else:
+            reward = 0.0
+            self._last_action_result = f"No logs available for '{target}'."
+        return logs, reward
+    def _handle_inspect_config(self, target: str) -> tuple:
+        """Return current config for a service."""
+        assert self._scenario is not None
+        config = self._current_configs.get(target, {})
+        self._inspected_targets.add(f"config:{target}")
+        # Small reward for inspecting a service that has issues
+        has_issues = any(i.service == target for i in self._scenario.issues if i.issue_id not in self._issues_fixed)
+        reward = 0.05 if has_issues else 0.02
+        self._last_action_result = f"Inspected config for '{target}'. Configuration retrieved."
+        return config, reward
+    def _handle_inspect_endpoint(self, target: str) -> tuple:
+        """Simulate testing an endpoint and return the response."""
+        assert self._scenario is not None
+        # Find unfixed issues for this service
+        unfixed = [
+            i for i in self._scenario.issues
+            if i.service == target and i.issue_id not in self._issues_fixed
+        ]
+        if unfixed:
+            # Simulate a failure based on the first unfixed issue
+            issue = unfixed[0]
+            api_response = {
+                "status": "error",
+                "status_code": 401 if "auth" in issue.issue_id else 500,
+                "error": issue.description,
+                "hint": f"Check the {issue.fix_key} configuration",
+            }
+            reward = 0.05
+            self._last_action_result = f"Tested endpoint on '{target}'. Got error response."
+        else:
+            api_response = {
+                "status": "success",
+                "status_code": 200,
+                "message": f"{target} is working correctly.",
+            }
+            reward = 0.02
+            self._last_action_result = f"Tested endpoint on '{target}'. Service responding OK."
+        return api_response, reward
+    def _handle_submit_fix(self, target: str, fix_payload: Dict[str, Any]) -> float:
+        """Process a fix submission and score it."""
+        assert self._scenario is not None
+        if not fix_payload:
+            self._last_action_result = "Fix rejected: fix_payload cannot be empty."
+            return -0.1
+        # Find issues for this target service
+        target_issues = [
+            i for i in self._scenario.issues
+            if i.service == target and i.issue_id not in self._issues_fixed
+        ]
+        if not target_issues:
+            self._last_action_result = f"No unfixed issues found for '{target}'."
+            return -0.05
+        reward = 0.0
+        fixed_any = False
+        for issue in target_issues:
+            if self._check_fix(issue, fix_payload):
+                self._issues_fixed.add(issue.issue_id)
+                self._issues_found.add(issue.issue_id)  # finding + fixing counts
+                self._apply_fix(target, fix_payload)
+                reward += 0.25
+                fixed_any = True
+        if fixed_any:
+            fixed_count = sum(1 for i in target_issues if i.issue_id in self._issues_fixed)
+            self._last_action_result = (
+                f"Fix accepted for '{target}'! "
+                f"Fixed {fixed_count} issue(s). "
+                f"Total fixed: {len(self._issues_fixed)}/{len(self._scenario.issues)}"
+            )
+        else:
+            self._last_action_result = (
+                f"Fix rejected for '{target}'. The payload doesn't address any known issues. "
+                "Try inspecting logs and config to identify the correct fix."
+            )
+            reward = -0.1
+        return reward
+    # ─── Helper Methods ───────────────────────────────────────────────────
+    def _check_fix(self, issue: Issue, fix_payload: Dict[str, Any]) -> bool:
+        """
+        Check if a fix payload correctly addresses an issue.
+        Uses fuzzy matching — the fix is accepted if:
+        1. The fix_key is present in the payload, OR
+        2. Any expected_fix key is present in the payload with a reasonable value
+        """
+        # Direct key match
+        if issue.fix_key in fix_payload:
+            return True
+        # Check nested key (e.g., "headers.Authorization" -> check payload for "Authorization")
+        if "." in issue.fix_key:
+            parts = issue.fix_key.split(".")
+            leaf_key = parts[-1]
+            if leaf_key in fix_payload:
+                return True
+        # Check expected fix keys
+        for key in issue.expected_fix:
+            if key in fix_payload:
+                return True
+            if "." in key:
+                leaf = key.split(".")[-1]
+                if leaf in fix_payload:
+                    return True
+        return False
+    def _apply_fix(self, target: str, fix_payload: Dict[str, Any]) -> None:
+        """Apply a fix to the current configuration."""
+        if target not in self._current_configs:
+            return
+        config = self._current_configs[target]
+        for key, value in fix_payload.items():
+            if "." in key:
+                # Nested key: e.g., "headers.Authorization"
+                parts = key.split(".")
+                obj = config
+                for part in parts[:-1]:
+                    if part not in obj:
+                        obj[part] = {}
+                    obj = obj[part]
+                obj[parts[-1]] = value
+            else:
+                config[key] = value
+    def _get_hints(self) -> List[str]:
+        """Return progressive hints based on step count."""
+        if self._scenario is None:
+            return []
+        hints = []
+        step = self._state.step_count
+        total_issues = len(self._scenario.issues)
+        unfixed = total_issues - len(self._issues_fixed)
+        if step == 0:
+            hints.append("Start by inspecting error logs for each service to find clues.")
+            hints.append(f"There are {total_issues} issues to find and fix.")
+        elif step > 0 and len(self._issues_found) == 0:
+            hints.append("Try 'inspect_logs' on different services to find error patterns.")
+        elif len(self._issues_found) > 0 and len(self._issues_fixed) == 0:
+            hints.append("You've found issues! Use 'inspect_config' to see current settings, then 'submit_fix'.")
+        elif unfixed > 0:
+            hints.append(f"{unfixed} issue(s) remaining. Check services you haven't inspected yet.")
+        # Late-game hints
+        if self._scenario.max_steps - step <= 5 and unfixed > 0:
+            # Give more specific hints when running low on steps
+            for issue in self._scenario.issues:
+                if issue.issue_id not in self._issues_fixed:
+                    hints.append(f"Hint: Check '{issue.service}' ��� look for '{issue.fix_key}' in the config.")
+        return hints
+    # ─── Grading ──────────────────────────────────────────────────────────
+    def grade(self) -> float:
+        """
+        Grade the agent's performance on the current episode.
+        Score = (issues_fixed / issues_total) * efficiency_bonus
+        Efficiency bonus = 1.0 + (remaining_steps / max_steps * 0.3)
+        Returns:
+            Score between 0.0 and 1.0
+        """
+        if self._scenario is None:
+            return 0.0
+        total = len(self._scenario.issues)
+        if total == 0:
+            return 1.0
+        fix_ratio = len(self._issues_fixed) / total
+        remaining = max(0, self._scenario.max_steps - self._state.step_count)
+        efficiency_bonus = 1.0 + (remaining / self._scenario.max_steps * 0.3)
+        score = fix_ratio * efficiency_bonus
+        return min(1.0, round(score, 4))
+    def get_task_info(self) -> Dict[str, Any]:
+        """Return information about the current task."""
+        if self._scenario is None:
+            return {"error": "Environment not initialized. Call reset() first."}
+        return {
+            "task_id": self._task_id,
+            "difficulty": self._scenario.difficulty,
+            "description": self._scenario.description,
+            "max_steps": self._scenario.max_steps,
+            "issues_total": len(self._scenario.issues),
+            "services": self._scenario.services,
+            "action_schema": {
+                "action_type": {
+                    "type": "string",
+                    "enum": ["inspect_logs", "inspect_config", "inspect_endpoint", "submit_fix"],
+                    "description": "The type of debugging action to take",
+                },
+                "target": {
+                    "type": "string",
+                    "enum": self._scenario.services,
+                    "description": "The service to act on",
+                },
+                "fix_payload": {
+                    "type": "object",
+                    "description": "Configuration fix (required for submit_fix action)",
+                    "required": False,
+                },
+            },
+        }

server/app.py ADDED Viewed

	@@ -0,0 +1,196 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""
+FastAPI application for the API Integration Debugging Environment.
+Endpoints:
+    - POST /reset: Reset the environment
+    - POST /step: Execute an action
+    - GET /state: Get current environment state
+    - GET /schema: Get action/observation schemas
+    - WS /ws: WebSocket endpoint for persistent sessions
+    - GET /tasks: List all tasks with action schema
+    - POST /grader: Get grader score for current episode
+    - POST /baseline: Run baseline inference on all tasks
+Usage:
+    uvicorn server.app:app --reload --host 0.0.0.0 --port 8000
+"""
+import os
+from typing import Dict, Any, Optional
+from fastapi import FastAPI
+from pydantic import BaseModel
+try:
+    from openenv.core.env_server.http_server import create_app
+except Exception as e:
+    raise ImportError(
+        "openenv is required. Install with: uv sync"
+    ) from e
+try:
+    from ..models import ApiDebugAction, ApiDebugObservation
+    from .api_debug_env_environment import ApiDebugEnvironment
+except ModuleNotFoundError:
+    from models import ApiDebugAction, ApiDebugObservation
+    from server.api_debug_env_environment import ApiDebugEnvironment
+try:
+    from ..scenarios import get_all_task_ids, get_scenario
+except ModuleNotFoundError:
+    from scenarios import get_all_task_ids, get_scenario
+# ─── Create the core OpenEnv app ─────────────────────────────────────────────
+app = create_app(
+    ApiDebugEnvironment,
+    ApiDebugAction,
+    ApiDebugObservation,
+    env_name="api_debug_env",
+    max_concurrent_envs=3,
+)
+# ─── Hackathon-required endpoints ─────────────────────────────────────────────
+# Store environment instances per task for grading
+_grading_envs: Dict[str, ApiDebugEnvironment] = {}
+class GraderRequest(BaseModel):
+    task_id: str = "easy"
+class BaselineRequest(BaseModel):
+    api_key: Optional[str] = None
+@app.get("/tasks")
+async def list_tasks():
+    """Return list of all tasks with action schema."""
+    tasks = []
+    for task_id in get_all_task_ids():
+        scenario = get_scenario(task_id)
+        tasks.append({
+            "task_id": task_id,
+            "difficulty": scenario.difficulty,
+            "description": scenario.description,
+            "max_steps": scenario.max_steps,
+            "issues_count": len(scenario.issues),
+            "services": scenario.services,
+            "action_schema": {
+                "action_type": {
+                    "type": "string",
+                    "enum": ["inspect_logs", "inspect_config", "inspect_endpoint", "submit_fix"],
+                },
+                "target": {
+                    "type": "string",
+                    "enum": scenario.services,
+                },
+                "fix_payload": {
+                    "type": "object",
+                    "required": False,
+                },
+            },
+        })
+    return {"tasks": tasks}
+@app.post("/grader")
+async def run_grader(request: GraderRequest):
+    """Return grader score for a completed episode."""
+    task_id = request.task_id
+    if task_id in _grading_envs:
+        env = _grading_envs[task_id]
+        score = env.grade()
+        return {
+            "task_id": task_id,
+            "score": score,
+            "issues_fixed": len(env._issues_fixed),
+            "issues_total": len(env._scenario.issues) if env._scenario else 0,
+            "steps_used": env._state.step_count,
+        }
+    return {
+        "task_id": task_id,
+        "score": 0.0,
+        "message": "No completed episode found. Run the environment first.",
+    }
+@app.post("/baseline")
+async def run_baseline(request: BaselineRequest):
+    """
+    Run a simple rule-based baseline agent on all tasks.
+    Returns baseline scores for each task.
+    """
+    results = {}
+    for task_id in get_all_task_ids():
+        env = ApiDebugEnvironment(task_id=task_id)
+        obs = env.reset()
+        # Simple baseline strategy: inspect all logs, then all configs, then submit fixes
+        for service in obs.available_targets:
+            if env._done:
+                break
+            obs = env.step(ApiDebugAction(
+                action_type="inspect_logs",
+                target=service,
+            ))
+        for service in obs.available_targets:
+            if env._done:
+                break
+            obs = env.step(ApiDebugAction(
+                action_type="inspect_config",
+                target=service,
+            ))
+        for service in obs.available_targets:
+            if env._done:
+                break
+            obs = env.step(ApiDebugAction(
+                action_type="inspect_endpoint",
+                target=service,
+            ))
+        # Store for grading
+        _grading_envs[task_id] = env
+        score = env.grade()
+        results[task_id] = {
+            "score": score,
+            "steps_used": env._state.step_count,
+            "issues_found": len(env._issues_found),
+            "issues_fixed": len(env._issues_fixed),
+            "issues_total": len(env._scenario.issues) if env._scenario else 0,
+        }
+    return {"baseline_scores": results}
+# ─── Entry point ──────────────────────────────────────────────────────────────
+def main(host: str = "0.0.0.0", port: int = 8000):
+    """Run the server directly."""
+    import argparse
+    import uvicorn
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--host", type=str, default=host)
+    parser.add_argument("--port", type=int, default=port)
+    args = parser.parse_args()
+    uvicorn.run(app, host=args.host, port=args.port)
+if __name__ == "__main__":
+    main()

server/requirements.txt ADDED Viewed

	@@ -0,0 +1,6 @@

+openenv[core]>=0.2.0
+fastapi>=0.115.0
+uvicorn>=0.24.0

uv.lock ADDED Viewed

The diff for this file is too large to render. See raw diff