sql_env / AGENTS.md
hjerpe's picture
Upload folder using huggingface_hub
5dd1bb4 verified

Project Map (AGENTS.md)

This file is a navigation map for agents. Durable knowledge lives in docs/.

Start Here

System-of-Record Documents

Category Location Type Purpose
Guides docs/guides/README.md how-to Practical procedures
Design docs docs/design-docs/index.md explanation Feature design, ADRs
References docs/references/README.md reference External docs

Project Structure

This project follows the OpenEnv openenv init convention. The project root is the environment package β€” no envs/ nesting.

sql-env/                       # project root = environment package
β”œβ”€β”€ __init__.py                # exports SQLAction, SQLObservation, SQLEnvClient
β”œβ”€β”€ models.py                  # Pydantic models (action w/ tokens, observation w/ messages, state)
β”œβ”€β”€ client.py                  # SQLEnvClient(EnvClient) β€” WebSocket client w/ tensor serialization
β”œβ”€β”€ conftest.py                # pytest config (ignores __init__.py collection)
β”œβ”€β”€ openenv.yaml               # OpenEnv manifest
β”œβ”€β”€ pyproject.toml             # deps + package config (setuptools, torch, transformers)
β”œβ”€β”€ .python-version            # pins Python 3.12
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ databases/
β”‚   β”‚   └── models.py          # SQLAlchemy ORM models (student_assessment)
β”‚   └── questions/
β”‚       └── student_assessment.json  # 30+ Spider Q&A pairs with gold SQL
β”œβ”€β”€ server/
β”‚   β”œβ”€β”€ app.py                 # FastAPI app (tokenizer factory, MockTokenizer fallback)
β”‚   β”œβ”€β”€ sql_environment.py     # SQLEnvironment(Environment) β€” core logic + Ollama
β”‚   β”œβ”€β”€ test_sql_env.py        # MockTokenizer (char-code encoding for dev/test)
β”‚   β”œβ”€β”€ reward.py              # Reward computation (stub β€” Phase 3)
β”‚   β”œβ”€β”€ verifier.py            # Answer comparison (stub β€” Phase 3)
β”‚   β”œβ”€β”€ Dockerfile
β”‚   β”œβ”€β”€ requirements.txt
β”‚   └── install_deps.sh        # Docker setup script
β”œβ”€β”€ scripts/
β”‚   β”œβ”€β”€ download_spider_data.py       # Download Spider questions from HuggingFace
β”‚   └── generate_models_from_schema.py # Auto-generate SQLAlchemy models
β”œβ”€β”€ tests/
β”‚   └── test_smoke.py          # 21 tests (models, env, actions, client, schema)
β”œβ”€β”€ docs/                      # Design docs, architecture
└── AGENTS.md

Guardrails

  • Testing: Use the package manager (uv run pytest ...), never bare pytest.
  • Git safety: No destructive commands (reset --hard, push --force) unless explicit.
  • Secrets: Never commit .env or credentials.

Quick Commands

Task Command
Install uv sync
Lint uv run ruff check --fix .
Format uv run ruff format .
Test uv run pytest tests/ -v
Run server uv run uvicorn server.app:app --reload
Validate env uv run openenv validate --verbose
Build Docker uv run openenv build
Push to HF uv run openenv push

Development Workflow

  • Run via package manager (uv run ...), never bare commands.
  • List existing files before creating new ones (avoid naming drift).
  • Prefer vertical slices over horizontal refactors.
  • No premature abstraction until multiple use-cases require it.

Delivery Safety (Move Fast Without Breaking Things)

Move fast by taking the smallest responsible step that produces real feedback, while pre-committing to guardrails so being wrong is survivable.

  • Small batches: Prefer vertical slices and small PRs; reduce blast radius and review/debug time.
  • Define "broken" first: Before shipping, write down what you will watch (errors, latency, correctness, cost) and the abort threshold.
  • Design for reversibility: Make changes easy to turn off, roll back, or ignore.

System Boundaries (Avoid Analysis Paralysis)

Systems are continuous webs; plans require artificial boundaries.

  • Boundary rule: Include only variables/components that could change the decision you are making.
  • Clouds: Treat everything else as exogenous inputs; track them as risks/assumptions.
  • Timebox mapping: If the landscape is moving faster than you can model it, run a probe (spike, canary, A/B) instead.

Maturity Modes

Match guardrails to maturity:

  • Exploratory: Learning > durability. Prefer spikes; avoid irreversible state changes; manual verification is OK; expect throwaway code.
  • MVP: Ship a thin end-to-end slice. Manual checks are OK, but you still need a fast rollback path and bounded impact.
  • Production: Build to last. Automated tests, observability, progressive rollout, and explicit rollback/incident posture.

Expect limiting factors to move as you ship: fix the current bottleneck, then re-diagnose the next.

Progressive Delivery

  • Feature flags: Use flags to make risky changes reversible. Categorize flags (release/experiment/ops/permissioning).
  • Flags are inventory: Every flag needs an owner, an expiry, and a removal plan.
  • Canary/ramp when risk is non-trivial: Start small, watch signals, ramp gradually; prefer "flip off" over redeploy.

Reliability Control Loop (If You Run Production)

  • SLO + error budget: If you are within budget, keep shipping; if you burn budget, freeze non-critical changes and pay down reliability.

Avoid

  • Big-bang releases, long-lived branches, unowned flags, flaky tests, and alert noise.

Python Guidelines

  • Prefer type hints for public APIs; use typing / collections.abc.
  • Use NumPy-style docstrings; keep them synced with type hints.
  • Error handling: Use specific exceptions; avoid try: ... except Exception: pass.
  • Dependencies: Use uv add <package>; do not manually edit pyproject.toml.

Docs Expectations

  • Keep durable design/ops knowledge in docs/ (architecture, runbook, decisions). Keep AGENTS.md as a short map, not an encyclopedia.

Testing Standards

  • Always use the project's package manager to run tests. Never invoke test runners directly.
    • Python (uv): uv run pytest tests/ -v (NEVER bare pytest)
    • Python (poetry): poetry run pytest tests/ -v
    • Node: npm test or npm run test
    • Rust: cargo test
  • Rationale: Bare pytest bypasses the virtualenv and may use the wrong Python/dependencies. Package managers ensure the correct environment. Bare invocations also trigger unnecessary permission prompts in automated workflows.