# Project Map (AGENTS.md) This file is a navigation map for agents. Durable knowledge lives in `docs/`. ## Start Here - Docs index: [docs/README.md](docs/README.md) - Architecture: [docs/ARCHITECTURE.md](docs/ARCHITECTURE.md) - Operations: [docs/RUNBOOK.md](docs/RUNBOOK.md) - Test: `uv run pytest tests/ -v` ## System-of-Record Documents | Category | Location | Type | Purpose | |----------|----------|------|---------| | Guides | [docs/guides/README.md](docs/guides/README.md) | how-to | Practical procedures | | Design docs | [docs/design-docs/index.md](docs/design-docs/index.md) | explanation | Feature design, ADRs | | References | [docs/references/README.md](docs/references/README.md) | reference | External docs | ## Project Structure This project follows the [OpenEnv](https://github.com/meta-pytorch/OpenEnv) `openenv init` convention. The project root **is** the environment package — no `envs/` nesting. ``` sql-env/ # project root = environment package ├── __init__.py # exports SQLAction, SQLObservation, SQLEnvClient ├── models.py # Pydantic models (action w/ tokens, observation w/ messages, state) ├── client.py # SQLEnvClient(EnvClient) — WebSocket client w/ tensor serialization ├── conftest.py # pytest config (ignores __init__.py collection) ├── openenv.yaml # OpenEnv manifest ├── pyproject.toml # deps + package config (setuptools, torch, transformers) ├── .python-version # pins Python 3.12 ├── data/ │ ├── databases/ │ │ └── models.py # SQLAlchemy ORM models (student_assessment) │ └── questions/ │ └── student_assessment.json # 30+ Spider Q&A pairs with gold SQL ├── server/ │ ├── app.py # FastAPI app (tokenizer factory, MockTokenizer fallback) │ ├── sql_environment.py # SQLEnvironment(Environment) — core logic + Ollama │ ├── test_sql_env.py # MockTokenizer (char-code encoding for dev/test) │ ├── reward.py # Reward computation (stub — Phase 3) │ ├── verifier.py # Answer comparison (stub — Phase 3) │ ├── Dockerfile │ ├── requirements.txt │ └── install_deps.sh # Docker setup script ├── scripts/ │ ├── download_spider_data.py # Download Spider questions from HuggingFace │ └── generate_models_from_schema.py # Auto-generate SQLAlchemy models ├── tests/ │ └── test_smoke.py # 21 tests (models, env, actions, client, schema) ├── docs/ # Design docs, architecture └── AGENTS.md ``` ## Guardrails - **Testing:** Use the package manager (`uv run pytest ...`), never bare `pytest`. - **Git safety:** No destructive commands (`reset --hard`, `push --force`) unless explicit. - **Secrets:** Never commit `.env` or credentials. ## Quick Commands | Task | Command | |------|---------| | Install | `uv sync` | | Lint | `uv run ruff check --fix .` | | Format | `uv run ruff format .` | | Test | `uv run pytest tests/ -v` | | Run server | `uv run uvicorn server.app:app --reload` | | Validate env | `uv run openenv validate --verbose` | | Build Docker | `uv run openenv build` | | Push to HF | `uv run openenv push` | ## Development Workflow - Run via package manager (`uv run ...`), never bare commands. - List existing files before creating new ones (avoid naming drift). - Prefer vertical slices over horizontal refactors. - No premature abstraction until multiple use-cases require it. ## Delivery Safety (Move Fast Without Breaking Things) Move fast by taking the smallest responsible step that produces real feedback, while pre-committing to guardrails so being wrong is survivable. - **Small batches:** Prefer vertical slices and small PRs; reduce blast radius and review/debug time. - **Define "broken" first:** Before shipping, write down what you will watch (errors, latency, correctness, cost) and the abort threshold. - **Design for reversibility:** Make changes easy to turn off, roll back, or ignore. ## System Boundaries (Avoid Analysis Paralysis) Systems are continuous webs; plans require artificial boundaries. - **Boundary rule:** Include only variables/components that could change the decision you are making. - **Clouds:** Treat everything else as exogenous inputs; track them as risks/assumptions. - **Timebox mapping:** If the landscape is moving faster than you can model it, run a probe (spike, canary, A/B) instead. ## Maturity Modes Match guardrails to maturity: - **Exploratory:** Learning > durability. Prefer spikes; avoid irreversible state changes; manual verification is OK; expect throwaway code. - **MVP:** Ship a thin end-to-end slice. Manual checks are OK, but you still need a fast rollback path and bounded impact. - **Production:** Build to last. Automated tests, observability, progressive rollout, and explicit rollback/incident posture. Expect limiting factors to move as you ship: fix the current bottleneck, then re-diagnose the next. ## Progressive Delivery - **Feature flags:** Use flags to make risky changes reversible. Categorize flags (release/experiment/ops/permissioning). - **Flags are inventory:** Every flag needs an owner, an expiry, and a removal plan. - **Canary/ramp when risk is non-trivial:** Start small, watch signals, ramp gradually; prefer "flip off" over redeploy. ## Reliability Control Loop (If You Run Production) - **SLO + error budget:** If you are within budget, keep shipping; if you burn budget, freeze non-critical changes and pay down reliability. ## Avoid - Big-bang releases, long-lived branches, unowned flags, flaky tests, and alert noise. ## Python Guidelines - Prefer type hints for public APIs; use `typing` / `collections.abc`. - Use NumPy-style docstrings; keep them synced with type hints. - Error handling: Use specific exceptions; avoid `try: ... except Exception: pass`. - Dependencies: Use `uv add `; do not manually edit `pyproject.toml`. ## Docs Expectations - Keep durable design/ops knowledge in `docs/` (architecture, runbook, decisions). Keep AGENTS.md as a short map, not an encyclopedia. ## Testing Standards - **Always use the project's package manager** to run tests. Never invoke test runners directly. - Python (uv): `uv run pytest tests/ -v` (NEVER bare `pytest`) - Python (poetry): `poetry run pytest tests/ -v` - Node: `npm test` or `npm run test` - Rust: `cargo test` - **Rationale:** Bare `pytest` bypasses the virtualenv and may use the wrong Python/dependencies. Package managers ensure the correct environment. Bare invocations also trigger unnecessary permission prompts in automated workflows.