# Project Map (AGENTS.md)

This file is a navigation map for agents. Durable knowledge lives in `docs/`.

## Start Here

- Docs index: [docs/README.md](docs/README.md)
- Architecture: [docs/ARCHITECTURE.md](docs/ARCHITECTURE.md)
- Operations: [docs/RUNBOOK.md](docs/RUNBOOK.md)
- Test: `uv run pytest tests/ -v`

## System-of-Record Documents

| Category | Location | Type | Purpose |
|----------|----------|------|---------|
| Guides | [docs/guides/README.md](docs/guides/README.md) | how-to | Practical procedures |
| Design docs | [docs/design-docs/index.md](docs/design-docs/index.md) | explanation | Feature design, ADRs |
| References | [docs/references/README.md](docs/references/README.md) | reference | External docs |

## Project Structure

This project follows the [OpenEnv](https://github.com/meta-pytorch/OpenEnv) `openenv init` convention.
The project root **is** the environment package — no `envs/` nesting.

```
sql-env/                       # project root = environment package
├── __init__.py                # exports SQLAction, SQLObservation, SQLEnvClient
├── models.py                  # Pydantic models (action w/ tokens, observation w/ messages, state)
├── client.py                  # SQLEnvClient(EnvClient) — WebSocket client w/ tensor serialization
├── conftest.py                # pytest config (ignores __init__.py collection)
├── openenv.yaml               # OpenEnv manifest
├── pyproject.toml             # deps + package config (setuptools, torch, transformers)
├── .python-version            # pins Python 3.12
├── data/
│   ├── databases/
│   │   └── models.py          # SQLAlchemy ORM models (student_assessment)
│   └── questions/
│       └── student_assessment.json  # 30+ Spider Q&A pairs with gold SQL
├── server/
│   ├── app.py                 # FastAPI app (tokenizer factory, MockTokenizer fallback)
│   ├── sql_environment.py     # SQLEnvironment(Environment) — core logic + Ollama
│   ├── test_sql_env.py        # MockTokenizer (char-code encoding for dev/test)
│   ├── reward.py              # Reward computation (stub — Phase 3)
│   ├── verifier.py            # Answer comparison (stub — Phase 3)
│   ├── Dockerfile
│   ├── requirements.txt
│   └── install_deps.sh        # Docker setup script
├── scripts/
│   ├── download_spider_data.py       # Download Spider questions from HuggingFace
│   └── generate_models_from_schema.py # Auto-generate SQLAlchemy models
├── tests/
│   └── test_smoke.py          # 21 tests (models, env, actions, client, schema)
├── docs/                      # Design docs, architecture
└── AGENTS.md
```

## Guardrails

- **Testing:** Use the package manager (`uv run pytest ...`), never bare `pytest`.
- **Git safety:** No destructive commands (`reset --hard`, `push --force`) unless explicit.
- **Secrets:** Never commit `.env` or credentials.

## Quick Commands

| Task | Command |
|------|---------|
| Install | `uv sync` |
| Lint | `uv run ruff check --fix .` |
| Format | `uv run ruff format .` |
| Test | `uv run pytest tests/ -v` |
| Run server | `uv run uvicorn server.app:app --reload` |
| Validate env | `uv run openenv validate --verbose` |
| Build Docker | `uv run openenv build` |
| Push to HF | `uv run openenv push` |

## Development Workflow

- Run via package manager (`uv run ...`), never bare commands.
- List existing files before creating new ones (avoid naming drift).
- Prefer vertical slices over horizontal refactors.
- No premature abstraction until multiple use-cases require it.

<!-- GUIDELINES-BEGIN -->

## Delivery Safety (Move Fast Without Breaking Things)

Move fast by taking the smallest responsible step that produces real feedback, while pre-committing to guardrails so being wrong is survivable.

- **Small batches:** Prefer vertical slices and small PRs; reduce blast radius and review/debug time.
- **Define "broken" first:** Before shipping, write down what you will watch (errors, latency, correctness, cost) and the abort threshold.
- **Design for reversibility:** Make changes easy to turn off, roll back, or ignore.

## System Boundaries (Avoid Analysis Paralysis)

Systems are continuous webs; plans require artificial boundaries.

- **Boundary rule:** Include only variables/components that could change the decision you are making.
- **Clouds:** Treat everything else as exogenous inputs; track them as risks/assumptions.
- **Timebox mapping:** If the landscape is moving faster than you can model it, run a probe (spike, canary, A/B) instead.

## Maturity Modes

Match guardrails to maturity:

- **Exploratory:** Learning > durability. Prefer spikes; avoid irreversible state changes; manual verification is OK; expect throwaway code.
- **MVP:** Ship a thin end-to-end slice. Manual checks are OK, but you still need a fast rollback path and bounded impact.
- **Production:** Build to last. Automated tests, observability, progressive rollout, and explicit rollback/incident posture.

Expect limiting factors to move as you ship: fix the current bottleneck, then re-diagnose the next.

## Progressive Delivery

- **Feature flags:** Use flags to make risky changes reversible. Categorize flags (release/experiment/ops/permissioning).
- **Flags are inventory:** Every flag needs an owner, an expiry, and a removal plan.
- **Canary/ramp when risk is non-trivial:** Start small, watch signals, ramp gradually; prefer "flip off" over redeploy.

## Reliability Control Loop (If You Run Production)

- **SLO + error budget:** If you are within budget, keep shipping; if you burn budget, freeze non-critical changes and pay down reliability.

## Avoid

- Big-bang releases, long-lived branches, unowned flags, flaky tests, and alert noise.

## Python Guidelines

- Prefer type hints for public APIs; use `typing` / `collections.abc`.
- Use NumPy-style docstrings; keep them synced with type hints.
- Error handling: Use specific exceptions; avoid `try: ... except Exception: pass`.
- Dependencies: Use `uv add <package>`; do not manually edit `pyproject.toml`.

## Docs Expectations

- Keep durable design/ops knowledge in `docs/` (architecture, runbook, decisions). Keep AGENTS.md as a short map, not an encyclopedia.

## Testing Standards

- **Always use the project's package manager** to run tests. Never invoke test runners directly.
  - Python (uv): `uv run pytest tests/ -v` (NEVER bare `pytest`)
  - Python (poetry): `poetry run pytest tests/ -v`
  - Node: `npm test` or `npm run test`
  - Rust: `cargo test`
- **Rationale:** Bare `pytest` bypasses the virtualenv and may use the wrong Python/dependencies. Package managers ensure the correct environment. Bare invocations also trigger unnecessary permission prompts in automated workflows.

<!-- GUIDELINES-END -->