Spaces:

hjerpe
/

sql_env

Runtime error

App Files Files Community

sql_env / AGENTS.md

hjerpe

Upload folder using huggingface_hub

5dd1bb4 verified 5 days ago

preview code

raw

history blame contribute delete

6.91 kB

Project Map (AGENTS.md)

This file is a navigation map for agents. Durable knowledge lives in docs/.

Start Here

Docs index: docs/README.md
Architecture: docs/ARCHITECTURE.md
Operations: docs/RUNBOOK.md
Test: uv run pytest tests/ -v

System-of-Record Documents

Category	Location	Type	Purpose
Guides	docs/guides/README.md	how-to	Practical procedures
Design docs	docs/design-docs/index.md	explanation	Feature design, ADRs
References	docs/references/README.md	reference	External docs

Project Structure

This project follows the OpenEnv openenv init convention. The project root is the environment package — no envs/ nesting.

sql-env/                       # project root = environment package
├── __init__.py                # exports SQLAction, SQLObservation, SQLEnvClient
├── models.py                  # Pydantic models (action w/ tokens, observation w/ messages, state)
├── client.py                  # SQLEnvClient(EnvClient) — WebSocket client w/ tensor serialization
├── conftest.py                # pytest config (ignores __init__.py collection)
├── openenv.yaml               # OpenEnv manifest
├── pyproject.toml             # deps + package config (setuptools, torch, transformers)
├── .python-version            # pins Python 3.12
├── data/
│   ├── databases/
│   │   └── models.py          # SQLAlchemy ORM models (student_assessment)
│   └── questions/
│       └── student_assessment.json  # 30+ Spider Q&A pairs with gold SQL
├── server/
│   ├── app.py                 # FastAPI app (tokenizer factory, MockTokenizer fallback)
│   ├── sql_environment.py     # SQLEnvironment(Environment) — core logic + Ollama
│   ├── test_sql_env.py        # MockTokenizer (char-code encoding for dev/test)
│   ├── reward.py              # Reward computation (stub — Phase 3)
│   ├── verifier.py            # Answer comparison (stub — Phase 3)
│   ├── Dockerfile
│   ├── requirements.txt
│   └── install_deps.sh        # Docker setup script
├── scripts/
│   ├── download_spider_data.py       # Download Spider questions from HuggingFace
│   └── generate_models_from_schema.py # Auto-generate SQLAlchemy models
├── tests/
│   └── test_smoke.py          # 21 tests (models, env, actions, client, schema)
├── docs/                      # Design docs, architecture
└── AGENTS.md

Guardrails

Testing: Use the package manager (uv run pytest ...), never bare pytest.
Git safety: No destructive commands (reset --hard, push --force) unless explicit.
Secrets: Never commit .env or credentials.

Quick Commands

Task	Command
Install	`uv sync`
Lint	`uv run ruff check --fix .`
Format	`uv run ruff format .`
Test	`uv run pytest tests/ -v`
Run server	`uv run uvicorn server.app:app --reload`
Validate env	`uv run openenv validate --verbose`
Build Docker	`uv run openenv build`
Push to HF	`uv run openenv push`

Development Workflow

Run via package manager (uv run ...), never bare commands.
List existing files before creating new ones (avoid naming drift).
Prefer vertical slices over horizontal refactors.
No premature abstraction until multiple use-cases require it.

Delivery Safety (Move Fast Without Breaking Things)

Move fast by taking the smallest responsible step that produces real feedback, while pre-committing to guardrails so being wrong is survivable.

Small batches: Prefer vertical slices and small PRs; reduce blast radius and review/debug time.
Define "broken" first: Before shipping, write down what you will watch (errors, latency, correctness, cost) and the abort threshold.
Design for reversibility: Make changes easy to turn off, roll back, or ignore.

System Boundaries (Avoid Analysis Paralysis)

Systems are continuous webs; plans require artificial boundaries.

Boundary rule: Include only variables/components that could change the decision you are making.
Clouds: Treat everything else as exogenous inputs; track them as risks/assumptions.
Timebox mapping: If the landscape is moving faster than you can model it, run a probe (spike, canary, A/B) instead.

Maturity Modes

Match guardrails to maturity:

Exploratory: Learning > durability. Prefer spikes; avoid irreversible state changes; manual verification is OK; expect throwaway code.
MVP: Ship a thin end-to-end slice. Manual checks are OK, but you still need a fast rollback path and bounded impact.
Production: Build to last. Automated tests, observability, progressive rollout, and explicit rollback/incident posture.

Expect limiting factors to move as you ship: fix the current bottleneck, then re-diagnose the next.

Progressive Delivery

Feature flags: Use flags to make risky changes reversible. Categorize flags (release/experiment/ops/permissioning).
Flags are inventory: Every flag needs an owner, an expiry, and a removal plan.
Canary/ramp when risk is non-trivial: Start small, watch signals, ramp gradually; prefer "flip off" over redeploy.

Reliability Control Loop (If You Run Production)

SLO + error budget: If you are within budget, keep shipping; if you burn budget, freeze non-critical changes and pay down reliability.

Avoid

Big-bang releases, long-lived branches, unowned flags, flaky tests, and alert noise.

Python Guidelines

Prefer type hints for public APIs; use typing / collections.abc.
Use NumPy-style docstrings; keep them synced with type hints.
Error handling: Use specific exceptions; avoid try: ... except Exception: pass.
Dependencies: Use uv add <package>; do not manually edit pyproject.toml.

Docs Expectations

Keep durable design/ops knowledge in docs/ (architecture, runbook, decisions). Keep AGENTS.md as a short map, not an encyclopedia.

Testing Standards

Always use the project's package manager to run tests. Never invoke test runners directly.
- Python (uv): uv run pytest tests/ -v (NEVER bare pytest)
- Python (poetry): poetry run pytest tests/ -v
- Node: npm test or npm run test
- Rust: cargo test
Rationale: Bare pytest bypasses the virtualenv and may use the wrong Python/dependencies. Package managers ensure the correct environment. Bare invocations also trigger unnecessary permission prompts in automated workflows.