# Architecture > Last updated: 2026-02-28 System map for SQLEnv — an RL environment where agents learn interactive SQL exploration via the OpenEnv framework. **Goals:** - Show how components connect (system map + key flows) - Make hidden state explicit (what lives where) - Define shared interfaces (Pydantic models, WebSocket API) - Keep invariants legible (what must stay true) **Non-goals:** - CLI reference (see `docs/RUNBOOK.md`) - Per-feature implementation details (link to specs) --- ## System Map ```text SQLEnv System ================================================================ RL Training Loop SQLEnv Server (Docker) ---------------- ---------------------- +---------------------+ +------------+ WebSocket (JSON) | server/app.py | | SQLEnv |<=========================>| FastAPI + WS | | Client | SQLAction -> server | | | (client.py)| SQLObs <- server +----------+----------+ +-----+------+ | | v | tensor <-> list +---------------------+ | serialization | SQLEnvironment | | | (sql_environment.py)| +-----v------+ | | | RL Agent | | - reset() / step() | | (external) | | - action detection | | e.g. GRPO | | - message_to_action | +------------+ +--+-------+-------+--+ | | | v v v +------+ +------+ +--------+ |Schema| |Sample| | Query | |Intro-| |Gen | | (Ollama| |spect.| | | | LLM) | +--+---+ +--+---+ +---+----+ | | | v v v +-------------------------+ | SQLAlchemy ORM Models | | (data/databases/ | | models.py) | | 9 tables: | | Address, Person, | | Student, Course, ... | +-------------------------+ Data (committed) External (optional) ---------------- ------------------- data/questions/ +----------+ student_assessment.json | Ollama | (53 Spider Q&A pairs) | LLM API | | :11434 | +----------+ ``` --- ## Component Inventory | Component | Owns | Entrypoint | State / Output | |-----------|------|------------|----------------| | **SQLEnvClient** | WebSocket transport, tensor serialization | `client.py` | Stateless (wraps server) | | **FastAPI app** | HTTP/WS endpoints, tokenizer factory | `server/app.py` | In-memory tokenizer | | **SQLEnvironment** | Episode lifecycle, action dispatch, state | `server/sql_environment.py` | `SQLState` (in-memory) | | **Pydantic models** | Type contracts (action, observation, state) | `models.py` | N/A (data classes) | | **ORM models** | Database schema definition | `data/databases/models.py` | SQLAlchemy metadata | | **Spider data** | Question-answer pairs | `data/questions/student_assessment.json` | 53 Q&A entries | | **MockTokenizer** | Dev/test tokenization (no GPU needed) | `server/test_sql_env.py` | Deterministic (ord/chr) | ### External Services | Service | Purpose | Required | Fallback | |---------|---------|----------|----------| | Ollama (`localhost:11434`) | Table selection + SQL generation | No | First table in dict; query returns error string | --- ## Key Flows ### Flow: Episode (Reset + Multi-Turn Steps) ```text Client Server (SQLEnvironment) Ollama | | | |--- reset() ----------------->| | | |-- init state, system prompt | | |-- tokenize system message | |<-- SQLObservation -----------| (MockTokenizer or HF) | | .messages=[system] | | | .tokens=shape([N]) | | | | | |--- message_to_action(msg) -->| | | |-- detect action type | | | (keyword matching) | | |-- append msg to history | | |-- tokenize full conversation | |<-- SQLAction ----------------| | | .action_type="describe" | | | .tokens=shape([1,M]) | | | | | |--- step(action) ------------>| | | |-- select table -------------->| | |<-- table name (or fallback) --| | |-- introspect ORM schema | | |-- append assistant msg | | |-- append action tokens | |<-- SQLObservation -----------| | | .messages=[sys,usr,asst] | | | .tokens=shape([N+M+K]) | | | | | (repeat step() for sample, query, answer...) ``` ### Flow: Action Detection ```text User message string | v _detect_action_type(content) | +-- contains "describe"/"schema"/"columns"? --> "describe" | +-- contains "sample"/"example"/"rows"? --> "sample" | +-- default --> "query" ``` ### Flow: Client Serialization (WebSocket Transport) ```text Client Server | | | _step_payload(action): | | tokens: Tensor -> list (JSON-safe) | | {action_type, action_description, | | tokens: [[1,2,3,...]], metadata} | | ---------------------------------------->| | | | _parse_result(data): | | tokens: list -> Tensor | | StepResult(obs, reward, done, info) | | <----------------------------------------| ``` --- ## Shared Data Models These three Pydantic models are used across client, server, and tests. Defined in `models.py`. ### SQLAction ```python class SQLAction(Action): action_type: str # "describe" | "sample" | "query" | "answer" action_description: str # raw user message content tokens: torch.Tensor # tokenized conversation context, shape [1, seq_len] ``` **Used by:** SQLEnvironment.step(), SQLEnvClient._step_payload(), tests ### SQLObservation ```python class SQLObservation(Observation): messages: list[Message] # full conversation history [{role, content}, ...] tokens: torch.Tensor # flattened 1D tensor of all turn tokens concatenated ``` **Used by:** SQLEnvironment.reset()/step(), SQLEnvClient._parse_result(), tests ### SQLState ```python class SQLState(State): episode_id: str # UUID per episode step_count: int # turns taken history_messages: list[Message] # accumulates across turns history_tokens: list[torch.Tensor] # one tensor per turn, flattened on output current_action_type: str | None # last detected action type ``` **Used by:** SQLEnvironment (internal), state endpoint **Note:** This is a lightweight summary for logging. The full RL state lives inside SQLEnvironment and is not exposed to the agent. --- ## API Contracts ### WebSocket (OpenEnv Protocol) The server exposes a WebSocket endpoint via FastAPI. The OpenEnv framework handles the protocol — SQLEnv implements `reset()` and `step()` on the server side, and `SQLEnvClient` wraps the client side. | Operation | Client Method | Payload | Response | |-----------|---------------|---------|----------| | Reset | `client.reset()` | `{}` | `SQLObservation` (JSON) | | Step | `client.step(action)` | `{action_type, action_description, tokens: list, metadata}` | `StepResult(obs, reward, done, info)` | | State | `client.state()` | `{}` | `SQLState` (JSON) | ### Ollama (Optional) | Endpoint | Purpose | Payload | |----------|---------|---------| | `POST /api/generate` | Table selection | `{model, prompt, stream: false}` | | `POST /api/generate` | SQL generation | `{model, prompt, stream: false}` | Timeout: 30s. Failure mode: graceful fallback (never crashes). --- ## Cross-Cutting Concerns ### Code Style & Abstraction Philosophy OOP for framework integration (Environment, EnvClient subclasses), plain methods for logic. Extract helpers when they clarify intent, not for DRY. - **Structure:** Flat package root with `server/` for server-only code - **Error handling:** Graceful fallbacks (never crash), `ValueError` for invalid inputs - **Imports:** `try: from sql_env.X / except: from X` for dual install/Docker compatibility ### Tokenization Two paths, same interface (`apply_chat_template`): | Mode | Tokenizer | Source | When | |------|-----------|--------|------| | Dev/Test | `MockTokenizer` | `server/test_sql_env.py` | No GPU, no downloads | | Production | HuggingFace | `transformers` library | Real RL training | `MockTokenizer` encodes as `ord(c)` per character, decodes as `chr(t)`. Deterministic and fast. ### Configuration | Variable | Required | Description | Default | |----------|----------|-------------|---------| | `OLLAMA_MODEL` | No | Ollama model name for SQL generation | `qwen2` | | `OLLAMA_BASE_URL` | No | Ollama API endpoint | `http://localhost:11434` | --- ## Data, State, and Storage Locations - **Repo (committed):** - `data/questions/student_assessment.json` — 53 Spider Q&A pairs - `data/databases/models.py` — 9 SQLAlchemy ORM table definitions - **Runtime state (in-memory, per episode):** - `SQLState.history_messages` — conversation messages - `SQLState.history_tokens` — tensor per turn - **Not yet implemented:** - SQLite database files (Phase 3 — queries currently go through Ollama, not executed locally) - Reward/verification state --- ## Invariants and Guardrails - `self.db_models` refers to **database table** models (SQLAlchemy), never RL models - Token tensors grow monotonically across turns (never shrink or reset mid-episode) - `message_to_action()` mutates state — it appends to history before tokenizing - Ollama failures never crash the environment — always graceful fallback - `tests/test_smoke.py` must pass without Ollama, without GPU, without network - Schema column names in `_build_schema_description()` must match `data/databases/models.py` --- ## Glossary | Term | Definition | |------|------------| | Episode | One question-answering session: reset -> N steps -> terminal | | Action type | One of: describe, sample, query, answer | | MockTokenizer | Deterministic char-code tokenizer for dev/test (no GPU) | | Spider | Academic text-to-SQL benchmark dataset | | ORM models | SQLAlchemy class definitions in `data/databases/models.py` | | OpenEnv | Meta's RL environment framework (Environment, EnvClient, Action, Observation) | --- ## Infrastructure ### Development **Prerequisites:** - Python 3.11-3.12 (torch incompatible with 3.13) - `uv` package manager - Ollama (optional) **Setup:** ```bash git clone && cd sql-env uv sync uv run pytest tests/ -v # 21 tests, ~3.5s, no external deps ``` ### Production **Deployment:** Docker container via OpenEnv CLI (`openenv build` / `openenv push`) **Runtime:** FastAPI on port 8000 (defined in `openenv.yaml`) **Status:** Dockerfile is a scaffold stub — not yet validated --- ## Suggested Feature Breakdown | ID | Feature | Complexity | Dependencies | Notes | |----|---------|------------|--------------|-------| | F001 | SQL query execution | standard | - | Execute queries against real SQLite, return results | | F002 | Reward computation | standard | F001 | 3-layer reward: operational, progress, terminal | | F003 | Answer verification | standard | F001 | Compare agent answer to gold SQL results | | F004 | Docker validation | simple | - | Update Dockerfile, test `openenv build` | | F005 | Multi-database support | complex | F001 | Load any Spider database, not just student_assessment | ### Suggested Implementation Order 1. **F001** — Foundation: wire up SQLite execution so queries return real data 2. **F002 + F003** — Can be done in parallel once F001 is complete 3. **F004** — Independent, can be done anytime 4. **F005** — After the single-database path is solid --- ## Future Considerations - **Real SQLite execution:** Queries currently go to Ollama for SQL generation but aren't executed against a database. Phase 3 should execute the generated SQL and return actual results. - **Multi-episode batching:** For RL training, the environment will need to support multiple concurrent episodes efficiently. - **Reward shaping:** The 3-layer reward (operational, progress, terminal) is designed in `models.py` but not implemented. - **Table selection without Ollama:** A lightweight keyword/embedding-based table selector could replace the LLM fallback. --- ## Keeping This Map Current Update this file when you change any of: - System boundaries (new service, new subsystem) - Persistent state locations (new files/dirs written or read) - Shared data models or API contracts - Cross-cutting invariants --- ## References - Docs index: `docs/README.md` - Operations: `docs/RUNBOOK.md` - OpenEnv framework: https://github.com/meta-pytorch/OpenEnv - Spider dataset: https://huggingface.co/datasets/xlangai/spider