# Architecture

> Last updated: 2026-02-28

System map for SQLEnv — an RL environment where agents learn interactive SQL exploration via the OpenEnv framework.

**Goals:**
- Show how components connect (system map + key flows)
- Make hidden state explicit (what lives where)
- Define shared interfaces (Pydantic models, WebSocket API)
- Keep invariants legible (what must stay true)

**Non-goals:**
- CLI reference (see `docs/RUNBOOK.md`)
- Per-feature implementation details (link to specs)

---

## System Map

```text
                         SQLEnv System
  ================================================================

  RL Training Loop                          SQLEnv Server (Docker)
  ----------------                          ----------------------
                                           +---------------------+
  +------------+     WebSocket (JSON)      | server/app.py       |
  | SQLEnv     |<=========================>| FastAPI + WS        |
  | Client     |  SQLAction  -> server     |                     |
  | (client.py)|  SQLObs    <- server      +----------+----------+
  +-----+------+                                      |
        |                                             v
        | tensor <-> list                  +---------------------+
        | serialization                    | SQLEnvironment      |
        |                                  | (sql_environment.py)|
  +-----v------+                           |                     |
  | RL Agent   |                           | - reset() / step()  |
  | (external) |                           | - action detection  |
  | e.g. GRPO  |                           | - message_to_action |
  +------------+                           +--+-------+-------+--+
                                              |       |       |
                                              v       v       v
                                         +------+ +------+ +--------+
                                         |Schema| |Sample| | Query  |
                                         |Intro-| |Gen   | | (Ollama|
                                         |spect.| |      | |  LLM)  |
                                         +--+---+ +--+---+ +---+----+
                                            |        |          |
                                            v        v          v
                                         +-------------------------+
                                         | SQLAlchemy ORM Models   |
                                         | (data/databases/        |
                                         |  models.py)             |
                                         | 9 tables:               |
                                         | Address, Person,        |
                                         | Student, Course, ...    |
                                         +-------------------------+

  Data (committed)                         External (optional)
  ----------------                         -------------------
  data/questions/                          +----------+
    student_assessment.json                | Ollama   |
    (53 Spider Q&A pairs)                  | LLM API  |
                                           | :11434   |
                                           +----------+
```

---

## Component Inventory

| Component | Owns | Entrypoint | State / Output |
|-----------|------|------------|----------------|
| **SQLEnvClient** | WebSocket transport, tensor serialization | `client.py` | Stateless (wraps server) |
| **FastAPI app** | HTTP/WS endpoints, tokenizer factory | `server/app.py` | In-memory tokenizer |
| **SQLEnvironment** | Episode lifecycle, action dispatch, state | `server/sql_environment.py` | `SQLState` (in-memory) |
| **Pydantic models** | Type contracts (action, observation, state) | `models.py` | N/A (data classes) |
| **ORM models** | Database schema definition | `data/databases/models.py` | SQLAlchemy metadata |
| **Spider data** | Question-answer pairs | `data/questions/student_assessment.json` | 53 Q&A entries |
| **MockTokenizer** | Dev/test tokenization (no GPU needed) | `server/test_sql_env.py` | Deterministic (ord/chr) |

### External Services

| Service | Purpose | Required | Fallback |
|---------|---------|----------|----------|
| Ollama (`localhost:11434`) | Table selection + SQL generation | No | First table in dict; query returns error string |

---

## Key Flows

### Flow: Episode (Reset + Multi-Turn Steps)

```text
Client                    Server (SQLEnvironment)              Ollama
  |                              |                               |
  |--- reset() ----------------->|                               |
  |                              |-- init state, system prompt   |
  |                              |-- tokenize system message     |
  |<-- SQLObservation -----------|   (MockTokenizer or HF)       |
  |    .messages=[system]        |                               |
  |    .tokens=shape([N])        |                               |
  |                              |                               |
  |--- message_to_action(msg) -->|                               |
  |                              |-- detect action type          |
  |                              |   (keyword matching)          |
  |                              |-- append msg to history       |
  |                              |-- tokenize full conversation  |
  |<-- SQLAction ----------------|                               |
  |    .action_type="describe"   |                               |
  |    .tokens=shape([1,M])      |                               |
  |                              |                               |
  |--- step(action) ------------>|                               |
  |                              |-- select table -------------->|
  |                              |<-- table name (or fallback) --|
  |                              |-- introspect ORM schema       |
  |                              |-- append assistant msg        |
  |                              |-- append action tokens        |
  |<-- SQLObservation -----------|                               |
  |    .messages=[sys,usr,asst]  |                               |
  |    .tokens=shape([N+M+K])    |                               |
  |                              |                               |
  (repeat step() for sample, query, answer...)
```

### Flow: Action Detection

```text
User message string
        |
        v
  _detect_action_type(content)
        |
        +-- contains "describe"/"schema"/"columns"?  --> "describe"
        |
        +-- contains "sample"/"example"/"rows"?      --> "sample"
        |
        +-- default                                  --> "query"
```

### Flow: Client Serialization (WebSocket Transport)

```text
  Client                                      Server
    |                                           |
    |  _step_payload(action):                   |
    |    tokens: Tensor -> list (JSON-safe)     |
    |    {action_type, action_description,      |
    |     tokens: [[1,2,3,...]], metadata}       |
    |  ---------------------------------------->|
    |                                           |
    |  _parse_result(data):                     |
    |    tokens: list -> Tensor                 |
    |    StepResult(obs, reward, done, info)     |
    |  <----------------------------------------|
```

---

## Shared Data Models

These three Pydantic models are used across client, server, and tests.
Defined in `models.py`.

### SQLAction

```python
class SQLAction(Action):
    action_type: str         # "describe" | "sample" | "query" | "answer"
    action_description: str  # raw user message content
    tokens: torch.Tensor     # tokenized conversation context, shape [1, seq_len]
```

**Used by:** SQLEnvironment.step(), SQLEnvClient._step_payload(), tests

### SQLObservation

```python
class SQLObservation(Observation):
    messages: list[Message]  # full conversation history [{role, content}, ...]
    tokens: torch.Tensor     # flattened 1D tensor of all turn tokens concatenated
```

**Used by:** SQLEnvironment.reset()/step(), SQLEnvClient._parse_result(), tests

### SQLState

```python
class SQLState(State):
    episode_id: str                    # UUID per episode
    step_count: int                    # turns taken
    history_messages: list[Message]    # accumulates across turns
    history_tokens: list[torch.Tensor] # one tensor per turn, flattened on output
    current_action_type: str | None    # last detected action type
```

**Used by:** SQLEnvironment (internal), state endpoint
**Note:** This is a lightweight summary for logging. The full RL state lives inside SQLEnvironment and is not exposed to the agent.

---

## API Contracts

### WebSocket (OpenEnv Protocol)

The server exposes a WebSocket endpoint via FastAPI. The OpenEnv framework handles the protocol — SQLEnv implements `reset()` and `step()` on the server side, and `SQLEnvClient` wraps the client side.

| Operation | Client Method | Payload | Response |
|-----------|---------------|---------|----------|
| Reset | `client.reset()` | `{}` | `SQLObservation` (JSON) |
| Step | `client.step(action)` | `{action_type, action_description, tokens: list, metadata}` | `StepResult(obs, reward, done, info)` |
| State | `client.state()` | `{}` | `SQLState` (JSON) |

### Ollama (Optional)

| Endpoint | Purpose | Payload |
|----------|---------|---------|
| `POST /api/generate` | Table selection | `{model, prompt, stream: false}` |
| `POST /api/generate` | SQL generation | `{model, prompt, stream: false}` |

Timeout: 30s. Failure mode: graceful fallback (never crashes).

---

## Cross-Cutting Concerns

### Code Style & Abstraction Philosophy

OOP for framework integration (Environment, EnvClient subclasses), plain methods for logic. Extract helpers when they clarify intent, not for DRY.

- **Structure:** Flat package root with `server/` for server-only code
- **Error handling:** Graceful fallbacks (never crash), `ValueError` for invalid inputs
- **Imports:** `try: from sql_env.X / except: from X` for dual install/Docker compatibility

### Tokenization

Two paths, same interface (`apply_chat_template`):

| Mode | Tokenizer | Source | When |
|------|-----------|--------|------|
| Dev/Test | `MockTokenizer` | `server/test_sql_env.py` | No GPU, no downloads |
| Production | HuggingFace | `transformers` library | Real RL training |

`MockTokenizer` encodes as `ord(c)` per character, decodes as `chr(t)`. Deterministic and fast.

### Configuration

| Variable | Required | Description | Default |
|----------|----------|-------------|---------|
| `OLLAMA_MODEL` | No | Ollama model name for SQL generation | `qwen2` |
| `OLLAMA_BASE_URL` | No | Ollama API endpoint | `http://localhost:11434` |

---

## Data, State, and Storage Locations

- **Repo (committed):**
  - `data/questions/student_assessment.json` — 53 Spider Q&A pairs
  - `data/databases/models.py` — 9 SQLAlchemy ORM table definitions
- **Runtime state (in-memory, per episode):**
  - `SQLState.history_messages` — conversation messages
  - `SQLState.history_tokens` — tensor per turn
- **Not yet implemented:**
  - SQLite database files (Phase 3 — queries currently go through Ollama, not executed locally)
  - Reward/verification state

---

## Invariants and Guardrails

- `self.db_models` refers to **database table** models (SQLAlchemy), never RL models
- Token tensors grow monotonically across turns (never shrink or reset mid-episode)
- `message_to_action()` mutates state — it appends to history before tokenizing
- Ollama failures never crash the environment — always graceful fallback
- `tests/test_smoke.py` must pass without Ollama, without GPU, without network
- Schema column names in `_build_schema_description()` must match `data/databases/models.py`

---

## Glossary

| Term | Definition |
|------|------------|
| Episode | One question-answering session: reset -> N steps -> terminal |
| Action type | One of: describe, sample, query, answer |
| MockTokenizer | Deterministic char-code tokenizer for dev/test (no GPU) |
| Spider | Academic text-to-SQL benchmark dataset |
| ORM models | SQLAlchemy class definitions in `data/databases/models.py` |
| OpenEnv | Meta's RL environment framework (Environment, EnvClient, Action, Observation) |

---

## Infrastructure

### Development

**Prerequisites:**
- Python 3.11-3.12 (torch incompatible with 3.13)
- `uv` package manager
- Ollama (optional)

**Setup:**
```bash
git clone <repo-url> && cd sql-env
uv sync
uv run pytest tests/ -v    # 21 tests, ~3.5s, no external deps
```

### Production

**Deployment:** Docker container via OpenEnv CLI (`openenv build` / `openenv push`)
**Runtime:** FastAPI on port 8000 (defined in `openenv.yaml`)
**Status:** Dockerfile is a scaffold stub — not yet validated

---

## Suggested Feature Breakdown

| ID | Feature | Complexity | Dependencies | Notes |
|----|---------|------------|--------------|-------|
| F001 | SQL query execution | standard | - | Execute queries against real SQLite, return results |
| F002 | Reward computation | standard | F001 | 3-layer reward: operational, progress, terminal |
| F003 | Answer verification | standard | F001 | Compare agent answer to gold SQL results |
| F004 | Docker validation | simple | - | Update Dockerfile, test `openenv build` |
| F005 | Multi-database support | complex | F001 | Load any Spider database, not just student_assessment |

### Suggested Implementation Order

1. **F001** — Foundation: wire up SQLite execution so queries return real data
2. **F002 + F003** — Can be done in parallel once F001 is complete
3. **F004** — Independent, can be done anytime
4. **F005** — After the single-database path is solid

---

## Future Considerations

- **Real SQLite execution:** Queries currently go to Ollama for SQL generation but aren't executed against a database. Phase 3 should execute the generated SQL and return actual results.
- **Multi-episode batching:** For RL training, the environment will need to support multiple concurrent episodes efficiently.
- **Reward shaping:** The 3-layer reward (operational, progress, terminal) is designed in `models.py` but not implemented.
- **Table selection without Ollama:** A lightweight keyword/embedding-based table selector could replace the LLM fallback.

---

## Keeping This Map Current

Update this file when you change any of:
- System boundaries (new service, new subsystem)
- Persistent state locations (new files/dirs written or read)
- Shared data models or API contracts
- Cross-cutting invariants

---

## References

- Docs index: `docs/README.md`
- Operations: `docs/RUNBOOK.md`
- OpenEnv framework: https://github.com/meta-pytorch/OpenEnv
- Spider dataset: https://huggingface.co/datasets/xlangai/spider