sql_env / docs /ARCHITECTURE.md
hjerpe's picture
Upload folder using huggingface_hub
5dd1bb4 verified

Architecture

Last updated: 2026-02-28

System map for SQLEnv β€” an RL environment where agents learn interactive SQL exploration via the OpenEnv framework.

Goals:

  • Show how components connect (system map + key flows)
  • Make hidden state explicit (what lives where)
  • Define shared interfaces (Pydantic models, WebSocket API)
  • Keep invariants legible (what must stay true)

Non-goals:

  • CLI reference (see docs/RUNBOOK.md)
  • Per-feature implementation details (link to specs)

System Map

                         SQLEnv System
  ================================================================

  RL Training Loop                          SQLEnv Server (Docker)
  ----------------                          ----------------------
                                           +---------------------+
  +------------+     WebSocket (JSON)      | server/app.py       |
  | SQLEnv     |<=========================>| FastAPI + WS        |
  | Client     |  SQLAction  -> server     |                     |
  | (client.py)|  SQLObs    <- server      +----------+----------+
  +-----+------+                                      |
        |                                             v
        | tensor <-> list                  +---------------------+
        | serialization                    | SQLEnvironment      |
        |                                  | (sql_environment.py)|
  +-----v------+                           |                     |
  | RL Agent   |                           | - reset() / step()  |
  | (external) |                           | - action detection  |
  | e.g. GRPO  |                           | - message_to_action |
  +------------+                           +--+-------+-------+--+
                                              |       |       |
                                              v       v       v
                                         +------+ +------+ +--------+
                                         |Schema| |Sample| | Query  |
                                         |Intro-| |Gen   | | (Ollama|
                                         |spect.| |      | |  LLM)  |
                                         +--+---+ +--+---+ +---+----+
                                            |        |          |
                                            v        v          v
                                         +-------------------------+
                                         | SQLAlchemy ORM Models   |
                                         | (data/databases/        |
                                         |  models.py)             |
                                         | 9 tables:               |
                                         | Address, Person,        |
                                         | Student, Course, ...    |
                                         +-------------------------+

  Data (committed)                         External (optional)
  ----------------                         -------------------
  data/questions/                          +----------+
    student_assessment.json                | Ollama   |
    (53 Spider Q&A pairs)                  | LLM API  |
                                           | :11434   |
                                           +----------+

Component Inventory

Component Owns Entrypoint State / Output
SQLEnvClient WebSocket transport, tensor serialization client.py Stateless (wraps server)
FastAPI app HTTP/WS endpoints, tokenizer factory server/app.py In-memory tokenizer
SQLEnvironment Episode lifecycle, action dispatch, state server/sql_environment.py SQLState (in-memory)
Pydantic models Type contracts (action, observation, state) models.py N/A (data classes)
ORM models Database schema definition data/databases/models.py SQLAlchemy metadata
Spider data Question-answer pairs data/questions/student_assessment.json 53 Q&A entries
MockTokenizer Dev/test tokenization (no GPU needed) server/test_sql_env.py Deterministic (ord/chr)

External Services

Service Purpose Required Fallback
Ollama (localhost:11434) Table selection + SQL generation No First table in dict; query returns error string

Key Flows

Flow: Episode (Reset + Multi-Turn Steps)

Client                    Server (SQLEnvironment)              Ollama
  |                              |                               |
  |--- reset() ----------------->|                               |
  |                              |-- init state, system prompt   |
  |                              |-- tokenize system message     |
  |<-- SQLObservation -----------|   (MockTokenizer or HF)       |
  |    .messages=[system]        |                               |
  |    .tokens=shape([N])        |                               |
  |                              |                               |
  |--- message_to_action(msg) -->|                               |
  |                              |-- detect action type          |
  |                              |   (keyword matching)          |
  |                              |-- append msg to history       |
  |                              |-- tokenize full conversation  |
  |<-- SQLAction ----------------|                               |
  |    .action_type="describe"   |                               |
  |    .tokens=shape([1,M])      |                               |
  |                              |                               |
  |--- step(action) ------------>|                               |
  |                              |-- select table -------------->|
  |                              |<-- table name (or fallback) --|
  |                              |-- introspect ORM schema       |
  |                              |-- append assistant msg        |
  |                              |-- append action tokens        |
  |<-- SQLObservation -----------|                               |
  |    .messages=[sys,usr,asst]  |                               |
  |    .tokens=shape([N+M+K])    |                               |
  |                              |                               |
  (repeat step() for sample, query, answer...)

Flow: Action Detection

User message string
        |
        v
  _detect_action_type(content)
        |
        +-- contains "describe"/"schema"/"columns"?  --> "describe"
        |
        +-- contains "sample"/"example"/"rows"?      --> "sample"
        |
        +-- default                                  --> "query"

Flow: Client Serialization (WebSocket Transport)

  Client                                      Server
    |                                           |
    |  _step_payload(action):                   |
    |    tokens: Tensor -> list (JSON-safe)     |
    |    {action_type, action_description,      |
    |     tokens: [[1,2,3,...]], metadata}       |
    |  ---------------------------------------->|
    |                                           |
    |  _parse_result(data):                     |
    |    tokens: list -> Tensor                 |
    |    StepResult(obs, reward, done, info)     |
    |  <----------------------------------------|

Shared Data Models

These three Pydantic models are used across client, server, and tests. Defined in models.py.

SQLAction

class SQLAction(Action):
    action_type: str         # "describe" | "sample" | "query" | "answer"
    action_description: str  # raw user message content
    tokens: torch.Tensor     # tokenized conversation context, shape [1, seq_len]

Used by: SQLEnvironment.step(), SQLEnvClient._step_payload(), tests

SQLObservation

class SQLObservation(Observation):
    messages: list[Message]  # full conversation history [{role, content}, ...]
    tokens: torch.Tensor     # flattened 1D tensor of all turn tokens concatenated

Used by: SQLEnvironment.reset()/step(), SQLEnvClient._parse_result(), tests

SQLState

class SQLState(State):
    episode_id: str                    # UUID per episode
    step_count: int                    # turns taken
    history_messages: list[Message]    # accumulates across turns
    history_tokens: list[torch.Tensor] # one tensor per turn, flattened on output
    current_action_type: str | None    # last detected action type

Used by: SQLEnvironment (internal), state endpoint Note: This is a lightweight summary for logging. The full RL state lives inside SQLEnvironment and is not exposed to the agent.


API Contracts

WebSocket (OpenEnv Protocol)

The server exposes a WebSocket endpoint via FastAPI. The OpenEnv framework handles the protocol β€” SQLEnv implements reset() and step() on the server side, and SQLEnvClient wraps the client side.

Operation Client Method Payload Response
Reset client.reset() {} SQLObservation (JSON)
Step client.step(action) {action_type, action_description, tokens: list, metadata} StepResult(obs, reward, done, info)
State client.state() {} SQLState (JSON)

Ollama (Optional)

Endpoint Purpose Payload
POST /api/generate Table selection {model, prompt, stream: false}
POST /api/generate SQL generation {model, prompt, stream: false}

Timeout: 30s. Failure mode: graceful fallback (never crashes).


Cross-Cutting Concerns

Code Style & Abstraction Philosophy

OOP for framework integration (Environment, EnvClient subclasses), plain methods for logic. Extract helpers when they clarify intent, not for DRY.

  • Structure: Flat package root with server/ for server-only code
  • Error handling: Graceful fallbacks (never crash), ValueError for invalid inputs
  • Imports: try: from sql_env.X / except: from X for dual install/Docker compatibility

Tokenization

Two paths, same interface (apply_chat_template):

Mode Tokenizer Source When
Dev/Test MockTokenizer server/test_sql_env.py No GPU, no downloads
Production HuggingFace transformers library Real RL training

MockTokenizer encodes as ord(c) per character, decodes as chr(t). Deterministic and fast.

Configuration

Variable Required Description Default
OLLAMA_MODEL No Ollama model name for SQL generation qwen2
OLLAMA_BASE_URL No Ollama API endpoint http://localhost:11434

Data, State, and Storage Locations

  • Repo (committed):
    • data/questions/student_assessment.json β€” 53 Spider Q&A pairs
    • data/databases/models.py β€” 9 SQLAlchemy ORM table definitions
  • Runtime state (in-memory, per episode):
    • SQLState.history_messages β€” conversation messages
    • SQLState.history_tokens β€” tensor per turn
  • Not yet implemented:
    • SQLite database files (Phase 3 β€” queries currently go through Ollama, not executed locally)
    • Reward/verification state

Invariants and Guardrails

  • self.db_models refers to database table models (SQLAlchemy), never RL models
  • Token tensors grow monotonically across turns (never shrink or reset mid-episode)
  • message_to_action() mutates state β€” it appends to history before tokenizing
  • Ollama failures never crash the environment β€” always graceful fallback
  • tests/test_smoke.py must pass without Ollama, without GPU, without network
  • Schema column names in _build_schema_description() must match data/databases/models.py

Glossary

Term Definition
Episode One question-answering session: reset -> N steps -> terminal
Action type One of: describe, sample, query, answer
MockTokenizer Deterministic char-code tokenizer for dev/test (no GPU)
Spider Academic text-to-SQL benchmark dataset
ORM models SQLAlchemy class definitions in data/databases/models.py
OpenEnv Meta's RL environment framework (Environment, EnvClient, Action, Observation)

Infrastructure

Development

Prerequisites:

  • Python 3.11-3.12 (torch incompatible with 3.13)
  • uv package manager
  • Ollama (optional)

Setup:

git clone <repo-url> && cd sql-env
uv sync
uv run pytest tests/ -v    # 21 tests, ~3.5s, no external deps

Production

Deployment: Docker container via OpenEnv CLI (openenv build / openenv push) Runtime: FastAPI on port 8000 (defined in openenv.yaml) Status: Dockerfile is a scaffold stub β€” not yet validated


Suggested Feature Breakdown

ID Feature Complexity Dependencies Notes
F001 SQL query execution standard - Execute queries against real SQLite, return results
F002 Reward computation standard F001 3-layer reward: operational, progress, terminal
F003 Answer verification standard F001 Compare agent answer to gold SQL results
F004 Docker validation simple - Update Dockerfile, test openenv build
F005 Multi-database support complex F001 Load any Spider database, not just student_assessment

Suggested Implementation Order

  1. F001 β€” Foundation: wire up SQLite execution so queries return real data
  2. F002 + F003 β€” Can be done in parallel once F001 is complete
  3. F004 β€” Independent, can be done anytime
  4. F005 β€” After the single-database path is solid

Future Considerations

  • Real SQLite execution: Queries currently go to Ollama for SQL generation but aren't executed against a database. Phase 3 should execute the generated SQL and return actual results.
  • Multi-episode batching: For RL training, the environment will need to support multiple concurrent episodes efficiently.
  • Reward shaping: The 3-layer reward (operational, progress, terminal) is designed in models.py but not implemented.
  • Table selection without Ollama: A lightweight keyword/embedding-based table selector could replace the LLM fallback.

Keeping This Map Current

Update this file when you change any of:

  • System boundaries (new service, new subsystem)
  • Persistent state locations (new files/dirs written or read)
  • Shared data models or API contracts
  • Cross-cutting invariants

References