Spaces:

hjerpe
/

sql_env

Runtime error

                         SQLEnv System
  ================================================================

  RL Training Loop                          SQLEnv Server (Docker)
  ----------------                          ----------------------
                                           +---------------------+
  +------------+     WebSocket (JSON)      | server/app.py       |
  | SQLEnv     |<=========================>| FastAPI + WS        |
  | Client     |  SQLAction  -> server     |                     |
  | (client.py)|  SQLObs    <- server      +----------+----------+
  +-----+------+                                      |
        |                                             v
        | tensor <-> list                  +---------------------+
        | serialization                    | SQLEnvironment      |
        |                                  | (sql_environment.py)|
  +-----v------+                           |                     |
  | RL Agent   |                           | - reset() / step()  |
  | (external) |                           | - action detection  |
  | e.g. GRPO  |                           | - message_to_action |
  +------------+                           +--+-------+-------+--+
                                              |       |       |
                                              v       v       v
                                         +------+ +------+ +--------+
                                         |Schema| |Sample| | Query  |
                                         |Intro-| |Gen   | | (Ollama|
                                         |spect.| |      | |  LLM)  |
                                         +--+---+ +--+---+ +---+----+
                                            |        |          |
                                            v        v          v
                                         +-------------------------+
                                         | SQLAlchemy ORM Models   |
                                         | (data/databases/        |
                                         |  models.py)             |
                                         | 9 tables:               |
                                         | Address, Person,        |
                                         | Student, Course, ...    |
                                         +-------------------------+

  Data (committed)                         External (optional)
  ----------------                         -------------------
  data/questions/                          +----------+
    student_assessment.json                | Ollama   |
    (53 Spider Q&A pairs)                  | LLM API  |
                                           | :11434   |
                                           +----------+

Component Inventory

Component	Owns	Entrypoint	State / Output
SQLEnvClient	WebSocket transport, tensor serialization	`client.py`	Stateless (wraps server)
FastAPI app	HTTP/WS endpoints, tokenizer factory	`server/app.py`	In-memory tokenizer
SQLEnvironment	Episode lifecycle, action dispatch, state	`server/sql_environment.py`	`SQLState` (in-memory)
Pydantic models	Type contracts (action, observation, state)	`models.py`	N/A (data classes)
ORM models	Database schema definition	`data/databases/models.py`	SQLAlchemy metadata
Spider data	Question-answer pairs	`data/questions/student_assessment.json`	53 Q&A entries
MockTokenizer	Dev/test tokenization (no GPU needed)	`server/test_sql_env.py`	Deterministic (ord/chr)

External Services

Service	Purpose	Required	Fallback
Ollama (`localhost:11434`)	Table selection + SQL generation	No	First table in dict; query returns error string

Key Flows

Flow: Episode (Reset + Multi-Turn Steps)

Client                    Server (SQLEnvironment)              Ollama
  |                              |                               |
  |--- reset() ----------------->|                               |
  |                              |-- init state, system prompt   |
  |                              |-- tokenize system message     |
  |<-- SQLObservation -----------|   (MockTokenizer or HF)       |
  |    .messages=[system]        |                               |
  |    .tokens=shape([N])        |                               |
  |                              |                               |
  |--- message_to_action(msg) -->|                               |
  |                              |-- detect action type          |
  |                              |   (keyword matching)          |
  |                              |-- append msg to history       |
  |                              |-- tokenize full conversation  |
  |<-- SQLAction ----------------|                               |
  |    .action_type="describe"   |                               |
  |    .tokens=shape([1,M])      |                               |
  |                              |                               |
  |--- step(action) ------------>|                               |
  |                              |-- select table -------------->|
  |                              |<-- table name (or fallback) --|
  |                              |-- introspect ORM schema       |
  |                              |-- append assistant msg        |
  |                              |-- append action tokens        |
  |<-- SQLObservation -----------|                               |
  |    .messages=[sys,usr,asst]  |                               |
  |    .tokens=shape([N+M+K])    |                               |
  |                              |                               |
  (repeat step() for sample, query, answer...)

Flow: Action Detection

User message string
        |
        v
  _detect_action_type(content)
        |
        +-- contains "describe"/"schema"/"columns"?  --> "describe"
        |
        +-- contains "sample"/"example"/"rows"?      --> "sample"
        |
        +-- default                                  --> "query"

Flow: Client Serialization (WebSocket Transport)

  Client                                      Server
    |                                           |
    |  _step_payload(action):                   |
    |    tokens: Tensor -> list (JSON-safe)     |
    |    {action_type, action_description,      |
    |     tokens: [[1,2,3,...]], metadata}       |
    |  ---------------------------------------->|
    |                                           |
    |  _parse_result(data):                     |
    |    tokens: list -> Tensor                 |
    |    StepResult(obs, reward, done, info)     |
    |  <----------------------------------------|

Shared Data Models

These three Pydantic models are used across client, server, and tests. Defined in models.py.

SQLAction

class SQLAction(Action):
    action_type: str         # "describe" | "sample" | "query" | "answer"
    action_description: str  # raw user message content
    tokens: torch.Tensor     # tokenized conversation context, shape [1, seq_len]

Used by: SQLEnvironment.step(), SQLEnvClient._step_payload(), tests

SQLObservation

class SQLObservation(Observation):
    messages: list[Message]  # full conversation history [{role, content}, ...]
    tokens: torch.Tensor     # flattened 1D tensor of all turn tokens concatenated

Used by: SQLEnvironment.reset()/step(), SQLEnvClient._parse_result(), tests

SQLState

class SQLState(State):
    episode_id: str                    # UUID per episode
    step_count: int                    # turns taken
    history_messages: list[Message]    # accumulates across turns
    history_tokens: list[torch.Tensor] # one tensor per turn, flattened on output
    current_action_type: str | None    # last detected action type

Used by: SQLEnvironment (internal), state endpoint Note: This is a lightweight summary for logging. The full RL state lives inside SQLEnvironment and is not exposed to the agent.

API Contracts

WebSocket (OpenEnv Protocol)

The server exposes a WebSocket endpoint via FastAPI. The OpenEnv framework handles the protocol — SQLEnv implements reset() and step() on the server side, and SQLEnvClient wraps the client side.

Operation	Client Method	Payload	Response
Reset	`client.reset()`	`{}`	`SQLObservation` (JSON)
Step	`client.step(action)`	`{action_type, action_description, tokens: list, metadata}`	`StepResult(obs, reward, done, info)`
State	`client.state()`	`{}`	`SQLState` (JSON)

Ollama (Optional)

Endpoint	Purpose	Payload
`POST /api/generate`	Table selection	`{model, prompt, stream: false}`
`POST /api/generate`	SQL generation	`{model, prompt, stream: false}`

Timeout: 30s. Failure mode: graceful fallback (never crashes).

Cross-Cutting Concerns

Code Style & Abstraction Philosophy

OOP for framework integration (Environment, EnvClient subclasses), plain methods for logic. Extract helpers when they clarify intent, not for DRY.

Structure: Flat package root with server/ for server-only code
Error handling: Graceful fallbacks (never crash), ValueError for invalid inputs
Imports: try: from sql_env.X / except: from X for dual install/Docker compatibility

Tokenization

Two paths, same interface (apply_chat_template):

Mode	Tokenizer	Source	When
Dev/Test	`MockTokenizer`	`server/test_sql_env.py`	No GPU, no downloads
Production	HuggingFace	`transformers` library	Real RL training

MockTokenizer encodes as ord(c) per character, decodes as chr(t). Deterministic and fast.

Configuration

Variable	Required	Description	Default
`OLLAMA_MODEL`	No	Ollama model name for SQL generation	`qwen2`
`OLLAMA_BASE_URL`	No	Ollama API endpoint	`http://localhost:11434`

Data, State, and Storage Locations

Repo (committed):
- data/questions/student_assessment.json — 53 Spider Q&A pairs
- data/databases/models.py — 9 SQLAlchemy ORM table definitions
Runtime state (in-memory, per episode):
- SQLState.history_messages — conversation messages
- SQLState.history_tokens — tensor per turn
Not yet implemented:
- SQLite database files (Phase 3 — queries currently go through Ollama, not executed locally)
- Reward/verification state

Invariants and Guardrails

self.db_models refers to database table models (SQLAlchemy), never RL models
Token tensors grow monotonically across turns (never shrink or reset mid-episode)
message_to_action() mutates state — it appends to history before tokenizing
Ollama failures never crash the environment — always graceful fallback
tests/test_smoke.py must pass without Ollama, without GPU, without network
Schema column names in _build_schema_description() must match data/databases/models.py

Glossary

Term	Definition
Episode	One question-answering session: reset -> N steps -> terminal
Action type	One of: describe, sample, query, answer
MockTokenizer	Deterministic char-code tokenizer for dev/test (no GPU)
Spider	Academic text-to-SQL benchmark dataset
ORM models	SQLAlchemy class definitions in `data/databases/models.py`
OpenEnv	Meta's RL environment framework (Environment, EnvClient, Action, Observation)

Infrastructure

Development

Prerequisites:

Python 3.11-3.12 (torch incompatible with 3.13)
uv package manager
Ollama (optional)

Setup:

git clone <repo-url> && cd sql-env
uv sync
uv run pytest tests/ -v    # 21 tests, ~3.5s, no external deps

Production

Deployment: Docker container via OpenEnv CLI (openenv build / openenv push) Runtime: FastAPI on port 8000 (defined in openenv.yaml) Status: Dockerfile is a scaffold stub — not yet validated

Suggested Feature Breakdown

ID	Feature	Complexity	Dependencies	Notes
F001	SQL query execution	standard	-	Execute queries against real SQLite, return results
F002	Reward computation	standard	F001	3-layer reward: operational, progress, terminal
F003	Answer verification	standard	F001	Compare agent answer to gold SQL results
F004	Docker validation	simple	-	Update Dockerfile, test `openenv build`
F005	Multi-database support	complex	F001	Load any Spider database, not just student_assessment

Suggested Implementation Order

F001 — Foundation: wire up SQLite execution so queries return real data
F002 + F003 — Can be done in parallel once F001 is complete
F004 — Independent, can be done anytime
F005 — After the single-database path is solid

Future Considerations

Real SQLite execution: Queries currently go to Ollama for SQL generation but aren't executed against a database. Phase 3 should execute the generated SQL and return actual results.
Multi-episode batching: For RL training, the environment will need to support multiple concurrent episodes efficiently.
Reward shaping: The 3-layer reward (operational, progress, terminal) is designed in models.py but not implemented.
Table selection without Ollama: A lightweight keyword/embedding-based table selector could replace the LLM fallback.

Keeping This Map Current

Update this file when you change any of:

System boundaries (new service, new subsystem)
Persistent state locations (new files/dirs written or read)
Shared data models or API contracts
Cross-cutting invariants

References

Docs index: docs/README.md
Operations: docs/RUNBOOK.md
OpenEnv framework: https://github.com/meta-pytorch/OpenEnv
Spider dataset: https://huggingface.co/datasets/xlangai/spider