Architecture
Last updated: 2026-02-28
System map for SQLEnv β an RL environment where agents learn interactive SQL exploration via the OpenEnv framework.
Goals:
- Show how components connect (system map + key flows)
- Make hidden state explicit (what lives where)
- Define shared interfaces (Pydantic models, WebSocket API)
- Keep invariants legible (what must stay true)
Non-goals:
- CLI reference (see
docs/RUNBOOK.md) - Per-feature implementation details (link to specs)
System Map
SQLEnv System
================================================================
RL Training Loop SQLEnv Server (Docker)
---------------- ----------------------
+---------------------+
+------------+ WebSocket (JSON) | server/app.py |
| SQLEnv |<=========================>| FastAPI + WS |
| Client | SQLAction -> server | |
| (client.py)| SQLObs <- server +----------+----------+
+-----+------+ |
| v
| tensor <-> list +---------------------+
| serialization | SQLEnvironment |
| | (sql_environment.py)|
+-----v------+ | |
| RL Agent | | - reset() / step() |
| (external) | | - action detection |
| e.g. GRPO | | - message_to_action |
+------------+ +--+-------+-------+--+
| | |
v v v
+------+ +------+ +--------+
|Schema| |Sample| | Query |
|Intro-| |Gen | | (Ollama|
|spect.| | | | LLM) |
+--+---+ +--+---+ +---+----+
| | |
v v v
+-------------------------+
| SQLAlchemy ORM Models |
| (data/databases/ |
| models.py) |
| 9 tables: |
| Address, Person, |
| Student, Course, ... |
+-------------------------+
Data (committed) External (optional)
---------------- -------------------
data/questions/ +----------+
student_assessment.json | Ollama |
(53 Spider Q&A pairs) | LLM API |
| :11434 |
+----------+
Component Inventory
| Component | Owns | Entrypoint | State / Output |
|---|---|---|---|
| SQLEnvClient | WebSocket transport, tensor serialization | client.py |
Stateless (wraps server) |
| FastAPI app | HTTP/WS endpoints, tokenizer factory | server/app.py |
In-memory tokenizer |
| SQLEnvironment | Episode lifecycle, action dispatch, state | server/sql_environment.py |
SQLState (in-memory) |
| Pydantic models | Type contracts (action, observation, state) | models.py |
N/A (data classes) |
| ORM models | Database schema definition | data/databases/models.py |
SQLAlchemy metadata |
| Spider data | Question-answer pairs | data/questions/student_assessment.json |
53 Q&A entries |
| MockTokenizer | Dev/test tokenization (no GPU needed) | server/test_sql_env.py |
Deterministic (ord/chr) |
External Services
| Service | Purpose | Required | Fallback |
|---|---|---|---|
Ollama (localhost:11434) |
Table selection + SQL generation | No | First table in dict; query returns error string |
Key Flows
Flow: Episode (Reset + Multi-Turn Steps)
Client Server (SQLEnvironment) Ollama
| | |
|--- reset() ----------------->| |
| |-- init state, system prompt |
| |-- tokenize system message |
|<-- SQLObservation -----------| (MockTokenizer or HF) |
| .messages=[system] | |
| .tokens=shape([N]) | |
| | |
|--- message_to_action(msg) -->| |
| |-- detect action type |
| | (keyword matching) |
| |-- append msg to history |
| |-- tokenize full conversation |
|<-- SQLAction ----------------| |
| .action_type="describe" | |
| .tokens=shape([1,M]) | |
| | |
|--- step(action) ------------>| |
| |-- select table -------------->|
| |<-- table name (or fallback) --|
| |-- introspect ORM schema |
| |-- append assistant msg |
| |-- append action tokens |
|<-- SQLObservation -----------| |
| .messages=[sys,usr,asst] | |
| .tokens=shape([N+M+K]) | |
| | |
(repeat step() for sample, query, answer...)
Flow: Action Detection
User message string
|
v
_detect_action_type(content)
|
+-- contains "describe"/"schema"/"columns"? --> "describe"
|
+-- contains "sample"/"example"/"rows"? --> "sample"
|
+-- default --> "query"
Flow: Client Serialization (WebSocket Transport)
Client Server
| |
| _step_payload(action): |
| tokens: Tensor -> list (JSON-safe) |
| {action_type, action_description, |
| tokens: [[1,2,3,...]], metadata} |
| ---------------------------------------->|
| |
| _parse_result(data): |
| tokens: list -> Tensor |
| StepResult(obs, reward, done, info) |
| <----------------------------------------|
Shared Data Models
These three Pydantic models are used across client, server, and tests.
Defined in models.py.
SQLAction
class SQLAction(Action):
action_type: str # "describe" | "sample" | "query" | "answer"
action_description: str # raw user message content
tokens: torch.Tensor # tokenized conversation context, shape [1, seq_len]
Used by: SQLEnvironment.step(), SQLEnvClient._step_payload(), tests
SQLObservation
class SQLObservation(Observation):
messages: list[Message] # full conversation history [{role, content}, ...]
tokens: torch.Tensor # flattened 1D tensor of all turn tokens concatenated
Used by: SQLEnvironment.reset()/step(), SQLEnvClient._parse_result(), tests
SQLState
class SQLState(State):
episode_id: str # UUID per episode
step_count: int # turns taken
history_messages: list[Message] # accumulates across turns
history_tokens: list[torch.Tensor] # one tensor per turn, flattened on output
current_action_type: str | None # last detected action type
Used by: SQLEnvironment (internal), state endpoint Note: This is a lightweight summary for logging. The full RL state lives inside SQLEnvironment and is not exposed to the agent.
API Contracts
WebSocket (OpenEnv Protocol)
The server exposes a WebSocket endpoint via FastAPI. The OpenEnv framework handles the protocol β SQLEnv implements reset() and step() on the server side, and SQLEnvClient wraps the client side.
| Operation | Client Method | Payload | Response |
|---|---|---|---|
| Reset | client.reset() |
{} |
SQLObservation (JSON) |
| Step | client.step(action) |
{action_type, action_description, tokens: list, metadata} |
StepResult(obs, reward, done, info) |
| State | client.state() |
{} |
SQLState (JSON) |
Ollama (Optional)
| Endpoint | Purpose | Payload |
|---|---|---|
POST /api/generate |
Table selection | {model, prompt, stream: false} |
POST /api/generate |
SQL generation | {model, prompt, stream: false} |
Timeout: 30s. Failure mode: graceful fallback (never crashes).
Cross-Cutting Concerns
Code Style & Abstraction Philosophy
OOP for framework integration (Environment, EnvClient subclasses), plain methods for logic. Extract helpers when they clarify intent, not for DRY.
- Structure: Flat package root with
server/for server-only code - Error handling: Graceful fallbacks (never crash),
ValueErrorfor invalid inputs - Imports:
try: from sql_env.X / except: from Xfor dual install/Docker compatibility
Tokenization
Two paths, same interface (apply_chat_template):
| Mode | Tokenizer | Source | When |
|---|---|---|---|
| Dev/Test | MockTokenizer |
server/test_sql_env.py |
No GPU, no downloads |
| Production | HuggingFace | transformers library |
Real RL training |
MockTokenizer encodes as ord(c) per character, decodes as chr(t). Deterministic and fast.
Configuration
| Variable | Required | Description | Default |
|---|---|---|---|
OLLAMA_MODEL |
No | Ollama model name for SQL generation | qwen2 |
OLLAMA_BASE_URL |
No | Ollama API endpoint | http://localhost:11434 |
Data, State, and Storage Locations
- Repo (committed):
data/questions/student_assessment.jsonβ 53 Spider Q&A pairsdata/databases/models.pyβ 9 SQLAlchemy ORM table definitions
- Runtime state (in-memory, per episode):
SQLState.history_messagesβ conversation messagesSQLState.history_tokensβ tensor per turn
- Not yet implemented:
- SQLite database files (Phase 3 β queries currently go through Ollama, not executed locally)
- Reward/verification state
Invariants and Guardrails
self.db_modelsrefers to database table models (SQLAlchemy), never RL models- Token tensors grow monotonically across turns (never shrink or reset mid-episode)
message_to_action()mutates state β it appends to history before tokenizing- Ollama failures never crash the environment β always graceful fallback
tests/test_smoke.pymust pass without Ollama, without GPU, without network- Schema column names in
_build_schema_description()must matchdata/databases/models.py
Glossary
| Term | Definition |
|---|---|
| Episode | One question-answering session: reset -> N steps -> terminal |
| Action type | One of: describe, sample, query, answer |
| MockTokenizer | Deterministic char-code tokenizer for dev/test (no GPU) |
| Spider | Academic text-to-SQL benchmark dataset |
| ORM models | SQLAlchemy class definitions in data/databases/models.py |
| OpenEnv | Meta's RL environment framework (Environment, EnvClient, Action, Observation) |
Infrastructure
Development
Prerequisites:
- Python 3.11-3.12 (torch incompatible with 3.13)
uvpackage manager- Ollama (optional)
Setup:
git clone <repo-url> && cd sql-env
uv sync
uv run pytest tests/ -v # 21 tests, ~3.5s, no external deps
Production
Deployment: Docker container via OpenEnv CLI (openenv build / openenv push)
Runtime: FastAPI on port 8000 (defined in openenv.yaml)
Status: Dockerfile is a scaffold stub β not yet validated
Suggested Feature Breakdown
| ID | Feature | Complexity | Dependencies | Notes |
|---|---|---|---|---|
| F001 | SQL query execution | standard | - | Execute queries against real SQLite, return results |
| F002 | Reward computation | standard | F001 | 3-layer reward: operational, progress, terminal |
| F003 | Answer verification | standard | F001 | Compare agent answer to gold SQL results |
| F004 | Docker validation | simple | - | Update Dockerfile, test openenv build |
| F005 | Multi-database support | complex | F001 | Load any Spider database, not just student_assessment |
Suggested Implementation Order
- F001 β Foundation: wire up SQLite execution so queries return real data
- F002 + F003 β Can be done in parallel once F001 is complete
- F004 β Independent, can be done anytime
- F005 β After the single-database path is solid
Future Considerations
- Real SQLite execution: Queries currently go to Ollama for SQL generation but aren't executed against a database. Phase 3 should execute the generated SQL and return actual results.
- Multi-episode batching: For RL training, the environment will need to support multiple concurrent episodes efficiently.
- Reward shaping: The 3-layer reward (operational, progress, terminal) is designed in
models.pybut not implemented. - Table selection without Ollama: A lightweight keyword/embedding-based table selector could replace the LLM fallback.
Keeping This Map Current
Update this file when you change any of:
- System boundaries (new service, new subsystem)
- Persistent state locations (new files/dirs written or read)
- Shared data models or API contracts
- Cross-cutting invariants
References
- Docs index:
docs/README.md - Operations:
docs/RUNBOOK.md - OpenEnv framework: https://github.com/meta-pytorch/OpenEnv
- Spider dataset: https://huggingface.co/datasets/xlangai/spider