File size: 14,880 Bytes
5dd1bb4 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 | # Architecture
> Last updated: 2026-02-28
System map for SQLEnv β an RL environment where agents learn interactive SQL exploration via the OpenEnv framework.
**Goals:**
- Show how components connect (system map + key flows)
- Make hidden state explicit (what lives where)
- Define shared interfaces (Pydantic models, WebSocket API)
- Keep invariants legible (what must stay true)
**Non-goals:**
- CLI reference (see `docs/RUNBOOK.md`)
- Per-feature implementation details (link to specs)
---
## System Map
```text
SQLEnv System
================================================================
RL Training Loop SQLEnv Server (Docker)
---------------- ----------------------
+---------------------+
+------------+ WebSocket (JSON) | server/app.py |
| SQLEnv |<=========================>| FastAPI + WS |
| Client | SQLAction -> server | |
| (client.py)| SQLObs <- server +----------+----------+
+-----+------+ |
| v
| tensor <-> list +---------------------+
| serialization | SQLEnvironment |
| | (sql_environment.py)|
+-----v------+ | |
| RL Agent | | - reset() / step() |
| (external) | | - action detection |
| e.g. GRPO | | - message_to_action |
+------------+ +--+-------+-------+--+
| | |
v v v
+------+ +------+ +--------+
|Schema| |Sample| | Query |
|Intro-| |Gen | | (Ollama|
|spect.| | | | LLM) |
+--+---+ +--+---+ +---+----+
| | |
v v v
+-------------------------+
| SQLAlchemy ORM Models |
| (data/databases/ |
| models.py) |
| 9 tables: |
| Address, Person, |
| Student, Course, ... |
+-------------------------+
Data (committed) External (optional)
---------------- -------------------
data/questions/ +----------+
student_assessment.json | Ollama |
(53 Spider Q&A pairs) | LLM API |
| :11434 |
+----------+
```
---
## Component Inventory
| Component | Owns | Entrypoint | State / Output |
|-----------|------|------------|----------------|
| **SQLEnvClient** | WebSocket transport, tensor serialization | `client.py` | Stateless (wraps server) |
| **FastAPI app** | HTTP/WS endpoints, tokenizer factory | `server/app.py` | In-memory tokenizer |
| **SQLEnvironment** | Episode lifecycle, action dispatch, state | `server/sql_environment.py` | `SQLState` (in-memory) |
| **Pydantic models** | Type contracts (action, observation, state) | `models.py` | N/A (data classes) |
| **ORM models** | Database schema definition | `data/databases/models.py` | SQLAlchemy metadata |
| **Spider data** | Question-answer pairs | `data/questions/student_assessment.json` | 53 Q&A entries |
| **MockTokenizer** | Dev/test tokenization (no GPU needed) | `server/test_sql_env.py` | Deterministic (ord/chr) |
### External Services
| Service | Purpose | Required | Fallback |
|---------|---------|----------|----------|
| Ollama (`localhost:11434`) | Table selection + SQL generation | No | First table in dict; query returns error string |
---
## Key Flows
### Flow: Episode (Reset + Multi-Turn Steps)
```text
Client Server (SQLEnvironment) Ollama
| | |
|--- reset() ----------------->| |
| |-- init state, system prompt |
| |-- tokenize system message |
|<-- SQLObservation -----------| (MockTokenizer or HF) |
| .messages=[system] | |
| .tokens=shape([N]) | |
| | |
|--- message_to_action(msg) -->| |
| |-- detect action type |
| | (keyword matching) |
| |-- append msg to history |
| |-- tokenize full conversation |
|<-- SQLAction ----------------| |
| .action_type="describe" | |
| .tokens=shape([1,M]) | |
| | |
|--- step(action) ------------>| |
| |-- select table -------------->|
| |<-- table name (or fallback) --|
| |-- introspect ORM schema |
| |-- append assistant msg |
| |-- append action tokens |
|<-- SQLObservation -----------| |
| .messages=[sys,usr,asst] | |
| .tokens=shape([N+M+K]) | |
| | |
(repeat step() for sample, query, answer...)
```
### Flow: Action Detection
```text
User message string
|
v
_detect_action_type(content)
|
+-- contains "describe"/"schema"/"columns"? --> "describe"
|
+-- contains "sample"/"example"/"rows"? --> "sample"
|
+-- default --> "query"
```
### Flow: Client Serialization (WebSocket Transport)
```text
Client Server
| |
| _step_payload(action): |
| tokens: Tensor -> list (JSON-safe) |
| {action_type, action_description, |
| tokens: [[1,2,3,...]], metadata} |
| ---------------------------------------->|
| |
| _parse_result(data): |
| tokens: list -> Tensor |
| StepResult(obs, reward, done, info) |
| <----------------------------------------|
```
---
## Shared Data Models
These three Pydantic models are used across client, server, and tests.
Defined in `models.py`.
### SQLAction
```python
class SQLAction(Action):
action_type: str # "describe" | "sample" | "query" | "answer"
action_description: str # raw user message content
tokens: torch.Tensor # tokenized conversation context, shape [1, seq_len]
```
**Used by:** SQLEnvironment.step(), SQLEnvClient._step_payload(), tests
### SQLObservation
```python
class SQLObservation(Observation):
messages: list[Message] # full conversation history [{role, content}, ...]
tokens: torch.Tensor # flattened 1D tensor of all turn tokens concatenated
```
**Used by:** SQLEnvironment.reset()/step(), SQLEnvClient._parse_result(), tests
### SQLState
```python
class SQLState(State):
episode_id: str # UUID per episode
step_count: int # turns taken
history_messages: list[Message] # accumulates across turns
history_tokens: list[torch.Tensor] # one tensor per turn, flattened on output
current_action_type: str | None # last detected action type
```
**Used by:** SQLEnvironment (internal), state endpoint
**Note:** This is a lightweight summary for logging. The full RL state lives inside SQLEnvironment and is not exposed to the agent.
---
## API Contracts
### WebSocket (OpenEnv Protocol)
The server exposes a WebSocket endpoint via FastAPI. The OpenEnv framework handles the protocol β SQLEnv implements `reset()` and `step()` on the server side, and `SQLEnvClient` wraps the client side.
| Operation | Client Method | Payload | Response |
|-----------|---------------|---------|----------|
| Reset | `client.reset()` | `{}` | `SQLObservation` (JSON) |
| Step | `client.step(action)` | `{action_type, action_description, tokens: list, metadata}` | `StepResult(obs, reward, done, info)` |
| State | `client.state()` | `{}` | `SQLState` (JSON) |
### Ollama (Optional)
| Endpoint | Purpose | Payload |
|----------|---------|---------|
| `POST /api/generate` | Table selection | `{model, prompt, stream: false}` |
| `POST /api/generate` | SQL generation | `{model, prompt, stream: false}` |
Timeout: 30s. Failure mode: graceful fallback (never crashes).
---
## Cross-Cutting Concerns
### Code Style & Abstraction Philosophy
OOP for framework integration (Environment, EnvClient subclasses), plain methods for logic. Extract helpers when they clarify intent, not for DRY.
- **Structure:** Flat package root with `server/` for server-only code
- **Error handling:** Graceful fallbacks (never crash), `ValueError` for invalid inputs
- **Imports:** `try: from sql_env.X / except: from X` for dual install/Docker compatibility
### Tokenization
Two paths, same interface (`apply_chat_template`):
| Mode | Tokenizer | Source | When |
|------|-----------|--------|------|
| Dev/Test | `MockTokenizer` | `server/test_sql_env.py` | No GPU, no downloads |
| Production | HuggingFace | `transformers` library | Real RL training |
`MockTokenizer` encodes as `ord(c)` per character, decodes as `chr(t)`. Deterministic and fast.
### Configuration
| Variable | Required | Description | Default |
|----------|----------|-------------|---------|
| `OLLAMA_MODEL` | No | Ollama model name for SQL generation | `qwen2` |
| `OLLAMA_BASE_URL` | No | Ollama API endpoint | `http://localhost:11434` |
---
## Data, State, and Storage Locations
- **Repo (committed):**
- `data/questions/student_assessment.json` β 53 Spider Q&A pairs
- `data/databases/models.py` β 9 SQLAlchemy ORM table definitions
- **Runtime state (in-memory, per episode):**
- `SQLState.history_messages` β conversation messages
- `SQLState.history_tokens` β tensor per turn
- **Not yet implemented:**
- SQLite database files (Phase 3 β queries currently go through Ollama, not executed locally)
- Reward/verification state
---
## Invariants and Guardrails
- `self.db_models` refers to **database table** models (SQLAlchemy), never RL models
- Token tensors grow monotonically across turns (never shrink or reset mid-episode)
- `message_to_action()` mutates state β it appends to history before tokenizing
- Ollama failures never crash the environment β always graceful fallback
- `tests/test_smoke.py` must pass without Ollama, without GPU, without network
- Schema column names in `_build_schema_description()` must match `data/databases/models.py`
---
## Glossary
| Term | Definition |
|------|------------|
| Episode | One question-answering session: reset -> N steps -> terminal |
| Action type | One of: describe, sample, query, answer |
| MockTokenizer | Deterministic char-code tokenizer for dev/test (no GPU) |
| Spider | Academic text-to-SQL benchmark dataset |
| ORM models | SQLAlchemy class definitions in `data/databases/models.py` |
| OpenEnv | Meta's RL environment framework (Environment, EnvClient, Action, Observation) |
---
## Infrastructure
### Development
**Prerequisites:**
- Python 3.11-3.12 (torch incompatible with 3.13)
- `uv` package manager
- Ollama (optional)
**Setup:**
```bash
git clone <repo-url> && cd sql-env
uv sync
uv run pytest tests/ -v # 21 tests, ~3.5s, no external deps
```
### Production
**Deployment:** Docker container via OpenEnv CLI (`openenv build` / `openenv push`)
**Runtime:** FastAPI on port 8000 (defined in `openenv.yaml`)
**Status:** Dockerfile is a scaffold stub β not yet validated
---
## Suggested Feature Breakdown
| ID | Feature | Complexity | Dependencies | Notes |
|----|---------|------------|--------------|-------|
| F001 | SQL query execution | standard | - | Execute queries against real SQLite, return results |
| F002 | Reward computation | standard | F001 | 3-layer reward: operational, progress, terminal |
| F003 | Answer verification | standard | F001 | Compare agent answer to gold SQL results |
| F004 | Docker validation | simple | - | Update Dockerfile, test `openenv build` |
| F005 | Multi-database support | complex | F001 | Load any Spider database, not just student_assessment |
### Suggested Implementation Order
1. **F001** β Foundation: wire up SQLite execution so queries return real data
2. **F002 + F003** β Can be done in parallel once F001 is complete
3. **F004** β Independent, can be done anytime
4. **F005** β After the single-database path is solid
---
## Future Considerations
- **Real SQLite execution:** Queries currently go to Ollama for SQL generation but aren't executed against a database. Phase 3 should execute the generated SQL and return actual results.
- **Multi-episode batching:** For RL training, the environment will need to support multiple concurrent episodes efficiently.
- **Reward shaping:** The 3-layer reward (operational, progress, terminal) is designed in `models.py` but not implemented.
- **Table selection without Ollama:** A lightweight keyword/embedding-based table selector could replace the LLM fallback.
---
## Keeping This Map Current
Update this file when you change any of:
- System boundaries (new service, new subsystem)
- Persistent state locations (new files/dirs written or read)
- Shared data models or API contracts
- Cross-cutting invariants
---
## References
- Docs index: `docs/README.md`
- Operations: `docs/RUNBOOK.md`
- OpenEnv framework: https://github.com/meta-pytorch/OpenEnv
- Spider dataset: https://huggingface.co/datasets/xlangai/spider
|