File size: 14,880 Bytes
5dd1bb4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
# Architecture

> Last updated: 2026-02-28

System map for SQLEnv β€” an RL environment where agents learn interactive SQL exploration via the OpenEnv framework.

**Goals:**
- Show how components connect (system map + key flows)
- Make hidden state explicit (what lives where)
- Define shared interfaces (Pydantic models, WebSocket API)
- Keep invariants legible (what must stay true)

**Non-goals:**
- CLI reference (see `docs/RUNBOOK.md`)
- Per-feature implementation details (link to specs)

---

## System Map

```text
                         SQLEnv System
  ================================================================

  RL Training Loop                          SQLEnv Server (Docker)
  ----------------                          ----------------------
                                           +---------------------+
  +------------+     WebSocket (JSON)      | server/app.py       |
  | SQLEnv     |<=========================>| FastAPI + WS        |
  | Client     |  SQLAction  -> server     |                     |
  | (client.py)|  SQLObs    <- server      +----------+----------+
  +-----+------+                                      |
        |                                             v
        | tensor <-> list                  +---------------------+
        | serialization                    | SQLEnvironment      |
        |                                  | (sql_environment.py)|
  +-----v------+                           |                     |
  | RL Agent   |                           | - reset() / step()  |
  | (external) |                           | - action detection  |
  | e.g. GRPO  |                           | - message_to_action |
  +------------+                           +--+-------+-------+--+
                                              |       |       |
                                              v       v       v
                                         +------+ +------+ +--------+
                                         |Schema| |Sample| | Query  |
                                         |Intro-| |Gen   | | (Ollama|
                                         |spect.| |      | |  LLM)  |
                                         +--+---+ +--+---+ +---+----+
                                            |        |          |
                                            v        v          v
                                         +-------------------------+
                                         | SQLAlchemy ORM Models   |
                                         | (data/databases/        |
                                         |  models.py)             |
                                         | 9 tables:               |
                                         | Address, Person,        |
                                         | Student, Course, ...    |
                                         +-------------------------+

  Data (committed)                         External (optional)
  ----------------                         -------------------
  data/questions/                          +----------+
    student_assessment.json                | Ollama   |
    (53 Spider Q&A pairs)                  | LLM API  |
                                           | :11434   |
                                           +----------+
```

---

## Component Inventory

| Component | Owns | Entrypoint | State / Output |
|-----------|------|------------|----------------|
| **SQLEnvClient** | WebSocket transport, tensor serialization | `client.py` | Stateless (wraps server) |
| **FastAPI app** | HTTP/WS endpoints, tokenizer factory | `server/app.py` | In-memory tokenizer |
| **SQLEnvironment** | Episode lifecycle, action dispatch, state | `server/sql_environment.py` | `SQLState` (in-memory) |
| **Pydantic models** | Type contracts (action, observation, state) | `models.py` | N/A (data classes) |
| **ORM models** | Database schema definition | `data/databases/models.py` | SQLAlchemy metadata |
| **Spider data** | Question-answer pairs | `data/questions/student_assessment.json` | 53 Q&A entries |
| **MockTokenizer** | Dev/test tokenization (no GPU needed) | `server/test_sql_env.py` | Deterministic (ord/chr) |

### External Services

| Service | Purpose | Required | Fallback |
|---------|---------|----------|----------|
| Ollama (`localhost:11434`) | Table selection + SQL generation | No | First table in dict; query returns error string |

---

## Key Flows

### Flow: Episode (Reset + Multi-Turn Steps)

```text
Client                    Server (SQLEnvironment)              Ollama
  |                              |                               |
  |--- reset() ----------------->|                               |
  |                              |-- init state, system prompt   |
  |                              |-- tokenize system message     |
  |<-- SQLObservation -----------|   (MockTokenizer or HF)       |
  |    .messages=[system]        |                               |
  |    .tokens=shape([N])        |                               |
  |                              |                               |
  |--- message_to_action(msg) -->|                               |
  |                              |-- detect action type          |
  |                              |   (keyword matching)          |
  |                              |-- append msg to history       |
  |                              |-- tokenize full conversation  |
  |<-- SQLAction ----------------|                               |
  |    .action_type="describe"   |                               |
  |    .tokens=shape([1,M])      |                               |
  |                              |                               |
  |--- step(action) ------------>|                               |
  |                              |-- select table -------------->|
  |                              |<-- table name (or fallback) --|
  |                              |-- introspect ORM schema       |
  |                              |-- append assistant msg        |
  |                              |-- append action tokens        |
  |<-- SQLObservation -----------|                               |
  |    .messages=[sys,usr,asst]  |                               |
  |    .tokens=shape([N+M+K])    |                               |
  |                              |                               |
  (repeat step() for sample, query, answer...)
```

### Flow: Action Detection

```text
User message string
        |
        v
  _detect_action_type(content)
        |
        +-- contains "describe"/"schema"/"columns"?  --> "describe"
        |
        +-- contains "sample"/"example"/"rows"?      --> "sample"
        |
        +-- default                                  --> "query"
```

### Flow: Client Serialization (WebSocket Transport)

```text
  Client                                      Server
    |                                           |
    |  _step_payload(action):                   |
    |    tokens: Tensor -> list (JSON-safe)     |
    |    {action_type, action_description,      |
    |     tokens: [[1,2,3,...]], metadata}       |
    |  ---------------------------------------->|
    |                                           |
    |  _parse_result(data):                     |
    |    tokens: list -> Tensor                 |
    |    StepResult(obs, reward, done, info)     |
    |  <----------------------------------------|
```

---

## Shared Data Models

These three Pydantic models are used across client, server, and tests.
Defined in `models.py`.

### SQLAction

```python
class SQLAction(Action):
    action_type: str         # "describe" | "sample" | "query" | "answer"
    action_description: str  # raw user message content
    tokens: torch.Tensor     # tokenized conversation context, shape [1, seq_len]
```

**Used by:** SQLEnvironment.step(), SQLEnvClient._step_payload(), tests

### SQLObservation

```python
class SQLObservation(Observation):
    messages: list[Message]  # full conversation history [{role, content}, ...]
    tokens: torch.Tensor     # flattened 1D tensor of all turn tokens concatenated
```

**Used by:** SQLEnvironment.reset()/step(), SQLEnvClient._parse_result(), tests

### SQLState

```python
class SQLState(State):
    episode_id: str                    # UUID per episode
    step_count: int                    # turns taken
    history_messages: list[Message]    # accumulates across turns
    history_tokens: list[torch.Tensor] # one tensor per turn, flattened on output
    current_action_type: str | None    # last detected action type
```

**Used by:** SQLEnvironment (internal), state endpoint
**Note:** This is a lightweight summary for logging. The full RL state lives inside SQLEnvironment and is not exposed to the agent.

---

## API Contracts

### WebSocket (OpenEnv Protocol)

The server exposes a WebSocket endpoint via FastAPI. The OpenEnv framework handles the protocol β€” SQLEnv implements `reset()` and `step()` on the server side, and `SQLEnvClient` wraps the client side.

| Operation | Client Method | Payload | Response |
|-----------|---------------|---------|----------|
| Reset | `client.reset()` | `{}` | `SQLObservation` (JSON) |
| Step | `client.step(action)` | `{action_type, action_description, tokens: list, metadata}` | `StepResult(obs, reward, done, info)` |
| State | `client.state()` | `{}` | `SQLState` (JSON) |

### Ollama (Optional)

| Endpoint | Purpose | Payload |
|----------|---------|---------|
| `POST /api/generate` | Table selection | `{model, prompt, stream: false}` |
| `POST /api/generate` | SQL generation | `{model, prompt, stream: false}` |

Timeout: 30s. Failure mode: graceful fallback (never crashes).

---

## Cross-Cutting Concerns

### Code Style & Abstraction Philosophy

OOP for framework integration (Environment, EnvClient subclasses), plain methods for logic. Extract helpers when they clarify intent, not for DRY.

- **Structure:** Flat package root with `server/` for server-only code
- **Error handling:** Graceful fallbacks (never crash), `ValueError` for invalid inputs
- **Imports:** `try: from sql_env.X / except: from X` for dual install/Docker compatibility

### Tokenization

Two paths, same interface (`apply_chat_template`):

| Mode | Tokenizer | Source | When |
|------|-----------|--------|------|
| Dev/Test | `MockTokenizer` | `server/test_sql_env.py` | No GPU, no downloads |
| Production | HuggingFace | `transformers` library | Real RL training |

`MockTokenizer` encodes as `ord(c)` per character, decodes as `chr(t)`. Deterministic and fast.

### Configuration

| Variable | Required | Description | Default |
|----------|----------|-------------|---------|
| `OLLAMA_MODEL` | No | Ollama model name for SQL generation | `qwen2` |
| `OLLAMA_BASE_URL` | No | Ollama API endpoint | `http://localhost:11434` |

---

## Data, State, and Storage Locations

- **Repo (committed):**
  - `data/questions/student_assessment.json` β€” 53 Spider Q&A pairs
  - `data/databases/models.py` β€” 9 SQLAlchemy ORM table definitions
- **Runtime state (in-memory, per episode):**
  - `SQLState.history_messages` β€” conversation messages
  - `SQLState.history_tokens` β€” tensor per turn
- **Not yet implemented:**
  - SQLite database files (Phase 3 β€” queries currently go through Ollama, not executed locally)
  - Reward/verification state

---

## Invariants and Guardrails

- `self.db_models` refers to **database table** models (SQLAlchemy), never RL models
- Token tensors grow monotonically across turns (never shrink or reset mid-episode)
- `message_to_action()` mutates state β€” it appends to history before tokenizing
- Ollama failures never crash the environment β€” always graceful fallback
- `tests/test_smoke.py` must pass without Ollama, without GPU, without network
- Schema column names in `_build_schema_description()` must match `data/databases/models.py`

---

## Glossary

| Term | Definition |
|------|------------|
| Episode | One question-answering session: reset -> N steps -> terminal |
| Action type | One of: describe, sample, query, answer |
| MockTokenizer | Deterministic char-code tokenizer for dev/test (no GPU) |
| Spider | Academic text-to-SQL benchmark dataset |
| ORM models | SQLAlchemy class definitions in `data/databases/models.py` |
| OpenEnv | Meta's RL environment framework (Environment, EnvClient, Action, Observation) |

---

## Infrastructure

### Development

**Prerequisites:**
- Python 3.11-3.12 (torch incompatible with 3.13)
- `uv` package manager
- Ollama (optional)

**Setup:**
```bash
git clone <repo-url> && cd sql-env
uv sync
uv run pytest tests/ -v    # 21 tests, ~3.5s, no external deps
```

### Production

**Deployment:** Docker container via OpenEnv CLI (`openenv build` / `openenv push`)
**Runtime:** FastAPI on port 8000 (defined in `openenv.yaml`)
**Status:** Dockerfile is a scaffold stub β€” not yet validated

---

## Suggested Feature Breakdown

| ID | Feature | Complexity | Dependencies | Notes |
|----|---------|------------|--------------|-------|
| F001 | SQL query execution | standard | - | Execute queries against real SQLite, return results |
| F002 | Reward computation | standard | F001 | 3-layer reward: operational, progress, terminal |
| F003 | Answer verification | standard | F001 | Compare agent answer to gold SQL results |
| F004 | Docker validation | simple | - | Update Dockerfile, test `openenv build` |
| F005 | Multi-database support | complex | F001 | Load any Spider database, not just student_assessment |

### Suggested Implementation Order

1. **F001** β€” Foundation: wire up SQLite execution so queries return real data
2. **F002 + F003** β€” Can be done in parallel once F001 is complete
3. **F004** β€” Independent, can be done anytime
4. **F005** β€” After the single-database path is solid

---

## Future Considerations

- **Real SQLite execution:** Queries currently go to Ollama for SQL generation but aren't executed against a database. Phase 3 should execute the generated SQL and return actual results.
- **Multi-episode batching:** For RL training, the environment will need to support multiple concurrent episodes efficiently.
- **Reward shaping:** The 3-layer reward (operational, progress, terminal) is designed in `models.py` but not implemented.
- **Table selection without Ollama:** A lightweight keyword/embedding-based table selector could replace the LLM fallback.

---

## Keeping This Map Current

Update this file when you change any of:
- System boundaries (new service, new subsystem)
- Persistent state locations (new files/dirs written or read)
- Shared data models or API contracts
- Cross-cutting invariants

---

## References

- Docs index: `docs/README.md`
- Operations: `docs/RUNBOOK.md`
- OpenEnv framework: https://github.com/meta-pytorch/OpenEnv
- Spider dataset: https://huggingface.co/datasets/xlangai/spider