title: SQLEnv
emoji: 🤖
colorFrom: blue
colorTo: green
sdk: docker
pinned: false
base_path: /web
SQLEnv: Teaching Agents to Explore Databases
SQLEnv is an interactive RL environment for text-to-SQL reasoning. Instead of producing one-shot SQL, agents learn to think like data analysts: inspect schema, sample rows, run exploratory queries, and submit a final answer with confidence.
Built for the OpenEnv Challenge, this project packages environment runtime, dense rewards, evaluation, and training hooks so others can reproduce results and iterate quickly.
Quick Start
Run these three commands to install, validate, and smoke-test the environment:
uv sync
uv run openenv validate --verbose
uv run pytest tests/ -v
Local server run:
uv run uvicorn server.app:app --reload --host 0.0.0.0 --port 8000
Docker run:
docker build -t sql-env:latest -f server/Dockerfile .
docker run -p 8000:8000 sql-env:latest
Why SQLEnv
Static text-to-SQL benchmarks reward final outputs, not reasoning quality. SQLEnv turns SQL generation into an interactive decision process with feedback at each step, making it suitable for RL training and behavior analysis.
Architecture
+-------------+ WebSocket +----------------------+ SQLite
| RL Agent | <------------------> | SQLEnvClient | <----------------+
| (GRPO/TRL) | | (client.py) | |
+-------------+ +----------+-----------+ |
HTTP/WebSocket |
| |
v |
+--------------------------+ |
| FastAPI Server | |
| (server.app:app) | |
+------------+-------------+ |
| |
v |
+--------------------------+ |
| SQLEnvironment |------------+
| step/reset/reward/verify |
+--------------------------+
How It Works
Each episode begins with a natural language question mapped to a hidden Spider database. The agent acts through four environment actions:
| Action | Purpose | Typical Output |
|---|---|---|
DESCRIBE table_name |
Inspect schema and column metadata | Column names, types, row count |
SAMPLE table_name |
Inspect representative rows | Small row sample |
QUERY sql_string |
Execute read-only SQL in sandbox | Query result rows or SQL error |
ANSWER value |
Submit final answer | Terminal reward and completion |
Episode flow:
reset()returns question context and available tables.step()executes one exploration action at a time.ANSWERends the episode with correctness-based terminal reward.
Train an Agent
Use the GRPO training pipeline artifacts from F006 and run the notebook workflow:
- Notebook:
notebooks/train_grpo.ipynb - Training support modules:
training/ - Evaluation utilities:
evaluation/
This setup is designed for Colab and local CPU/GPU environments.
HuggingFace Space
- Live Space:
https://huggingface.co/spaces/<your-org-or-user>/sql-env(update after push) - Health check:
curl https://<space-url>/health - Deploy command:
uv run openenv push
Project Structure
sql-env/
|- __init__.py
|- client.py
|- models.py
|- openenv.yaml
|- server/
| |- app.py
| |- sql_environment.py
| |- reward.py
| |- verifier.py
| `- Dockerfile
|- data/
| |- databases/
| `- questions/
|- training/
|- evaluation/
|- notebooks/
| `- train_grpo.ipynb
|- specs/
|- docs/
`- tests/
Deployment Checklist
uv run openenv validate --verboseuv run openenv builduv run openenv push- Verify
/healthand run one full episode through the client.
Links
- OpenEnv framework: https://github.com/meta-pytorch/OpenEnv
- OpenEnv docs: https://meta-pytorch.org/OpenEnv/
- Spider dataset: https://huggingface.co/datasets/xlangai/spider
- TRL OpenEnv docs: https://huggingface.co/docs/trl/openenv
- Verification plan:
specs/F007-VERIFICATION_SPEC.md