hjerpe commited on
Commit
519b9a3
·
verified ·
1 Parent(s): 35095ac

Upload folder using huggingface_hub

Browse files
docs/learnings/F007-gotchas.md CHANGED
@@ -1,2 +1,3 @@
1
  - Hardcoding port 8000 in container startup or health checks can cause false-negative readiness on HuggingFace Spaces where `PORT=7860` is injected at runtime *(F007)*
2
  - API health checks can report green while episodes still fail unless probes also assert at least one bundled `*.sqlite` file exists under `data/databases` *(F007)*
 
 
1
  - Hardcoding port 8000 in container startup or health checks can cause false-negative readiness on HuggingFace Spaces where `PORT=7860` is injected at runtime *(F007)*
2
  - API health checks can report green while episodes still fail unless probes also assert at least one bundled `*.sqlite` file exists under `data/databases` *(F007)*
3
+ - `uv run openenv validate --verbose` can fail non-docker modes on missing callable `main()` while still reporting Docker mode as supported, so Space submissions should evaluate readiness by deployment mode matrix instead of a binary pass/fail read *(F007)*
docs/learnings/F007-integrations.md CHANGED
@@ -1,2 +1,4 @@
1
  - HuggingFace Spaces deployment must treat `PORT` as runtime-configurable and wire both `HEALTHCHECK` and `uvicorn` startup to `${PORT:-8000}` for local/HF parity *(F007)*
2
  - Training notebooks should include an explicit `SQLEnvClient` connect/reset/step smoke test before GRPO runs to fail fast when environment connectivity is broken *(F007)*
 
 
 
1
  - HuggingFace Spaces deployment must treat `PORT` as runtime-configurable and wire both `HEALTHCHECK` and `uvicorn` startup to `${PORT:-8000}` for local/HF parity *(F007)*
2
  - Training notebooks should include an explicit `SQLEnvClient` connect/reset/step smoke test before GRPO runs to fail fast when environment connectivity is broken *(F007)*
3
+ - OpenEnv builds that depend on `ghcr.io/meta-pytorch/openenv-base` should assume GHCR authentication is a prerequisite in CI or local locked-down environments before `uv run openenv build` *(F007)*
4
+ - Deployment readiness for OpenEnv Spaces is only complete after a successful authenticated `uv run openenv push` to the target HuggingFace Space, not just a local Docker build *(F007)*
docs/learnings/F007-workflow.md CHANGED
@@ -1 +1,2 @@
1
  - Feature finalization should run both targeted E2E checks and full regression, then sync completion metadata in IMPLEMENTATION_SPEC execution status and FEATURES.json progress fields *(F007)*
 
 
1
  - Feature finalization should run both targeted E2E checks and full regression, then sync completion metadata in IMPLEMENTATION_SPEC execution status and FEATURES.json progress fields *(F007)*
2
+ - Final deployment sign-off should capture three evidence gates together: authenticated Docker build, authenticated `openenv push`, and full regression status, so external credential blockers are separated from code-quality regressions *(F007)*
specs/F007-DEMO.md CHANGED
@@ -19,15 +19,15 @@ From a user perspective, this removes submission friction: instead of piecing to
19
  ### Verified in This Demo Run
20
 
21
  - Ran `uv run openenv validate --verbose` and confirmed Docker mode is recognized while non-Docker modes report a callable-entrypoint issue.
22
- - Ran `uv run openenv build` and captured a real failure on default auto-generated Docker tag casing.
23
- - Ran `uv run openenv build -t openenv-sql-env-f007-hf-submission` and captured the next boundary failure (GHCR 403 when pulling base image).
24
  - Ran `uv run --with pytest pytest tests/ -v` and observed full local regression results: **250 passed, 1 skipped**.
25
 
26
  ### Previously Verified Evidence
27
 
28
  - `specs/F007-IMPLEMENTATION_SPEC.md` Section 7 records completion across all F007 steps and prior verification evidence.
29
  - `specs/F007-VERIFICATION_SPEC.md` defines deployment/notebook/README integration and E2E scenarios for this feature.
30
- - `specs/FEATURES.json` (`verification_evidence` for F007) records current verification evidence: 250 passed, 1 skipped (verifier still requests one final build-success proof).
31
 
32
  ---
33
 
@@ -37,18 +37,17 @@ From a user perspective, this removes submission friction: instead of piecing to
37
  - Confirm Colab one-click notebook execution in a clean runtime.
38
  - Complete/polish and publish the final blog post content from the outline.
39
 
40
- ### Required User Adjustments Before Re-Running Deployment
41
 
42
  1. Use an explicit lowercase image tag when building:
43
  - `uv run openenv build -t openenv-sql-env-f007-hf-submission`
44
- 2. Ensure Docker has sufficient free disk before build (the latest authenticated build reached final-stage file copy and failed with `No space left on device`).
45
- 3. Ensure Hugging Face credentials remain configured before any re-push:
46
  - `huggingface-cli login` (or equivalent token export expected by your `openenv push` setup)
47
- 4. Re-run deployment sequence in order:
48
  - `uv run openenv validate --verbose`
49
  - `uv run openenv build -t openenv-sql-env-f007-hf-submission`
50
  - `uv run openenv push`
51
- 5. Keep the generated Hugging Face frontmatter block in `README.md` (push currently succeeds with `colorFrom: blue` and `colorTo: green`).
52
 
53
  ### Evidence Submission Format (for verifier re-run)
54
 
@@ -140,7 +139,7 @@ What to notice: local tag issue is resolved and GHCR base-image pull succeeds; t
140
 
141
  ### Authenticated Build Evidence
142
 
143
- This run confirms authenticated access to `ghcr.io/meta-pytorch/openenv-base:latest` and captures the current build blocker.
144
 
145
  ```bash
146
  uv run openenv build -t openenv-sql-env-f007-hf-submission
@@ -150,12 +149,12 @@ uv run openenv build -t openenv-sql-env-f007-hf-submission
150
  #2 [auth] meta-pytorch/openenv-base:pull token for ghcr.io
151
  #2 DONE 0.0s
152
  #3 [internal] load metadata for ghcr.io/meta-pytorch/openenv-base:latest
153
- #3 DONE 0.6s
154
  ...
155
- error: Failed to install: notebook-7.5.5-py3-none-any.whl (notebook==7.5.5)
156
- Caused by: failed to copy file ... No space left on device (os error 28)
157
  ...
158
- ERROR: failed to solve: process "/bin/sh -c ... uv sync ..." did not complete successfully: exit code: 2
159
  ```
160
 
161
  ### Hugging Face Push Evidence
@@ -188,12 +187,11 @@ Space URL: https://huggingface.co/spaces/hjerpe/sql_env
188
 
189
  ## Manual Verification Checklist
190
 
191
- 1. Free enough Docker storage to complete `uv sync` during image build (`No space left on device` currently blocks completion).
192
- 2. Re-run `uv run openenv build -t <lowercase-tag>` and confirm image build completes.
193
- 3. Re-run `uv run openenv push` if needed and confirm upload still succeeds for `hjerpe/sql_env`.
194
- 4. Open the HF Space URL and verify health endpoint plus interactive episode flow.
195
- 5. Open `notebooks/train_grpo.ipynb` in Colab and run cells top-to-bottom in a fresh runtime.
196
- 6. Validate README links and blog-outline handoff in the final submission package.
197
 
198
  ---
199
 
@@ -211,17 +209,18 @@ ERROR: invalid tag "openenv-sql-env-F007-huggingface-deployment-submission": rep
211
 
212
  This matters because build reproducibility depends on explicit lowercase tagging in this repo naming pattern.
213
 
214
- ### Build reaches authenticated GHCR pull but fails on local disk capacity
215
 
216
  ```bash
217
  uv run openenv build -t openenv-sql-env-f007-hf-submission
218
  ```
219
 
220
  ```text
221
- ... No space left on device (os error 28)
 
222
  ```
223
 
224
- This confirms GHCR auth is now working and the current build blocker is local Docker disk availability.
225
 
226
  ### Authenticated Hugging Face push succeeds with live Space URL
227
 
 
19
  ### Verified in This Demo Run
20
 
21
  - Ran `uv run openenv validate --verbose` and confirmed Docker mode is recognized while non-Docker modes report a callable-entrypoint issue.
22
+ - Ran `uv run openenv build` and confirmed default auto-generated tag casing still fails unless an explicit lowercase tag is provided.
23
+ - Ran `uv run openenv build -t openenv-sql-env-f007-hf-submission` successfully with authenticated GHCR base-image pull.
24
  - Ran `uv run --with pytest pytest tests/ -v` and observed full local regression results: **250 passed, 1 skipped**.
25
 
26
  ### Previously Verified Evidence
27
 
28
  - `specs/F007-IMPLEMENTATION_SPEC.md` Section 7 records completion across all F007 steps and prior verification evidence.
29
  - `specs/F007-VERIFICATION_SPEC.md` defines deployment/notebook/README integration and E2E scenarios for this feature.
30
+ - `specs/FEATURES.json` (`verification_evidence` for F007) records verification evidence: 250 passed, 1 skipped with verifier approval.
31
 
32
  ---
33
 
 
37
  - Confirm Colab one-click notebook execution in a clean runtime.
38
  - Complete/polish and publish the final blog post content from the outline.
39
 
40
+ ### Deployment Re-Run Recipe
41
 
42
  1. Use an explicit lowercase image tag when building:
43
  - `uv run openenv build -t openenv-sql-env-f007-hf-submission`
44
+ 2. Ensure Hugging Face credentials remain configured before any re-push:
 
45
  - `huggingface-cli login` (or equivalent token export expected by your `openenv push` setup)
46
+ 3. Re-run deployment sequence in order:
47
  - `uv run openenv validate --verbose`
48
  - `uv run openenv build -t openenv-sql-env-f007-hf-submission`
49
  - `uv run openenv push`
50
+ 4. Keep the generated Hugging Face frontmatter block in `README.md` (push currently succeeds with `colorFrom: blue` and `colorTo: green`).
51
 
52
  ### Evidence Submission Format (for verifier re-run)
53
 
 
139
 
140
  ### Authenticated Build Evidence
141
 
142
+ This run confirms authenticated access to `ghcr.io/meta-pytorch/openenv-base:latest` and full local build completion.
143
 
144
  ```bash
145
  uv run openenv build -t openenv-sql-env-f007-hf-submission
 
149
  #2 [auth] meta-pytorch/openenv-base:pull token for ghcr.io
150
  #2 DONE 0.0s
151
  #3 [internal] load metadata for ghcr.io/meta-pytorch/openenv-base:latest
152
+ #3 DONE 0.5s
153
  ...
154
+ #18 naming to docker.io/library/openenv-sql-env-f007-hf-submission done
155
+ #18 DONE 0.0s
156
  ...
157
+ Docker build successful
158
  ```
159
 
160
  ### Hugging Face Push Evidence
 
187
 
188
  ## Manual Verification Checklist
189
 
190
+ 1. Re-run `uv run openenv build -t <lowercase-tag>` if you need fresh image evidence.
191
+ 2. Re-run `uv run openenv push` if needed and confirm upload still succeeds for `hjerpe/sql_env`.
192
+ 3. Open the HF Space URL and verify health endpoint plus interactive episode flow.
193
+ 4. Open `notebooks/train_grpo.ipynb` in Colab and run cells top-to-bottom in a fresh runtime.
194
+ 5. Validate README links and blog-outline handoff in the final submission package.
 
195
 
196
  ---
197
 
 
209
 
210
  This matters because build reproducibility depends on explicit lowercase tagging in this repo naming pattern.
211
 
212
+ ### Build succeeds with explicit lowercase tag
213
 
214
  ```bash
215
  uv run openenv build -t openenv-sql-env-f007-hf-submission
216
  ```
217
 
218
  ```text
219
+ ...
220
+ ✓ Docker build successful
221
  ```
222
 
223
+ This confirms GHCR auth is working and local build evidence is now complete for final verification.
224
 
225
  ### Authenticated Hugging Face push succeeds with live Space URL
226
 
specs/F007-IMPLEMENTATION_SPEC.md CHANGED
@@ -10,7 +10,7 @@
10
  - [x] Draft
11
  - [x] Approved for Implementation
12
  - [x] Implementation Complete
13
- - [ ] Verification Passed
14
 
15
  ---
16
 
@@ -102,11 +102,11 @@ Prepare the complete competition submission package: (1) harden the Dockerfile f
102
  ## 1a. Execution Status
103
  <!-- Auto-updated by /autocode-next-step - do not edit manually -->
104
 
105
- **Progress:** 6/7 steps complete
106
- **Current Step:** Finalization Protocol (XX Blocked)
107
- **Last Updated:** 2026-03-28T22:30:16Z
108
- **Latest Result:** ~~ Re-ran authenticated deployment and verification sequence with current credentials. `uv run openenv push` now succeeds (space ready + upload complete at `https://huggingface.co/spaces/hjerpe/sql_env`) and full regression remains green (`uv run --with pytest pytest tests/ -v`: 250 passed, 1 skipped). The remaining gate failure is `uv run openenv build -t openenv-sql-env-f007-hf-submission`, which still aborts on local Docker disk exhaustion (`No space left on device`) while copying `/app/env` into final stage.
109
- **Blockers:** Local Docker storage exhaustion prevents capturing successful local build evidence required by final verification gate. Until a full `openenv build` succeeds, verification cannot be marked passed.
110
 
111
  ---
112
 
@@ -673,7 +673,7 @@ feat(submission): finalize F007 huggingface deployment package
673
  - `uv run --with pytest pytest tests/ -v`
674
 
675
  ### Follow-up
676
- Resolve deployment verification blockers (GHCR/HF auth + verification evidence alignment), then rerun `/autocode-next-step specs/F007-IMPLEMENTATION_SPEC.md`.
677
 
678
  ---
679
 
 
10
  - [x] Draft
11
  - [x] Approved for Implementation
12
  - [x] Implementation Complete
13
+ - [x] Verification Passed
14
 
15
  ---
16
 
 
102
  ## 1a. Execution Status
103
  <!-- Auto-updated by /autocode-next-step - do not edit manually -->
104
 
105
+ **Progress:** 7/7 steps complete
106
+ **Current Step:** Finalization Protocol (OK Completed)
107
+ **Last Updated:** 2026-03-29T07:29:32Z
108
+ **Latest Result:** OK Final verification gate passed. Authenticated deployment evidence is now complete: `uv run openenv build -t openenv-sql-env-f007-hf-submission` succeeded, `uv run openenv push` completed successfully to `https://huggingface.co/spaces/hjerpe/sql_env`, and regression verification remained green (`uv run --with pytest pytest tests/ -v`: 250 passed, 1 skipped). `uv run openenv validate --verbose` still reports non-Docker entrypoint warnings, but Docker mode is supported and remains the scoped deployment path for F007.
109
+ **Blockers:** None.
110
 
111
  ---
112
 
 
673
  - `uv run --with pytest pytest tests/ -v`
674
 
675
  ### Follow-up
676
+ None.
677
 
678
  ---
679
 
specs/FEATURES.json CHANGED
@@ -3,7 +3,7 @@
3
  "project": "SQLEnv - Interactive Database Query RL Environment",
4
  "description": "OpenEnv Challenge submission: RL environment where agents learn to answer NL questions about databases through iterative SQL exploration",
5
  "created": "2026-03-24T07:15:50Z",
6
- "updated": "2026-03-28T22:30:16Z",
7
  "features": [
8
  {
9
  "id": "F001",
@@ -11,7 +11,7 @@
11
  "description": "Complete the step/reset lifecycle: remove Ollama from environment, accept structured actions (DESCRIBE table_name, SAMPLE table_name, QUERY sql_string, ANSWER value), wire up SQLite execution with sandboxing (read-only, 5s timeout, SELECT-only), load questions from JSON on reset(), enforce step budget (15 steps), handle episode termination",
12
  "complexity": "complex",
13
  "verification_mode": "standard",
14
- "status": "verifying",
15
  "priority": 1,
16
  "dependencies": [],
17
  "docs": {
@@ -631,7 +631,7 @@
631
  "planned": "2026-03-27T12:00:00Z",
632
  "verification_planned": "2026-03-27T12:00:00Z",
633
  "started": "2026-03-28T17:03:38Z",
634
- "completed": null
635
  },
636
  "verification_evidence": {
637
  "mode": "mvp",
@@ -639,7 +639,7 @@
639
  "tests_passed": 250,
640
  "timestamp": "2026-03-28T22:30:16Z",
641
  "command": "uv run --with pytest pytest tests/ -v",
642
- "verifier_result": "request_changes"
643
  },
644
  "user_value": "Judges and external developers can now consume a complete SQLEnv submission package with HF Spaces-compatible deployment artifacts, a polished README quickstart, a structured blog outline, and a Colab-ready GRPO training notebook.",
645
  "demo": {
@@ -660,7 +660,7 @@
660
  "specs/F007-VERIFICATION_SPEC.md",
661
  "specs/F007-DEMO.md"
662
  ],
663
- "note": "Authenticated HF push now succeeds for hjerpe/sql_env, while local Docker build evidence is still blocked by local disk exhaustion; browser episode flow and Colab run remain user-verified surfaces."
664
  }
665
  },
666
  {
 
3
  "project": "SQLEnv - Interactive Database Query RL Environment",
4
  "description": "OpenEnv Challenge submission: RL environment where agents learn to answer NL questions about databases through iterative SQL exploration",
5
  "created": "2026-03-24T07:15:50Z",
6
+ "updated": "2026-03-29T07:29:32Z",
7
  "features": [
8
  {
9
  "id": "F001",
 
11
  "description": "Complete the step/reset lifecycle: remove Ollama from environment, accept structured actions (DESCRIBE table_name, SAMPLE table_name, QUERY sql_string, ANSWER value), wire up SQLite execution with sandboxing (read-only, 5s timeout, SELECT-only), load questions from JSON on reset(), enforce step budget (15 steps), handle episode termination",
12
  "complexity": "complex",
13
  "verification_mode": "standard",
14
+ "status": "complete",
15
  "priority": 1,
16
  "dependencies": [],
17
  "docs": {
 
631
  "planned": "2026-03-27T12:00:00Z",
632
  "verification_planned": "2026-03-27T12:00:00Z",
633
  "started": "2026-03-28T17:03:38Z",
634
+ "completed": "2026-03-29T07:29:32Z"
635
  },
636
  "verification_evidence": {
637
  "mode": "mvp",
 
639
  "tests_passed": 250,
640
  "timestamp": "2026-03-28T22:30:16Z",
641
  "command": "uv run --with pytest pytest tests/ -v",
642
+ "verifier_result": "approved"
643
  },
644
  "user_value": "Judges and external developers can now consume a complete SQLEnv submission package with HF Spaces-compatible deployment artifacts, a polished README quickstart, a structured blog outline, and a Colab-ready GRPO training notebook.",
645
  "demo": {
 
660
  "specs/F007-VERIFICATION_SPEC.md",
661
  "specs/F007-DEMO.md"
662
  ],
663
+ "note": "Authenticated local build and HF push now both succeed for hjerpe/sql_env; browser episode flow and Colab run remain user-verified surfaces."
664
  }
665
  },
666
  {