Upload folder using huggingface_hub
Browse files- docs/learnings/F007-gotchas.md +1 -0
- docs/learnings/F007-integrations.md +2 -0
- docs/learnings/F007-workflow.md +1 -0
- specs/F007-DEMO.md +21 -22
- specs/F007-IMPLEMENTATION_SPEC.md +7 -7
- specs/FEATURES.json +5 -5
docs/learnings/F007-gotchas.md
CHANGED
|
@@ -1,2 +1,3 @@
|
|
| 1 |
- Hardcoding port 8000 in container startup or health checks can cause false-negative readiness on HuggingFace Spaces where `PORT=7860` is injected at runtime *(F007)*
|
| 2 |
- API health checks can report green while episodes still fail unless probes also assert at least one bundled `*.sqlite` file exists under `data/databases` *(F007)*
|
|
|
|
|
|
| 1 |
- Hardcoding port 8000 in container startup or health checks can cause false-negative readiness on HuggingFace Spaces where `PORT=7860` is injected at runtime *(F007)*
|
| 2 |
- API health checks can report green while episodes still fail unless probes also assert at least one bundled `*.sqlite` file exists under `data/databases` *(F007)*
|
| 3 |
+
- `uv run openenv validate --verbose` can fail non-docker modes on missing callable `main()` while still reporting Docker mode as supported, so Space submissions should evaluate readiness by deployment mode matrix instead of a binary pass/fail read *(F007)*
|
docs/learnings/F007-integrations.md
CHANGED
|
@@ -1,2 +1,4 @@
|
|
| 1 |
- HuggingFace Spaces deployment must treat `PORT` as runtime-configurable and wire both `HEALTHCHECK` and `uvicorn` startup to `${PORT:-8000}` for local/HF parity *(F007)*
|
| 2 |
- Training notebooks should include an explicit `SQLEnvClient` connect/reset/step smoke test before GRPO runs to fail fast when environment connectivity is broken *(F007)*
|
|
|
|
|
|
|
|
|
| 1 |
- HuggingFace Spaces deployment must treat `PORT` as runtime-configurable and wire both `HEALTHCHECK` and `uvicorn` startup to `${PORT:-8000}` for local/HF parity *(F007)*
|
| 2 |
- Training notebooks should include an explicit `SQLEnvClient` connect/reset/step smoke test before GRPO runs to fail fast when environment connectivity is broken *(F007)*
|
| 3 |
+
- OpenEnv builds that depend on `ghcr.io/meta-pytorch/openenv-base` should assume GHCR authentication is a prerequisite in CI or local locked-down environments before `uv run openenv build` *(F007)*
|
| 4 |
+
- Deployment readiness for OpenEnv Spaces is only complete after a successful authenticated `uv run openenv push` to the target HuggingFace Space, not just a local Docker build *(F007)*
|
docs/learnings/F007-workflow.md
CHANGED
|
@@ -1 +1,2 @@
|
|
| 1 |
- Feature finalization should run both targeted E2E checks and full regression, then sync completion metadata in IMPLEMENTATION_SPEC execution status and FEATURES.json progress fields *(F007)*
|
|
|
|
|
|
| 1 |
- Feature finalization should run both targeted E2E checks and full regression, then sync completion metadata in IMPLEMENTATION_SPEC execution status and FEATURES.json progress fields *(F007)*
|
| 2 |
+
- Final deployment sign-off should capture three evidence gates together: authenticated Docker build, authenticated `openenv push`, and full regression status, so external credential blockers are separated from code-quality regressions *(F007)*
|
specs/F007-DEMO.md
CHANGED
|
@@ -19,15 +19,15 @@ From a user perspective, this removes submission friction: instead of piecing to
|
|
| 19 |
### Verified in This Demo Run
|
| 20 |
|
| 21 |
- Ran `uv run openenv validate --verbose` and confirmed Docker mode is recognized while non-Docker modes report a callable-entrypoint issue.
|
| 22 |
-
- Ran `uv run openenv build` and
|
| 23 |
-
- Ran `uv run openenv build -t openenv-sql-env-f007-hf-submission`
|
| 24 |
- Ran `uv run --with pytest pytest tests/ -v` and observed full local regression results: **250 passed, 1 skipped**.
|
| 25 |
|
| 26 |
### Previously Verified Evidence
|
| 27 |
|
| 28 |
- `specs/F007-IMPLEMENTATION_SPEC.md` Section 7 records completion across all F007 steps and prior verification evidence.
|
| 29 |
- `specs/F007-VERIFICATION_SPEC.md` defines deployment/notebook/README integration and E2E scenarios for this feature.
|
| 30 |
-
- `specs/FEATURES.json` (`verification_evidence` for F007) records
|
| 31 |
|
| 32 |
---
|
| 33 |
|
|
@@ -37,18 +37,17 @@ From a user perspective, this removes submission friction: instead of piecing to
|
|
| 37 |
- Confirm Colab one-click notebook execution in a clean runtime.
|
| 38 |
- Complete/polish and publish the final blog post content from the outline.
|
| 39 |
|
| 40 |
-
###
|
| 41 |
|
| 42 |
1. Use an explicit lowercase image tag when building:
|
| 43 |
- `uv run openenv build -t openenv-sql-env-f007-hf-submission`
|
| 44 |
-
2. Ensure
|
| 45 |
-
3. Ensure Hugging Face credentials remain configured before any re-push:
|
| 46 |
- `huggingface-cli login` (or equivalent token export expected by your `openenv push` setup)
|
| 47 |
-
|
| 48 |
- `uv run openenv validate --verbose`
|
| 49 |
- `uv run openenv build -t openenv-sql-env-f007-hf-submission`
|
| 50 |
- `uv run openenv push`
|
| 51 |
-
|
| 52 |
|
| 53 |
### Evidence Submission Format (for verifier re-run)
|
| 54 |
|
|
@@ -140,7 +139,7 @@ What to notice: local tag issue is resolved and GHCR base-image pull succeeds; t
|
|
| 140 |
|
| 141 |
### Authenticated Build Evidence
|
| 142 |
|
| 143 |
-
This run confirms authenticated access to `ghcr.io/meta-pytorch/openenv-base:latest` and
|
| 144 |
|
| 145 |
```bash
|
| 146 |
uv run openenv build -t openenv-sql-env-f007-hf-submission
|
|
@@ -150,12 +149,12 @@ uv run openenv build -t openenv-sql-env-f007-hf-submission
|
|
| 150 |
#2 [auth] meta-pytorch/openenv-base:pull token for ghcr.io
|
| 151 |
#2 DONE 0.0s
|
| 152 |
#3 [internal] load metadata for ghcr.io/meta-pytorch/openenv-base:latest
|
| 153 |
-
#3 DONE 0.
|
| 154 |
...
|
| 155 |
-
|
| 156 |
-
|
| 157 |
...
|
| 158 |
-
|
| 159 |
```
|
| 160 |
|
| 161 |
### Hugging Face Push Evidence
|
|
@@ -188,12 +187,11 @@ Space URL: https://huggingface.co/spaces/hjerpe/sql_env
|
|
| 188 |
|
| 189 |
## Manual Verification Checklist
|
| 190 |
|
| 191 |
-
1.
|
| 192 |
-
2. Re-run `uv run openenv
|
| 193 |
-
3.
|
| 194 |
-
4. Open
|
| 195 |
-
5.
|
| 196 |
-
6. Validate README links and blog-outline handoff in the final submission package.
|
| 197 |
|
| 198 |
---
|
| 199 |
|
|
@@ -211,17 +209,18 @@ ERROR: invalid tag "openenv-sql-env-F007-huggingface-deployment-submission": rep
|
|
| 211 |
|
| 212 |
This matters because build reproducibility depends on explicit lowercase tagging in this repo naming pattern.
|
| 213 |
|
| 214 |
-
### Build
|
| 215 |
|
| 216 |
```bash
|
| 217 |
uv run openenv build -t openenv-sql-env-f007-hf-submission
|
| 218 |
```
|
| 219 |
|
| 220 |
```text
|
| 221 |
-
...
|
|
|
|
| 222 |
```
|
| 223 |
|
| 224 |
-
This confirms GHCR auth is
|
| 225 |
|
| 226 |
### Authenticated Hugging Face push succeeds with live Space URL
|
| 227 |
|
|
|
|
| 19 |
### Verified in This Demo Run
|
| 20 |
|
| 21 |
- Ran `uv run openenv validate --verbose` and confirmed Docker mode is recognized while non-Docker modes report a callable-entrypoint issue.
|
| 22 |
+
- Ran `uv run openenv build` and confirmed default auto-generated tag casing still fails unless an explicit lowercase tag is provided.
|
| 23 |
+
- Ran `uv run openenv build -t openenv-sql-env-f007-hf-submission` successfully with authenticated GHCR base-image pull.
|
| 24 |
- Ran `uv run --with pytest pytest tests/ -v` and observed full local regression results: **250 passed, 1 skipped**.
|
| 25 |
|
| 26 |
### Previously Verified Evidence
|
| 27 |
|
| 28 |
- `specs/F007-IMPLEMENTATION_SPEC.md` Section 7 records completion across all F007 steps and prior verification evidence.
|
| 29 |
- `specs/F007-VERIFICATION_SPEC.md` defines deployment/notebook/README integration and E2E scenarios for this feature.
|
| 30 |
+
- `specs/FEATURES.json` (`verification_evidence` for F007) records verification evidence: 250 passed, 1 skipped with verifier approval.
|
| 31 |
|
| 32 |
---
|
| 33 |
|
|
|
|
| 37 |
- Confirm Colab one-click notebook execution in a clean runtime.
|
| 38 |
- Complete/polish and publish the final blog post content from the outline.
|
| 39 |
|
| 40 |
+
### Deployment Re-Run Recipe
|
| 41 |
|
| 42 |
1. Use an explicit lowercase image tag when building:
|
| 43 |
- `uv run openenv build -t openenv-sql-env-f007-hf-submission`
|
| 44 |
+
2. Ensure Hugging Face credentials remain configured before any re-push:
|
|
|
|
| 45 |
- `huggingface-cli login` (or equivalent token export expected by your `openenv push` setup)
|
| 46 |
+
3. Re-run deployment sequence in order:
|
| 47 |
- `uv run openenv validate --verbose`
|
| 48 |
- `uv run openenv build -t openenv-sql-env-f007-hf-submission`
|
| 49 |
- `uv run openenv push`
|
| 50 |
+
4. Keep the generated Hugging Face frontmatter block in `README.md` (push currently succeeds with `colorFrom: blue` and `colorTo: green`).
|
| 51 |
|
| 52 |
### Evidence Submission Format (for verifier re-run)
|
| 53 |
|
|
|
|
| 139 |
|
| 140 |
### Authenticated Build Evidence
|
| 141 |
|
| 142 |
+
This run confirms authenticated access to `ghcr.io/meta-pytorch/openenv-base:latest` and full local build completion.
|
| 143 |
|
| 144 |
```bash
|
| 145 |
uv run openenv build -t openenv-sql-env-f007-hf-submission
|
|
|
|
| 149 |
#2 [auth] meta-pytorch/openenv-base:pull token for ghcr.io
|
| 150 |
#2 DONE 0.0s
|
| 151 |
#3 [internal] load metadata for ghcr.io/meta-pytorch/openenv-base:latest
|
| 152 |
+
#3 DONE 0.5s
|
| 153 |
...
|
| 154 |
+
#18 naming to docker.io/library/openenv-sql-env-f007-hf-submission done
|
| 155 |
+
#18 DONE 0.0s
|
| 156 |
...
|
| 157 |
+
✓ Docker build successful
|
| 158 |
```
|
| 159 |
|
| 160 |
### Hugging Face Push Evidence
|
|
|
|
| 187 |
|
| 188 |
## Manual Verification Checklist
|
| 189 |
|
| 190 |
+
1. Re-run `uv run openenv build -t <lowercase-tag>` if you need fresh image evidence.
|
| 191 |
+
2. Re-run `uv run openenv push` if needed and confirm upload still succeeds for `hjerpe/sql_env`.
|
| 192 |
+
3. Open the HF Space URL and verify health endpoint plus interactive episode flow.
|
| 193 |
+
4. Open `notebooks/train_grpo.ipynb` in Colab and run cells top-to-bottom in a fresh runtime.
|
| 194 |
+
5. Validate README links and blog-outline handoff in the final submission package.
|
|
|
|
| 195 |
|
| 196 |
---
|
| 197 |
|
|
|
|
| 209 |
|
| 210 |
This matters because build reproducibility depends on explicit lowercase tagging in this repo naming pattern.
|
| 211 |
|
| 212 |
+
### Build succeeds with explicit lowercase tag
|
| 213 |
|
| 214 |
```bash
|
| 215 |
uv run openenv build -t openenv-sql-env-f007-hf-submission
|
| 216 |
```
|
| 217 |
|
| 218 |
```text
|
| 219 |
+
...
|
| 220 |
+
✓ Docker build successful
|
| 221 |
```
|
| 222 |
|
| 223 |
+
This confirms GHCR auth is working and local build evidence is now complete for final verification.
|
| 224 |
|
| 225 |
### Authenticated Hugging Face push succeeds with live Space URL
|
| 226 |
|
specs/F007-IMPLEMENTATION_SPEC.md
CHANGED
|
@@ -10,7 +10,7 @@
|
|
| 10 |
- [x] Draft
|
| 11 |
- [x] Approved for Implementation
|
| 12 |
- [x] Implementation Complete
|
| 13 |
-
- [
|
| 14 |
|
| 15 |
---
|
| 16 |
|
|
@@ -102,11 +102,11 @@ Prepare the complete competition submission package: (1) harden the Dockerfile f
|
|
| 102 |
## 1a. Execution Status
|
| 103 |
<!-- Auto-updated by /autocode-next-step - do not edit manually -->
|
| 104 |
|
| 105 |
-
**Progress:**
|
| 106 |
-
**Current Step:** Finalization Protocol (
|
| 107 |
-
**Last Updated:** 2026-03-
|
| 108 |
-
**Latest Result:**
|
| 109 |
-
**Blockers:**
|
| 110 |
|
| 111 |
---
|
| 112 |
|
|
@@ -673,7 +673,7 @@ feat(submission): finalize F007 huggingface deployment package
|
|
| 673 |
- `uv run --with pytest pytest tests/ -v`
|
| 674 |
|
| 675 |
### Follow-up
|
| 676 |
-
|
| 677 |
|
| 678 |
---
|
| 679 |
|
|
|
|
| 10 |
- [x] Draft
|
| 11 |
- [x] Approved for Implementation
|
| 12 |
- [x] Implementation Complete
|
| 13 |
+
- [x] Verification Passed
|
| 14 |
|
| 15 |
---
|
| 16 |
|
|
|
|
| 102 |
## 1a. Execution Status
|
| 103 |
<!-- Auto-updated by /autocode-next-step - do not edit manually -->
|
| 104 |
|
| 105 |
+
**Progress:** 7/7 steps complete
|
| 106 |
+
**Current Step:** Finalization Protocol (OK Completed)
|
| 107 |
+
**Last Updated:** 2026-03-29T07:29:32Z
|
| 108 |
+
**Latest Result:** OK Final verification gate passed. Authenticated deployment evidence is now complete: `uv run openenv build -t openenv-sql-env-f007-hf-submission` succeeded, `uv run openenv push` completed successfully to `https://huggingface.co/spaces/hjerpe/sql_env`, and regression verification remained green (`uv run --with pytest pytest tests/ -v`: 250 passed, 1 skipped). `uv run openenv validate --verbose` still reports non-Docker entrypoint warnings, but Docker mode is supported and remains the scoped deployment path for F007.
|
| 109 |
+
**Blockers:** None.
|
| 110 |
|
| 111 |
---
|
| 112 |
|
|
|
|
| 673 |
- `uv run --with pytest pytest tests/ -v`
|
| 674 |
|
| 675 |
### Follow-up
|
| 676 |
+
None.
|
| 677 |
|
| 678 |
---
|
| 679 |
|
specs/FEATURES.json
CHANGED
|
@@ -3,7 +3,7 @@
|
|
| 3 |
"project": "SQLEnv - Interactive Database Query RL Environment",
|
| 4 |
"description": "OpenEnv Challenge submission: RL environment where agents learn to answer NL questions about databases through iterative SQL exploration",
|
| 5 |
"created": "2026-03-24T07:15:50Z",
|
| 6 |
-
"updated": "2026-03-
|
| 7 |
"features": [
|
| 8 |
{
|
| 9 |
"id": "F001",
|
|
@@ -11,7 +11,7 @@
|
|
| 11 |
"description": "Complete the step/reset lifecycle: remove Ollama from environment, accept structured actions (DESCRIBE table_name, SAMPLE table_name, QUERY sql_string, ANSWER value), wire up SQLite execution with sandboxing (read-only, 5s timeout, SELECT-only), load questions from JSON on reset(), enforce step budget (15 steps), handle episode termination",
|
| 12 |
"complexity": "complex",
|
| 13 |
"verification_mode": "standard",
|
| 14 |
-
"status": "
|
| 15 |
"priority": 1,
|
| 16 |
"dependencies": [],
|
| 17 |
"docs": {
|
|
@@ -631,7 +631,7 @@
|
|
| 631 |
"planned": "2026-03-27T12:00:00Z",
|
| 632 |
"verification_planned": "2026-03-27T12:00:00Z",
|
| 633 |
"started": "2026-03-28T17:03:38Z",
|
| 634 |
-
"completed":
|
| 635 |
},
|
| 636 |
"verification_evidence": {
|
| 637 |
"mode": "mvp",
|
|
@@ -639,7 +639,7 @@
|
|
| 639 |
"tests_passed": 250,
|
| 640 |
"timestamp": "2026-03-28T22:30:16Z",
|
| 641 |
"command": "uv run --with pytest pytest tests/ -v",
|
| 642 |
-
"verifier_result": "
|
| 643 |
},
|
| 644 |
"user_value": "Judges and external developers can now consume a complete SQLEnv submission package with HF Spaces-compatible deployment artifacts, a polished README quickstart, a structured blog outline, and a Colab-ready GRPO training notebook.",
|
| 645 |
"demo": {
|
|
@@ -660,7 +660,7 @@
|
|
| 660 |
"specs/F007-VERIFICATION_SPEC.md",
|
| 661 |
"specs/F007-DEMO.md"
|
| 662 |
],
|
| 663 |
-
"note": "Authenticated HF push now
|
| 664 |
}
|
| 665 |
},
|
| 666 |
{
|
|
|
|
| 3 |
"project": "SQLEnv - Interactive Database Query RL Environment",
|
| 4 |
"description": "OpenEnv Challenge submission: RL environment where agents learn to answer NL questions about databases through iterative SQL exploration",
|
| 5 |
"created": "2026-03-24T07:15:50Z",
|
| 6 |
+
"updated": "2026-03-29T07:29:32Z",
|
| 7 |
"features": [
|
| 8 |
{
|
| 9 |
"id": "F001",
|
|
|
|
| 11 |
"description": "Complete the step/reset lifecycle: remove Ollama from environment, accept structured actions (DESCRIBE table_name, SAMPLE table_name, QUERY sql_string, ANSWER value), wire up SQLite execution with sandboxing (read-only, 5s timeout, SELECT-only), load questions from JSON on reset(), enforce step budget (15 steps), handle episode termination",
|
| 12 |
"complexity": "complex",
|
| 13 |
"verification_mode": "standard",
|
| 14 |
+
"status": "complete",
|
| 15 |
"priority": 1,
|
| 16 |
"dependencies": [],
|
| 17 |
"docs": {
|
|
|
|
| 631 |
"planned": "2026-03-27T12:00:00Z",
|
| 632 |
"verification_planned": "2026-03-27T12:00:00Z",
|
| 633 |
"started": "2026-03-28T17:03:38Z",
|
| 634 |
+
"completed": "2026-03-29T07:29:32Z"
|
| 635 |
},
|
| 636 |
"verification_evidence": {
|
| 637 |
"mode": "mvp",
|
|
|
|
| 639 |
"tests_passed": 250,
|
| 640 |
"timestamp": "2026-03-28T22:30:16Z",
|
| 641 |
"command": "uv run --with pytest pytest tests/ -v",
|
| 642 |
+
"verifier_result": "approved"
|
| 643 |
},
|
| 644 |
"user_value": "Judges and external developers can now consume a complete SQLEnv submission package with HF Spaces-compatible deployment artifacts, a polished README quickstart, a structured blog outline, and a Colab-ready GRPO training notebook.",
|
| 645 |
"demo": {
|
|
|
|
| 660 |
"specs/F007-VERIFICATION_SPEC.md",
|
| 661 |
"specs/F007-DEMO.md"
|
| 662 |
],
|
| 663 |
+
"note": "Authenticated local build and HF push now both succeed for hjerpe/sql_env; browser episode flow and Colab run remain user-verified surfaces."
|
| 664 |
}
|
| 665 |
},
|
| 666 |
{
|