SUPIR

Running on Zero

App Files Files Community

SUPIR / SKILLS.md

Fabrice-TIERCELIN

New version

d6ae0aa 7 days ago

preview code

raw

history blame contribute delete

11.8 kB

A newer version of the Gradio SDK is available: 6.14.0

Upgrade

Skills — how to make changes in this project

Process rules and habits for AI assistants working on this repo. Companion to CLAUDE.md (which is what & why); this file is how.

Default rule when in doubt: stop and ask the user. The user prefers a question over wrong work.

Investigation before fix

Reproduce the bug visually before patching CSS / UI

When the user reports a layout, color, click, or visibility issue, the first action is Playwright + screenshot, not code. The user has called this out explicitly:

"Make sure to check playwright with screenshot to verify issues before making fix."

Skipping the visual repro twice in a row produced patches that addressed a different symptom than what the user was seeing. Reproduce, then fix, then re-screenshot to verify the fix.

Tools: local dev server (port 7860, see "Running locally" below) + mcp__playwright__browser_* tools. Resize to the affected viewport (typically 380 px / 900 px / 1280 px). browser_evaluate is the most reliable way to inspect DOM state — getBoundingClientRect, getComputedStyle, elementFromPoint.

Pull HF Space logs first when something runs there

For Spaces failures, the run logs are the source of truth. Pull and search:

HF_TOKEN=$(cat ~/.cache/huggingface/token)
curl -s -H "Authorization: Bearer ${HF_TOKEN}" \
  "https://huggingface.co/api/spaces/techfreakworm/LTX2.3-Studio/logs/run" \
  -o /tmp/hf_run.log

# Find last submit and tail from there
python3 << 'PY'
import json
events = []
for line in open('/tmp/hf_run.log'):
    line = line.strip()
    if line.startswith('data: '):
        try: events.append(json.loads(line[6:]))
        except Exception: pass
last = max(i for i, e in enumerate(events) if 'submitting workflow' in e.get('data', ''))
for ev in events[last:]:
    print(ev.get('timestamp', '')[:19], ev.get('data', '').rstrip()[:240])
PY

/logs/build is the other endpoint. Build logs show preload, image-build, pip; run logs show container output.

Stage check before action

HF_TOKEN=$(cat ~/.cache/huggingface/token)
curl -s -H "Authorization: Bearer ${HF_TOKEN}" \
  "https://huggingface.co/api/spaces/techfreakworm/LTX2.3-Studio" | jq -r '.runtime'

Stages: BUILDING (image), APP_STARTING (boot), RUNNING, RUNTIME_ERROR, RUNNING_BUILDING (live serving + new build queued). If RUNTIME_ERROR is non-null, that's your headline.

Sequential thinking for repeated failures

The user has called this out:

"On 2nd failed fix, stop patching; use sequential-thinking MCP + brainstorming skill"

If your first fix didn't land, stop patching. Use mcp__sequential-thinking__sequentialthinking to think through the failure mode end-to-end, plus web search for canonical solutions. Do not loop on speculative one-line patches.

Web-search for HF / Gradio errors with the literal message

HF docs change. The Spaces Configuration Reference and Spaces ZeroGPU pages often have undocumented behavior captured in forum threads. When you hit a Gradio/Spaces error, web-search the literal exception message. Examples that paid off:

gradio.exceptions.InvalidPathError → fix was allowed_paths= (Gradio 5 file-access policy)
'Workload evicted, storage limit exceeded (150G)' → 150 GB ephemeral cap
'No @spaces.GPU function detected during startup' → must be module-level decorator
'GPU task aborted' → @spaces.GPU(duration=...) cap

Verification

Run the full repro in Playwright before declaring done

After a UI fix, re-run the same Playwright sequence that exposed the bug. Take a screenshot. Read the DOM state. Don't trust "it should work now" — show that it does.

Local before push

When iterating on app behavior, the local dev server gives instant feedback. The user explicitly asks for this — they do most testing on the WiFi-accessible local URL. Never push during HF testing windows. When the user is testing on the live Space, hold local commits until they say push.

# In repo root
source .venv/bin/activate
python app.py  # or background it; see "Running locally"

The user has stated:

"DO NOT PUSH since testing is happening on HF"

When in doubt, hold and ask.

Smoke import + build_app after backend/app changes

python -c "import app; b = app.build_app(); print(type(b).__name__)"

Should print Blocks. Catches most syntax / import-cycle issues without spinning up the full server.

Sanity-test isolated functions when changing logic

For workflow walkers, model registry, duration estimators — write a tiny python3 -c '...' or HEREDOC to feed synthetic inputs and verify outputs. Faster than running the full app, catches regressions that the full app would mask.

Running locally

Standard launch (port 7860)

cd /Users/techfreakworm/Projects/llm/ltx2.3-AIO-generator
source .venv/bin/activate
nohup python app.py > /tmp/ltx_studio_run.log 2>&1 &
echo $! > /tmp/ltx_studio.pid

Wait ~18 seconds for ComfyUI to import + Gradio to bind, then check:

lsof -nP -iTCP:7860 -sTCP:LISTEN

LAN-accessible URL

Bound to 0.0.0.0:7860 by default. Get the LAN IP:

ipconfig getifaddr en0 || ipconfig getifaddr en1

Open http://<LAN_IP>:7860 on phone/tablet on the same WiFi. macOS firewall: allow inbound for python if connection refused.

Stop

PID=$(cat /tmp/ltx_studio.pid)
kill -9 $PID
lsof -nP -iTCP:7860 -sTCP:LISTEN | awk 'NR>1 {print $2}' | xargs -r kill -9

Pushing changes

Two remotes

git push origin master                                                 # GitHub
HF_TOKEN=$(cat ~/.cache/huggingface/token)                             # HF auth (cli removed `hf auth token`)
git push "https://techfreakworm:${HF_TOKEN}@huggingface.co/spaces/techfreakworm/LTX2.3-Studio" master:main

GitHub: master. HF Space: main. The Space accepts force-push only with explicit user consent.

When to push

Default: hold all commits locally, ask the user before pushing.
The user usually says "push" or "push them" when ready.
During the user's HF testing windows, NEVER push.
After a successful local Playwright verification of a fix, summarize the queued commits and ask.

Spaces deploy lifecycle

Each push triggers a Docker image rebuild. Most layers are cached unless requirements.txt or README YAML changes. The first push that adds/changes preload_from_hub: triggers a long preload step (download all listed files into ~/.cache/huggingface/hub).

Container start sequence (after image push):

HF brings up the container as user 1000
Our _bootstrap() runs:
- clones ComfyUI + custom nodes (cold-start only — frozen ZeroGPU containers retain them)
- pip installs each custom node's requirements
- _mirror_preload_hf_cache() builds writable cache mirror
- copies seed inputs
- sets HF_HOME / HF_HUB_CACHE env vars
gr.Blocks(...).launch() binds 7860
Stage transitions to RUNNING

ZeroGPU container freeze on idle: keeps ~/comfyui, ~/hf-cache-rw, etc. Wake on next request restores in seconds. Push or rebuild loses everything.

When the user says "deep think"

The user explicitly invokes deeper investigation when stuck:

"Use deep thinking using sequential thinking and web search and code exploration."

Use mcp__sequential-thinking__sequentialthinking to lay out the problem end-to-end. Web-search literal error messages. Read code beyond the immediate failure site. Avoid speculative one-line patches when in this mode.

What never to do

Push without explicit permission during HF test windows.
Add Co-Authored-By or any agent attribution to commit messages.
Hand-edit workflows/*.json — the user re-exports from ComfyUI editor.
chmod the HF preload cache — we don't own it. See cache-mirror approach in CLAUDE.md.
Switch sdk: gradio → sdk: docker in README. Loses ZeroGPU.
Move models into the repo via git LFS without asking. Pro has 1 TB LFS but bandwidth is finite.
Implement out-of-scope v1.1+ features without asking. See "Out of scope" in CLAUDE.md.
Eagerly load models at module import. _bootstrap() only ensures clones + cache mirroring. Model load happens when ComfyUI's executor evaluates a node.

Memory (cross-session)

The user's preferences live at ~/.claude/projects/-Users-techfreakworm-Projects/memory/. Key entries:

Git authorship: sole author, no co-author footers
Verify before fix: Playwright + screenshot first
Don't push during HF testing: hold local commits
Autonomous execution: prefer scripts over notebooks, report results
No conda: python3.11 -m venv, brew for system bins
Tests folder: keep ~/Projects/tests/ separate from ~/Projects/

When the user asks to remember something new, save it as a memory file and update MEMORY.md index.

When stuck for too long

Three escalation steps:

mcp__sequential-thinking__sequentialthinking — think the whole flow through, identify the unknown.
WebSearch + WebFetch — find canonical fix or known issue.
Ask the user — describe what's been tried, what's still unknown, propose options.

Do not loop on patches when you've patched twice and it's still broken.

Repo structure (high level)

.
├── app.py               # Gradio entry, _bootstrap, _on_generate, build_app
├── backend.py           # ComfyUILibraryBackend, _execute_workflow, _GPU
├── modes.py             # MODE_REGISTRY + per-mode parameterize_fn + node-id constants
├── models.py            # MODEL_REGISTRY, walk_workflow_for_models, ensure_models
├── ui.py                # render_status, _render_idle, mode-form layout primitives
├── workflow.py          # load_template, set_input
├── workflows/           # API-format mode JSONs (do not hand-edit)
│   ├── t2v.json
│   ├── i2v.json
│   ├── a2v.json
│   ├── lipsync.json
│   ├── keyframe.json
│   └── style.json
├── assets/seed_inputs/  # placeholder image/audio/video for cold-start (gitignored except this dir)
├── docs/
│   ├── superpowers/specs/    # design specs (per-feature)
│   ├── superpowers/plans/    # implementation plans (per-feature)
│   └── future_improvements.md
├── tools/extract_modes.py    # regenerate workflows/ from master
├── tests/
├── README.md            # HF Space YAML + project description
├── CLAUDE.md            # what & why (this project's facts)
├── SKILLS.md            # how (this file)
├── requirements.txt
└── comfyui/             # git submodule (local) / runtime clone target (Spaces)

Useful one-liners

# What's the Space's current SHA vs local HEAD
hf_sha=$(curl -s -H "Authorization: Bearer $(cat ~/.cache/huggingface/token)" \
  "https://huggingface.co/api/spaces/techfreakworm/LTX2.3-Studio" \
  | jq -r '.sha')
echo "HF: ${hf_sha:0:8}  local: $(git rev-parse HEAD | cut -c1-8)"

# Local commits ahead of origin
git log origin/master..HEAD --oneline

# All class_types referenced by workflows (cross-check against custom_nodes)
python3 -c "import json, glob, sys
seen = set()
for p in glob.glob('workflows/*.json'):
    seen |= {n.get('class_type','') for n in json.load(open(p)).values()}
for c in sorted(seen): print(c)"

# Models referenced by workflows but not in registry
python3 -c "import json, glob, models
needed = set()
for p in glob.glob('workflows/*.json'):
    needed |= models.walk_workflow_for_models(json.load(open(p)))
unmapped = needed - set(models.MODEL_REGISTRY)
print('unmapped:', sorted(unmapped) or 'none')"