Ace-Step-v1.5

Sleeping

App Files Files Community

ChuxiJ commited on Dec 24, 2025

Commit

0228d48

1 Parent(s): 4670365

add api

Browse files

Files changed (9) hide show

.gitignore +3 -1
API.md +192 -0
acestep/__init__.py +1 -0
acestep/api_server.py +599 -0
acestep/third_parts/nano-vllm/nanovllm/engine/model_runner.py +28 -6
close_api_server.sh +140 -0
pyproject.toml +4 -1
requirements.txt +3 -1
run_api_server.sh +27 -0

.gitignore CHANGED Viewed

@@ -215,4 +215,6 @@ playground.ipynb
 .history/
 upload_checkpoints.sh
 checkpoints.7z
-README_old.md

 .history/
 upload_checkpoints.sh
 checkpoints.7z
+README_old.md
+discord_bot/
+feishu_bot/

API.md ADDED Viewed

	@@ -0,0 +1,192 @@

+# ACE-Step API Client Documentation
+This service provides an HTTP-based asynchronous music generation API.
+**Basic Workflow**:
+1. Call `POST /v1/music/generate` to submit a task and obtain a `job_id`.
+2. Call `GET /v1/jobs/{job_id}` to poll the task status until `status` is `succeeded` or `failed`.
+---
+## 1. Task Status Description
+Task status (`status`) includes the following types:
+- `queued`: Task has entered the queue and is waiting to be executed. You can check `queue_position` and `eta_seconds` at this time.
+- `running`: Generation is in progress.
+- `succeeded`: Generation succeeded, results are in the `result` field.
+- `failed`: Generation failed, error information is in the `error` field.
+---
+## 2. Create Generation Task
+### 2.1 API Definition
+- **URL**: `/v1/music/generate`
+- **Method**: `POST`
+- **Content-Type**: `application/json` or `multipart/form-data`
+### 2.2 Request Parameters
+#### Method A: JSON Request (application/json)
+Suitable for passing only text parameters, or referencing audio file paths that already exist on the server.
+**Basic Parameters**:
+| Parameter Name | Type | Default | Description |
+| :--- | :--- | :--- | :--- |
+| `caption` | string | `""` | Music description prompt |
+| `lyrics` | string | `""` | Lyrics content |
+| `vocal_language` | string | `"en"` | Lyrics language (en, zh, ja, etc.) |
+| `audio_format` | string | `"mp3"` | Output format (mp3, wav, flac) |
+**Music Attribute Parameters**:
+| Parameter Name | Type | Default | Description |
+| :--- | :--- | :--- | :--- |
+| `bpm` | int | null | Specify tempo (BPM) |
+| `key_scale` | string | `""` | Key/scale (e.g., "C Major") |
+| `time_signature` | string | `""` | Time signature (e.g., "4/4") |
+| `audio_duration` | float | null | Generation duration (seconds) |
+**Generation Control Parameters**:
+| Parameter Name | Type | Default | Description |
+| :--- | :--- | :--- | :--- |
+| `inference_steps` | int | `8` | Number of inference steps |
+| `guidance_scale` | float | `7.0` | Prompt guidance coefficient |
+| `use_random_seed` | bool | `true` | Whether to use random seed |
+| `seed` | int | `-1` | Specify seed (when use_random_seed=false) |
+| `batch_size` | int | null | Batch generation count |
+**Edit/Reference Audio Parameters** (requires absolute path on server):
+| Parameter Name | Type | Default | Description |
+| :--- | :--- | :--- | :--- |
+| `reference_audio_path` | string | null | Reference audio path (Style Transfer) |
+| `src_audio_path` | string | null | Source audio path (Repainting/Cover) |
+| `task_type` | string | `"text2music"` | Task type (text2music, cover, repaint) |
+| `instruction` | string | `"Fill..."` | Edit instruction |
+| `repainting_start` | float | `0.0` | Repainting start time |
+| `repainting_end` | float | null | Repainting end time |
+| `audio_cover_strength` | float | `1.0` | Cover strength |
+#### Method B: File Upload (multipart/form-data)
+Use this when you need to upload local audio files as reference or source audio.
+In addition to supporting all the above fields as Form Fields, the following file fields are also supported:
+- `reference_audio`: (File) Upload reference audio file
+- `src_audio`: (File) Upload source audio file
+> **Note**: After uploading files, the corresponding `_path` parameters will be automatically ignored, and the system will use the temporary file path after upload.
+### 2.3 Response Example
+```json
+{
+  "job_id": "550e8400-e29b-41d4-a716-446655440000",
+  "status": "queued",
+  "queue_position": 1
+}
+```
+### 2.4 Usage Examples (cURL)
+**JSON Method**:
+```bash
+curl -X POST http://localhost:8001/v1/music/generate \
+  -H 'Content-Type: application/json' \
+  -d '{
+    "caption": "upbeat pop song",
+    "lyrics": "Hello world",
+    "inference_steps": 16
+  }'
+```
+> Note: If you use `curl -d` but **forget** to add `-H 'Content-Type: application/json'`, curl will default to sending `application/x-www-form-urlencoded`, and older server versions will return 415.
+**Form Method (no file upload, application/x-www-form-urlencoded)**:
+```bash
+curl -X POST http://localhost:8001/v1/music/generate \
+  -H 'Content-Type: application/x-www-form-urlencoded' \
+  --data-urlencode 'caption=upbeat pop song' \
+  --data-urlencode 'lyrics=Hello world' \
+  --data-urlencode 'inference_steps=16'
+```
+**File Upload Method**:
+```bash
+curl -X POST http://localhost:8001/v1/music/generate \
+  -F "caption=remix this song" \
+  -F "src_audio=@/path/to/local/song.mp3" \
+  -F "task_type=repaint"
+```
+---
+## 3. Query Task Results
+### 3.1 API Definition
+- **URL**: `/v1/jobs/{job_id}`
+- **Method**: `GET`
+### 3.2 Response Parameters
+The response contains basic task information, queue status, and final results.
+**Main Fields**:
+- `status`: Current status
+- `queue_position`: Current queue position (0 means running or completed)
+- `eta_seconds`: Estimated remaining wait time (seconds)
+- `result`: Result object when successful
+  - `audio_paths`: List of generated audio file URLs/paths
+  - `first_audio_path`: Preferred audio path
+  - `generation_info`: Generation parameter details
+  - `status_message`: Brief result description
+- `error`: Error information when failed
+### 3.3 Response Examples
+**Queued**:
+```json
+{
+  "job_id": "...",
+  "status": "queued",
+  "created_at": 1700000000.0,
+  "queue_position": 5,
+  "eta_seconds": 25.0,
+  "result": null,
+  "error": null
+}
+```
+**Execution Successful**:
+```json
+{
+  "job_id": "...",
+  "status": "succeeded",
+  "created_at": 1700000000.0,
+  "finished_at": 1700000010.0,
+  "queue_position": 0,
+  "result": {
+    "first_audio_path": "/tmp/generated_1.mp3",
+    "second_audio_path": "/tmp/generated_2.mp3",
+    "audio_paths": ["/tmp/generated_1.mp3", "/tmp/generated_2.mp3"],
+    "generation_info": "Steps: 8, Scale: 7.0 ...",
+    "status_message": "✅ Generation completed successfully!",
+    "seed_value": "12345"
+  },
+  "error": null
+}
+```

acestep/__init__.py ADDED Viewed

	@@ -0,0 +1 @@


1	+ """ACE-Step package."""

acestep/api_server.py ADDED Viewed

	@@ -0,0 +1,599 @@

+"""FastAPI server for ACE-Step V1.5.
+Endpoints:
+- POST /v1/music/generate  Create an async music generation job (queued)
+    - Supports application/json and multipart/form-data (with file upload)
+- GET  /v1/jobs/{job_id}   Poll job status/result (+ queue position/eta when queued)
+NOTE:
+- In-memory queue and job store -> run uvicorn with workers=1.
+"""
+from __future__ import annotations
+import asyncio
+import json
+import os
+import sys
+import time
+import traceback
+import tempfile
+import urllib.parse
+from collections import deque
+from concurrent.futures import ThreadPoolExecutor
+from contextlib import asynccontextmanager
+from dataclasses import dataclass
+from pathlib import Path
+from threading import Lock
+from typing import Any, Dict, Literal, Optional
+from uuid import uuid4
+from fastapi import FastAPI, HTTPException, Request
+from pydantic import BaseModel, Field
+from starlette.datastructures import UploadFile as StarletteUploadFile
+from .handler import AceStepHandler
+JobStatus = Literal["queued", "running", "succeeded", "failed"]
+class GenerateMusicRequest(BaseModel):
+    caption: str = Field(default="", description="Text caption describing the music")
+    lyrics: str = Field(default="", description="Lyric text")
+    bpm: Optional[int] = None
+    key_scale: str = ""
+    time_signature: str = ""
+    vocal_language: str = "en"
+    inference_steps: int = 8
+    guidance_scale: float = 7.0
+    use_random_seed: bool = True
+    seed: int = -1
+    reference_audio_path: Optional[str] = None
+    src_audio_path: Optional[str] = None
+    audio_duration: Optional[float] = None
+    batch_size: Optional[int] = None
+    audio_code_string: str = ""
+    repainting_start: float = 0.0
+    repainting_end: Optional[float] = None
+    instruction: str = "Fill the audio semantic mask based on the given conditions:"
+    audio_cover_strength: float = 1.0
+    task_type: str = "text2music"
+    use_adg: bool = False
+    cfg_interval_start: float = 0.0
+    cfg_interval_end: float = 1.0
+    audio_format: str = "mp3"
+    use_tiled_decode: bool = True
+class CreateJobResponse(BaseModel):
+    job_id: str
+    status: JobStatus
+    queue_position: int = 0  # 1-based best-effort position when queued
+class JobResult(BaseModel):
+    first_audio_path: Optional[str] = None
+    second_audio_path: Optional[str] = None
+    audio_paths: list[str] = Field(default_factory=list)
+    generation_info: str = ""
+    status_message: str = ""
+    seed_value: str = ""
+class JobResponse(BaseModel):
+    job_id: str
+    status: JobStatus
+    created_at: float
+    started_at: Optional[float] = None
+    finished_at: Optional[float] = None
+    # queue observability
+    queue_position: int = 0
+    eta_seconds: Optional[float] = None
+    avg_job_seconds: Optional[float] = None
+    result: Optional[JobResult] = None
+    error: Optional[str] = None
+@dataclass
+class _JobRecord:
+    job_id: str
+    status: JobStatus
+    created_at: float
+    started_at: Optional[float] = None
+    finished_at: Optional[float] = None
+    result: Optional[Dict[str, Any]] = None
+    error: Optional[str] = None
+class _JobStore:
+    def __init__(self) -> None:
+        self._lock = Lock()
+        self._jobs: Dict[str, _JobRecord] = {}
+    def create(self) -> _JobRecord:
+        job_id = str(uuid4())
+        rec = _JobRecord(job_id=job_id, status="queued", created_at=time.time())
+        with self._lock:
+            self._jobs[job_id] = rec
+        return rec
+    def get(self, job_id: str) -> Optional[_JobRecord]:
+        with self._lock:
+            return self._jobs.get(job_id)
+    def mark_running(self, job_id: str) -> None:
+        with self._lock:
+            rec = self._jobs[job_id]
+            rec.status = "running"
+            rec.started_at = time.time()
+    def mark_succeeded(self, job_id: str, result: Dict[str, Any]) -> None:
+        with self._lock:
+            rec = self._jobs[job_id]
+            rec.status = "succeeded"
+            rec.finished_at = time.time()
+            rec.result = result
+            rec.error = None
+    def mark_failed(self, job_id: str, error: str) -> None:
+        with self._lock:
+            rec = self._jobs[job_id]
+            rec.status = "failed"
+            rec.finished_at = time.time()
+            rec.result = None
+            rec.error = error
+def _env_bool(name: str, default: bool) -> bool:
+    v = os.getenv(name)
+    if v is None:
+        return default
+    return v.strip().lower() in {"1", "true", "yes", "y", "on"}
+def _get_project_root() -> str:
+    current_file = os.path.abspath(__file__)
+    return os.path.dirname(os.path.dirname(current_file))
+def _to_int(v: Any, default: Optional[int] = None) -> Optional[int]:
+    if v is None:
+        return default
+    if isinstance(v, int):
+        return v
+    s = str(v).strip()
+    if s == "":
+        return default
+    return int(s)
+def _to_float(v: Any, default: Optional[float] = None) -> Optional[float]:
+    if v is None:
+        return default
+    if isinstance(v, float):
+        return v
+    s = str(v).strip()
+    if s == "":
+        return default
+    return float(s)
+def _to_bool(v: Any, default: bool = False) -> bool:
+    if v is None:
+        return default
+    if isinstance(v, bool):
+        return v
+    s = str(v).strip().lower()
+    if s == "":
+        return default
+    return s in {"1", "true", "yes", "y", "on"}
+async def _save_upload_to_temp(upload: StarletteUploadFile, *, prefix: str) -> str:
+    suffix = Path(upload.filename or "").suffix
+    fd, path = tempfile.mkstemp(prefix=f"{prefix}_", suffix=suffix)
+    os.close(fd)
+    try:
+        with open(path, "wb") as f:
+            while True:
+                chunk = await upload.read(1024 * 1024)
+                if not chunk:
+                    break
+                f.write(chunk)
+    except Exception:
+        try:
+            os.remove(path)
+        except Exception:
+            pass
+        raise
+    finally:
+        try:
+            await upload.close()
+        except Exception:
+            pass
+    return path
+def create_app() -> FastAPI:
+    store = _JobStore()
+    QUEUE_MAXSIZE = int(os.getenv("ACESTEP_QUEUE_MAXSIZE", "200"))
+    WORKER_COUNT = int(os.getenv("ACESTEP_QUEUE_WORKERS", "1"))  # 单 GPU 建议 1
+    INITIAL_AVG_JOB_SECONDS = float(os.getenv("ACESTEP_AVG_JOB_SECONDS", "5.0"))
+    AVG_WINDOW = int(os.getenv("ACESTEP_AVG_WINDOW", "50"))
+    @asynccontextmanager
+    async def lifespan(app: FastAPI):
+        # Clear proxy env that may affect downstream libs
+        for proxy_var in ["http_proxy", "https_proxy", "HTTP_PROXY", "HTTPS_PROXY", "ALL_PROXY"]:
+            os.environ.pop(proxy_var, None)
+        handler = AceStepHandler()
+        init_lock = asyncio.Lock()
+        app.state._initialized = False
+        app.state._init_error = None
+        app.state._init_lock = init_lock
+        max_workers = int(os.getenv("ACESTEP_API_WORKERS", "1"))
+        executor = ThreadPoolExecutor(max_workers=max_workers)
+        # Queue & observability
+        app.state.job_queue = asyncio.Queue(maxsize=QUEUE_MAXSIZE)  # (job_id, req)
+        app.state.pending_ids = deque()  # queued job_ids
+        app.state.pending_lock = asyncio.Lock()
+        # temp files per job (from multipart uploads)
+        app.state.job_temp_files = {}  # job_id -> list[path]
+        app.state.job_temp_files_lock = asyncio.Lock()
+        # stats
+        app.state.stats_lock = asyncio.Lock()
+        app.state.recent_durations = deque(maxlen=AVG_WINDOW)
+        app.state.avg_job_seconds = INITIAL_AVG_JOB_SECONDS
+        app.state.handler = handler
+        app.state.executor = executor
+        app.state.job_store = store
+        app.state._python_executable = sys.executable
+        async def _ensure_initialized() -> None:
+            h: AceStepHandler = app.state.handler
+            if getattr(app.state, "_initialized", False):
+                return
+            if getattr(app.state, "_init_error", None):
+                raise RuntimeError(app.state._init_error)
+            async with app.state._init_lock:
+                if getattr(app.state, "_initialized", False):
+                    return
+                if getattr(app.state, "_init_error", None):
+                    raise RuntimeError(app.state._init_error)
+                project_root = _get_project_root()
+                config_path = os.getenv("ACESTEP_CONFIG_PATH", "acestep-v15-turbo")
+                device = os.getenv("ACESTEP_DEVICE", "auto")
+                use_flash_attention = _env_bool("ACESTEP_USE_FLASH_ATTENTION", True)
+                offload_to_cpu = _env_bool("ACESTEP_OFFLOAD_TO_CPU", False)
+                offload_dit_to_cpu = _env_bool("ACESTEP_OFFLOAD_DIT_TO_CPU", False)
+                status_msg, ok = h.initialize_service(
+                    project_root=project_root,
+                    config_path=config_path,
+                    device=device,
+                    use_flash_attention=use_flash_attention,
+                    compile_model=False,
+                    offload_to_cpu=offload_to_cpu,
+                    offload_dit_to_cpu=offload_dit_to_cpu,
+                )
+                if not ok:
+                    app.state._init_error = status_msg
+                    raise RuntimeError(status_msg)
+                app.state._initialized = True
+        async def _cleanup_job_temp_files(job_id: str) -> None:
+            async with app.state.job_temp_files_lock:
+                paths = app.state.job_temp_files.pop(job_id, [])
+            for p in paths:
+                try:
+                    os.remove(p)
+                except Exception:
+                    pass
+        async def _run_one_job(job_id: str, req: GenerateMusicRequest) -> None:
+            job_store: _JobStore = app.state.job_store
+            h: AceStepHandler = app.state.handler
+            executor: ThreadPoolExecutor = app.state.executor
+            await _ensure_initialized()
+            job_store.mark_running(job_id)
+            def _blocking_generate() -> Dict[str, Any]:
+                first, second, paths, gen_info, status_msg, seed_value, *_ = h.generate_music(
+                    captions=req.caption,
+                    lyrics=req.lyrics,
+                    bpm=req.bpm,
+                    key_scale=req.key_scale,
+                    time_signature=req.time_signature,
+                    vocal_language=req.vocal_language,
+                    inference_steps=req.inference_steps,
+                    guidance_scale=req.guidance_scale,
+                    use_random_seed=req.use_random_seed,
+                    seed=req.seed,
+                    reference_audio=req.reference_audio_path,
+                    audio_duration=req.audio_duration,
+                    batch_size=req.batch_size,
+                    src_audio=req.src_audio_path,
+                    audio_code_string=req.audio_code_string,
+                    repainting_start=req.repainting_start,
+                    repainting_end=req.repainting_end,
+                    instruction=req.instruction,
+                    audio_cover_strength=req.audio_cover_strength,
+                    task_type=req.task_type,
+                    use_adg=req.use_adg,
+                    cfg_interval_start=req.cfg_interval_start,
+                    cfg_interval_end=req.cfg_interval_end,
+                    audio_format=req.audio_format,
+                    use_tiled_decode=req.use_tiled_decode,
+                    progress=None,
+                )
+                return {
+                    "first_audio_path": first,
+                    "second_audio_path": second,
+                    "audio_paths": paths,
+                    "generation_info": gen_info,
+                    "status_message": status_msg,
+                    "seed_value": seed_value,
+                }
+            t0 = time.time()
+            try:
+                loop = asyncio.get_running_loop()
+                result = await loop.run_in_executor(executor, _blocking_generate)
+                job_store.mark_succeeded(job_id, result)
+            except Exception:
+                job_store.mark_failed(job_id, traceback.format_exc())
+            finally:
+                dt = max(0.0, time.time() - t0)
+                async with app.state.stats_lock:
+                    app.state.recent_durations.append(dt)
+                    if app.state.recent_durations:
+                        app.state.avg_job_seconds = sum(app.state.recent_durations) / len(app.state.recent_durations)
+        async def _queue_worker(worker_idx: int) -> None:
+            while True:
+                job_id, req = await app.state.job_queue.get()
+                try:
+                    async with app.state.pending_lock:
+                        try:
+                            app.state.pending_ids.remove(job_id)
+                        except ValueError:
+                            pass
+                    await _run_one_job(job_id, req)
+                finally:
+                    await _cleanup_job_temp_files(job_id)
+                    app.state.job_queue.task_done()
+        worker_count = max(1, WORKER_COUNT)
+        workers = [asyncio.create_task(_queue_worker(i)) for i in range(worker_count)]
+        app.state.worker_tasks = workers
+        try:
+            yield
+        finally:
+            for t in workers:
+                t.cancel()
+            executor.shutdown(wait=False, cancel_futures=True)
+    app = FastAPI(title="ACE-Step API", version="1.0", lifespan=lifespan)
+    async def _queue_position(job_id: str) -> int:
+        async with app.state.pending_lock:
+            try:
+                return list(app.state.pending_ids).index(job_id) + 1
+            except ValueError:
+                return 0
+    async def _eta_seconds_for_position(pos: int) -> Optional[float]:
+        if pos <= 0:
+            return None
+        async with app.state.stats_lock:
+            avg = float(getattr(app.state, "avg_job_seconds", INITIAL_AVG_JOB_SECONDS))
+        return pos * avg
+    @app.post("/v1/music/generate", response_model=CreateJobResponse)
+    async def create_music_generate_job(request: Request) -> CreateJobResponse:
+        content_type = (request.headers.get("content-type") or "").lower()
+        temp_files: list[str] = []
+        def _build_req_from_mapping(mapping: Any, *, reference_audio_path: Optional[str], src_audio_path: Optional[str]) -> GenerateMusicRequest:
+            get = getattr(mapping, "get", None)
+            if not callable(get):
+                raise HTTPException(status_code=400, detail="Invalid request payload")
+            return GenerateMusicRequest(
+                caption=str(get("caption", "") or ""),
+                lyrics=str(get("lyrics", "") or ""),
+                bpm=_to_int(get("bpm"), None),
+                key_scale=str(get("key_scale", "") or ""),
+                time_signature=str(get("time_signature", "") or ""),
+                vocal_language=str(get("vocal_language", "en") or "en"),
+                inference_steps=_to_int(get("inference_steps"), 8) or 8,
+                guidance_scale=_to_float(get("guidance_scale"), 7.0) or 7.0,
+                use_random_seed=_to_bool(get("use_random_seed"), True),
+                seed=_to_int(get("seed"), -1) or -1,
+                reference_audio_path=reference_audio_path,
+                src_audio_path=src_audio_path,
+                audio_duration=_to_float(get("audio_duration"), None),
+                batch_size=_to_int(get("batch_size"), None),
+                audio_code_string=str(get("audio_code_string", "") or ""),
+                repainting_start=_to_float(get("repainting_start"), 0.0) or 0.0,
+                repainting_end=_to_float(get("repainting_end"), None),
+                instruction=str(get("instruction", "Fill the audio semantic mask based on the given conditions:") or ""),
+                audio_cover_strength=_to_float(get("audio_cover_strength"), 1.0) or 1.0,
+                task_type=str(get("task_type", "text2music") or "text2music"),
+                use_adg=_to_bool(get("use_adg"), False),
+                cfg_interval_start=_to_float(get("cfg_interval_start"), 0.0) or 0.0,
+                cfg_interval_end=_to_float(get("cfg_interval_end"), 1.0) or 1.0,
+                audio_format=str(get("audio_format", "mp3") or "mp3"),
+                use_tiled_decode=_to_bool(get("use_tiled_decode"), True),
+            )
+        def _first_value(v: Any) -> Any:
+            if isinstance(v, list) and v:
+                return v[0]
+            return v
+        if content_type.startswith("application/json"):
+            body = await request.json()
+            req = GenerateMusicRequest(**body)
+        elif content_type.endswith("+json"):
+            body = await request.json()
+            req = GenerateMusicRequest(**body)
+        elif content_type.startswith("multipart/form-data"):
+            form = await request.form()
+            ref_up = form.get("reference_audio")
+            src_up = form.get("src_audio")
+            reference_audio_path = None
+            src_audio_path = None
+            if isinstance(ref_up, StarletteUploadFile):
+                reference_audio_path = await _save_upload_to_temp(ref_up, prefix="reference_audio")
+                temp_files.append(reference_audio_path)
+            else:
+                reference_audio_path = str(form.get("reference_audio_path") or "").strip() or None
+            if isinstance(src_up, StarletteUploadFile):
+                src_audio_path = await _save_upload_to_temp(src_up, prefix="src_audio")
+                temp_files.append(src_audio_path)
+            else:
+                src_audio_path = str(form.get("src_audio_path") or "").strip() or None
+            req = _build_req_from_mapping(form, reference_audio_path=reference_audio_path, src_audio_path=src_audio_path)
+        elif content_type.startswith("application/x-www-form-urlencoded"):
+            form = await request.form()
+            reference_audio_path = str(form.get("reference_audio_path") or "").strip() or None
+            src_audio_path = str(form.get("src_audio_path") or "").strip() or None
+            req = _build_req_from_mapping(form, reference_audio_path=reference_audio_path, src_audio_path=src_audio_path)
+        else:
+            raw = await request.body()
+            raw_stripped = raw.lstrip()
+            # Best-effort: accept missing/incorrect Content-Type if payload is valid JSON.
+            if raw_stripped.startswith(b"{") or raw_stripped.startswith(b"["):
+                try:
+                    body = json.loads(raw.decode("utf-8"))
+                    if isinstance(body, dict):
+                        req = GenerateMusicRequest(**body)
+                    else:
+                        raise HTTPException(status_code=400, detail="JSON payload must be an object")
+                except HTTPException:
+                    raise
+                except Exception:
+                    raise HTTPException(
+                        status_code=400,
+                        detail="Invalid JSON body (hint: set 'Content-Type: application/json')",
+                    )
+            # Best-effort: parse key=value bodies even if Content-Type is missing.
+            elif raw_stripped and b"=" in raw:
+                parsed = urllib.parse.parse_qs(raw.decode("utf-8"), keep_blank_values=True)
+                flat = {k: _first_value(v) for k, v in parsed.items()}
+                reference_audio_path = str(flat.get("reference_audio_path") or "").strip() or None
+                src_audio_path = str(flat.get("src_audio_path") or "").strip() or None
+                req = _build_req_from_mapping(flat, reference_audio_path=reference_audio_path, src_audio_path=src_audio_path)
+            else:
+                raise HTTPException(
+                    status_code=415,
+                    detail=(
+                        f"Unsupported Content-Type: {content_type or '(missing)'}; "
+                        "use application/json, application/x-www-form-urlencoded, or multipart/form-data"
+                    ),
+                )
+        rec = store.create()
+        q: asyncio.Queue = app.state.job_queue
+        if q.full():
+            for p in temp_files:
+                try:
+                    os.remove(p)
+                except Exception:
+                    pass
+            raise HTTPException(status_code=429, detail="Server busy: queue is full")
+        if temp_files:
+            async with app.state.job_temp_files_lock:
+                app.state.job_temp_files[rec.job_id] = temp_files
+        async with app.state.pending_lock:
+            app.state.pending_ids.append(rec.job_id)
+            position = len(app.state.pending_ids)
+        await q.put((rec.job_id, req))
+        return CreateJobResponse(job_id=rec.job_id, status="queued", queue_position=position)
+    @app.get("/v1/jobs/{job_id}", response_model=JobResponse)
+    async def get_job(job_id: str) -> JobResponse:
+        rec = store.get(job_id)
+        if rec is None:
+            raise HTTPException(status_code=404, detail="Job not found")
+        pos = 0
+        eta = None
+        async with app.state.stats_lock:
+            avg = float(getattr(app.state, "avg_job_seconds", INITIAL_AVG_JOB_SECONDS))
+        if rec.status == "queued":
+            pos = await _queue_position(job_id)
+            eta = await _eta_seconds_for_position(pos)
+        return JobResponse(
+            job_id=rec.job_id,
+            status=rec.status,
+            created_at=rec.created_at,
+            started_at=rec.started_at,
+            finished_at=rec.finished_at,
+            queue_position=pos,
+            eta_seconds=eta,
+            avg_job_seconds=avg,
+            result=JobResult(**rec.result) if rec.result else None,
+            error=rec.error,
+        )
+    return app
+app = create_app()
+def main() -> None:
+    import uvicorn
+    host = os.getenv("ACESTEP_API_HOST", "127.0.0.1")
+    port = int(os.getenv("ACESTEP_API_PORT", "8001"))
+    # IMPORTANT: in-memory queue/store -> workers MUST be 1
+    uvicorn.run("acestep.api_server:app", host=host, port=port, reload=False, workers=1)
+if __name__ == "__main__":
+    main()

acestep/third_parts/nano-vllm/nanovllm/engine/model_runner.py CHANGED Viewed

@@ -43,6 +43,9 @@ def find_available_port(start_port: int = 2333, max_attempts: int = 100) -> int:
 class ModelRunner:
     def __init__(self, config: Config, rank: int, event: Event | list[Event]):
         self.config = config
         hf_config = config.hf_config
         self.block_size = config.kvcache_block_size
@@ -55,7 +58,9 @@ class ModelRunner:
         dist.init_process_group("nccl", f"tcp://localhost:{dist_port}", world_size=self.world_size, rank=rank)
         torch.cuda.set_device(rank)
         default_dtype = torch.get_default_dtype()
-        torch.set_default_dtype(hf_config.torch_dtype)
         torch.set_default_device("cuda")
         self.model = Qwen3ForCausalLM(hf_config)
         load_model(self.model, config.model)
@@ -130,14 +135,31 @@ class ModelRunner:
         config = self.config
         hf_config = config.hf_config
         free, total = torch.cuda.mem_get_info()
-        used = total - free
-        peak = torch.cuda.memory_stats()["allocated_bytes.all.peak"]
         current = torch.cuda.memory_stats()["allocated_bytes.all.current"]
         num_kv_heads = hf_config.num_key_value_heads // self.world_size
         head_dim = getattr(hf_config, "head_dim", hf_config.hidden_size // hf_config.num_attention_heads)
-        block_bytes = 2 * hf_config.num_hidden_layers * self.block_size * num_kv_heads * head_dim * hf_config.torch_dtype.itemsize
-        config.num_kvcache_blocks = int(total * config.gpu_memory_utilization - used - peak + current) // block_bytes
-        assert config.num_kvcache_blocks > 0
         self.kv_cache = torch.empty(2, hf_config.num_hidden_layers, config.num_kvcache_blocks, self.block_size, num_kv_heads, head_dim)
         layer_id = 0
         for module in self.model.modules():

 class ModelRunner:
     def __init__(self, config: Config, rank: int, event: Event | list[Event]):
+        # Enable capturing scalar outputs to avoid graph breaks from Tensor.item() calls
+        torch._dynamo.config.capture_scalar_outputs = True
         self.config = config
         hf_config = config.hf_config
         self.block_size = config.kvcache_block_size
         dist.init_process_group("nccl", f"tcp://localhost:{dist_port}", world_size=self.world_size, rank=rank)
         torch.cuda.set_device(rank)
         default_dtype = torch.get_default_dtype()
+        # Use dtype instead of deprecated torch_dtype
+        config_dtype = getattr(hf_config, 'dtype', getattr(hf_config, 'torch_dtype', torch.float32))
+        torch.set_default_dtype(config_dtype)
         torch.set_default_device("cuda")
         self.model = Qwen3ForCausalLM(hf_config)
         load_model(self.model, config.model)
         config = self.config
         hf_config = config.hf_config
         free, total = torch.cuda.mem_get_info()
         current = torch.cuda.memory_stats()["allocated_bytes.all.current"]
         num_kv_heads = hf_config.num_key_value_heads // self.world_size
         head_dim = getattr(hf_config, "head_dim", hf_config.hidden_size // hf_config.num_attention_heads)
+        # Use dtype instead of deprecated torch_dtype
+        config_dtype = getattr(hf_config, 'dtype', getattr(hf_config, 'torch_dtype', torch.float32))
+        block_bytes = 2 * hf_config.num_hidden_layers * self.block_size * num_kv_heads * head_dim * config_dtype.itemsize
+        # Calculate available memory for KV cache
+        # After warmup_model, empty_cache has been called, so current represents model memory only
+        # Use free memory but respect the gpu_memory_utilization limit
+        target_total_usage = total * config.gpu_memory_utilization
+        available_for_kv_cache = min(free * 0.9, target_total_usage - current)
+        # Ensure we have positive memory available
+        if available_for_kv_cache <= 0:
+            available_for_kv_cache = free * 0.5  # Fallback to 50% of free memory
+        config.num_kvcache_blocks = max(1, int(available_for_kv_cache) // block_bytes)
+        if config.num_kvcache_blocks <= 0:
+            raise RuntimeError(
+                f"Insufficient GPU memory for KV cache. "
+                f"Free: {free / 1024**3:.2f} GB, Current: {current / 1024**3:.2f} GB, "
+                f"Available for KV: {available_for_kv_cache / 1024**3:.2f} GB, "
+                f"Block size: {block_bytes / 1024**2:.2f} MB"
+            )
         self.kv_cache = torch.empty(2, hf_config.num_hidden_layers, config.num_kvcache_blocks, self.block_size, num_kv_heads, head_dim)
         layer_id = 0
         for module in self.model.modules():

close_api_server.sh ADDED Viewed

	@@ -0,0 +1,140 @@

+#!/usr/bin/env bash
+set -euo pipefail
+usage() {
+	cat <<'EOF'
+Usage:
+	./close_api_server.sh [--port PORT] [--pid PID] [--force]
+Defaults:
+	PORT: 8001
+Behavior:
+	- If --pid is provided, stops that PID.
+	- Otherwise, finds the listening PID(s) on --port and stops them.
+	- By default, only stops processes whose cmdline contains "uvicorn" or "acestep.api_server".
+		Use --force to skip this safety check.
+EOF
+}
+PORT="8001"
+PID=""
+FORCE="0"
+while [[ $# -gt 0 ]]; do
+	case "$1" in
+		--port)
+			PORT="${2:-}"; shift 2 ;;
+		--pid)
+			PID="${2:-}"; shift 2 ;;
+		--force)
+			FORCE="1"; shift ;;
+		-h|--help)
+			usage; exit 0 ;;
+		*)
+			echo "Unknown argument: $1" >&2
+			usage
+			exit 2
+			;;
+	esac
+done
+if [[ -n "$PORT" ]] && ! [[ "$PORT" =~ ^[0-9]+$ ]]; then
+	echo "Invalid --port: $PORT" >&2
+	exit 2
+fi
+if [[ -n "$PID" ]] && ! [[ "$PID" =~ ^[0-9]+$ ]]; then
+	echo "Invalid --pid: $PID" >&2
+	exit 2
+fi
+_cmdline() {
+	local pid="$1"
+	if [[ -r "/proc/${pid}/cmdline" ]]; then
+		tr '\0' ' ' < "/proc/${pid}/cmdline" | sed 's/[[:space:]]\+/ /g' || true
+	else
+		echo ""
+	fi
+}
+_is_target_process() {
+	local pid="$1"
+	local cmd
+	cmd="$(_cmdline "$pid")"
+	[[ "$cmd" == *"uvicorn"* || "$cmd" == *"acestep.api_server"* ]]
+}
+_find_pids_by_port() {
+	local port="$1"
+	local pids=""
+	if command -v lsof >/dev/null 2>&1; then
+		pids="$(lsof -nP -t -iTCP:"$port" -sTCP:LISTEN 2>/dev/null | tr '\n' ' ' || true)"
+	elif command -v ss >/dev/null 2>&1; then
+		# 输出示例：LISTEN 0 4096 127.0.0.1:8001 ... users:("python",pid=12345,fd=3)
+		pids="$(ss -lptn "sport = :$port" 2>/dev/null | sed -n 's/.*pid=\([0-9]\+\).*/\1/p' | sort -u | tr '\n' ' ' || true)"
+	elif command -v netstat >/dev/null 2>&1; then
+		# 输出示例：tcp ... LISTEN 12345/python
+		pids="$(netstat -lntp 2>/dev/null | awk -v p=":${port}" '$4 ~ p && $6=="LISTEN" {split($7,a,"/"); if (a[1] ~ /^[0-9]+$/) print a[1]}' | sort -u | tr '\n' ' ' || true)"
+	elif command -v fuser >/dev/null 2>&1; then
+		pids="$(fuser -n tcp "$port" 2>/dev/null | tr '\n' ' ' || true)"
+	fi
+	echo "$pids"
+}
+_stop_pid() {
+	local pid="$1"
+	if ! kill -0 "$pid" 2>/dev/null; then
+		echo "PID $pid not running."
+		return 0
+	fi
+	if [[ "$FORCE" != "1" ]] && ! _is_target_process "$pid"; then
+		echo "Skip PID $pid (cmdline does not look like uvicorn/acestep.api_server). Use --force to stop anyway." >&2
+		echo "cmdline: $(_cmdline "$pid")" >&2
+		return 3
+	fi
+	echo "Stopping PID $pid..."
+	kill -TERM "$pid" 2>/dev/null || true
+	for _ in $(seq 1 30); do
+		if ! kill -0 "$pid" 2>/dev/null; then
+			echo "Stopped PID $pid."
+			return 0
+		fi
+		sleep 0.2
+	done
+	echo "PID $pid did not exit; sending SIGKILL..." >&2
+	kill -KILL "$pid" 2>/dev/null || true
+	sleep 0.1
+	if kill -0 "$pid" 2>/dev/null; then
+		echo "Failed to kill PID $pid." >&2
+		return 1
+	fi
+	echo "Killed PID $pid."
+	return 0
+}
+if [[ -n "$PID" ]]; then
+	_stop_pid "$PID"
+	exit $?
+fi
+pids="$(_find_pids_by_port "$PORT")"
+if [[ -z "${pids// }" ]]; then
+	echo "No listening process found on port $PORT."
+	exit 0
+fi
+rc=0
+for pid in $pids; do
+	if [[ -n "$pid" ]]; then
+		_stop_pid "$pid" || rc=$?
+	fi
+done
+exit "$rc"

pyproject.toml CHANGED Viewed

@@ -18,10 +18,13 @@ dependencies = [
     "loguru>=0.7.3",
     "einops>=0.8.1",
     "accelerate>=1.12.0",
 ]
 [project.scripts]
 acestep = "acestep.acestep_v15_pipeline:main"
 [build-system]
 requires = ["hatchling"]
@@ -32,7 +35,7 @@ dev-dependencies = []
 [[tool.uv.index]]
 name = "pytorch"
-url = "https://download.pytorch.org/whl/cu130"
 [tool.hatch.build.targets.wheel]
 packages = ["acestep"]

     "loguru>=0.7.3",
     "einops>=0.8.1",
     "accelerate>=1.12.0",
+    "fastapi>=0.110.0",
+    "uvicorn[standard]>=0.27.0",
 ]
 [project.scripts]
 acestep = "acestep.acestep_v15_pipeline:main"
+acestep-api = "acestep.api_server:main"
 [build-system]
 requires = ["hatchling"]
 [[tool.uv.index]]
 name = "pytorch"
+url = "https://download.pytorch.org/whl/cu128"
 [tool.hatch.build.targets.wheel]
 packages = ["acestep"]

requirements.txt CHANGED Viewed

@@ -7,4 +7,6 @@ loguru
 einops
 accelerator
 vector-quantize-pytorch
-psutil

 einops
 accelerator
 vector-quantize-pytorch
+psutil
+fastapi
+uvicorn

run_api_server.sh ADDED Viewed

	@@ -0,0 +1,27 @@

+#!/usr/bin/env bash
+set -euo pipefail
+ROOT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+CONDA_ACTIVATE="${CONDA_ACTIVATE:-/root/data/repo/gongjunmin/miniconda3/bin/activate}"
+CONDA_ENV_NAME="${ACESTEP_CONDA_ENV:-acestep_v15_train}"
+HOST="${ACESTEP_API_HOST:-0.0.0.0}"
+PORT="${ACESTEP_API_PORT:-8001}"
+LOG_LEVEL="${ACESTEP_API_LOG_LEVEL:-debug}"
+cd "$ROOT_DIR"
+# 临时关闭 nounset 以避免 conda activate.d 脚本中的 unbound variable 错误
+set +u
+# shellcheck disable=SC1090
+source "$CONDA_ACTIVATE" "$CONDA_ENV_NAME"
+set -u
+# NOTE: api_server 使用内存队列/任务存储，要求 workers=1。
+nohup python -m uvicorn acestep.api_server:app \
+	--host "0.0.0.0" \
+	--port "8001" \
+	--workers 1 \
+	--log-level "$LOG_LEVEL" > server.log 2>&1 &
+echo "Server started in background with PID $!. Logs in server.log"