Spaces:
Running on Zero
Running on Zero
Commit ·
6446441
1
Parent(s): b94c46b
Free GPU memory between HunyuanFoley segments to prevent OOM
Browse filesAfter each segment's denoise_process, explicitly del audio_batch and
visual_feats then call torch.cuda.empty_cache(). The 15-s audio latent
tensor is several GB; without explicit deletion PyTorch holds the CUDA
allocation until GC runs, causing OOM when the second segment allocates
its own latent. This is why seg 1 completed successfully but seg 2 failed
silently (ZeroGPU kills worker on OOM with no Python traceback).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
app.py
CHANGED
|
@@ -1301,6 +1301,12 @@ def _hunyuan_gpu_infer(video_file, prompt, negative_prompt, seed_val,
|
|
| 1301 |
batch_size=1,
|
| 1302 |
)
|
| 1303 |
seg_wavs.append(audio_batch[0].float().cpu().numpy())
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1304 |
|
| 1305 |
_log_inference_timing("HunyuanFoley", time.perf_counter() - _t0,
|
| 1306 |
len(segments), int(num_steps), HUNYUAN_SECS_PER_STEP)
|
|
|
|
| 1301 |
batch_size=1,
|
| 1302 |
)
|
| 1303 |
seg_wavs.append(audio_batch[0].float().cpu().numpy())
|
| 1304 |
+
# Free GPU memory between segments — latents/visual_feats from denoise_process
|
| 1305 |
+
# stay allocated until GC runs; explicit deletion + cache clear prevents OOM
|
| 1306 |
+
# when processing a second segment (the 15-s latent tensor is ~several GB).
|
| 1307 |
+
del audio_batch, visual_feats
|
| 1308 |
+
if torch.cuda.is_available():
|
| 1309 |
+
torch.cuda.empty_cache()
|
| 1310 |
|
| 1311 |
_log_inference_timing("HunyuanFoley", time.perf_counter() - _t0,
|
| 1312 |
len(segments), int(num_steps), HUNYUAN_SECS_PER_STEP)
|