BoxOfColors Claude Sonnet 4.6 commited on
Commit
6446441
·
1 Parent(s): b94c46b

Free GPU memory between HunyuanFoley segments to prevent OOM

Browse files

After each segment's denoise_process, explicitly del audio_batch and
visual_feats then call torch.cuda.empty_cache(). The 15-s audio latent
tensor is several GB; without explicit deletion PyTorch holds the CUDA
allocation until GC runs, causing OOM when the second segment allocates
its own latent. This is why seg 1 completed successfully but seg 2 failed
silently (ZeroGPU kills worker on OOM with no Python traceback).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Files changed (1) hide show
  1. app.py +6 -0
app.py CHANGED
@@ -1301,6 +1301,12 @@ def _hunyuan_gpu_infer(video_file, prompt, negative_prompt, seed_val,
1301
  batch_size=1,
1302
  )
1303
  seg_wavs.append(audio_batch[0].float().cpu().numpy())
 
 
 
 
 
 
1304
 
1305
  _log_inference_timing("HunyuanFoley", time.perf_counter() - _t0,
1306
  len(segments), int(num_steps), HUNYUAN_SECS_PER_STEP)
 
1301
  batch_size=1,
1302
  )
1303
  seg_wavs.append(audio_batch[0].float().cpu().numpy())
1304
+ # Free GPU memory between segments — latents/visual_feats from denoise_process
1305
+ # stay allocated until GC runs; explicit deletion + cache clear prevents OOM
1306
+ # when processing a second segment (the 15-s latent tensor is ~several GB).
1307
+ del audio_batch, visual_feats
1308
+ if torch.cuda.is_available():
1309
+ torch.cuda.empty_cache()
1310
 
1311
  _log_inference_timing("HunyuanFoley", time.perf_counter() - _t0,
1312
  len(segments), int(num_steps), HUNYUAN_SECS_PER_STEP)