Stereo2Spatial
Collection
Collection of flow matching DiT models to go from stereo -> spatial • 1 item • Updated
stereo2spatial-v1 is a DiT model for converting mono or
stereo audio into 12-channel 7.1.4 spatial audio at 48 kHz.
The model is intended to be used with the stereo2spatial codebase. The bundle contains the model weights, runtime config, and bundled EAR-VAE assets needed for inference.
SpatialDiT48000507.1.412FL, FR, FC, LFE, BL, BR, SL, SR, TFL,
TFR, TBL, TBR102412166432This v1 release was trained for 440,000 total steps:
200,000 steps without GAN200,000 additional steps with GAN enabled40,000 steps with GAN enabledThis model is intended for:
7.1.4This model is not a drop-in replacement for professional mastering, QC, or broadcast authoring workflows.
7.1.4 output layout; do not expect other layouts
to work without retraining or exporting a different target-channel setup.From a local checkout of the stereo2spatial code repository:
python -m venv .venv
. .venv/Scripts/activate # Windows PowerShell: .\.venv\Scripts\Activate.ps1
pip install -e .
python -m pip install -U "huggingface_hub[cli]"
hf download francislabounty/stereo2spatial-v1 --local-dir checkpoints/stereo2spatial-v1
python infer.py --checkpoint checkpoints/stereo2spatial-v1 --input-audio path/to/input.wav --output-audio path/to/output_spatial.wav --device cuda --show-progress
The recommended usage is pointing --checkpoint at the downloaded bundle
directory. The inference CLI will:
config.jsonmodel.safetensorsvae/python infer.py --checkpoint checkpoints/stereo2spatial-v1 --input-audio path/to/input.wav --output-audio path/to/output_spatial.wav --device cuda --show-progress --report-json outputs/report.json
Useful flags:
--device cpu to run on CPU--solver auto|heun|euler|unipc|... to change the latent solver--normalize-peak to normalize the rendered WAV before writingThis model is released under the Apache 2.0 license.