VibeVoice โ MLX
VibeVoice converted and quantized for native MLX inference on Apple Silicon.
A hybrid LLM + diffusion architecture built for long-form speech and voice-conditioned generation. Works in greedy or sampled mode, and produces natural-sounding output at scale.
Variants
| Path | Precision |
|---|---|
mlx-int8/ |
int8 quantized weights |
How to Get Started
Via mlx-speech:
python scripts/generate_vibevoice.py \
--text "Hello from VibeVoice." \
--output outputs/vibevoice.wav
from mlx_speech.generation import VibeVoiceModel
model = VibeVoiceModel.from_path("mlx-int8")
Model Details
VibeVoice uses a 9B-parameter hybrid architecture combining a Qwen2 language model backbone with a continuous diffusion acoustic decoder. Converted to MLX with explicit weight remapping โ no PyTorch at inference time.
See mlx-speech for the full runtime and conversion code.
License
Apache 2.0.
Hardware compatibility
Log In to add your hardware
Quantized