VieNeu-Codec: The Heart of VieNeu-TTS v2

VieNeu-Codec is the high-performance audio engine built specifically for the upcoming VieNeu-TTS v2. It is a neural audio codec trained on over 20,000 hours of diverse Vietnamese and English speech data, ensuring state-of-the-art robustness, natural prosody, and crystal-clear audio reconstruction.

This repository provides the optimized ONNX versions of the VieNeu-Codec for production use.

🚀 Key Features

24kHz High-Fidelity: Crystal clear audio reconstruction optimized for the Vietnamese language.
Zero-Shot Voice Cloning: Clone any voice with just 5 seconds of reference audio.
Optimized for VieNeu-TTS v2: Seamlessly integrates with the next-generation LLM backbone of VieNeu-TTS.
Two Deployment Modes: Includes both FP32 (High Quality) and INT8 (High Speed) decoders.

📦 Model Components

vieneu_decoder.onnx: (FP32) High-fidelity audio decoder for maximum quality.
vieneu_decoder_int8.onnx: (INT8) Quantized decoder for fast CPU inference.

🛠️ Usage

Synthesize Speech

Combine the speaker embedding with content tokens from your LLM (VieNeu-TTS v2):

sess_dec = ort.InferenceSession("vieneu_decoder.onnx")
audio = sess_dec.run(None, {
    "content_ids": ids,
    "voice": embedding
})[0]

📄 License & Attribution

Author: Pham Nguyen Ngoc Bao
Project: VieNeu-Codec (for VieNeu-TTS v2)
Version: 2.0

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Spaces using pnnbao-ump/VieNeu-Codec 2

Collection including pnnbao-ump/VieNeu-Codec

VieNeu-TTS-v2

Collection

VieNeu-TTS-v2 is an advanced on-device Vietnamese Text-to-Speech (TTS) model with instant voice cloning and English-Vietnamese bilingual support. • 4 items • Updated 2 days ago • 1