VieNeu-Codec: The Heart of VieNeu-TTS v2

VieNeu-Codec is the high-performance audio engine built specifically for the upcoming VieNeu-TTS v2. It is a neural audio codec trained on over 20,000 hours of diverse Vietnamese and English speech data, ensuring state-of-the-art robustness, natural prosody, and crystal-clear audio reconstruction.

This repository provides the optimized ONNX versions of the VieNeu-Codec for production use.

πŸš€ Key Features

  • 24kHz High-Fidelity: Crystal clear audio reconstruction optimized for the Vietnamese language.
  • Zero-Shot Voice Cloning: Clone any voice with just 5 seconds of reference audio.
  • Optimized for VieNeu-TTS v2: Seamlessly integrates with the next-generation LLM backbone of VieNeu-TTS.
  • Two Deployment Modes: Includes both FP32 (High Quality) and INT8 (High Speed) decoders.

πŸ“¦ Model Components

  • vieneu_decoder.onnx: (FP32) High-fidelity audio decoder for maximum quality.
  • vieneu_decoder_int8.onnx: (INT8) Quantized decoder for fast CPU inference.

πŸ› οΈ Usage

Synthesize Speech

Combine the speaker embedding with content tokens from your LLM (VieNeu-TTS v2):

sess_dec = ort.InferenceSession("vieneu_decoder.onnx")
audio = sess_dec.run(None, {
    "content_ids": ids,
    "voice": embedding
})[0]

πŸ“„ License & Attribution

Author: Pham Nguyen Ngoc Bao
Project: VieNeu-Codec (for VieNeu-TTS v2)
Version: 2.0

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Spaces using pnnbao-ump/VieNeu-Codec 2

Collection including pnnbao-ump/VieNeu-Codec