LightBlue TTS ๐Ÿ‡ฎ๐Ÿ‡ฑ

Model Description

LightBlue is a state-of-the-art, lightning-fast Text-to-Speech (TTS) model built from scratch specifically for Hebrew (with English support). It is designed to produce 100% native Israeli-sounding speech with perfect handling of Nikud (vowels) and complex homographs, without compromising on inference speed.

It is fast enough to generate an entire 1-hour audiobook in just 3 seconds on a modern GPU.

Key Features

  • Blazing Fast Inference:
    • 1260x real-time on an NVIDIA RTX 3090 (21 minutes of audio generated per second).
    • 35x real-time on standard CPUs.
    • 20x real-time on Apple M1 chips.
  • Native Hebrew Quality: Features a real Israeli accent, correct stress placements, and native-level flow.
  • Advanced Contextual Understanding: Passes the "Homograph Test" (e.g., correctly distinguishing between ืฆืคื” as "watched" vs "floated", or ืชืจื“ as "spinach" vs "go down").
  • Multiple Voices: Includes high-quality voices like Yonatan (Hebrew only) and Rotem.

Uses

Direct Use

  • Generating high-quality Hebrew audio from text.
  • Real-time TTS applications running on standard CPUs or edge devices.
  • Audiobooks, accessibility tools, virtual assistants, and automated broadcasting.

Speed Benchmarks

LightBlue is optimized for extreme speed without sacrificing naturalness:

Hardware Speed Time for 1 Hour of Audio
NVIDIA RTX 3090 1260x real-time ~3 seconds
Standard CPU 35x real-time ~1.7 minutes
Apple M1 20x real-time ~3 minutes

How to Get Started

To use this model, you can clone the official GitHub repository and install the requirements:

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Dataset used to train notmax123/LightBlue