LightBlue TTS ๐ฎ๐ฑ
Model Description
LightBlue is a state-of-the-art, lightning-fast Text-to-Speech (TTS) model built from scratch specifically for Hebrew (with English support). It is designed to produce 100% native Israeli-sounding speech with perfect handling of Nikud (vowels) and complex homographs, without compromising on inference speed.
It is fast enough to generate an entire 1-hour audiobook in just 3 seconds on a modern GPU.
Key Features
- Blazing Fast Inference:
- 1260x real-time on an NVIDIA RTX 3090 (21 minutes of audio generated per second).
- 35x real-time on standard CPUs.
- 20x real-time on Apple M1 chips.
- Native Hebrew Quality: Features a real Israeli accent, correct stress placements, and native-level flow.
- Advanced Contextual Understanding: Passes the "Homograph Test" (e.g., correctly distinguishing between ืฆืคื as "watched" vs "floated", or ืชืจื as "spinach" vs "go down").
- Multiple Voices: Includes high-quality voices like Yonatan (Hebrew only) and Rotem.
Uses
Direct Use
- Generating high-quality Hebrew audio from text.
- Real-time TTS applications running on standard CPUs or edge devices.
- Audiobooks, accessibility tools, virtual assistants, and automated broadcasting.
Speed Benchmarks
LightBlue is optimized for extreme speed without sacrificing naturalness:
| Hardware |
Speed |
Time for 1 Hour of Audio |
| NVIDIA RTX 3090 |
1260x real-time |
~3 seconds |
| Standard CPU |
35x real-time |
~1.7 minutes |
| Apple M1 |
20x real-time |
~3 minutes |
How to Get Started
To use this model, you can clone the official GitHub repository and install the requirements: