ChatterBox Desi
A fine-tuned version of ResembleAI/chatterbox for 6 Indic languages text-to-speech synthesis: Bengali, Hindi, Marathi, Gujarati, Tamil, and Telugu.
Zero-shot TTS Output
| Language | Reference | Output | Text |
|---|---|---|---|
| Bengali (বাংলা) | আমরা কেউ মাষ্টার হতে চেয়েছিলাম, কেউ ডাক্তার, কেউ উকিল। অমলকান্তি সে-সব কিছু হতে চায়নি। সে রোদ্দুর হতে চেয়েছিল! | ||
| Bengali (বাংলা) | আমরা কেউ মাষ্টার হতে চেয়েছিলাম, কেউ ডাক্তার, কেউ উকিল। অমলকান্তি সে-সব কিছু হতে চায়নি। সে রোদ্দুর হতে চেয়েছিল! | ||
| Hindi (हिंदी) | हम में से कुछ मास्टर बनना चाहते थे, कुछ डॉक्टर, कुछ वकील। अमलकांति उन सब कुछ बनना नहीं चाहता था। वह धूप बनना चाहता था! | ||
| Hindi (हिंदी) | हम में से कुछ मास्टर बनना चाहते थे, कुछ डॉक्टर, कुछ वकील। अमलकांति उन सब कुछ बनना नहीं चाहता था। वह धूप बनना चाहता था! | ||
| Marathi (मराठी) | आम्ही कोणीतरी मास्टर होऊ इच्छित होतो, कोणीतरी डॉक्टर, कोणीतरी वकील. अमलकांती त्या सगळ्या काही होऊ इच्छित नव्हता. तो सूर्य होऊ इच्छित होता! | ||
| Marathi (मराठी) | आम्ही कोणीतरी मास्टर होऊ इच्छित होतो, कोणीतरी डॉक्टर, कोणीतरी वकील. अमलकांती त्या सगळ्या काही होऊ इच्छित नव्हता. तो सूर्य होऊ इच्छित होता! | ||
| Gujarati (ગુજરાતી) | અમલકાંતિ તે બધું બનવું નથી માંગતો હતો. તે ધૂપ બનવું માંગતો હતો! | ||
| Gujarati (ગુજરાતી) | અમલકાંતિ તે બધું બનવું નથી માંગતો હતો. તે ધૂપ બનવું માંગતો હતો! | ||
| Tamil (தமிழ்) | நாங்கள் யாரும் மாஸ்டர் ஆக விரும்பவில்லை, யாரும் டாக்டர் ஆக விரும்பவில்லை, யாரும் வக்கீல் ஆக விரும்பவில்லை. அமல்காந்தி அந்த எல்லாவற்றையும் ஆக விரும்பவில்லை. அவன் வெயிலாக இருக்க விரும்பினான்! | ||
| Tamil (தமிழ்) | நாங்கள் யாரும் மாஸ்டர் ஆக விரும்பவில்லை, யாரும் டாக்டர் ஆக விரும்பவில்லை, யாரும் வக்கீல் ஆக விரும்பவில்லை. அமல்காந்தி அந்த எல்லாவற்றையும் ஆக விரும்பவில்லை. அவன் வெயிலாக இருக்க விரும்பினான்! | ||
| Telugu (తెలుగు) | మనలో కొందరు మాస్టర్ కావాలని కోరుకున్నారు, కొందరు డాక్టర్ కావాలని కోరుకున్నారు, కొందరు వకీల్ కావాలని కోరుకున్నారు. అమల్కాంతి ఆ అన్ని కావాలని కోరుకోలేదు. అతను సూర్యుడిగా ఉండాలని కోరుకున్నాడు! | ||
| Telugu (తెలుగు) | మనలో కొందరు మాస్టర్ కావాలని కోరుకున్నారు, కొందరు డాక్టర్ కావాలని కోరుకున్నారు, కొందరు వకీల్ కావాలని కోరుకున్నారు. అమల్కాంతి ఆ అన్ని కావాలని కోరుకోలేదు. అతను సూర్యుడిగా ఉండాలని కోరుకున్నాడు! |
Model Details
- Base model: ResembleAI/chatterbox — multilingual ChatterBox (supports 23 languages)
- Fine-tuned on: 6 Indic language speech corpus (~424 hours, 216,819 samples)
- ai4bharat/Shrutilipi (Bengali, Hindi splits)
- ai4bharat/Rasa (Bengali, Hindi, Marathi, Gujarati, Tamil, Telugu splits)
- SPRINGLab/IndicTTS (Bengali, Gujarati, Marathi, Tamil, Telugu)
- Training steps: 10,000
- Architecture: T3 (Text-to-Token Transformer) + HiFT-GAN vocoder
- Vocabulary: Extended from 2,530 → 2,820 tokens to cover all 6 Indic scripts
- Language tagging: Text must be prefixed with language tag (e.g.
[bn],[hi],[mr],[gu],[ta],[te])
Training Data by Language
| Language | Code | Samples | Hours |
|---|---|---|---|
| Bengali | bn | 58,820 | 99.95 |
| Gujarati | gu | 32,604 | 73.17 |
| Hindi | hi | 12,116 | 21.55 |
| Marathi | mr | 37,899 | 72.70 |
| Tamil | ta | 39,437 | 72.74 |
| Telugu | te | 35,943 | 84.04 |
| Total | 216,819 | 424.15 |
Requirements
git clone https://github.com/gokhaneraslan/chatterbox-finetuning
cd chatterbox-finetuning
pip install -r requirements.txt
One-time patch (upstream vocab resize fix)
The upstream chatterbox-finetuning repo initialises T3 with a hard-coded 704-token vocabulary, which causes a size mismatch when loading this model (vocab=2820). Apply this one-line fix before running inference:
# Run from inside the cloned chatterbox-finetuning directory
python - <<'EOF'
import re, pathlib
f = pathlib.Path("src/chatterbox_/tts.py")
txt = f.read_text()
old = " t3 = T3()\n t3_state = load_file(ckpt_dir / \"t3_cfg.safetensors\")"
new = (
" t3_state = load_file(ckpt_dir / \"t3_cfg.safetensors\")\n"
" from .models.t3.modules.t3_config import T3Config\n"
" t3 = T3(hp=T3Config(text_tokens_dict_size=t3_state[\"text_emb.weight\"].shape[0]))"
)
f.write_text(txt.replace(old, new))
print("Patched tts.py")
EOF
Usage
Important: Text must be prefixed with a language tag: [bn], [hi], [mr], [gu], [ta], or [te].
import sys
sys.path.insert(0, "/path/to/chatterbox-finetuning")
from huggingface_hub import snapshot_download
from src.chatterbox_.tts import ChatterboxTTS
import torchaudio
model_dir = snapshot_download("BosonLab/chatterbox-desi")
model = ChatterboxTTS.from_local(model_dir, device="cuda")
# Bengali
text = "[bn] আমি বাংলায় কথা বলতে পারি। এটি একটি পরীক্ষামূলক বাক্য।"
wav = model.generate(text)
torchaudio.save("output_bn.wav", wav, model.sr)
# Hindi
text = "[hi] मैं हिंदी में बोल सकता हूँ। यह एक परीक्षण वाक्य है।"
wav = model.generate(text)
torchaudio.save("output_hi.wav", wav, model.sr)
# Marathi
text = "[mr] मी मराठीत बोलू शकतो. हे एक चाचणी वाक्य आहे."
wav = model.generate(text)
torchaudio.save("output_mr.wav", wav, model.sr)
# Gujarati
text = "[gu] હું ગુજરાતીમાં બોલી શકું છું. આ એક પ્રાયોગિક વાક્ય છે."
wav = model.generate(text)
torchaudio.save("output_gu.wav", wav, model.sr)
# Tamil
text = "[ta] நான் தமிழில் பேச முடியும். இது ஒரு சோதனை வாக்கியம்."
wav = model.generate(text)
torchaudio.save("output_ta.wav", wav, model.sr)
# Telugu
text = "[te] నేను తెలుగులో మాట్లాడగలను. ఇది ఒక పరీక్ష వాక్యం."
wav = model.generate(text)
torchaudio.save("output_te.wav", wav, model.sr)
With Voice Cloning
wav = model.generate(text, audio_prompt_path="reference.wav")
Files
| File | Description |
|---|---|
t3_cfg.safetensors |
Fine-tuned T3 text-to-token transformer (6 Indic langs, vocab=2820) |
s3gen.safetensors |
Speech codec decoder (unchanged from base) |
ve.safetensors |
Voice encoder (unchanged from base) |
conds.pt |
Conditioning embeddings (unchanged from base) |
tokenizer.json |
Tokenizer extended with 6 Indic scripts |
Training Data
All audio resampled to 16kHz. Text cleaned, normalized, and prefixed with language tags. Datasets sourced from AI4Bharat and SPRINGLab public datasets (CC BY 4.0).
Language Tags
Prefix your text with the appropriate language tag for best results:
| Language | Tag | Script |
|---|---|---|
| Bengali | [bn] |
Bengali |
| Hindi | [hi] |
Devanagari |
| Marathi | [mr] |
Devanagari |
| Gujarati | [gu] |
Gujarati |
| Tamil | [ta] |
Tamil |
| Telugu | [te] |
Telugu |
Limitations
- Optimized for 6 Indic languages; other languages may degrade
- Language tag prefix is required for correct language identification
- Best results with clear, well-punctuated text
- Emotion control inherited from base ChatterBox multilingual model
- Requires chatterbox-finetuning kit due to extended vocabulary
- Downloads last month
- 170
Model tree for BosonLab/chatterbox-desi
Base model
ResembleAI/chatterbox