pocket-tts-web / README.md
Kevin Knoedler
Add: multilingual v2 bundles
d0c0c79
metadata
title: Pocket TTS ONNX Web Demo
emoji: πŸŒ–
colorFrom: yellow
colorTo: pink
sdk: static
app_file: index.html
pinned: false
license: cc-by-4.0
short_description: Multilingual Pocket TTS voice cloning in the browser (CPU)
models:
  - KevinAHM/pocket-tts-onnx
custom_headers:
  cross-origin-embedder-policy: require-corp
  cross-origin-opener-policy: same-origin
  cross-origin-resource-policy: cross-origin

Pocket TTS Web Demo

Browser-only Pocket TTS inference with multilingual INT8 ONNX bundles and voice cloning.

Supported Bundles

  • english_2026-04
  • german
  • italian
  • portuguese
  • spanish

The web demo intentionally skips the 24l variants and only ships the current April 2026 English checkpoint.

Features

  • Multilingual bundle selector in the UI
  • Built-in voices for every shipped language bundle
  • Custom voice cloning from uploaded audio
  • INT8 ONNX inference in the browser
  • Streaming playback with low latency

Bundle Layout

Each language lives under onnx/<language>/ and includes:

  • bundle.json
  • tokenizer.model
  • bos_before_voice.npy
  • voices.bin
  • mimi_encoder_int8.onnx
  • text_conditioner_int8.onnx
  • flow_lm_main_int8.onnx
  • flow_lm_flow_int8.onnx
  • mimi_decoder_int8.onnx

voices.bin is a local browser asset containing the compact built-in voice states for that language bundle.

Browser Requirements

  • Modern browser with WebAssembly support
  • Chrome, Edge, Firefox, or Safari
  • Secure context (https:// or localhost)
  • Cross-origin isolation headers for threaded ONNX Runtime Web

Voice Cloning

  1. Select a language bundle.
  2. Choose a built-in voice or upload your own sample.
  3. Use a short clean reference clip for best results.
  4. Generate directly in the browser.

File Structure

pocket-tts-web/
β”œβ”€β”€ index.html
β”œβ”€β”€ onnx-streaming.js
β”œβ”€β”€ inference-worker.js
β”œβ”€β”€ PCMPlayerWorklet.js
β”œβ”€β”€ EventEmitter.js
β”œβ”€β”€ sentencepiece.js
β”œβ”€β”€ style.css
└── onnx/
    β”œβ”€β”€ english_2026-04/
    β”œβ”€β”€ german/
    β”œβ”€β”€ italian/
    β”œβ”€β”€ portuguese/
    └── spanish/

License

  • Models and bundled voice assets inherit the Pocket TTS licensing terms from kyutai/pocket-tts.
  • Code is Apache 2.0.