--- title: Ringg Parrot STT V1 emoji: 🦜 colorFrom: pink colorTo: blue sdk: static app_file: index.html pinned: false license: other license_name: Proprietary short_description: Proprietary Hindi-English code-mixed ASR system tags: - speech-to-text - asr - bilingual - english - hindi - code-mixed - audio - transcription - ringg - real-time --- # Ringg Parrot STT V1 High-accuracy Hindi-English code-mixed speech-to-text for business voice applications. [![Hugging Face Space](https://img.shields.io/badge/Hugging%20Face-Space-blue)](https://huggingface.co/spaces/RinggAI/STT) [![Model Access](https://img.shields.io/badge/Model-Proprietary-lightgrey)](#access-and-availability) ## Overview Ringg Parrot STT V1 is a proprietary automatic speech recognition system for Hindi, English, and Hindi-English code-mixed speech. It is designed for real-time voice products, business workflows, and production-grade speech-to-text use cases. This Hugging Face Space is intended for product evaluation and release information. The model weights, training code, and internal implementation are not open sourced. ## Access and Availability - Model weights: Not available for download from this repository. - Source code: Internal implementation is not open sourced. - Production and commercial access: Contact RinggAI at `sales@ringg.ai`. ## Playground Usage 1. Open the [Ringg STT Playground](https://ringg.ai/dashboard/stt). 2. Upload an audio file or use the available streaming interface. 3. Submit the audio for transcription. 4. Review the generated Hindi, English, or code-mixed transcript. ## SDK and Integration - Python SDK: [ringglabs on PyPI](https://pypi.org/project/ringglabs/) - Pipecat: Highly compatible with Pipecat toolkit using built-in VAD events. - For integration details, refer to the SDK documentation. ## Streaming Latency Typical streaming latency is 60-80ms. ## Benchmark Results WER stands for Word Error Rate. Lower values indicate better transcription accuracy. ### Original WER | Dataset | Ringg | ElevenLabs | Deepgram | Sarvam | |---|---:|---:|---:|---:| | indictts | **11.58** | 16.06 | 13.65 | 15.37 | | commonvoice | **14.30** | 16.59 | 20.04 | 18.21 | | fleurs | 15.20 | **11.99** | 17.14 | 16.00 | | kathbath | **11.78** | 13.24 | 15.93 | 17.53 | | kathbath_noisy | **13.09** | 13.14 | 17.44 | 16.19 | | mucs | 14.55 | **11.69** | 21.97 | 16.72 | | Overall WER | 13.79 | **13.00** | 19.23 | 16.72 | ### Normalized WER | Dataset | Ringg | ElevenLabs | Deepgram | Sarvam | |---|---:|---:|---:|---:| | indictts | **3.94** | 8.52 | 6.93 | 7.84 | | commonvoice | **6.37** | 13.02 | 14.88 | 13.06 | | fleurs | 9.73 | **7.67** | 11.35 | 9.54 | | kathbath | **7.15** | 10.15 | 11.38 | 10.41 | | kathbath_noisy | **8.37** | 10.01 | 12.98 | 11.78 | | mucs | **6.28** | 6.75 | 12.07 | 7.58 | | Overall WER | **7.27** | 8.94 | 12.36 | 9.76 | ## Features - Hindi-English code-mixed speech recognition. - Real-time streaming transcription. - File-based transcription for common audio formats. - Low-latency inference for voice products. - Business-focused deployment and integration support. - Compatibility with modern voice-agent pipelines. ## Supported Inputs - Languages: Hindi, English, and Hindi-English code-mixed speech. - Recommended audio: Clear speech with minimal background noise. - Sample rate: 16kHz or higher recommended. - Formats: WAV, MP3, FLAC, M4A, OGG, and OPUS. ## Best Practices - Use clear audio captured close to the speaker. - Reduce background noise, echo, and overlapping speech where possible. - Use 16kHz or higher audio for best results. - Test representative production audio before deployment. ## Use Cases - Voice assistants and AI agents. - Contact center transcription. - Meeting and conversation intelligence. - Voice search and commands. - Subtitling and content workflows. - Accessibility and documentation workflows. ## Limitations - Accuracy may vary with noisy audio, overlapping speakers, strong accents, dialect variation, or low-quality recordings. - Performance can vary across domains, speaker profiles, and audio capture setups. - Very long files or unsupported encodings may require preprocessing. - The hosted demo is intended for evaluation and may not reflect every production deployment configuration. ## Privacy and Data Notice Audio handling may depend on the selected deployment, integration, and commercial terms. Review RinggAI privacy terms and deployment documentation before using the service with sensitive, regulated, or personally identifiable data. ## Benchmark Dataset RinggAI has released the [ASR Benchmarking Open-Source Dataset](TODO-add-dataset-URL), which includes benchmark audio/data and transcriptions generated by Ringg, ElevenLabs, Deepgram, and Sarvam. ## Links - Playground: [Ringg STT Playground](https://ringg.ai/dashboard/stt) - SDK: [ringglabs on PyPI](https://pypi.org/project/ringglabs/) - Organization: [RinggAI on Hugging Face](https://huggingface.co/RinggAI) ## Team Built by the RinggAI Team.