Fabio Augusto Suizu PRO
fabiosuizu
AI & ML interests
None yet
Recent Activity
posted an
update
1 day ago
Open Pronunciation Assessment API — 17MB model, sub-300ms, phoneme-level scoring
Hi everyone!
I've been working on a pronunciation assessment engine optimized for edge deployment and real-time feedback. Wanted to share it with the community and get feedback.
**What it does**: Scores English pronunciation at 4 levels of granularity — phoneme, word, sentence, and overall (0-100 each). Returns IPA and ARPAbet notation for every phoneme.
**Key specs**:
- 17MB total model size — runs entirely on CPU
- 257ms median inference latency
- Exceeds human inter-annotator agreement at phone-level (+4.5%) and sentence-level (+5.2%)
- Benchmarked on standard academic datasets (2,500+ test utterances)
- Validated across 7 L1 backgrounds (Chinese, Japanese, Korean, Arabic, Spanish, Vietnamese, Russian)
**Architecture**: Proprietary ML pipeline optimized for pronunciation assessment. The entire engine runs in 17MB — no GPU required, no large foundation models needed.
**Try it**: https://huggingface.co/spaces/fabiosuizu/pronunciation-assessment
The demo lets you record audio or upload a file, enter the expected text, and get instant scoring down to individual phonemes.
**API access**: Available via REST API, MCP servers (for AI agents), and Azure Marketplace. Details in the Space description.
Would love feedback on:
1. Use cases you'd find this useful for
2. Languages you'd want supported next
3. Whether the scoring feels calibrated for your experience level
Thanks!https://huggingface.co/spaces/fabiosuizu/pronunciation-assessment updated
a Space 2 days ago
fabiosuizu/pronunciation-assessment posted an
update
5 days ago
Hi everyone!
I've been working on a pronunciation assessment engine optimized for edge deployment and real-time feedback. Wanted to share it with the community and get feedback.
**What it does**: Scores English pronunciation at 4 levels of granularity — phoneme, word, sentence, and overall (0-100 each). Returns IPA and ARPAbet notation for every phoneme.
**Key specs**:
- 17MB total model size (NeMo Citrinet-256, INT4 quantized)
- 257ms median inference on CPU
- Exceeds human inter-annotator agreement at phone-level (+4.5%) and sentence-level (+5.2%)
- Benchmarked on speechocean762 (2,500 test utterances)
- Tested across 7 L1 backgrounds (Chinese, Japanese, Korean, Arabic, Spanish, Vietnamese, Russian)
**Architecture**: CTC forced alignment + Viterbi decoding + GOP (Goodness of Pronunciation) scoring + MLP/XGBoost ensemble heads. No wav2vec2 dependency — the entire pipeline runs in 17MB.
**Try it**: https://huggingface.co/spaces/fabiosuizu/pronunciation-assessment
The demo lets you record audio or upload a file, enter the expected text, and get instant scoring down to individual phonemes.
**API access**: Available via REST API, MCP servers (for AI agents), and Azure Marketplace. Details in the Space description.
Would love feedback on:
1. Use cases you'd find this useful for
2. Languages you'd want supported next
3. Whether the scoring feels calibrated for your experience level
Thanks! Organizations
None yet