Fabio Augusto Suizu's picture

Fabio Augusto Suizu PRO

fabiosuizu

·

AI & ML interests

None yet

Recent Activity

posted an update about 2 months ago

Open Pronunciation Assessment API — 17MB model, sub-300ms, phoneme-level scoring Hi everyone! I've been working on a pronunciation assessment engine optimized for edge deployment and real-time feedback. Wanted to share it with the community and get feedback. **What it does**: Scores English pronunciation at 4 levels of granularity — phoneme, word, sentence, and overall (0-100 each). Returns IPA and ARPAbet notation for every phoneme. **Key specs**: - 17MB total model size — runs entirely on CPU - 257ms median inference latency - Exceeds human inter-annotator agreement at phone-level (+4.5%) and sentence-level (+5.2%) - Benchmarked on standard academic datasets (2,500+ test utterances) - Validated across 7 L1 backgrounds (Chinese, Japanese, Korean, Arabic, Spanish, Vietnamese, Russian) **Architecture**: Proprietary ML pipeline optimized for pronunciation assessment. The entire engine runs in 17MB — no GPU required, no large foundation models needed. **Try it**: https://huggingface.co/spaces/fabiosuizu/pronunciation-assessment The demo lets you record audio or upload a file, enter the expected text, and get instant scoring down to individual phonemes. **API access**: Available via REST API, MCP servers (for AI agents), and Azure Marketplace. Details in the Space description. Would love feedback on: 1. Use cases you'd find this useful for 2. Languages you'd want supported next 3. Whether the scoring feels calibrated for your experience level Thanks!https://huggingface.co/spaces/fabiosuizu/pronunciation-assessment

updated a Space about 2 months ago

fabiosuizu/pronunciation-assessment

posted an update about 2 months ago

Hi everyone! I've been working on a pronunciation assessment engine optimized for edge deployment and real-time feedback. Wanted to share it with the community and get feedback. **What it does**: Scores English pronunciation at 4 levels of granularity — phoneme, word, sentence, and overall (0-100 each). Returns IPA and ARPAbet notation for every phoneme. **Key specs**: - 17MB total model size (NeMo Citrinet-256, INT4 quantized) - 257ms median inference on CPU - Exceeds human inter-annotator agreement at phone-level (+4.5%) and sentence-level (+5.2%) - Benchmarked on speechocean762 (2,500 test utterances) - Tested across 7 L1 backgrounds (Chinese, Japanese, Korean, Arabic, Spanish, Vietnamese, Russian) **Architecture**: CTC forced alignment + Viterbi decoding + GOP (Goodness of Pronunciation) scoring + MLP/XGBoost ensemble heads. No wav2vec2 dependency — the entire pipeline runs in 17MB. **Try it**: https://huggingface.co/spaces/fabiosuizu/pronunciation-assessment The demo lets you record audio or upload a file, enter the expected text, and get instant scoring down to individual phonemes. **API access**: Available via REST API, MCP servers (for AI agents), and Azure Marketplace. Details in the Space description. Would love feedback on: 1. Use cases you'd find this useful for 2. Languages you'd want supported next 3. Whether the scoring feels calibrated for your experience level Thanks!

View all activity

Organizations

None yet

fabiosuizu 's models

None public yet