Qwen3-ASR Arabic โ€” UAE Emirati Dialect

Fine-tuned Qwen/Qwen3-ASR-1.7B for UAE Emirati Arabic dialect speech recognition.

Results

Metric Zero-shot (base) Fine-tuned Improvement
WER 13.53% 9.98% -26%
CER 3.33% 2.55% -23%

Evaluated on 2,497 UAE Arabic validation samples.

What improved

  • Matches informal Emirati dialect style (ุดูŠ vs ุดูŠุก, ุงู„ุงู…ุงูƒู† vs ุงู„ุฃู…ุงูƒู†)
  • Removes spurious punctuation that the base model adds
  • Better handling of dialect-specific words and expressions

Training Details

  • Base model: Qwen/Qwen3-ASR-1.7B (2B params, audio encoder + 1.7B LLM decoder)
  • Training data: ~22,500 UAE Emirati Arabic dialect samples from vadimbelsky/UAE_Arabic_English_Bilingual_Dataset_40k
  • Strategy: Audio encoder frozen, only LLM decoder fine-tuned (84.4% of params)
  • Precision: bfloat16
  • Epochs: 3
  • Effective batch size: 32 (batch 2 ร— gradient accumulation 16)
  • Learning rate: 2e-5 with linear schedule
  • Gradient checkpointing: enabled
  • Text normalization: Diacritics removed, alef/teh marbuta normalized, punctuation stripped

Usage

from qwen_asr import Qwen3ASRModel

model = Qwen3ASRModel.from_pretrained("vadimbelsky/qwen3-asr-arabic-uae")
result = model.transcribe("audio.wav", language="Arabic")
print(result)

Or with transformers directly:

from transformers import AutoModelForCausalLM, AutoProcessor

model = AutoModelForCausalLM.from_pretrained("vadimbelsky/qwen3-asr-arabic-uae")
processor = AutoProcessor.from_pretrained("vadimbelsky/qwen3-asr-arabic-uae")

Limitations

  • Trained on synthetic/generated Arabic speech data
  • Optimized for UAE Emirati dialect โ€” may not generalize to other Arabic dialects
  • Short utterances only (training data mostly < 20s)

License

Apache 2.0 (same as base model)

Downloads last month
43
Safetensors
Model size
2B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for vadimbelsky/qwen3-asr-arabic-uae

Finetuned
(42)
this model

Space using vadimbelsky/qwen3-asr-arabic-uae 1

Evaluation results