Qwen3-ASR Bengali LoRA Fine-Tuning
This repository contains a fine-tuned LoRA adapter and a complete set of scripts to train, evaluate, and deploy Qwen3-ASR (specifically for Bengali Automatic Speech Recognition).
By leveraging Parameter-Efficient Fine-Tuning (PEFT) via LoRA, this project optimizes the 1.7B parameter Qwen3-ASR model to perform highly accurate transcriptions for Bengali audio while keeping VRAM usage extremely efficient.
π Key Features
- Efficient LoRA Training: Optimized for an A100 (80GB) using Flash Attention 2, bfloat16, TF32, and deep integration with PyTorch 2.9.
- Robust Pipeline: End-to-end data preparation, training, merging, and evaluating scripts.
- Web UI & Inference: Includes a customized Gradio app (
web_asr.py) for quick streaming and offline inference testing.
π Repository Structure
Data & Checkpoints
data_local/β Where your processed JSONL data (train, validation, test) and 16kHz WAVs live.models/β Original base model weights (Qwen3-ASR-1.7Bor0.6B).checkpoints_bangla_lora/β The output directory for LoRA adapters during training.
Pipeline Scripts
prep_data.pyHandles dataset downloading, preprocessing, resampling to 16kHz (mono), and strictly normalizing Bengali Unicode strings viabnunicodenormalizer. It generates thetrain.jsonlandvalidation.jsonlfiles.train_qwen3_lora.pyThe core training script. It patches the internal Qwen3 thinker forward pass, automatically manages LoRA wrapping, utilizes custom data collators (preventing audio duplication in cache), and avoids OOM issues with smart batching/cleanup callbacks.merge_lora.pyFuses the trained LoRA adapter weights back into the baseQwen3-ASR-1.7Bmodel, creating a standalone, fully capable inference model that doesn't rely on dynamically loading adapters.evaluate_lora.pyExtracts metrics (Word Error Rate / Character Error Rate) against your test dataset to measure how well the fine-tuning performed.test_inference.pyA lightweight script for running quick, pure-python terminal transcriptions to test adapters on the fly.web_asr.pyA responsive web application designed to load your model and expose an interactive GUI for microphone recordings and file uploads.
π οΈ Usage & Quick Start
1. Environment Setup
We recommend using uv or a dedicated .venv. Important dependencies are listed in pyproject.toml.
uv sync # Install dependencies (Transformers, PEFT, FLash Attention)
2. Prepare Data
Put your raw data into place and run the prep script. This ensures audio is strictly 16kHz and text is normalized.
uv run prep_data.py
3. Training
If you want to train on an A100 setup, simply run:
uv run train_qwen3_lora.py
Note: Configs like BATCH_SIZE, LORA_RANK (16), and LR (2e-4) can be adjusted right at the top of the file depending on your VRAM limits.
4. Evaluation & Merging
Once training completes:
# Evaluate the adapter's performance (WER/CER)
uv run evaluate_lora.py
# Merge the adapter permanently if you are satisfied
uv run merge_lora.py
5. Web UI
Instantly spin up the browser-based transcription interface:
uv run web_asr.py
π Evaluation Benchmarks
This fine-tuned adapter (1.7B) was evaluated on a held-out test set of 2,000 randomized Bengali audio samples against the original Base model.
| Model | WER (%) | CER (%) | RTFx (Native HF generate) | Note |
|---|---|---|---|---|
| Qwen3-Bengali-LoRA (Ours) | 20.70% | 7.61% | 15.60x | Highly accurate transcriber across local dialects. |
| Base Qwen3-ASR (1.7B) | 72.25% | 41.79% | 28.44x | Failed to transcribe properly. Often hallucinated Hindi instead of Bengali. |
| Whisper Large-v3 | Pending | Pending | - | - |
(Note: The LoRA achieved this 50%+ relative error reduction without breaking the fundamental chat template structures. Evaluation was performed using the native transformers backend on a single NVIDIA A100 80GB GPU. RTFx throughput was measured on dynamic Peft un-fused generation which inherently carries dynamic routing overhead. Fused vLLM inference will yield 200+ RTFx).
Citation
@article{Qwen3ASR,
title={Qwen3-ASR: A Unified Speech Recognition and Forced Alignment Model},
author={Qwen Team},
year={2026}
}
- Downloads last month
- -