Qwen3-ASR Bengali LoRA Fine-Tuning

This repository contains a fine-tuned LoRA adapter and a complete set of scripts to train, evaluate, and deploy Qwen3-ASR (specifically for Bengali Automatic Speech Recognition).

By leveraging Parameter-Efficient Fine-Tuning (PEFT) via LoRA, this project optimizes the 1.7B parameter Qwen3-ASR model to perform highly accurate transcriptions for Bengali audio while keeping VRAM usage extremely efficient.

🚀 Key Features

Efficient LoRA Training: Optimized for an A100 (80GB) using Flash Attention 2, bfloat16, TF32, and deep integration with PyTorch 2.9.
Robust Pipeline: End-to-end data preparation, training, merging, and evaluating scripts.
Web UI & Inference: Includes a customized Gradio app (web_asr.py) for quick streaming and offline inference testing.

📁 Repository Structure

Data & Checkpoints

data_local/ — Where your processed JSONL data (train, validation, test) and 16kHz WAVs live.
models/ — Original base model weights (Qwen3-ASR-1.7B or 0.6B).
checkpoints_bangla_lora/ — The output directory for LoRA adapters during training.

Pipeline Scripts

prep_data.py Handles dataset downloading, preprocessing, resampling to 16kHz (mono), and strictly normalizing Bengali Unicode strings via bnunicodenormalizer. It generates the train.jsonl and validation.jsonl files.
train_qwen3_lora.py The core training script. It patches the internal Qwen3 thinker forward pass, automatically manages LoRA wrapping, utilizes custom data collators (preventing audio duplication in cache), and avoids OOM issues with smart batching/cleanup callbacks.
merge_lora.py Fuses the trained LoRA adapter weights back into the base Qwen3-ASR-1.7B model, creating a standalone, fully capable inference model that doesn't rely on dynamically loading adapters.
evaluate_lora.py Extracts metrics (Word Error Rate / Character Error Rate) against your test dataset to measure how well the fine-tuning performed.
test_inference.py A lightweight script for running quick, pure-python terminal transcriptions to test adapters on the fly.
web_asr.py A responsive web application designed to load your model and expose an interactive GUI for microphone recordings and file uploads.

🛠️ Usage & Quick Start

1. Environment Setup

We recommend using uv or a dedicated .venv. Important dependencies are listed in pyproject.toml.

uv sync  # Install dependencies (Transformers, PEFT, FLash Attention)

2. Prepare Data

Put your raw data into place and run the prep script. This ensures audio is strictly 16kHz and text is normalized.

uv run prep_data.py

3. Training

If you want to train on an A100 setup, simply run:

uv run train_qwen3_lora.py

Note: Configs like BATCH_SIZE, LORA_RANK (16), and LR (2e-4) can be adjusted right at the top of the file depending on your VRAM limits.

4. Evaluation & Merging

Once training completes:

# Evaluate the adapter's performance (WER/CER)
uv run evaluate_lora.py

# Merge the adapter permanently if you are satisfied
uv run merge_lora.py

5. Web UI

Instantly spin up the browser-based transcription interface:

uv run web_asr.py

📊 Evaluation Benchmarks

This fine-tuned adapter (1.7B) was evaluated on a held-out test set of 2,000 randomized Bengali audio samples against the original Base model.

Model	WER (%)	CER (%)	RTFx (Native HF generate)	Note
Qwen3-Bengali-LoRA (Ours)	20.70%	7.61%	15.60x	Highly accurate transcriber across local dialects.
Base Qwen3-ASR (1.7B)	72.25%	41.79%	28.44x	Failed to transcribe properly. Often hallucinated Hindi instead of Bengali.
Whisper Large-v3	Pending	Pending	-	-

(Note: The LoRA achieved this 50%+ relative error reduction without breaking the fundamental chat template structures. Evaluation was performed using the native transformers backend on a single NVIDIA A100 80GB GPU. RTFx throughput was measured on dynamic Peft un-fused generation which inherently carries dynamic routing overhead. Fused vLLM inference will yield 200+ RTFx).

Citation

@article{Qwen3ASR,
  title={Qwen3-ASR: A Unified Speech Recognition and Forced Alignment Model},
  author={Qwen Team},
  year={2026}
}

Downloads last month: -