Add model documentation
Browse files
README.md
ADDED
|
@@ -0,0 +1,86 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Nepali Automatic Speech Recognition (ASR)
|
| 2 |
+
|
| 3 |
+
## Overview
|
| 4 |
+
Fine-tuning and inference for Nepali language speech recognition using Wav2Vec2 and Whisper models.
|
| 5 |
+
|
| 6 |
+
## Model Details
|
| 7 |
+
|
| 8 |
+
| Property | Value |
|
| 9 |
+
|----------|-------|
|
| 10 |
+
| **Model ID** | `Saugat212/ASR_MODEL` |
|
| 11 |
+
| **Base Model** | facebook/wav2vec2-base |
|
| 12 |
+
| **Architecture** | wav2vec2 |
|
| 13 |
+
| **Parameters** | 0.3B |
|
| 14 |
+
| **Language** | Nepali |
|
| 15 |
+
|
| 16 |
+
## Purpose
|
| 17 |
+
|
| 18 |
+
- Convert Nepali speech audio to text
|
| 19 |
+
- Fine-tune Wav2Vec2 on Nepali datasets
|
| 20 |
+
- Evaluate ASR performance using WER metric
|
| 21 |
+
|
| 22 |
+
## Contents
|
| 23 |
+
|
| 24 |
+
| File | Description |
|
| 25 |
+
|------|-------------|
|
| 26 |
+
| `whisper_transcription.ipynb` | Whisper model for Nepali speech-to-text transcription |
|
| 27 |
+
| `wav2vec2_finetuning.ipynb` | Wav2Vec2 fine-tuning recipe for Nepali ASR |
|
| 28 |
+
| `wav2vec2_finetune.py` | Python script for Wav2Vec2 fine-tuning |
|
| 29 |
+
| `finetune.py` | ASR fine-tuning script |
|
| 30 |
+
| `Dataset/` | Training datasets (CSV files with audio paths and transcriptions) |
|
| 31 |
+
| `Phase 1/Finetuning/` | Phase 1 training data, checkpoints, and inference notebooks |
|
| 32 |
+
|
| 33 |
+
## Usage
|
| 34 |
+
|
| 35 |
+
### Load Model
|
| 36 |
+
```python
|
| 37 |
+
from transformers import Wav2Vec2Processor, Wav2Vec2ForCTC
|
| 38 |
+
|
| 39 |
+
model_name = "Saugat212/ASR_MODEL"
|
| 40 |
+
processor = Wav2Vec2Processor.from_pretrained(model_name)
|
| 41 |
+
model = Wav2Vec2ForCTC.from_pretrained(model_name)
|
| 42 |
+
```
|
| 43 |
+
|
| 44 |
+
### Inference
|
| 45 |
+
```python
|
| 46 |
+
import torchaudio
|
| 47 |
+
import torch
|
| 48 |
+
|
| 49 |
+
# Load audio
|
| 50 |
+
waveform, sample_rate = torchaudio.load("audio.wav")
|
| 51 |
+
|
| 52 |
+
# Process
|
| 53 |
+
input_values = processor(waveform.squeeze(), return_tensors="pt", sampling_rate=sample_rate).input_values
|
| 54 |
+
|
| 55 |
+
# Infer
|
| 56 |
+
with torch.no_grad():
|
| 57 |
+
logits = model(input_values).logits
|
| 58 |
+
predicted_ids = torch.argmax(logits, dim=-1)
|
| 59 |
+
|
| 60 |
+
# Decode
|
| 61 |
+
transcription = processor.batch_decode(predicted_ids)[0]
|
| 62 |
+
print(transcription)
|
| 63 |
+
```
|
| 64 |
+
|
| 65 |
+
## Models Available
|
| 66 |
+
|
| 67 |
+
- **Wav2Vec2**: `Saugat212/ASR_MODEL` - Fine-tuned Nepali ASR
|
| 68 |
+
- **Whisper**: OpenAI Whisper for alternative transcription
|
| 69 |
+
|
| 70 |
+
## Dataset
|
| 71 |
+
|
| 72 |
+
- Located in `Dataset/`
|
| 73 |
+
- Contains `final_transcriptions.csv` with audio paths and transcriptions
|
| 74 |
+
- Cleaned data in `cleaned_data.csv`
|
| 75 |
+
|
| 76 |
+
## Requirements
|
| 77 |
+
|
| 78 |
+
- transformers
|
| 79 |
+
- torchaudio
|
| 80 |
+
- datasets
|
| 81 |
+
- evaluate
|
| 82 |
+
- jiwer
|
| 83 |
+
|
| 84 |
+
## Fine-tuning
|
| 85 |
+
|
| 86 |
+
See `wav2vec2_finetuning.ipynb` for complete fine-tuning pipeline.
|