Saugat212 commited on
Commit
0dea481
·
verified ·
1 Parent(s): db77171

Add model documentation

Browse files
Files changed (1) hide show
  1. README.md +86 -0
README.md ADDED
@@ -0,0 +1,86 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Nepali Automatic Speech Recognition (ASR)
2
+
3
+ ## Overview
4
+ Fine-tuning and inference for Nepali language speech recognition using Wav2Vec2 and Whisper models.
5
+
6
+ ## Model Details
7
+
8
+ | Property | Value |
9
+ |----------|-------|
10
+ | **Model ID** | `Saugat212/ASR_MODEL` |
11
+ | **Base Model** | facebook/wav2vec2-base |
12
+ | **Architecture** | wav2vec2 |
13
+ | **Parameters** | 0.3B |
14
+ | **Language** | Nepali |
15
+
16
+ ## Purpose
17
+
18
+ - Convert Nepali speech audio to text
19
+ - Fine-tune Wav2Vec2 on Nepali datasets
20
+ - Evaluate ASR performance using WER metric
21
+
22
+ ## Contents
23
+
24
+ | File | Description |
25
+ |------|-------------|
26
+ | `whisper_transcription.ipynb` | Whisper model for Nepali speech-to-text transcription |
27
+ | `wav2vec2_finetuning.ipynb` | Wav2Vec2 fine-tuning recipe for Nepali ASR |
28
+ | `wav2vec2_finetune.py` | Python script for Wav2Vec2 fine-tuning |
29
+ | `finetune.py` | ASR fine-tuning script |
30
+ | `Dataset/` | Training datasets (CSV files with audio paths and transcriptions) |
31
+ | `Phase 1/Finetuning/` | Phase 1 training data, checkpoints, and inference notebooks |
32
+
33
+ ## Usage
34
+
35
+ ### Load Model
36
+ ```python
37
+ from transformers import Wav2Vec2Processor, Wav2Vec2ForCTC
38
+
39
+ model_name = "Saugat212/ASR_MODEL"
40
+ processor = Wav2Vec2Processor.from_pretrained(model_name)
41
+ model = Wav2Vec2ForCTC.from_pretrained(model_name)
42
+ ```
43
+
44
+ ### Inference
45
+ ```python
46
+ import torchaudio
47
+ import torch
48
+
49
+ # Load audio
50
+ waveform, sample_rate = torchaudio.load("audio.wav")
51
+
52
+ # Process
53
+ input_values = processor(waveform.squeeze(), return_tensors="pt", sampling_rate=sample_rate).input_values
54
+
55
+ # Infer
56
+ with torch.no_grad():
57
+ logits = model(input_values).logits
58
+ predicted_ids = torch.argmax(logits, dim=-1)
59
+
60
+ # Decode
61
+ transcription = processor.batch_decode(predicted_ids)[0]
62
+ print(transcription)
63
+ ```
64
+
65
+ ## Models Available
66
+
67
+ - **Wav2Vec2**: `Saugat212/ASR_MODEL` - Fine-tuned Nepali ASR
68
+ - **Whisper**: OpenAI Whisper for alternative transcription
69
+
70
+ ## Dataset
71
+
72
+ - Located in `Dataset/`
73
+ - Contains `final_transcriptions.csv` with audio paths and transcriptions
74
+ - Cleaned data in `cleaned_data.csv`
75
+
76
+ ## Requirements
77
+
78
+ - transformers
79
+ - torchaudio
80
+ - datasets
81
+ - evaluate
82
+ - jiwer
83
+
84
+ ## Fine-tuning
85
+
86
+ See `wav2vec2_finetuning.ipynb` for complete fine-tuning pipeline.