malper/abjadsr-he
Fine-tuned from outputs/pretrain on ILSpeech (~2h Hebrew, studio quality). Given audio, outputs Hebrew text paired with IPA transcription.
Stage 2 of 2 โ use this model for inference.
Training
- Dataset: ILSpeech (~2h Hebrew, studio quality) โ train split, with 10% held out as dev
- Base model: outputs/pretrain
- Checkpoint: step 100 (best by dev token accuracy)
- Dev token accuracy: 97.1%
- Dev loss: 0.627
- Learning rate: 5e-06, warmup 100 steps
- Batch size: 1 ร 64 grad-accum steps ร 4 GPUs (effective 256)
Output format
Hebrew text paired with ASCII IPA transcription:
ืืืืื ืืื = hexl'it j'uz
IPA special characters are mapped to ASCII: สโS, สโZ, dสโdZ, tสโtS, สโq, หโ', สโr, ฯโx, ษกโg.
Usage
import torch
import soundfile as sf
from transformers import WhisperForConditionalGeneration, WhisperProcessor
model_id = "malper/abjadsr-he"
processor = WhisperProcessor.from_pretrained(model_id)
model = WhisperForConditionalGeneration.from_pretrained(model_id)
model.eval()
# Load audio (must be 16 kHz mono float32)
audio, sr = sf.read("audio.wav", dtype="float32", always_2d=False)
# resample if needed: torchaudio.functional.resample(torch.from_numpy(audio), sr, 16000).numpy()
inputs = processor(audio, sampling_rate=16000, return_tensors="pt")
forced_ids = processor.get_decoder_prompt_ids(language="he", task="transcribe")
with torch.no_grad():
generated = model.generate(
inputs.input_features,
forced_decoder_ids=forced_ids,
max_new_tokens=444,
)
output = processor.batch_decode(generated, skip_special_tokens=True)[0].strip()
print(output)
# e.g. "hex'lit j'uzem lena'tsel"
- Downloads last month
- 52
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support