Pankaj8922/better-opus-mt-en-hi

Fine-tuned MarianMT model for English β†’ Hindi translation. This model is trained on AI4Bharat's Samanantar dataset, which contains over 10 million high-quality parallel sentences.

πŸ” Model Details

  • Base model: Helsinki-NLP/opus-mt-en-hi
  • Fine-tuned on: ai4bharat/samanantar English–Hindi subset
  • Total params: ~77M (MarianMT)
  • Framework: Hugging Face Transformers

πŸ“Š Performance (BLEU / chrF on 500 samples from Namratap/En-Hindi)

Domain Base BLEU Fine-tuned BLEU Base chrF Fine-tuned chrF
Healthcare 15.54 27.95 38.06 54.09
Gen News 14.11 26.31 39.07 52.98
Culture/Tourism 12.76 18.49 35.07 41.32
Education 20.28 28.82 43.84 49.68

βœ… BLEU improvements of +8 to +13 points across domains
βœ… chrF boosts up to +16 points, reflecting better fluency and coverage

🧠 Use Cases

  • Book and news translation (Hindi)
  • Offline/secure translation pipelines
  • Domain-adapted fine-tuning

πŸ“ Files Included

  • pytorch_model.bin β€” fine-tuned model weights
  • config.json β€” model architecture
  • tokenizer_config.json, vocab.json, source.spm, target.spm β€” tokenizer
  • generation_config.json β€” default decoding setup

βš–οΈ License

Apache 2.0 (Same as original model and Samanantar dataset)

Downloads last month
5
Safetensors
Model size
77M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for AI4INDIANS/better-opus-mt-en-hi

Finetuned
(58)
this model

Dataset used to train AI4INDIANS/better-opus-mt-en-hi