Catalan, English, Spanish -> Esperanto MT Model

Model description

This repository contains a multilingual MarianMT model for (English, Spanish, Catalan) โ†’ Esperanto translation with tiny architecture.

This model is not intended for direct inference through the Hugging Face transformers library.

Use Marian for inference instead.

The repository includes the following files:

  • model.npz.best-chrf.npz โ€” trained Marian model checkpoint
  • tiny.decoder.yml โ€” decoder configuration
  • vocab.spm โ€” SentencePiece vocabulary
  • run_model.sh โ€” Example script on how to run the model

Training data

The model was trained using Tatoeba parallel data, with FLORES-200 used as the development set.

Training sentence-pair counts:

  • ca-eo: 672,931
  • es-eo: 4,677,945
  • eo-en: 5,000,000

Inference

Run decoding from inside the model directory:

cat input.spa  \
  marian-decoder \
  -c tiny.decoder.yml \
  --output output.epo \
  --normalize \
  -m model.npz.best-chrf.npz \
  --vocabs vocab.spm vocab.spm \
  --log decode.log \
  --devices 0
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Collection including Helsinki-NLP/opus-mt-caenes-eo_tiny