Esperanto -> Catalan, English, Spanish MT Model

Model description

This repository contains a multilingual MarianMT model for Esperanto → (English, Spanish, Catalan) translation using language tags with tiny architecture.

This model is not intended for direct inference through the Hugging Face transformers library.

Use Marian for inference instead.

The repository includes the following files:

model.npz.best-chrf.npz — trained Marian model checkpoint
tiny.decoder.yml — decoder configuration
vocab.spm — SentencePiece vocabulary
run_model.sh — Example script on how to run the model

Supported target languages (via tags)

You control the target language by prefixing the source sentence with one of the following tags:

>>eng<< → English
>>spa<< → Spanish
>>cat<< → Catalan

Training data

The model was trained using Tatoeba parallel data, with FLORES-200 used as the development set.

Training sentence-pair counts:

ca-eo: 672,931
es-eo: 4,677,945
eo-en: 5,000,000

Inference

Run decoding from inside the model directory:

cat input.epo |  sed "s/^/>>cat<< /"  \
  marian-decoder \
  -c tiny.decoder.yml \
  --output output.cat \
  --normalize \
  -m model.npz.best-chrf.npz \
  --vocabs vocab.spm vocab.spm \
  --log decode.log \
  --devices 0

Downloads last month: -; Downloads are not tracked for this model. How to track

Collection including Helsinki-NLP/opus-mt-eo-caenes_tiny

Open Machine Translation for Esperanto

Collection

Open-source models, datasets, and code for machine translation to and from Esperanto. • 4 items • Updated 1 day ago