Open Machine Translation for Esperanto
Collection
Open-source models, datasets, and code for machine translation to and from Esperanto. β’ 4 items β’ Updated
This repository contains a multilingual MarianMT model for Esperanto β (English, Spanish, Catalan) translation using language tags with tiny architecture.
This model is not intended for direct inference through the Hugging Face transformers library.
Use Marian for inference instead.
The repository includes the following files:
model.npz.best-chrf.npz β trained Marian model checkpointtiny.decoder.yml β decoder configurationvocab.spm β SentencePiece vocabularyrun_model.sh β Example script on how to run the modelYou control the target language by prefixing the source sentence with one of the following tags:
>>eng<< β English>>spa<< β Spanish>>cat<< β CatalanThe model was trained using Tatoeba parallel data, with FLORES-200 used as the development set.
Training sentence-pair counts:
Run decoding from inside the model directory:
cat input.epo | sed "s/^/>>cat<< /" \
marian-decoder \
-c tiny.decoder.yml \
--output output.cat \
--normalize \
-m model.npz.best-chrf.npz \
--vocabs vocab.spm vocab.spm \
--log decode.log \
--devices 0