| --- |
| license: mit |
| language: |
| - yua |
| metrics: |
| - spearman rho |
| tags: |
| - maya |
| - yucatec maya |
| - embeddings |
| - low-resource languages |
| - indigenous languages |
| --- |
| |
| # maya2vec |
|
|
| maya2vec is a model to encode word embeddings in Maya. |
|
|
|  |
|
|
|
|
| maya2vec embeddings use 512 dimensions and were trained using the Skip-gram with Negative Sampling algorithm (SGNS) on data from La Jornada Maya (collaboration agreement), CENTROGEO - SEDECULTA phrases (referenced in Agreement SEDECULTA-DASJ-149-04-2024) and [T'aantsil corpus project](https://taantsil.com.mx/info). |
|
|
|
|
| ## Dependencies |
|
|
| Install gensim 4.0 version or greater. |
|
|
| ``` |
| $ pip install gensim |
| |
| |
| ``` |
|
|
|
|
| ## Usage |
|
|
| See usage.py |
|
|
| ``` |
| import gensim |
| from gensim.models import Word2Vec |
| |
| maya2vec = './model_512_60_5_-0.25_0.7308_3.35E-05' |
| |
| # load global model |
| model = Word2Vec.load(maya2vec) |
| |
| # Try out cosine similarity (dog, standing) |
| sim = model.wv.similarity("peek'", "waalak'") |
| print('''similarity("peek'", "waalak'")''', sim) |
| |
| # Similarity between 'peek'' and 'waalak'': 0.9583 |
| ``` |
|
|
| Cite the paper please: https://journal.iberamia.org/index.php/intartif/article/view/2119 |
|
|
| ``` |
| Molina-Villegas, A., et al. (2025). Generating a Culturally and Linguistically Adapted Word Similarity Benchmark for Yucatec Maya. Inteligencia Artificial, 28(76), 283–300. https://doi.org/10.4114/intartif.vol28iss76pp283-300 |
| |
| @article{maya2vec, |
| title={Generating a Culturally and Linguistically Adapted Word Similarity Benchmark for Yucatec Maya}, |
| author={Molina-Villegas, Alejandro and Suro-Villalobos, Joel and Reyes-Magaña, Jorge and Fernandez-Sabido, Silvia}, |
| journal={Inteligencia Artificial}, |
| volume={28}, |
| number={76} |
| pages={283–300}, |
| year={2025}, |
| publisher={IBERAMIA}, |
| DOI={10.4114/intartif.vol28iss76pp283-300} |
| } |
| |
| ``` |
|
|
|
|
| ## License |
|
|
| Permission is hereby granted, free of charge, to any person obtaining a copy |
| of this software and associated documentation files (the "Software"), to deal |
| in the Software without restriction, including without limitation the rights |
| to use, copy, modify, merge, publish, distribute, sublicense, and/or sell |
| copies of the Software, and to permit persons to whom the Software is |
| furnished to do so, subject to the following conditions: |
|
|
| The above copyright notice and this permission notice shall be included in all |
| copies or substantial portions of the Software. |
|
|
| THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR |
| IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, |
| FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE |
| AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER |
| LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, |
| OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE |
| SOFTWARE. |
|
|