Spaces:
Configuration error
Configuration error
Update README.md
Browse files
README.md
CHANGED
|
@@ -1,10 +1,36 @@
|
|
| 1 |
-
|
| 2 |
-
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
## Introduction
|
| 2 |
+
|
| 3 |
+
LLMTrad-IBE is a strategic research initiative dedicated to overcoming the digital divide affecting the minority Romance languages of the Iberian Peninsula. By leveraging state-of-the-art Natural Language Processing (NLP), we aim to ensure these languages are not left behind in the era of Artificial Intelligence.
|
| 4 |
+
|
| 5 |
+
This project is a key component of the AI-TraLow coordinated framework (AI-Driven Translation for Low-Resource Languages and Cultures), supported by the Spanish Ministry of Science, Innovation, and Universities (MCIU/AEI/10.13039/501100011033/FEDER, UE) under reference PID2024-158157OB-C33.
|
| 6 |
+
|
| 7 |
+
## Mission and Scope
|
| 8 |
+
|
| 9 |
+
Our research focuses on the development, adaptation, and evaluation of Large Language Models (LLMs) for four specific linguistic varieties characterized by limited digital resources:
|
| 10 |
+
|
| 11 |
+
* Asturian
|
| 12 |
+
* Aragonese
|
| 13 |
+
* Aranese
|
| 14 |
+
* Eonavian
|
| 15 |
+
|
| 16 |
+
## Strategic Research Areas
|
| 17 |
+
|
| 18 |
+
We employ a hybrid methodology that integrates the structural precision of symbolic systems with the generative power of neural architectures:
|
| 19 |
+
|
| 20 |
+
* LLM Specialization: Fine-tuning decoder-only architectures and exploring parameter-efficient strategies (PEFT) for translation.
|
| 21 |
+
* Knowledge Distillation: Developing compact and efficient models to facilitate sustainable deployment in standard computing environments.
|
| 22 |
+
* Resource Synthesis: Expanding Apertium-based lexical resources and curating high-quality benchmarks, including FLORES+ and NTREX adaptations.
|
| 23 |
+
* Ethical AI: Implementing rigorous evaluation frameworks to detect and mitigate gender bias and ensure linguistic authenticity.
|
| 24 |
+
|
| 25 |
+
## Collaborative Network
|
| 26 |
+
|
| 27 |
+
LLMTrad-IBE thrives on the synergy between leading academic institutions:
|
| 28 |
+
|
| 29 |
+
* Universitat Oberta de Catalunya (UOC) — Coordinating Institution
|
| 30 |
+
* Universitat Autònoma de Barcelona (UAB)
|
| 31 |
+
* Universidad de Oviedo
|
| 32 |
+
* Universidad de Zaragoza
|
| 33 |
+
|
| 34 |
+
## Commitment to Open Science
|
| 35 |
+
|
| 36 |
+
As part of our commitment to the scientific community and linguistic heritage, all models, datasets, and tools developed within this project are released under permissive open-source licenses.
|