Work In Progress New versions will be available during the coming weeks/months.
Sampling Parameters: For optimal performance, we recommend using temperatures close to zero (0 - 0.2). Additionally, we advise against using any type of repetition penalty, as from our experience, it negatively impacts instructed model's responses.
ALIA-40b-instruct-GGUF
- Model creator: BSC-LT.
- Original model: ALIA-40b-instruct-2601
Description
This repo contains GGUF format model files for ALIA-40b-instruct-2601.
Quantization
Model weights were exported to GGUF in FP16 first, then quantized with llama.cpp’s llama-quantize into the target preset (e.g., Q8). The same conversion + quantization pipeline was executed through quantool from a single YAML config (e.g., method: gguf, quant_level, and method-specific quantization_config) and run via quantool config.yaml.
About GGUF
GGUF is the model file format introduced by the llama.cpp team on August 21st, 2023, replacing the older GGML format (now deprecated). It brings significant improvements such as enhanced tokenization, proper handling of special tokens, embedded metadata (e.g., architecture, quantization type, tokenizer), and an extensible design for future compatibility.
How to use
Deployment as service and remote use (Messages API)
In our experience, vllm works well for deploying the full unquantized version of the model, whereas llama.cpp is appropriate for the quantized (GGUF) version.
We strongly discourage using ollama as we have encountered compatibility issues that may seriously degrade the model's performance.
The easiest and most reliable way to have a working deployment of ALIA-40b-instruct is through the "Deploy / HF Inference Endpoints" option directly on the Hugging Face model page. This automatically creates a functioning endpoint, using vllm or llama.cpp according to the model variant, with an appropriately dimensioned GPU. While there are additional settings available for the endpoint we found the standard configuration proposed by Hugging Face to be a reasonable starting point.
Once the endpoint is running, the model can be easily called using OpenAI's "Messages API" (the de facto standard API for LLM use). By using this API the chat template is applied automatically by the service, requiring no explicit configuration on the client side. The endpoint's configuration page on Hugging Face also provides a "Playground" for testing and API examples, as well as a simple chat interface.
Example usage:
# pip install openai
from openai import OpenAI
client = OpenAI(
base_url = YOUR_ENDPOINT_URL,
api_key = "$HF_TOKEN"
)
chat_completion = client.chat.completions.create(
model = "BSC-LT/ALIA-40b-instruct-2601-GGUF",
messages = [
{
"role": "user",
"content": "What is deep learning?"
}
],
stream = True,
max_tokens = 1000,
temperature=0.1
)
print(chat_completion.choices[0].message.content)
The model can also be deployed locally or on any server infrastructure with sufficient GPUs, using vllm or llama.cpp. We recommend an initial deployment on Hugging Face as a point of reference and comparison to make sure the model is behaving as expected in the desired deployment setup.
To check that your endpoint is working correctly, you can try to replicate the examples contained in this Colab Notebook
Evaluation
Gold-standard benchmarks
Evaluation is done using the Language Model Evaluation Harness (Gao et al., 2024). We evaluate on a set of tasks taken from SpanishBench, CatalanBench, BasqueBench and GalicianBench, as well as existing English tasks available in the LM Evaluation Harness. These benchmarks include both new and existing tasks and datasets. The tables below report results for a representative selection of evaluation datasets, capturing model's performance across a variety of tasks within these benchmarks.
Only tasks that are human-generated, human-translated, or involve strong human-in-the-loop process (i.e., machine translation followed by professional revision or machine generation followed by human revision and annotation) were used. This approach explains the variation in the number of tasks reported across languages. As additional high-quality tasks are published, we will update the evaluation results accordingly. We also plan to expand evaluation to other languages, provided that the datasets meet our quality standards.
During the implementation of the evaluation we observed a series of issues worth considering when replicating and interpreting the results presented. These issues include ≈1.5% variances in performance in some tasks depending on the version of the transformers library used, and depending on the use (or lack of use) of tensor parallelism when loading a model. When implementing existing tasks, we carry out a comprehensive quality evaluation of the dataset, the Harness task itself, and what kind of input models see during evaluation. Our implementation (see links above) addresses multiple existing problems such as errors in datasets and prompts, and lack of pre-processing. All this means that results will vary if using other Harness implementations, and may slightly vary depending on the replication setup.
It should be noted that these results are subject to all the drawbacks of every current gold-standard evaluation, and that the figures do not fully represent the model's capabilities and potential. We thus advise caution when reading and interpreting the results.
All results reported below correspond to a 0-shot evaluation setting.
Spanish
| task | metric | result |
|---|---|---|
| belebele_spa_Latn | acc | 0.73 |
| cocoteros_es | bleu | 0.02 |
| cocoteros_es | rouge1 | 0.31 |
| copa_es | acc | 0.75 |
| escola | mcc | 0 |
| flores_es | bleu | 0.25 |
| mmmlu_es | acc | 0.38 |
| openbookqa_es | acc | 0.22 |
| paws_es_spanish_bench | acc | 0.32 |
| phrases_es-va | bleu | 0.57 |
| phrases_va-es | bleu | 0.66 |
| wnli_es | acc | 0.01 |
| xlsum_es | bleu | 0.02 |
| xnli_es_spanish_bench | acc | 0.27 |
| xquad_es | f1 | 0.07 |
| xstorycloze_es | acc | 0.51 |
Catalan
| task | metric | result |
|---|---|---|
| arc_ca_challenge | acc | 0.34 |
| arc_ca_easy | acc | 0.67 |
| belebele_cat_Latn | acc | 0.73 |
| cabreu_abstractive | bleu | 0.06 |
| cabreu_extractive | rouge1 | 0.19 |
| cabreu_extreme | bleu | 0.03 |
| catalanqa | f1 | 0.11 |
| catcola | mcc | 0 |
| cocoteros_va | bleu | 0.01 |
| cocoteros_va | rouge1 | 0.3 |
| copa_ca | acc | 0.74 |
| flores_ca | bleu | 0.31 |
| mgsm_direct_ca | exact_match,flexible-extract | 0.64 |
| openbookqa_ca | acc | 0.2 |
| parafraseja | acc | 0.34 |
| paws_ca | acc | 0.41 |
| phrases_ca-va | bleu | 0.73 |
| phrases_va-ca | bleu | 0.83 |
| piqa_ca | acc | 0.54 |
| siqa_ca | acc | 0.27 |
| teca | acc | 0.39 |
| wnli_ca | acc | 0.32 |
| xquad_ca | f1 | 0.1 |
| xstorycloze_ca | acc | 0.52 |
Basque
| task | metric | result |
|---|---|---|
| arc_eu_challenge | acc | 0.24 |
| arc_eu_easy | acc | 0.54 |
| belebele_eus_Latn | acc | 0.67 |
| eus_proficiency | acc | 0.33 |
| eus_reading | acc | 0.47 |
| eus_trivia | acc | 0.49 |
| flores_eu | bleu | 0.18 |
| mgsm_native_cot_eu | exact_match,get-answer | 0 |
| paws_eu | acc | 0.34 |
| piqa_eu | acc | 0.42 |
| qnlieu | acc | 0.16 |
| wnli_eu | acc | -0.13 |
| xcopa_eu | acc | 0.51 |
| xnli_eu | acc | 0.32 |
| xnli_eu_native | acc | 0.3 |
| xstorycloze_eu | acc | 0.37 |
Galician
| task | metric | result |
|---|---|---|
| belebele_glg_Latn | acc | 0.75 |
| flores_gl | bleu | 0.28 |
| galcola | mcc | 0 |
| openbookqa_gl | acc | 0.15 |
| parafrases_gl | acc | 0.18 |
| paws_gl | acc | 0.39 |
| summarization_gl | bleu | 0.03 |
| xnli_gl | acc | 0.38 |
| xstorycloze_gl | acc | 0.48 |
English
| task | metric | result |
|---|---|---|
| arc_challenge | acc | 0.38 |
| arc_easy | acc | 0.73 |
| belebele_eng_Latn | acc | 0.77 |
| cola | mcc | 0 |
| copa | acc | 0.78 |
| hellaswag | acc | 0.54 |
| hellaswag | acc_norm | -0.32 |
| mmlu | acc | 0.47 |
| openbookqa | acc | 0.2 |
| paws_en | acc | 0.41 |
| piqa | acc | 0.64 |
| social_iqa | acc | 0.24 |
| truthfulqa_mc1 | acc | 0.17 |
| truthfulqa_mc2 | acc | 0.4 |
| wnli | acc | 0.38 |
| xnli_en_iberobench | acc | 0.37 |
| xquad_en | f1 | 0.14 |
| xstorycloze_en | acc | 0.59 |
Additional information
Author
The Language Technologies Lab from Barcelona Supercomputing Center.
Contact
For further information, please send an email to langtech@bsc.es.
Copyright
Copyright(c) 2026 by Language Technologies Lab, Barcelona Supercomputing Center.
Funding
This work is funded by the Ministerio para la Transformación Digital y de la Función Pública - Funded by EU – NextGenerationEU within the framework of the project Modelos del Lenguaje.
This work has been promoted and supported by the Government of Catalonia through the Aina Project.
Acknowledgements
This project has benefited from the contributions of numerous teams and institutions, mainly through data contributions, knowledge transfer or technical support.
We are especially grateful to our ILENIA project partners: CENID, HiTZ and CiTIUS for their participation. We also extend our genuine gratitude to the Spanish Senate and Congress, Fundación Dialnet, and the ‘Instituto Universitario de Sistemas Inteligentes y Aplicaciones Numéricas en Ingeniería (SIANI)’ of the University of Las Palmas de Gran Canaria. Many other institutions have been involved in the project. Our thanks to Òmnium Cultural, Parlament de Catalunya, Institut d'Estudis Aranesos, Racó Català, Vilaweb, ACN, Nació Digital, El món and Aquí Berguedà. We thank the Welsh government, DFKI, Occiglot project, especially Malte Ostendorff, and The Common Crawl Foundation, especially Pedro Ortiz, for their collaboration.
We would also like to give special thanks to the NVIDIA team, with whom we have met regularly, especially to: Ignacio Sarasua, Adam Henryk Grzywaczewski, Oleg Sudakov, Sergio Perez, Miguel Martinez, Felipe Soares and Meriem Bendris. Their constant support has been especially appreciated throughout the entire process.
Their valuable efforts have been instrumental in the development of this work.
Disclaimer
Be aware that the model may contain biases or other unintended distortions. When third parties deploy systems or provide services based on this model, or use the model themselves, they bear the responsibility for mitigating any associated risks and ensuring compliance with applicable regulations, including those governing the use of Artificial Intelligence.
The Barcelona Supercomputing Center, as the owner and creator of the model, shall not be held liable for any outcomes resulting from third-party use.
Citation
@misc{gonzalezagirre2025salamandratechnicalreport,
title={Salamandra Technical Report},
author={Aitor Gonzalez-Agirre and Marc Pàmies and Joan Llop and Irene Baucells and Severino Da Dalt and Daniel Tamayo and José Javier Saiz and Ferran Espuña and Jaume Prats and Javier Aula-Blasco and Mario Mina and Adrián Rubio and Alexander Shvets and Anna Sallés and Iñaki Lacunza and Iñigo Pikabea and Jorge Palomar and Júlia Falcão and Lucía Tormo and Luis Vasquez-Reina and Montserrat Marimon and Valle Ruíz-Fernández and Marta Villegas},
year={2025},
eprint={2502.08489},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2502.08489},
}
License
Model Index
- Downloads last month
- 389
8-bit
