Work In Progress New versions will be available during the coming weeks/months.

Sampling Parameters: For optimal performance, we recommend using temperatures close to zero (0 - 0.2). Additionally, we advise against using any type of repetition penalty, as from our experience, it negatively impacts instructed model's responses.

ALIA-40b-instruct-GGUF

Model creator: BSC-LT.
Original model: ALIA-40b-instruct-2601

Description

This repo contains GGUF format model files for ALIA-40b-instruct-2601.

Quantization

Model weights were exported to GGUF in FP16 first, then quantized with llama.cpp’s llama-quantize into the target preset (e.g., Q8). The same conversion + quantization pipeline was executed through quantool from a single YAML config (e.g., method: gguf, quant_level, and method-specific quantization_config) and run via quantool config.yaml.

About GGUF

GGUF is the model file format introduced by the llama.cpp team on August 21st, 2023, replacing the older GGML format (now deprecated). It brings significant improvements such as enhanced tokenization, proper handling of special tokens, embedded metadata (e.g., architecture, quantization type, tokenizer), and an extensible design for future compatibility.

How to use

Deployment as service and remote use (Messages API)

In our experience, vllm works well for deploying the full unquantized version of the model, whereas llama.cpp is appropriate for the quantized (GGUF) version. We strongly discourage using ollama as we have encountered compatibility issues that may seriously degrade the model's performance.

The easiest and most reliable way to have a working deployment of ALIA-40b-instruct is through the "Deploy / HF Inference Endpoints" option directly on the Hugging Face model page. This automatically creates a functioning endpoint, using vllm or llama.cpp according to the model variant, with an appropriately dimensioned GPU. While there are additional settings available for the endpoint we found the standard configuration proposed by Hugging Face to be a reasonable starting point.

Once the endpoint is running, the model can be easily called using OpenAI's "Messages API" (the de facto standard API for LLM use). By using this API the chat template is applied automatically by the service, requiring no explicit configuration on the client side. The endpoint's configuration page on Hugging Face also provides a "Playground" for testing and API examples, as well as a simple chat interface.

Example usage:

# pip install openai 

from openai import OpenAI 

client = OpenAI(
    base_url = YOUR_ENDPOINT_URL,
    api_key = "$HF_TOKEN"
)

chat_completion = client.chat.completions.create(
    model = "BSC-LT/ALIA-40b-instruct-2601-GGUF",
    messages = [
        {
            "role": "user",
            "content": "What is deep learning?"
        }
    ],
    stream = True,
    max_tokens = 1000,
  temperature=0.1
)

print(chat_completion.choices[0].message.content)

The model can also be deployed locally or on any server infrastructure with sufficient GPUs, using vllm or llama.cpp. We recommend an initial deployment on Hugging Face as a point of reference and comparison to make sure the model is behaving as expected in the desired deployment setup.

To check that your endpoint is working correctly, you can try to replicate the examples contained in this Colab Notebook

Evaluation

Gold-standard benchmarks

Evaluation is done using the Language Model Evaluation Harness (Gao et al., 2024). We evaluate on a set of tasks taken from SpanishBench, CatalanBench, BasqueBench and GalicianBench, as well as existing English tasks available in the LM Evaluation Harness. These benchmarks include both new and existing tasks and datasets. The tables below report results for a representative selection of evaluation datasets, capturing model's performance across a variety of tasks within these benchmarks.

Only tasks that are human-generated, human-translated, or involve strong human-in-the-loop process (i.e., machine translation followed by professional revision or machine generation followed by human revision and annotation) were used. This approach explains the variation in the number of tasks reported across languages. As additional high-quality tasks are published, we will update the evaluation results accordingly. We also plan to expand evaluation to other languages, provided that the datasets meet our quality standards.

During the implementation of the evaluation we observed a series of issues worth considering when replicating and interpreting the results presented. These issues include ≈1.5% variances in performance in some tasks depending on the version of the transformers library used, and depending on the use (or lack of use) of tensor parallelism when loading a model. When implementing existing tasks, we carry out a comprehensive quality evaluation of the dataset, the Harness task itself, and what kind of input models see during evaluation. Our implementation (see links above) addresses multiple existing problems such as errors in datasets and prompts, and lack of pre-processing. All this means that results will vary if using other Harness implementations, and may slightly vary depending on the replication setup.

It should be noted that these results are subject to all the drawbacks of every current gold-standard evaluation, and that the figures do not fully represent the model's capabilities and potential. We thus advise caution when reading and interpreting the results.

All results reported below correspond to a 0-shot evaluation setting.

Spanish

task	metric	result
belebele_spa_Latn	acc	0.73
cocoteros_es	bleu	0.02
cocoteros_es	rouge1	0.31
copa_es	acc	0.75
escola	mcc	0
flores_es	bleu	0.25
mmmlu_es	acc	0.38
openbookqa_es	acc	0.22
paws_es_spanish_bench	acc	0.32
phrases_es-va	bleu	0.57
phrases_va-es	bleu	0.66
wnli_es	acc	0.01
xlsum_es	bleu	0.02
xnli_es_spanish_bench	acc	0.27
xquad_es	f1	0.07
xstorycloze_es	acc	0.51

Catalan

task	metric	result
arc_ca_challenge	acc	0.34
arc_ca_easy	acc	0.67
belebele_cat_Latn	acc	0.73
cabreu_abstractive	bleu	0.06
cabreu_extractive	rouge1	0.19
cabreu_extreme	bleu	0.03
catalanqa	f1	0.11
catcola	mcc	0
cocoteros_va	bleu	0.01
cocoteros_va	rouge1	0.3
copa_ca	acc	0.74
flores_ca	bleu	0.31
mgsm_direct_ca	exact_match,flexible-extract	0.64
openbookqa_ca	acc	0.2
parafraseja	acc	0.34
paws_ca	acc	0.41
phrases_ca-va	bleu	0.73
phrases_va-ca	bleu	0.83
piqa_ca	acc	0.54
siqa_ca	acc	0.27
teca	acc	0.39
wnli_ca	acc	0.32
xquad_ca	f1	0.1
xstorycloze_ca	acc	0.52

Basque

task	metric	result
arc_eu_challenge	acc	0.24
arc_eu_easy	acc	0.54
belebele_eus_Latn	acc	0.67
eus_proficiency	acc	0.33
eus_reading	acc	0.47
eus_trivia	acc	0.49
flores_eu	bleu	0.18
mgsm_native_cot_eu	exact_match,get-answer	0
paws_eu	acc	0.34
piqa_eu	acc	0.42
qnlieu	acc	0.16
wnli_eu	acc	-0.13
xcopa_eu	acc	0.51
xnli_eu	acc	0.32
xnli_eu_native	acc	0.3
xstorycloze_eu	acc	0.37

Galician

task	metric	result
belebele_glg_Latn	acc	0.75
flores_gl	bleu	0.28
galcola	mcc	0
openbookqa_gl	acc	0.15
parafrases_gl	acc	0.18
paws_gl	acc	0.39
summarization_gl	bleu	0.03
xnli_gl	acc	0.38
xstorycloze_gl	acc	0.48

English

task	metric	result
arc_challenge	acc	0.38
arc_easy	acc	0.73
belebele_eng_Latn	acc	0.77
cola	mcc	0
copa	acc	0.78
hellaswag	acc	0.54
hellaswag	acc_norm	-0.32
mmlu	acc	0.47
openbookqa	acc	0.2
paws_en	acc	0.41
piqa	acc	0.64
social_iqa	acc	0.24
truthfulqa_mc1	acc	0.17
truthfulqa_mc2	acc	0.4
wnli	acc	0.38
xnli_en_iberobench	acc	0.37
xquad_en	f1	0.14
xstorycloze_en	acc	0.59

Additional information

Author

The Language Technologies Lab from Barcelona Supercomputing Center.

Contact

For further information, please send an email to langtech@bsc.es.

Copyright

Funding

This work is funded by the Ministerio para la Transformación Digital y de la Función Pública - Funded by EU – NextGenerationEU within the framework of the project Modelos del Lenguaje.

This work has been promoted and supported by the Government of Catalonia through the Aina Project.

Acknowledgements

This project has benefited from the contributions of numerous teams and institutions, mainly through data contributions, knowledge transfer or technical support.

We are especially grateful to our ILENIA project partners: CENID, HiTZ and CiTIUS for their participation. We also extend our genuine gratitude to the Spanish Senate and Congress, Fundación Dialnet, and the ‘Instituto Universitario de Sistemas Inteligentes y Aplicaciones Numéricas en Ingeniería (SIANI)’ of the University of Las Palmas de Gran Canaria. Many other institutions have been involved in the project. Our thanks to Òmnium Cultural, Parlament de Catalunya, Institut d'Estudis Aranesos, Racó Català, Vilaweb, ACN, Nació Digital, El món and Aquí Berguedà. We thank the Welsh government, DFKI, Occiglot project, especially Malte Ostendorff, and The Common Crawl Foundation, especially Pedro Ortiz, for their collaboration.

We would also like to give special thanks to the NVIDIA team, with whom we have met regularly, especially to: Ignacio Sarasua, Adam Henryk Grzywaczewski, Oleg Sudakov, Sergio Perez, Miguel Martinez, Felipe Soares and Meriem Bendris. Their constant support has been especially appreciated throughout the entire process.

Their valuable efforts have been instrumental in the development of this work.

Disclaimer

Be aware that the model may contain biases or other unintended distortions. When third parties deploy systems or provide services based on this model, or use the model themselves, they bear the responsibility for mitigating any associated risks and ensuring compliance with applicable regulations, including those governing the use of Artificial Intelligence.

The Barcelona Supercomputing Center, as the owner and creator of the model, shall not be held liable for any outcomes resulting from third-party use.

Citation

@misc{gonzalezagirre2025salamandratechnicalreport,
      title={Salamandra Technical Report}, 
      author={Aitor Gonzalez-Agirre and Marc Pàmies and Joan Llop and Irene Baucells and Severino Da Dalt and Daniel Tamayo and José Javier Saiz and Ferran Espuña and Jaume Prats and Javier Aula-Blasco and Mario Mina and Adrián Rubio and Alexander Shvets and Anna Sallés and Iñaki Lacunza and Iñigo Pikabea and Jorge Palomar and Júlia Falcão and Lucía Tormo and Luis Vasquez-Reina and Montserrat Marimon and Valle Ruíz-Fernández and Marta Villegas},
      year={2025},
      eprint={2502.08489},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2502.08489}, 
}

License

Apache License, Version 2.0

Model Index

Model	Base	Instruct
2b	Link	Link
7b	Link	Link
40b	Link	Link

Downloads last month: 389

GGUF

Model size

40B params

Architecture

llama

Hardware compatibility

8-bit

Model tree for BSC-LT/ALIA-40b-instruct-2601-GGUF

Base model

BSC-LT/ALIA-40b

Finetuned

BSC-LT/ALIA-40b-instruct-2601

Quantized

(8)

this model

Datasets used to train BSC-LT/ALIA-40b-instruct-2601-GGUF

Paper for BSC-LT/ALIA-40b-instruct-2601-GGUF

Salamandra Technical Report

Paper • 2502.08489 • Published Feb 12, 2025 • 3