Text Generation
Transformers
GGUF
conversational

Work In Progress New versions will be available during the coming weeks/months.

Sampling Parameters: For optimal performance, we recommend using temperatures close to zero (0 - 0.2). Additionally, we advise against using any type of repetition penalty, as from our experience, it negatively impacts instructed model's responses.

ALIA-40b-instruct-GGUF

Description

This repo contains GGUF format model files for ALIA-40b-instruct-2601.

Quantization

Model weights were exported to GGUF in FP16 first, then quantized with llama.cpp’s llama-quantize into the target preset (e.g., Q8). The same conversion + quantization pipeline was executed through quantool from a single YAML config (e.g., method: gguf, quant_level, and method-specific quantization_config) and run via quantool config.yaml.

About GGUF

GGUF is the model file format introduced by the llama.cpp team on August 21st, 2023, replacing the older GGML format (now deprecated). It brings significant improvements such as enhanced tokenization, proper handling of special tokens, embedded metadata (e.g., architecture, quantization type, tokenizer), and an extensible design for future compatibility.

How to use

Deployment as service and remote use (Messages API)

In our experience, vllm works well for deploying the full unquantized version of the model, whereas llama.cpp is appropriate for the quantized (GGUF) version. We strongly discourage using ollama as we have encountered compatibility issues that may seriously degrade the model's performance.

The easiest and most reliable way to have a working deployment of ALIA-40b-instruct is through the "Deploy / HF Inference Endpoints" option directly on the Hugging Face model page. This automatically creates a functioning endpoint, using vllm or llama.cpp according to the model variant, with an appropriately dimensioned GPU. While there are additional settings available for the endpoint we found the standard configuration proposed by Hugging Face to be a reasonable starting point.

Once the endpoint is running, the model can be easily called using OpenAI's "Messages API" (the de facto standard API for LLM use). By using this API the chat template is applied automatically by the service, requiring no explicit configuration on the client side. The endpoint's configuration page on Hugging Face also provides a "Playground" for testing and API examples, as well as a simple chat interface.

Example usage:

# pip install openai 

from openai import OpenAI 

client = OpenAI(
    base_url = YOUR_ENDPOINT_URL,
    api_key = "$HF_TOKEN"
)

chat_completion = client.chat.completions.create(
    model = "BSC-LT/ALIA-40b-instruct-2601-GGUF",
    messages = [
        {
            "role": "user",
            "content": "What is deep learning?"
        }
    ],
    stream = True,
    max_tokens = 1000,
  temperature=0.1
)

print(chat_completion.choices[0].message.content)

The model can also be deployed locally or on any server infrastructure with sufficient GPUs, using vllm or llama.cpp. We recommend an initial deployment on Hugging Face as a point of reference and comparison to make sure the model is behaving as expected in the desired deployment setup.

To check that your endpoint is working correctly, you can try to replicate the examples contained in this Colab Notebook


Evaluation

Gold-standard benchmarks

Evaluation is done using the Language Model Evaluation Harness (Gao et al., 2024). We evaluate on a set of tasks taken from SpanishBench, CatalanBench, BasqueBench and GalicianBench, as well as existing English tasks available in the LM Evaluation Harness. These benchmarks include both new and existing tasks and datasets. The tables below report results for a representative selection of evaluation datasets, capturing model's performance across a variety of tasks within these benchmarks.

Only tasks that are human-generated, human-translated, or involve strong human-in-the-loop process (i.e., machine translation followed by professional revision or machine generation followed by human revision and annotation) were used. This approach explains the variation in the number of tasks reported across languages. As additional high-quality tasks are published, we will update the evaluation results accordingly. We also plan to expand evaluation to other languages, provided that the datasets meet our quality standards.

During the implementation of the evaluation we observed a series of issues worth considering when replicating and interpreting the results presented. These issues include ≈1.5% variances in performance in some tasks depending on the version of the transformers library used, and depending on the use (or lack of use) of tensor parallelism when loading a model. When implementing existing tasks, we carry out a comprehensive quality evaluation of the dataset, the Harness task itself, and what kind of input models see during evaluation. Our implementation (see links above) addresses multiple existing problems such as errors in datasets and prompts, and lack of pre-processing. All this means that results will vary if using other Harness implementations, and may slightly vary depending on the replication setup.

It should be noted that these results are subject to all the drawbacks of every current gold-standard evaluation, and that the figures do not fully represent the model's capabilities and potential. We thus advise caution when reading and interpreting the results.

All results reported below correspond to a 0-shot evaluation setting.

Spanish

task metric result
belebele_spa_Latn acc 0.73
cocoteros_es bleu 0.02
cocoteros_es rouge1 0.31
copa_es acc 0.75
escola mcc 0
flores_es bleu 0.25
mmmlu_es acc 0.38
openbookqa_es acc 0.22
paws_es_spanish_bench acc 0.32
phrases_es-va bleu 0.57
phrases_va-es bleu 0.66
wnli_es acc 0.01
xlsum_es bleu 0.02
xnli_es_spanish_bench acc 0.27
xquad_es f1 0.07
xstorycloze_es acc 0.51

Catalan

task metric result
arc_ca_challenge acc 0.34
arc_ca_easy acc 0.67
belebele_cat_Latn acc 0.73
cabreu_abstractive bleu 0.06
cabreu_extractive rouge1 0.19
cabreu_extreme bleu 0.03
catalanqa f1 0.11
catcola mcc 0
cocoteros_va bleu 0.01
cocoteros_va rouge1 0.3
copa_ca acc 0.74
flores_ca bleu 0.31
mgsm_direct_ca exact_match,flexible-extract 0.64
openbookqa_ca acc 0.2
parafraseja acc 0.34
paws_ca acc 0.41
phrases_ca-va bleu 0.73
phrases_va-ca bleu 0.83
piqa_ca acc 0.54
siqa_ca acc 0.27
teca acc 0.39
wnli_ca acc 0.32
xquad_ca f1 0.1
xstorycloze_ca acc 0.52

Basque

task metric result
arc_eu_challenge acc 0.24
arc_eu_easy acc 0.54
belebele_eus_Latn acc 0.67
eus_proficiency acc 0.33
eus_reading acc 0.47
eus_trivia acc 0.49
flores_eu bleu 0.18
mgsm_native_cot_eu exact_match,get-answer 0
paws_eu acc 0.34
piqa_eu acc 0.42
qnlieu acc 0.16
wnli_eu acc -0.13
xcopa_eu acc 0.51
xnli_eu acc 0.32
xnli_eu_native acc 0.3
xstorycloze_eu acc 0.37

Galician

task metric result
belebele_glg_Latn acc 0.75
flores_gl bleu 0.28
galcola mcc 0
openbookqa_gl acc 0.15
parafrases_gl acc 0.18
paws_gl acc 0.39
summarization_gl bleu 0.03
xnli_gl acc 0.38
xstorycloze_gl acc 0.48

English

task metric result
arc_challenge acc 0.38
arc_easy acc 0.73
belebele_eng_Latn acc 0.77
cola mcc 0
copa acc 0.78
hellaswag acc 0.54
hellaswag acc_norm -0.32
mmlu acc 0.47
openbookqa acc 0.2
paws_en acc 0.41
piqa acc 0.64
social_iqa acc 0.24
truthfulqa_mc1 acc 0.17
truthfulqa_mc2 acc 0.4
wnli acc 0.38
xnli_en_iberobench acc 0.37
xquad_en f1 0.14
xstorycloze_en acc 0.59

Additional information

Author

The Language Technologies Lab from Barcelona Supercomputing Center.

Contact

For further information, please send an email to langtech@bsc.es.

Copyright

Copyright(c) 2026 by Language Technologies Lab, Barcelona Supercomputing Center.

Funding

This work is funded by the Ministerio para la Transformación Digital y de la Función Pública - Funded by EU – NextGenerationEU within the framework of the project Modelos del Lenguaje.

This work has been promoted and supported by the Government of Catalonia through the Aina Project.

Acknowledgements

This project has benefited from the contributions of numerous teams and institutions, mainly through data contributions, knowledge transfer or technical support.

We are especially grateful to our ILENIA project partners: CENID, HiTZ and CiTIUS for their participation. We also extend our genuine gratitude to the Spanish Senate and Congress, Fundación Dialnet, and the ‘Instituto Universitario de Sistemas Inteligentes y Aplicaciones Numéricas en Ingeniería (SIANI)’ of the University of Las Palmas de Gran Canaria. Many other institutions have been involved in the project. Our thanks to Òmnium Cultural, Parlament de Catalunya, Institut d'Estudis Aranesos, Racó Català, Vilaweb, ACN, Nació Digital, El món and Aquí Berguedà. We thank the Welsh government, DFKI, Occiglot project, especially Malte Ostendorff, and The Common Crawl Foundation, especially Pedro Ortiz, for their collaboration.

We would also like to give special thanks to the NVIDIA team, with whom we have met regularly, especially to: Ignacio Sarasua, Adam Henryk Grzywaczewski, Oleg Sudakov, Sergio Perez, Miguel Martinez, Felipe Soares and Meriem Bendris. Their constant support has been especially appreciated throughout the entire process.

Their valuable efforts have been instrumental in the development of this work.

Disclaimer

Be aware that the model may contain biases or other unintended distortions. When third parties deploy systems or provide services based on this model, or use the model themselves, they bear the responsibility for mitigating any associated risks and ensuring compliance with applicable regulations, including those governing the use of Artificial Intelligence.

The Barcelona Supercomputing Center, as the owner and creator of the model, shall not be held liable for any outcomes resulting from third-party use.

Citation

@misc{gonzalezagirre2025salamandratechnicalreport,
      title={Salamandra Technical Report}, 
      author={Aitor Gonzalez-Agirre and Marc Pàmies and Joan Llop and Irene Baucells and Severino Da Dalt and Daniel Tamayo and José Javier Saiz and Ferran Espuña and Jaume Prats and Javier Aula-Blasco and Mario Mina and Adrián Rubio and Alexander Shvets and Anna Sallés and Iñaki Lacunza and Iñigo Pikabea and Jorge Palomar and Júlia Falcão and Lucía Tormo and Luis Vasquez-Reina and Montserrat Marimon and Valle Ruíz-Fernández and Marta Villegas},
      year={2025},
      eprint={2502.08489},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2502.08489}, 
}

License

Apache License, Version 2.0

Model Index

Model Base Instruct
2b Link Link
7b Link Link
40b Link Link
Downloads last month
389
GGUF
Model size
40B params
Architecture
llama
Hardware compatibility
Log In to add your hardware

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for BSC-LT/ALIA-40b-instruct-2601-GGUF

Base model

BSC-LT/ALIA-40b
Quantized
(8)
this model

Datasets used to train BSC-LT/ALIA-40b-instruct-2601-GGUF

Paper for BSC-LT/ALIA-40b-instruct-2601-GGUF