Granite-4.1-8B

Granite-4.1-8B is an instruction-tuned large language model designed for conversational AI, reasoning, coding assistance, and structured text generation. This repository contains GGUF quantized variants of the model optimized for efficient local inference using llama.cpp.

The quantized formats significantly reduce memory requirements while maintaining strong instruction-following and reasoning performance, enabling practical deployment across consumer hardware and edge environments.


Model Overview

  • Model Name: Granite-4.1-8B
  • Base Model: ibm-granite/granite-4.1-8b
  • Architecture: Decoder-only Transformer
  • Parameter Count: 8 Billion
  • Modalities: Text
  • Primary Languages: English
  • Developer: IBM Granite
  • License: Apache 2.0

Quantization Formats

This repository provides various GGUF quantized versions of the Granite-4.1-8B model, optimized for efficient local inference using llama.cpp. Below are the details of the available I-Matrix (IQ) formats.

IQ3_M

  • Size reduction of approx 76.74% (3.81 GB) compared to 16-bit (16.38 GB)
  • Aggressive 3-bit quantization optimized for maximum memory efficiency
  • Suitable for CPU-only inference and low-memory deployment environments
  • Maintains lightweight conversational and instruction-following capability
  • Output quality may reduce on complex reasoning, coding, and long-context tasks

IQ4_NL

  • Size reduction of approx 70.94% (4.76 GB) compared to 16-bit (16.38 GB)
  • Advanced 4-bit non-linear quantization designed to better preserve output quality
  • More suitable for structured reasoning, coding assistance, and analytical tasks
  • Typically provides stronger consistency compared to lower-bit formats
  • Slightly increased computational overhead during inference

IQ4_XS

  • Size reduction of approx 72.34% (4.53 GB) compared to 16-bit (16.38 GB)
  • Balanced 4-bit quantization focused on efficiency and stable inference performance
  • Good trade-off between model size, speed, and response quality
  • Suitable for conversational AI, summarization, and general-purpose local deployment
  • Maintains reliable generation behavior across most practical workloads

Training Background (Original Model)

Granite-4.1-8B is trained with an emphasis on instruction comprehension, reasoning performance, and reliable text generation across a wide variety of tasks.

Pretraining

  • Large-scale language pretraining across diverse textual datasets
  • Focus on contextual understanding and robust language representation
  • Optimized for downstream conversational and reasoning workloads

Instruction Tuning

  • Refined using instruction-following datasets and conversational objectives
  • Enhanced for structured responses and multi-step reasoning
  • Improved consistency for coding, analysis, and text generation tasks

Key Capabilities

  • Instruction Following Handles diverse prompts and produces structured, context-aware responses.

  • Reasoning and Analysis Performs well on multi-step logical and analytical tasks.

  • Coding Assistance Supports code generation, explanation, and debugging workflows.

  • Efficient Local Deployment Quantized variants enable practical offline inference on consumer hardware.

  • Flexible Text Generation Suitable for summarization, Q&A, conversational AI, and structured outputs.


Usage Example

Using llama.cpp

./llama-cli \
  -m SandlogicTechnologies/granite-4.1-8b_IQ4_NL.gguf \
  -p "Explain the concept of knowledge distillation in detail"

Recommended Usecases

  • Conversational AI Systems Build local assistants and chat applications without cloud dependencies.

  • Coding and Development Workflows Support debugging, code explanation, and lightweight programming assistance.

  • Reasoning and Analysis Tasks Generate structured outputs for analytical and multi-step problem-solving tasks.

  • Research and Experimentation Evaluate prompts, workflows, and local inference strategies.


Acknowledgments

These quantized models are based on the original work by the IBM Granite development team.

Special thanks to:

  • The IBM Granite team for developing and releasing the Granite-4.1-8B model.

  • *Georgi Gerganov- and the llama.cpp open-source community for enabling efficient quantization and inference via the GGUF format.


Contact

For questions, feedback, or support, please reach out at support@sandlogic.com or visit https://www.sandlogic.com/

Downloads last month
339
GGUF
Model size
9B params
Architecture
granite
Hardware compatibility
Log In to add your hardware

3-bit

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for SandLogicTechnologies/granite-4.1-8b-GGUF

Quantized
(32)
this model