Gemma-3-27b-it Quantized Model

This repository provides a quantized version of the google/gemma-3-27b-it model. This quantization was performed to optimize inference efficiency and reduce VRAM usage while maintaining high performance, especially for Korean language tasks.

🛠 Quantization Details

The model was quantized using Generative Pre-trained Transformer Quantization(GPTQ) with a focus on preserving the linguistic nuances of Korean.

Hardware Used: 1 x NVIDIA A100 80GB
Calibration Dataset: maywell/ko-calibration
Number of Calibration Samples: 512 (Randomly sampled)
Calibration Strategy: Used a Korean dataset(include english) to minimize accuracy degradation in multi-lingual contexts.

📊 Loss Metrics (Last Layer)

Layer	Module	Loss	Damp	Time (s)
61	self_attn.k_proj	0.0000883157	0.05000	13.501
61	self_attn.q_proj	0.0001169325	0.05000	13.603
61	self_attn.v_proj	0.0000929558	0.05000	13.684
61	self_attn.o_proj	0.0000056512	0.05000	1.266
61	mlp.up_proj	0.0004954723	0.05000	3.865
61	mlp.gate_proj	0.0004981519	0.05000	3.886
61	mlp.down_proj	0.0000036350	0.05000	8.101

🚀 Deployment with vLLM

You can easily deploy this model using the vllm-openai Docker image to serve an OpenAI-compatible API.

Run with Docker

docker run --gpus all \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    -p 8000:8000 \
    --ipc=host \
    vllm/vllm-openai:latest \
    --model nagix999/gemma-3-27b-it-gptq-ko-calibration \
    --served-model-name gemma3-27b-it-gptq
    --quantization gptq_marlin \
    --max-model-len 8192 \
    --dtype bfloat16

Python API Example

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="EMPTY")

response = client.chat.completions.create(
    model="gemma3-27b-it-gptq",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "한국어와 영어로 LLM에 대해 설명해줘."}
    ]
)

print(response.choices[0].message.content)

License

This model is a derivative of google/gemma-3-27b-it and is subject to the Gemma Terms of Use. By downloading or using this model, you agree to the terms and conditions specified by Google.

For more details, please visit the Gemma License Agreement.

Downloads last month: 2

Safetensors

Model size

27B params

Tensor type

BF16

I32

Model tree for nagix999/gemma-3-27b-it-gptq-ko-calibration

Base model

google/gemma-3-27b-pt

Finetuned

google/gemma-3-27b-it

Quantized

(127)

this model

nagix999
/

gemma-3-27b-it-gptq-ko-calibration