Gemma-3-27b-it Quantized Model

This repository provides a quantized version of the google/gemma-3-27b-it model. This quantization was performed to optimize inference efficiency and reduce VRAM usage while maintaining high performance, especially for Korean language tasks.

πŸ›  Quantization Details

The model was quantized using Generative Pre-trained Transformer Quantization(GPTQ) with a focus on preserving the linguistic nuances of Korean.

  • Hardware Used: 1 x NVIDIA A100 80GB
  • Calibration Dataset: maywell/ko-calibration
  • Number of Calibration Samples: 512 (Randomly sampled)
  • Calibration Strategy: Used a Korean dataset(include english) to minimize accuracy degradation in multi-lingual contexts.

πŸ“Š Loss Metrics (Last Layer)

Layer Module Loss Damp Time (s)
61 self_attn.k_proj 0.0000883157 0.05000 13.501
61 self_attn.q_proj 0.0001169325 0.05000 13.603
61 self_attn.v_proj 0.0000929558 0.05000 13.684
61 self_attn.o_proj 0.0000056512 0.05000 1.266
61 mlp.up_proj 0.0004954723 0.05000 3.865
61 mlp.gate_proj 0.0004981519 0.05000 3.886
61 mlp.down_proj 0.0000036350 0.05000 8.101

πŸš€ Deployment with vLLM

You can easily deploy this model using the vllm-openai Docker image to serve an OpenAI-compatible API.

Run with Docker

docker run --gpus all \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    -p 8000:8000 \
    --ipc=host \
    vllm/vllm-openai:latest \
    --model nagix999/gemma-3-27b-it-gptq-ko-calibration \
    --served-model-name gemma3-27b-it-gptq
    --quantization gptq_marlin \
    --max-model-len 8192 \
    --dtype bfloat16

Python API Example

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="EMPTY")

response = client.chat.completions.create(
    model="gemma3-27b-it-gptq",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "ν•œκ΅­μ–΄μ™€ μ˜μ–΄λ‘œ LLM에 λŒ€ν•΄ μ„€λͺ…ν•΄μ€˜."}
    ]
)

print(response.choices[0].message.content)

License

This model is a derivative of google/gemma-3-27b-it and is subject to the Gemma Terms of Use. By downloading or using this model, you agree to the terms and conditions specified by Google.

For more details, please visit the Gemma License Agreement.

Downloads last month
2
Safetensors
Model size
27B params
Tensor type
BF16
Β·
I32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for nagix999/gemma-3-27b-it-gptq-ko-calibration

Quantized
(127)
this model

Dataset used to train nagix999/gemma-3-27b-it-gptq-ko-calibration