Gemma-3-27b-it Quantized Model
This repository provides a quantized version of the google/gemma-3-27b-it model. This quantization was performed to optimize inference efficiency and reduce VRAM usage while maintaining high performance, especially for Korean language tasks.
π Quantization Details
The model was quantized using Generative Pre-trained Transformer Quantization(GPTQ) with a focus on preserving the linguistic nuances of Korean.
- Hardware Used: 1 x NVIDIA A100 80GB
- Calibration Dataset: maywell/ko-calibration
- Number of Calibration Samples: 512 (Randomly sampled)
- Calibration Strategy: Used a Korean dataset(include english) to minimize accuracy degradation in multi-lingual contexts.
π Loss Metrics (Last Layer)
| Layer | Module | Loss | Damp | Time (s) |
|---|---|---|---|---|
| 61 | self_attn.k_proj | 0.0000883157 | 0.05000 | 13.501 |
| 61 | self_attn.q_proj | 0.0001169325 | 0.05000 | 13.603 |
| 61 | self_attn.v_proj | 0.0000929558 | 0.05000 | 13.684 |
| 61 | self_attn.o_proj | 0.0000056512 | 0.05000 | 1.266 |
| 61 | mlp.up_proj | 0.0004954723 | 0.05000 | 3.865 |
| 61 | mlp.gate_proj | 0.0004981519 | 0.05000 | 3.886 |
| 61 | mlp.down_proj | 0.0000036350 | 0.05000 | 8.101 |
π Deployment with vLLM
You can easily deploy this model using the vllm-openai Docker image to serve an OpenAI-compatible API.
Run with Docker
docker run --gpus all \
-v ~/.cache/huggingface:/root/.cache/huggingface \
-p 8000:8000 \
--ipc=host \
vllm/vllm-openai:latest \
--model nagix999/gemma-3-27b-it-gptq-ko-calibration \
--served-model-name gemma3-27b-it-gptq
--quantization gptq_marlin \
--max-model-len 8192 \
--dtype bfloat16
Python API Example
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="EMPTY")
response = client.chat.completions.create(
model="gemma3-27b-it-gptq",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "νκ΅μ΄μ μμ΄λ‘ LLMμ λν΄ μ€λͺ
ν΄μ€."}
]
)
print(response.choices[0].message.content)
License
This model is a derivative of google/gemma-3-27b-it and is subject to the Gemma Terms of Use. By downloading or using this model, you agree to the terms and conditions specified by Google.
For more details, please visit the Gemma License Agreement.
- Downloads last month
- 2