gemma-3-270M-Swahili-llm
Fine-tuned Gemma3-270M model specifically adapted for Swahili language instruction-following and conversation tasks.
Model Description
This model is a fine-tuned version of google/gemma-3-270m-it on ~67,000 Swahili instruction-response pairs. The fine-tuning was performed using LoRA (Low-Rank Adaptation) for parameter-efficient training, making it memory-efficient and faster.
Model Size: 270M parameters
Language: Swahili
Task: Instruction-following and conversation
Training Details
- Training Method: LoRA (Low-Rank Adaptation)
- LoRA Rank: 128
- Max Sequence Length: 2048
- Batch Size: 4 per device
- Learning Rate: 5e-5
- Optimizer: AdamW 8-bit
- Dataset: Swahili Instructions Dataset by alfaxadeyembe (~67,000 pairs)
This model was trained 2x faster with Unsloth.
Usage
Using Transformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_name = "ngusadeep/gemma-3-270M-Swahili-llm"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.float16,
device_map="auto"
)
messages = [{"role": "user", "content": "Eleza nini maana ya uongozi."}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=256,
temperature=1.0,
top_p=0.95,
top_k=64,
do_sample=True
)
response = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
print(response)
Using Unsloth (Recommended)
from unsloth import FastLanguageModel
from unsloth.chat_templates import get_chat_template
from transformers import TextStreamer
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="ngusadeep/gemma-3-270M-Swahili-llm",
max_seq_length=2048,
load_in_4bit=False,
)
tokenizer = get_chat_template(tokenizer, chat_template="gemma3")
messages = [{"role": "user", "content": "Eleza nini maana ya uongozi."}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True).removeprefix('<bos>')
_ = model.generate(
**tokenizer(text, return_tensors="pt").to("cuda"),
max_new_tokens=256,
temperature=1.0,
top_p=0.95,
top_k=64,
streamer=TextStreamer(tokenizer, skip_prompt=True),
)
Example Swahili Prompts
"Eleza nini maana ya uongozi." # Explanation
"Tunga hadithi fupi kuhusu safari." # Creative writing
"Ni nini tofauti kati ya mchana na usiku?" # Q&A
"Andika sentensi tano kuhusu elimu." # Instruction following
Recommended Generation Parameters
- temperature: 1.0
- top_p: 0.95
- top_k: 64
GGUF Format
This model has been converted to GGUF format for use with llama.cpp.
Available GGUF Files
gemma-3-270m-it.Q8_0.gguf- 8-bit quantization
Using with llama.cpp
For text-only LLMs:
./llama.cpp/llama-cli -hf ngusadeep/gemma-3-270M-Swahili-llm --jinja
For multimodal models:
./llama.cpp/llama-mtmd-cli -hf ngusadeep/gemma-3-270M-Swahili-llm --jinja
Ollama
An Ollama Modelfile is included for easy deployment.
Note: The model's BOS token behavior was adjusted for GGUF compatibility.
Model Capabilities
After fine-tuning, the model demonstrates improved capability to:
- Understand Swahili instructions
- Generate appropriate responses in Swahili
- Follow conversational patterns
- Handle various instruction types (explanations, creative writing, Q&A, etc.)
Limitations
- The model is fine-tuned on Swahili instruction-following tasks and may not perform as well on other languages or tasks
- As a 270M parameter model, it has limitations in complex reasoning tasks
- The model may occasionally generate responses that are not factually accurate
Dataset Citation
The model was fine-tuned on the Swahili Instructions dataset:
@misc{swahili-instructions-dataset,
title={Swahili Instructions Dataset},
author={alfaxadeyembe},
year={2024},
publisher={Kaggle},
howpublished={\url{https://www.kaggle.com/datasets/alfaxadeyembe/swahili-instructions}}
}
Model Citation
When you use this model remember to cite like this:
@misc{gemma3-270m-swahili-llm,
title={Gemma3-270M Swahili Fine-tuned Model},
author={ngusadeep},
year={2024},
howpublished={\url{https://huggingface.co/ngusadeep/gemma-3-270M-Swahili-llm}}
}
Acknowledgments
- Google's Gemma3 model
- Unsloth team for the efficient fine-tuning framework
- alfaxadeyembe for creating and sharing the Swahili Instructions Dataset on Kaggle
License
This model is licensed under Apache 2.0. See the LICENSE file for details.
- Downloads last month
- 35
