Hebrew-GPT: Specialized 1B Hebrew Instruction Model

Hebrew-GPT is a state-of-the-art, instruction-tuned Small Language Model (SLM) based on the Llama-3.2-1B architecture. It has been engineered to bridge the gap in low-parameter Hebrew linguistic performance, providing a compact yet powerful solution for Hebrew natural language understanding and generation.

💎 Model Highlights

Linguistic Specialization: Specifically tuned to handle the Morphologically Rich Language (MRL) features of Hebrew, including prefix-suffix handling and correct right-to-left (RTL) context awareness.
16-bit Precision: Unlike many quantized small models, this version features Full Merged BFloat16 weights, ensuring no loss of intelligence from the fine-tuning process.
Instruction Optimized: Trained specifically to follow complex prompts, summarize documents, and engage in dialogue, rather than just basic text completion.
Efficiency: At 1 billion parameters, it is optimized for edge deployment, providing high-speed inference on standard consumer hardware.

🛠 Technical Specifications

Architecture

Base Architecture: Llama 3.2
Parameters: 1.23 Billion
Context Length: 128k tokens (native support)
Weight Format: Safetensors (Standalone)
Precision: BFloat16 ($BF16$)

Training Methodology

The model underwent Supervised Fine-Tuning (SFT) using a curated multi-source dataset strategy to ensure high-quality Hebrew output without compromising logical reasoning:

Hebrew Instruction Set (70%): Extensive Alpaca-formatted datasets translated and corrected for Hebrew grammar.
Hebrew Contextual Knowledge (20%): Fact-based data from Hebrew wikis and structured Q&A.
Logic Preservation (10%): High-quality English instructional data to maintain cross-lingual reasoning and mathematical stability.

📈 Performance & Monitoring

During the development phase, the model was monitored via detailed telemetry to ensure stable convergence. Key metrics tracked included:

Gradient Norm Stability: Monitored to prevent exploding gradients in RTL text generation.
VRAM Optimization: Efficiently managed to maximize batch size and learning stability.
Loss Decay: Consistent downward trend in cross-entropy loss across all three data streams.

🚀 Quick Start Guide

Installation

pip install transformers torch accelerate

Basic Usage (Python)

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "XythicK/Hebrew-GPT"

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

# Standard Llama-3.2 Chat Template
messages = [
    {"role": "system", "content": "אתה עוזר חכם ומקצועי בעברית."},
    {"role": "user", "content": "כתוב לי מתכון קצר לחלה לשבת."},
]

input_ids = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

outputs = model.generate(
    input_ids,
    max_new_tokens=256,
    do_sample=True,
    temperature=0.7,
    top_p=0.9,
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

⚖️ Ethics and Limitations

While Hebrew-GPT is highly capable for its size, users should note:

Hallucination: Like all LLMs, it can generate incorrect facts. Verify critical information.

Bias: The model reflects the biases present in its training data.

Parameter Constraints: As a 1B model, it may struggle with highly technical academic subjects compared to 70B+ models.

Downloads last month: 65

Safetensors

Model size

1B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for XythicK/Hebrew-GPT

Base model

meta-llama/Llama-3.2-1B-Instruct

Finetuned

(1425)

this model

Quantizations

1 model