Hebrew-GPT: Specialized 1B Hebrew Instruction Model

Hebrew-GPT is a state-of-the-art, instruction-tuned Small Language Model (SLM) based on the Llama-3.2-1B architecture. It has been engineered to bridge the gap in low-parameter Hebrew linguistic performance, providing a compact yet powerful solution for Hebrew natural language understanding and generation.


馃拵 Model Highlights

  • Linguistic Specialization: Specifically tuned to handle the Morphologically Rich Language (MRL) features of Hebrew, including prefix-suffix handling and correct right-to-left (RTL) context awareness.
  • 16-bit Precision: Unlike many quantized small models, this version features Full Merged BFloat16 weights, ensuring no loss of intelligence from the fine-tuning process.
  • Instruction Optimized: Trained specifically to follow complex prompts, summarize documents, and engage in dialogue, rather than just basic text completion.
  • Efficiency: At 1 billion parameters, it is optimized for edge deployment, providing high-speed inference on standard consumer hardware.

馃洜 Technical Specifications

Architecture

  • Base Architecture: Llama 3.2
  • Parameters: 1.23 Billion
  • Context Length: 128k tokens (native support)
  • Weight Format: Safetensors (Standalone)
  • Precision: BFloat16 ($BF16$)

Training Methodology

The model underwent Supervised Fine-Tuning (SFT) using a curated multi-source dataset strategy to ensure high-quality Hebrew output without compromising logical reasoning:

  • Hebrew Instruction Set (70%): Extensive Alpaca-formatted datasets translated and corrected for Hebrew grammar.
  • Hebrew Contextual Knowledge (20%): Fact-based data from Hebrew wikis and structured Q&A.
  • Logic Preservation (10%): High-quality English instructional data to maintain cross-lingual reasoning and mathematical stability.

馃搱 Performance & Monitoring

During the development phase, the model was monitored via detailed telemetry to ensure stable convergence. Key metrics tracked included:

  • Gradient Norm Stability: Monitored to prevent exploding gradients in RTL text generation.
  • VRAM Optimization: Efficiently managed to maximize batch size and learning stability.
  • Loss Decay: Consistent downward trend in cross-entropy loss across all three data streams.

馃殌 Quick Start Guide

Installation

pip install transformers torch accelerate

Basic Usage (Python)

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "XythicK/Hebrew-GPT"

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

# Standard Llama-3.2 Chat Template
messages = [
    {"role": "system", "content": "讗转讛 注讜讝专 讞讻诐 讜诪拽爪讜注讬 讘注讘专讬转."},
    {"role": "user", "content": "讻转讜讘 诇讬 诪转讻讜谉 拽爪专 诇讞诇讛 诇砖讘转."},
]

input_ids = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

outputs = model.generate(
    input_ids,
    max_new_tokens=256,
    do_sample=True,
    temperature=0.7,
    top_p=0.9,
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

鈿栵笍 Ethics and Limitations

While Hebrew-GPT is highly capable for its size, users should note:

Hallucination: Like all LLMs, it can generate incorrect facts. Verify critical information.

Bias: The model reflects the biases present in its training data.

Parameter Constraints: As a 1B model, it may struggle with highly technical academic subjects compared to 70B+ models.

Downloads last month
65
Safetensors
Model size
1B params
Tensor type
BF16
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for XythicK/Hebrew-GPT

Finetuned
(1425)
this model
Quantizations
1 model