ModernBERT Dutch Base Wide

A ModernBERT model pretrained on Dutch mc4_nl_cleaned dataset. This model has 22 layers (like ModernBERT-base) but with wider hidden dimensions (1024 instead of 768), placing it between base and large in terms of parameters (230M).

Model Details

  • Architecture: ModernBERT (Answer.AI/LightOn)
  • Layers: 22
  • Hidden size: 1024
  • Attention heads: 16
  • Intermediate size: 1536
  • Vocab size: 32,128
  • Parameters: 230M
  • Tokenizer: yhavinga/dutch-llama-tokenizer (SentencePiece, Dutch-optimized)

Training

  • Dataset: yhavinga/mc4_nl_cleaned (full config)
  • Steps: 2,000,000
  • Batch size: 8 per device (multi-host TPU v4)
  • Learning rate: 3e-5 with cosine decay to 1e-6
  • Warmup steps: 20,000
  • Weight decay: 0.01
  • Sequence length: 1024
  • Precision: bfloat16

Usage

from transformers import AutoTokenizer, ModernBertForMaskedLM

model = ModernBertForMaskedLM.from_pretrained("yhavinga/dmbert-dutchl-1024h-22l-2000000")
tokenizer = AutoTokenizer.from_pretrained("yhavinga/dmbert-dutchl-1024h-22l-2000000")

# Masked language modeling
inputs = tokenizer("Amsterdam is de<mask> van Nederland.", return_tensors="pt")
outputs = model(**inputs)
predictions = tokenizer.decode(outputs.logits[0, 4].topk(5).indices[0])
# Expected: "hoofdstad" (capital)

Model Architecture Differences

This model (1024h-22L-2) differs from the earlier 1024h-22L variant:

Parameter 1024h-22L 1024h-22L-2 (this model)
intermediate_size 4096 1536
tokenizer jhu-clsp/mmBERT-small yhavinga/dutch-llama-tokenizer
vocab_size 256,000 32,128

The smaller intermediate MLP size and Dutch-specific tokenizer make this model more efficient while maintaining strong Dutch language understanding.

Citation

If you use this model, please cite:

@model{dmbert_dutchl_1024h,
  title={Dutch ModernBERT 1024h-22L},
  author={Yeb Havinga},
  year={2025},
  publisher={Hugging Face},
  url={https://huggingface.co/yhavinga/dmbert-dutchl-1024h-22l-2000000}
}
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support