Instructions to use mayflowergmbh/boldt-dc-1b-german-it-16k with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use mayflowergmbh/boldt-dc-1b-german-it-16k with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="mayflowergmbh/boldt-dc-1b-german-it-16k")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("mayflowergmbh/boldt-dc-1b-german-it-16k")
model = AutoModelForCausalLM.from_pretrained("mayflowergmbh/boldt-dc-1b-german-it-16k")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use mayflowergmbh/boldt-dc-1b-german-it-16k with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "mayflowergmbh/boldt-dc-1b-german-it-16k"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "mayflowergmbh/boldt-dc-1b-german-it-16k",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/mayflowergmbh/boldt-dc-1b-german-it-16k

SGLang

How to use mayflowergmbh/boldt-dc-1b-german-it-16k with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "mayflowergmbh/boldt-dc-1b-german-it-16k" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "mayflowergmbh/boldt-dc-1b-german-it-16k",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "mayflowergmbh/boldt-dc-1b-german-it-16k" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "mayflowergmbh/boldt-dc-1b-german-it-16k",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Unsloth Studio new

How to use mayflowergmbh/boldt-dc-1b-german-it-16k with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for mayflowergmbh/boldt-dc-1b-german-it-16k to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for mayflowergmbh/boldt-dc-1b-german-it-16k to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for mayflowergmbh/boldt-dc-1b-german-it-16k to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="mayflowergmbh/boldt-dc-1b-german-it-16k",
    max_seq_length=2048,
)

Docker Model Runner
How to use mayflowergmbh/boldt-dc-1b-german-it-16k with Docker Model Runner:
```
docker model run hf.co/mayflowergmbh/boldt-dc-1b-german-it-16k
```

Boldt-DC-1B German IT 16K

A 1.25B-parameter German instruction-tuned Llama model with a 16,384-token context window, extended from the base model's native 2,048-token context via YaRN RoPE scaling (factor 8.0).

The model was produced in two stages on top of Boldt/Boldt-DC-1B:

Continued pretraining (CPT 16K) on long German documents (RAG contexts) to adapt the model to the extended context window.
Supervised fine-tuning (SFT 16K) on curated German instruction data.

Both stages used Unsloth + TRL SFTTrainer with a LoRA adapter (r=64, α=64) over q/k/v/o/gate/up/down_proj modules; the released checkpoint is the merged base + adapter in bfloat16.

Quick start

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "mayflowergmbh/boldt-dc-1b-german-it-16k"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id, dtype=torch.bfloat16, device_map="auto", trust_remote_code=True
)

prompt = (
    "<|system|>\nDu bist ein präziser deutschsprachiger Assistent.\n"
    "<|user|>\nFasse die wichtigsten Punkte der DSGVO in drei Sätzen zusammen.\n"
    "<|assistant|>\n"
)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=256, do_sample=False)
print(tokenizer.decode(out[0], skip_special_tokens=False))

The chat template uses four special tokens: <|system|>, <|user|>, <|assistant|>, <|end|>.

Architecture


Base model	`Boldt/Boldt-DC-1B` (Llama, 1.25B parameters)
Hidden size / layers / heads	2,048 / 16 / 16
Intermediate size	8,192
Vocabulary	32,004 (4 added chat tokens)
Native context	2,048
Extended context	16,384 (YaRN, factor 8.0)
Released precision	bfloat16, merged single shard
Tensor format	safetensors

The RoPE configuration in config.json:

"rope_parameters": {
  "rope_type": "yarn",
  "factor": 8.0,
  "original_max_position_embeddings": 2048,
  "rope_theta": 10000.0
}

Training

	CPT 16K	SFT 16K
Steps	5,000	7,000
Effective batch size	1 × 32 (grad-accum)	1 × 32 (grad-accum)
`max_seq_length`	16,384	16,384
Packing	enabled	disabled
LR / schedule	7e-5, cosine, 3 % warmup	5e-5, cosine, 3 % warmup
LoRA r / α	64 / 64	64 / 64 (continued)
Target modules	q/k/v/o + gate/up/down	q/k/v/o + gate/up/down
Optimizer	adamw_torch_fused	adamw_torch_fused
Gradient checkpointing	Unsloth	Unsloth
Wall-clock (A6000, sm_8.6)	~17 h (resumed)	~32.8 h
Final eval loss	1.787	1.117
Final train loss	1.751 (last 40-step window)	1.235
Epochs (over training data)	~20.8	~3.2

The CPT run experienced one external OOM-kill at step 4480/5000 (host swap pressure from unrelated services) and was resumed from checkpoint-4000 after pinning the trainer to a single A6000 with MALLOC_ARENA_MAX=2 and nice -n 19 ionice -c 3. The loss curve is continuous across the resume — eval loss went 1.7878 (step 4000) → 1.7873 (4250) → 1.7871 (4500) → 1.7787 (5000), monotonically decreasing.

Datasets

CPT and SFT data were assembled from publicly available German corpora. Each dataset retains its original license; users of this model must review the upstream licenses before commercial use.

Dataset	Stage	Rows ingested	Purpose
`DiscoResearch/germanrag`	CPT + SFT	3,362 + 3,362	Long German RAG contexts
`CausalLM/GPT-4-Self-Instruct-German`	SFT	10,006	German self-instruct
`seedboxai/multitask_german_examples_32k`	SFT	50,000	German multitask instructions
`maxidl/Capybara-de`	SFT	15,991	German conversational instructions
`AgentWaller/german-oasst1-qa-format`	SFT	9,843	OASST-style German QA
`dennlinger/klexikon`	SFT	2,346	German simplification/summarization
`flozi00/german-function-calling`	SFT	1,327	German function-calling

After deduplication, packing, and length filtering, the processed splits used in training were:

CPT: 65,168 train / 3,501 validation rows
SFT: 70,029 train / 3,671 validation rows
Needle eval: 512 synthetic German long-context retrieval examples

Sources that failed to load and were excluded: GEM/mlsum (deprecated dataset scripts in datasets library) and PleIAs/German-PD (intentionally disabled in this run).

Evaluation

Long-context needle-in-haystack (German)

5 runs at 3 needle depths × 3 context lengths each. The model must recover a 6-digit "geheime Prüfnummer" injected into a long German document.

Context length	Attempts	Successes	Success rate
4,096	5	5	100 %
8,192	5	5	100 %
16,384	5	5	100 %

This is the gate that validates the YaRN extension actually works in practice; the model retrieves the needle reliably at the full 16K context.

Loss curves

SFT eval loss (every 250 steps)
 step 250  1.277
 step 500  1.256
 step 1000 1.247
 step 2500 1.195
 step 4000 1.128
 step 5000 1.120
 step 6000 1.118
 step 7000 1.117

Smooth monotonic decrease; the last five evaluations are within ±0.001 of each other — the model is fully converged on the SFT distribution.

Intended use and limitations

Intended use. German-language assistant tasks: summarization, question answering, instruction following, simple function-calling templates, and long-document Q&A up to 16K tokens.

Capability ceiling. This is a 1.25 B parameter model. Free-form generation can drift off-topic, hallucinate facts, or produce generic prose, especially on open-ended prompts. The strong long-context needle results indicate the model can retrieve information injected into a long German document, not that it can reason over arbitrarily long inputs.

Languages. Trained primarily for German. The tokenizer is shared with the Llama base, so English is supported as a side effect but quality is not the project's target.

Safety. No RLHF or DPO alignment was applied. Outputs are not guaranteed to be safe, polite, or non-toxic. Apply your own safety layer before deploying.

Licensing. This card is released under Apache 2.0, but the model weights inherit constraints from the base model (Boldt/Boldt-DC-1B) and from each contributing dataset's license. Review those before commercial deployment.

How it was trained

The full pipeline (system check → install → smoke test → CPT 16K → SFT 16K → long-context eval → merge → final report) is reproducible from a Makefile. Key hardware: a single NVIDIA RTX A6000 (sm_8.6, 48 GB), CUDA 12.8 wheels, Unsloth 2026.5.2, Transformers 5.5.0, TRL 0.24.0, PEFT 0.19.1, PyTorch 2.10.0+cu128.

Acknowledgments

Base model: Boldt/Boldt-DC-1B.
Training stack: Unsloth, TRL, PEFT.
Dataset authors listed above — thank you for publishing high-quality German data.

Downloads last month: 12

Safetensors

Model size

1B params

Tensor type

BF16

Model tree for mayflowergmbh/boldt-dc-1b-german-it-16k

Base model

Boldt/Boldt-DC-1B

Adapter

(1)

this model

mayflowergmbh
/

boldt-dc-1b-german-it-16k