Tori v2 — Name Splitter
ModernBERT-base (149M params) fine-tuned for splitting full name strings into forenames and surnames using BIO token classification.
Evaluation Results
| Variant | F1 | Precision | Recall | Eval Dataset |
|---|---|---|---|---|
| bilineal (default) | 0.9948 | 0.9948 | 0.9949 | MX/CO/ES/PE/CL names |
| unilineal | 0.9927 | 0.9927 | 0.9927 | AR/US/BR/PT names |
Metrics are entity-level (seqeval) — a name span is only correct if all its tokens match.
Variants
| Variant | Countries | Surname Pattern | Subfolder |
|---|---|---|---|
| bilineal (default) | MX, CO, ES, PE, CL | Double surname (paternal + maternal) | / (root) |
| unilineal | AR, US, BR, PT | Single surname | unilineal/ |
Usage
from tori.inference import load_pipeline, split_name
# Default: bilineal model (double-surname countries)
pipe = load_pipeline("ittailup/tori2")
result = split_name(pipe, "Juan Carlos García López")
print(result.forenames) # ['Juan', 'Carlos']
print(result.surnames) # ['García', 'López']
# Unilineal model (single-surname countries)
pipe = load_pipeline("ittailup/tori2", variant="unilineal")
result = split_name(pipe, "John Michael Smith")
print(result.forenames) # ['John', 'Michael']
print(result.surnames) # ['Smith']
Labels
O— Outside any name entityB-forenames— Beginning of forenameI-forenames— Inside forename (continuation)B-surnames— Beginning of surnameI-surnames— Inside surname (continuation)
Important: Custom Aggregation Required
This model uses ModernBERT's GPT-style BPE tokenizer (Ġ prefix), which is
not compatible with HuggingFace's built-in aggregation_strategy="simple".
Use the tori.inference module which handles subword aggregation correctly,
or use aggregation_strategy="none" and aggregate tokens yourself using
character offsets.
Training
- Base model: answerdotai/ModernBERT-base (149M params)
- Training data: philipperemy/name-dataset, Mexico SEP, RENAPER (AR), datos.gob.ar
- Batch size: 256 effective (128 x 2 gradient accumulation)
- Learning rate: 5e-5, cosine schedule with 10% warmup
- Epochs: 3
- Precision: bf16
- Hardware: NVIDIA A10G (AWS g5.xlarge)
- Downloads last month
- 19
Evaluation results
- F1 on Bilineal eval split (MX/CO/ES/PE/CL)self-reported0.995
- Precision on Bilineal eval split (MX/CO/ES/PE/CL)self-reported0.995
- Recall on Bilineal eval split (MX/CO/ES/PE/CL)self-reported0.995
- F1 on Unilineal eval split (AR/US/BR/PT)self-reported0.993
- Precision on Unilineal eval split (AR/US/BR/PT)self-reported0.993
- Recall on Unilineal eval split (AR/US/BR/PT)self-reported0.993