Model Card for Model ID
Legal-BERT Base Entity Classifier
Overview
A fine-tuned BERT-based model for classifying legal entities (such as locations and dates) within the context of legal decision texts.
The model is based on nlpaueb/legal-bert-base-uncased and is trained to predict the type of a marked entity span, given its context, using special entity markers [E] ... [/E].
Model Details
- Model Name: legal-bert-base-classifier-refinement-abb
- Architecture: BERT (nlpaueb/legal-bert-base-uncased)
- Task: Entity Classification (NER-style, entity-in-context classification)
- Framework: PyTorch, Hugging Face Transformers
- Author: S. Vercoutere
Intended Use
- Purpose: Automatic classification of legal entities (e.g., location, date) in municipal or governmental decision documents.
- Not Intended For: General-purpose NER, non-legal domains, or tasks outside entity classification.
Training Data
- Source: Annotated legal decision texts from Ghent/Freiburg/Bamberg.
- Entity Types:
- Locations:
impact_location,context_location - Dates:
publication_date,session_date,entry_date,expiry_date,legal_date,context_date,validity_period,context_period
- Locations:
- Preprocessing:
- XML-like tags in text, with entities wrapped in
<entity_type>...</entity_type>. - For training, one entity per sample is marked with
[E] ... [/E]in context. - Dataset balanced to max 5000 samples per label.
- XML-like tags in text, with entities wrapped in
Training Procedure
- Model:
nlpaueb/legal-bert-base-uncased - Tokenization: Hugging Face AutoTokenizer, with
[E]and[/E]as additional special tokens. - Max Sequence Length: 512
- Batch Size: 4
- Optimizer: AdamW
- Learning Rate: 2e-5
- Epochs: 10
- Mixed Precision: Yes (AMP)
- Validation Split: 20%
- Evaluation Metrics: Accuracy, F1, confusion matrix
Evaluation
Validation Accuracy: 0.8454 (on held-out validation set)
Detailed Entity-Level Evaluation:
| Entity Label | Precision | Recall | F1-score | Support |
|---|---|---|---|---|
| context_date | 0.7298 | 0.7867 | 0.7572 | 975 |
| context_location | 0.8623 | 0.8992 | 0.8804 | 843 |
| context_period | 0.6443 | 0.7007 | 0.6713 | 137 |
| entry_date | 0.8806 | 0.6860 | 0.7712 | 484 |
| expiry_date | 0.7826 | 0.5180 | 0.6234 | 139 |
| impact_location | 0.7616 | 0.8205 | 0.7900 | 997 |
| legal_date | 0.9685 | 0.9777 | 0.9731 | 943 |
| publication_date | 0.9626 | 1.0000 | 0.9809 | 386 |
| session_date | 0.9680 | 0.9597 | 0.9638 | 347 |
| validity_period | 0.8772 | 0.7495 | 0.8083 | 467 |
| accuracy | 0.8454 | 5718 | ||
| macro avg | 0.8438 | 0.8098 | 0.8220 | 5718 |
| weighted avg | 0.8485 | 0.8454 | 0.8444 | 5718 |
Usage Example
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
tokenizer = AutoTokenizer.from_pretrained("svercoutere/legal-bert-base-classifier-refinement-abb")
model = AutoModelForSequenceClassification.from_pretrained("svercoutere/legal-bert-base-classifier-refinement-abb")
def classify_entity(entity_text, context_text):
marked_text = context_text.replace(entity_text, f"[E] {entity_text} [/E]", 1)
inputs = tokenizer(marked_text, return_tensors="pt", truncation=True, max_length=512, padding="max_length")
with torch.no_grad():
outputs = model(**inputs)
pred = torch.argmax(outputs.logits, dim=-1).item()
return pred # Map to label using label_encoder.classes_
Limitations & Bias
- The model is trained on legal texts from specific municipalities and may not generalize to other domains or languages.
- Only entity types present in the training data are supported.
- The model expects entities to be marked with
[E] ... [/E]in the input.
Citation
If you use this model, please cite:
@misc{legal-bert-classifier-refinement-abb,
author = {S. Vercoutere},
title = {Legal-BERT Entity Refinement},
year = {2026},
howpublished = {\url{https://huggingface.co/svercoutere/legal-bert-base-classifier-refinement-abb}}
}
- Downloads last month
- 3