Model Card for Model ID

Legal-BERT Base Entity Classifier

Overview

A fine-tuned BERT-based model for classifying legal entities (such as locations and dates) within the context of legal decision texts. The model is based on nlpaueb/legal-bert-base-uncased and is trained to predict the type of a marked entity span, given its context, using special entity markers [E] ... [/E].

Model Details

Model Name: legal-bert-base-classifier-refinement-abb
Architecture: BERT (nlpaueb/legal-bert-base-uncased)
Task: Entity Classification (NER-style, entity-in-context classification)
Framework: PyTorch, Hugging Face Transformers
Author: S. Vercoutere

Intended Use

Purpose: Automatic classification of legal entities (e.g., location, date) in municipal or governmental decision documents.
Not Intended For: General-purpose NER, non-legal domains, or tasks outside entity classification.

Training Data

Source: Annotated legal decision texts from Ghent/Freiburg/Bamberg.
Entity Types:
- Locations: impact_location, context_location
- Dates: publication_date, session_date, entry_date, expiry_date, legal_date, context_date, validity_period, context_period
Preprocessing:
- XML-like tags in text, with entities wrapped in <entity_type>...</entity_type>.
- For training, one entity per sample is marked with [E] ... [/E] in context.
- Dataset balanced to max 5000 samples per label.

Training Procedure

Model: nlpaueb/legal-bert-base-uncased
Tokenization: Hugging Face AutoTokenizer, with [E] and [/E] as additional special tokens.
Max Sequence Length: 512
Batch Size: 4
Optimizer: AdamW
Learning Rate: 2e-5
Epochs: 10
Mixed Precision: Yes (AMP)
Validation Split: 20%
Evaluation Metrics: Accuracy, F1, confusion matrix

Evaluation

Validation Accuracy: 0.8454 (on held-out validation set)

Detailed Entity-Level Evaluation:

Entity Label	Precision	Recall	F1-score	Support
context_date	0.7298	0.7867	0.7572	975
context_location	0.8623	0.8992	0.8804	843
context_period	0.6443	0.7007	0.6713	137
entry_date	0.8806	0.6860	0.7712	484
expiry_date	0.7826	0.5180	0.6234	139
impact_location	0.7616	0.8205	0.7900	997
legal_date	0.9685	0.9777	0.9731	943
publication_date	0.9626	1.0000	0.9809	386
session_date	0.9680	0.9597	0.9638	347
validity_period	0.8772	0.7495	0.8083	467
accuracy			0.8454	5718
macro avg	0.8438	0.8098	0.8220	5718
weighted avg	0.8485	0.8454	0.8444	5718

Usage Example

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

tokenizer = AutoTokenizer.from_pretrained("svercoutere/legal-bert-base-classifier-refinement-abb")
model = AutoModelForSequenceClassification.from_pretrained("svercoutere/legal-bert-base-classifier-refinement-abb")

def classify_entity(entity_text, context_text):
    marked_text = context_text.replace(entity_text, f"[E] {entity_text} [/E]", 1)
    inputs = tokenizer(marked_text, return_tensors="pt", truncation=True, max_length=512, padding="max_length")
    with torch.no_grad():
        outputs = model(**inputs)
    pred = torch.argmax(outputs.logits, dim=-1).item()
    return pred  # Map to label using label_encoder.classes_

Limitations & Bias

The model is trained on legal texts from specific municipalities and may not generalize to other domains or languages.
Only entity types present in the training data are supported.
The model expects entities to be marked with [E] ... [/E] in the input.

Citation

If you use this model, please cite:

@misc{legal-bert-classifier-refinement-abb,
  author = {S. Vercoutere},
  title = {Legal-BERT Entity Refinement},
  year = {2026},
  howpublished = {\url{https://huggingface.co/svercoutere/legal-bert-base-classifier-refinement-abb}}
}

Downloads last month: 3

Safetensors

Model size

0.1B params

Tensor type

F32