Model Card for Model ID

Legal-BERT Base Entity Classifier

Overview

A fine-tuned BERT-based model for classifying legal entities (such as locations and dates) within the context of legal decision texts. The model is based on nlpaueb/legal-bert-base-uncased and is trained to predict the type of a marked entity span, given its context, using special entity markers [E] ... [/E].

Model Details

  • Model Name: legal-bert-base-classifier-refinement-abb
  • Architecture: BERT (nlpaueb/legal-bert-base-uncased)
  • Task: Entity Classification (NER-style, entity-in-context classification)
  • Framework: PyTorch, Hugging Face Transformers
  • Author: S. Vercoutere

Intended Use

  • Purpose: Automatic classification of legal entities (e.g., location, date) in municipal or governmental decision documents.
  • Not Intended For: General-purpose NER, non-legal domains, or tasks outside entity classification.

Training Data

  • Source: Annotated legal decision texts from Ghent/Freiburg/Bamberg.
  • Entity Types:
    • Locations: impact_location, context_location
    • Dates: publication_date, session_date, entry_date, expiry_date, legal_date, context_date, validity_period, context_period
  • Preprocessing:
    • XML-like tags in text, with entities wrapped in <entity_type>...</entity_type>.
    • For training, one entity per sample is marked with [E] ... [/E] in context.
    • Dataset balanced to max 5000 samples per label.

Training Procedure

  • Model: nlpaueb/legal-bert-base-uncased
  • Tokenization: Hugging Face AutoTokenizer, with [E] and [/E] as additional special tokens.
  • Max Sequence Length: 512
  • Batch Size: 4
  • Optimizer: AdamW
  • Learning Rate: 2e-5
  • Epochs: 10
  • Mixed Precision: Yes (AMP)
  • Validation Split: 20%
  • Evaluation Metrics: Accuracy, F1, confusion matrix

Evaluation

Validation Accuracy: 0.8454 (on held-out validation set)

Detailed Entity-Level Evaluation:

Entity Label Precision Recall F1-score Support
context_date 0.7298 0.7867 0.7572 975
context_location 0.8623 0.8992 0.8804 843
context_period 0.6443 0.7007 0.6713 137
entry_date 0.8806 0.6860 0.7712 484
expiry_date 0.7826 0.5180 0.6234 139
impact_location 0.7616 0.8205 0.7900 997
legal_date 0.9685 0.9777 0.9731 943
publication_date 0.9626 1.0000 0.9809 386
session_date 0.9680 0.9597 0.9638 347
validity_period 0.8772 0.7495 0.8083 467
accuracy 0.8454 5718
macro avg 0.8438 0.8098 0.8220 5718
weighted avg 0.8485 0.8454 0.8444 5718

Usage Example

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

tokenizer = AutoTokenizer.from_pretrained("svercoutere/legal-bert-base-classifier-refinement-abb")
model = AutoModelForSequenceClassification.from_pretrained("svercoutere/legal-bert-base-classifier-refinement-abb")

def classify_entity(entity_text, context_text):
    marked_text = context_text.replace(entity_text, f"[E] {entity_text} [/E]", 1)
    inputs = tokenizer(marked_text, return_tensors="pt", truncation=True, max_length=512, padding="max_length")
    with torch.no_grad():
        outputs = model(**inputs)
    pred = torch.argmax(outputs.logits, dim=-1).item()
    return pred  # Map to label using label_encoder.classes_

Limitations & Bias

  • The model is trained on legal texts from specific municipalities and may not generalize to other domains or languages.
  • Only entity types present in the training data are supported.
  • The model expects entities to be marked with [E] ... [/E] in the input.

Citation

If you use this model, please cite:

@misc{legal-bert-classifier-refinement-abb,
  author = {S. Vercoutere},
  title = {Legal-BERT Entity Refinement},
  year = {2026},
  howpublished = {\url{https://huggingface.co/svercoutere/legal-bert-base-classifier-refinement-abb}}
}
Downloads last month
3
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support