MultiEvalVietSum

MultiEvalVietSum is a Vietnamese summary evaluation model released under the Hugging Face account phuongntc.

It is a criterion-specific cross-encoder evaluator that takes a source document and a candidate summary as input and outputs three scalar scores:

  • Faithfulness
  • Coherence
  • Relevance

Model description

This model is built on top of the multilingual long-context encoder jhu-clsp/mmBERT-base and fine-tuned as a custom evaluator for Vietnamese summarization research.

Architecture summary:

  • Backbone: jhu-clsp/mmBERT-base
  • Input format: (document, summary) pair
  • Pooling: CLS + mean pooling
  • Prediction heads: three scalar regression heads
  • Criteria: faithfulness, coherence, relevance
  • Training objective: MSE regression + pairwise margin ranking loss

Intended use

This model is intended for:

  • research on automatic summary evaluation in Vietnamese
  • system comparison for Vietnamese summarization
  • criterion-specific scoring of candidate summaries against a source document

This model is not intended to replace human judgment in high-stakes evaluation settings.

Input processing

The evaluator uses a pairwise input construction strategy:

  • the summary is truncated first up to SUM_MAX_LEN = 192
  • the remaining token budget is assigned to the source document
  • the total pair length is capped at MAX_LEN = 2048

This design prioritizes source-document evidence during evaluation.

Reported setup

  • model_name: MultiEvalVietSum
  • repo_id: phuongntc/MultiEvalVietSum
  • backbone: jhu-clsp/mmBERT-base
  • task: Vietnamese summary evaluation
  • max_len: 2048
  • summary_max_len: 192
  • pooling: CLS + mean pooling
  • outputs: faithfulness, coherence, relevance

Validation metrics:

  • val_pearson_faith: None
  • val_pearson_coh: None
  • val_pearson_rel: None
  • val_pearson_mean: None
  • val_spearman_faith: None
  • val_spearman_coh: None
  • val_spearman_rel: None
  • val_spearman_mean: None

Output format

The model outputs three scalar scores:

  1. faithfulness
  2. coherence
  3. relevance

Users may optionally combine them into an overall score using a weighting scheme appropriate for their study.

Limitations

  • The model only sees the truncated (document, summary) pair defined by the preprocessing pipeline
  • Very long documents may be partially invisible to the evaluator
  • If a candidate summary is longer than the summary cap, only the visible portion is evaluated
  • Performance may vary across domains outside the training or evaluation distribution

Transparency and reproducibility notes

To reproduce scores as closely as possible, users should keep the following consistent:

  • backbone model
  • tokenizer
  • MAX_LEN
  • SUM_MAX_LEN
  • pair construction rule
  • model architecture and checkpoint

The repository includes:

  • tokenizer files
  • evaluator weights
  • a custom loader file
  • an inference example
  • a training summary file

How to use

After downloading the repo, use the included files:

  • modeling_multievalvietsum.py
  • inference_example.py

Example:

  1. Download or clone the repository
  2. Open Python in that folder
  3. Run: from inference_example import predict_scores scores = predict_scores("Văn bản gốc", "Bản tóm tắt", model_dir=".") print(scores)

Citation

@misc{phuong2026multievalvietsum, title={MultiEvalVietSum: A Vietnamese Criterion-Specific Evaluator for Summary Assessment}, author={Phuong N. T. and collaborators}, year={2026}, note={Model card and code release on Hugging Face}, howpublished={\url{https://huggingface.co/phuongntc/MultiEvalVietSum}} }

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support