MultiEvalVietSum

MultiEvalVietSum is a Vietnamese summary evaluation model released under the Hugging Face account phuongntc.

It is a criterion-specific cross-encoder evaluator that takes a source document and a candidate summary as input and outputs three scalar scores:

Faithfulness
Coherence
Relevance

Model description

This model is built on top of the multilingual long-context encoder jhu-clsp/mmBERT-base and fine-tuned as a custom evaluator for Vietnamese summarization research.

Architecture summary:

Backbone: jhu-clsp/mmBERT-base
Input format: (document, summary) pair
Pooling: CLS + mean pooling
Prediction heads: three scalar regression heads
Criteria: faithfulness, coherence, relevance
Training objective: MSE regression + pairwise margin ranking loss

Intended use

This model is intended for:

research on automatic summary evaluation in Vietnamese
system comparison for Vietnamese summarization
criterion-specific scoring of candidate summaries against a source document

This model is not intended to replace human judgment in high-stakes evaluation settings.

Input processing

The evaluator uses a pairwise input construction strategy:

the summary is truncated first up to SUM_MAX_LEN = 192
the remaining token budget is assigned to the source document
the total pair length is capped at MAX_LEN = 2048

This design prioritizes source-document evidence during evaluation.

Reported setup

model_name: MultiEvalVietSum
repo_id: phuongntc/MultiEvalVietSum
backbone: jhu-clsp/mmBERT-base
task: Vietnamese summary evaluation
max_len: 2048
summary_max_len: 192
pooling: CLS + mean pooling
outputs: faithfulness, coherence, relevance

Validation metrics:

val_pearson_faith: None
val_pearson_coh: None
val_pearson_rel: None
val_pearson_mean: None
val_spearman_faith: None
val_spearman_coh: None
val_spearman_rel: None
val_spearman_mean: None

Output format

The model outputs three scalar scores:

faithfulness
coherence
relevance

Users may optionally combine them into an overall score using a weighting scheme appropriate for their study.

Limitations

The model only sees the truncated (document, summary) pair defined by the preprocessing pipeline
Very long documents may be partially invisible to the evaluator
If a candidate summary is longer than the summary cap, only the visible portion is evaluated
Performance may vary across domains outside the training or evaluation distribution

Transparency and reproducibility notes

To reproduce scores as closely as possible, users should keep the following consistent:

backbone model
tokenizer
MAX_LEN
SUM_MAX_LEN
pair construction rule
model architecture and checkpoint

The repository includes:

tokenizer files
evaluator weights
a custom loader file
an inference example
a training summary file

How to use

After downloading the repo, use the included files:

modeling_multievalvietsum.py
inference_example.py

Example:

Download or clone the repository
Open Python in that folder
Run: from inference_example import predict_scores scores = predict_scores("Văn bản gốc", "Bản tóm tắt", model_dir=".") print(scores)

Citation

@misc{phuong2026multievalvietsum, title={MultiEvalVietSum: A Vietnamese Criterion-Specific Evaluator for Summary Assessment}, author={Phuong N. T. and collaborators}, year={2026}, note={Model card and code release on Hugging Face}, howpublished={\url{https://huggingface.co/phuongntc/MultiEvalVietSum}} }

Downloads last month: -; Downloads are not tracked for this model. How to track