MultiEvalVietSum
MultiEvalVietSum is a Vietnamese summary evaluation model released under the Hugging Face account phuongntc.
It is a criterion-specific cross-encoder evaluator that takes a source document and a candidate summary as input and outputs three scalar scores:
- Faithfulness
- Coherence
- Relevance
Model description
This model is built on top of the multilingual long-context encoder jhu-clsp/mmBERT-base and fine-tuned as a custom evaluator for Vietnamese summarization research.
Architecture summary:
- Backbone: jhu-clsp/mmBERT-base
- Input format: (document, summary) pair
- Pooling: CLS + mean pooling
- Prediction heads: three scalar regression heads
- Criteria: faithfulness, coherence, relevance
- Training objective: MSE regression + pairwise margin ranking loss
Intended use
This model is intended for:
- research on automatic summary evaluation in Vietnamese
- system comparison for Vietnamese summarization
- criterion-specific scoring of candidate summaries against a source document
This model is not intended to replace human judgment in high-stakes evaluation settings.
Input processing
The evaluator uses a pairwise input construction strategy:
- the summary is truncated first up to SUM_MAX_LEN = 192
- the remaining token budget is assigned to the source document
- the total pair length is capped at MAX_LEN = 2048
This design prioritizes source-document evidence during evaluation.
Reported setup
- model_name: MultiEvalVietSum
- repo_id: phuongntc/MultiEvalVietSum
- backbone: jhu-clsp/mmBERT-base
- task: Vietnamese summary evaluation
- max_len: 2048
- summary_max_len: 192
- pooling: CLS + mean pooling
- outputs: faithfulness, coherence, relevance
Validation metrics:
- val_pearson_faith: None
- val_pearson_coh: None
- val_pearson_rel: None
- val_pearson_mean: None
- val_spearman_faith: None
- val_spearman_coh: None
- val_spearman_rel: None
- val_spearman_mean: None
Output format
The model outputs three scalar scores:
- faithfulness
- coherence
- relevance
Users may optionally combine them into an overall score using a weighting scheme appropriate for their study.
Limitations
- The model only sees the truncated (document, summary) pair defined by the preprocessing pipeline
- Very long documents may be partially invisible to the evaluator
- If a candidate summary is longer than the summary cap, only the visible portion is evaluated
- Performance may vary across domains outside the training or evaluation distribution
Transparency and reproducibility notes
To reproduce scores as closely as possible, users should keep the following consistent:
- backbone model
- tokenizer
- MAX_LEN
- SUM_MAX_LEN
- pair construction rule
- model architecture and checkpoint
The repository includes:
- tokenizer files
- evaluator weights
- a custom loader file
- an inference example
- a training summary file
How to use
After downloading the repo, use the included files:
- modeling_multievalvietsum.py
- inference_example.py
Example:
- Download or clone the repository
- Open Python in that folder
- Run: from inference_example import predict_scores scores = predict_scores("Văn bản gốc", "Bản tóm tắt", model_dir=".") print(scores)
Citation
@misc{phuong2026multievalvietsum, title={MultiEvalVietSum: A Vietnamese Criterion-Specific Evaluator for Summary Assessment}, author={Phuong N. T. and collaborators}, year={2026}, note={Model card and code release on Hugging Face}, howpublished={\url{https://huggingface.co/phuongntc/MultiEvalVietSum}} }