--- license: cc-by-nc-4.0 --- # SentiV **A Benchmark for Low-Resource Vietnamese Speech–Text Emotion Understanding** This repository releases datasets, code, and pretrained checkpoints for **SentiV**, a benchmark for Vietnamese emotion understanding across **text**, **speech**, and **multimodal** settings, as described in our paper. πŸ“„ **Paper**: *SentiV: A Benchmark for Low-Resource Vietnamese Speech–Text Emotion Understanding* --- ## 1. Overview SentiV focuses on realistic low-resource evaluation for Vietnamese emotion recognition under: * Label imbalance * Limited supervision (1–100% label budgets) * Cross-dataset and cross-modal generalization * Explicit label-space alignment between text and speech We release: * Text emotion dataset (data + code + checkpoints) * Speech emotion annotations (labels + code + checkpoints) * Reproducible training and evaluation scripts --- ## 2. Repository Structure ``` sentiv/ β”œβ”€β”€ text-training/ β”‚ β”œβ”€β”€ model/ # Text model checkpoints β”‚ β”œβ”€β”€ train_PhoBERT.py # Training script (PhoBERT) β”‚ β”œβ”€β”€ train.xlsx # Labeled text data β”‚ └── readme.MD β”‚ β”œβ”€β”€ voice-training/ β”‚ β”œβ”€β”€ hubert-large-ls960/ # Speech model checkpoints β”‚ β”œβ”€β”€ label/ # Emotion labels and split manifests (speech) β”‚ β”œβ”€β”€ label-text/ # Text samples paired with the speech data, annotated with the emotion labels β”‚ β”œβ”€β”€ train_hubert.py # HuBERT fine-tuning script β”‚ └── readme.MD β”‚ └── README.md # This file ``` --- ## 3. Tasks and Label Space ### Task A: Text Emotion Classification * Labels (7): `Anger, Disgust, Enjoyment, Fear, Neutral, Sadness, Surprise` * Dataset: social media text (comments, posts) * Evaluation: Macro-F1, Accuracy ### Task B: Speech Emotion Classification * Labels (6): `Anger (includes Disgust), Enjoyment, Fear, Neutral, Sadness, Surprise` * Disgust is merged into Anger due to extreme scarcity in speech data ### Task C: Multimodal Speech–Text Classification * Same 6-label space as speech * Late fusion over text and speech logits --- ## 4. Text Modality (text-training) ### Data * Source: public Vietnamese social media posts * Size: 265,011 labeled samples * Average length: ~20 words * Labels: 7 emotions * Anonymized and released strictly for research use ### Model * Backbone: **PhoBERT (vinai/phobert-base)** * Loss: Focal Loss with class reweighting * Max sequence length: 256 * Metric: Macro-F1 ### Training ```bash python train_PhoBERT.py ``` The script supports: * Class imbalance handling * Oversampling * Low-resource label budgets * Fixed train/dev/test splits --- ## 5. Speech Modality (voice-training) ### Data * Source audio: VietSpeech dataset (batches 0–10) * We release: * Emotion labels * Split manifests * Training code * Raw audio must be obtained from the original VietSpeech source under its license ### Label Mapping * Disgust is merged into Anger for training stability * Final label space: 6 emotions ### Model * Backbone: **HuBERT Large (ls960)** * Input: 16 kHz audio, max 8 seconds * Loss: Weighted Cross-Entropy * Sampler: WeightedRandomSampler * Metric: Macro-F1 ### Training ```bash python train_hubert.py ``` --- ## 6. Multimodal Fusion We adopt **late fusion at logit level** for reproducibility. ### Fusion Strategy * Average fusion * Concatenation + MLP * **Uncertainty-aware late fusion** (main method) Confidence is estimated from entropy or max probability, and fusion weights are adjusted dynamically to down-weight unreliable modalities. --- ## 7. Low-Resource Evaluation Protocol * Label budgets: 1%, 5%, 10%, 25%, 50%, 100% * Fixed test set * Only training data is subsampled * 3–5 random seeds per setting * Report mean Β± std This protocol is designed to reflect realistic variance under limited supervision. --- ## 8. Ethics and Licensing ### Text Data * Collected from publicly available social media * All user-identifying information removed * Research-only use * Takedown requests supported ### Speech Data * Based on VietSpeech * Speakers provided research consent * We release labels and derived artifacts only Users must comply with original dataset licenses. --- ## 9. Access Policy This repository is released via Hugging Face with **access control enabled**. * Users must request access * Access is granted manually for research purposes * Redistribution without permission is not allowed --- ## 10. Citation If you use SentiV, please cite our paper: ```bibtex @misc{pham_duc_dat_2026, author = { Pham Duc Dat and Ngoc Tram Huynh Thi and Vo Ngoc Minh Anh and Nhan Le Thanh Pham and Le Anh Tien and Tan Duy Le and Kha Tu Huynh }, title = { sentiv (Revision 6b39b15) }, year = 2026, url = { https://huggingface.co/ducdatit2002/sentiv }, doi = { 10.57967/hf/7805 }, publisher = { Hugging Face } } ```