BAREC Strict Track Sentence-Level Readability Model

Overview

This model is designed for fine-grained Arabic readability assessment at the sentence level, developed for the BAREC Shared Task 2025 (Strict Track). It is based on AraBERTv2 and fine-tuned using the BAREC corpus with a 19-level readability classification. The model uses D3Tok input variants and a combination of Cross-Entropy (CE) and Quadratic Weighted Kappa (WKL) losses.

Intended Uses & Limitations

Intended use: Predicting the readability of Arabic sentences or documents (scale 1-19)
Domain: Modern Standard Arabic, educational content

Model Details

Base model: CAMeL-Lab/readability-arabertv2-d3tok-CE
Input variant: D3Tok (token-level)
Labels: 19 readability levels (1 = easiest, 19 = hardest)
Losses: CE → WKL (for best results)
Strict track: Sentence
Best QWK: 82% (sentence-level)

Training Data

Corpus: BAREC Corpus v1.0
Train/Val/Test split: Train (80%), Dev (10%), and Test (10%).
Preprocessing: Input variant generated using the official scripts (D3Tok)
Cleaning: No additional cleaning, only official preprocessing

Training Procedure

Loss functions: Cross-Entropy, then Quadratic Weighted Kappa (WKL)
Hyperparameters:
- Learning rate: 2e-5
- Batch size: 32
- Epochs: 8
- Scheduler: cosine_with_restarts
- Weight Decay: 0.05
- fp16: enabled
Metrics: QWK (Quadratic Weighted Kappa), macro F1, accuracy

Evaluation Results

Split	QWK
Validation	82.0%
Test (Public)	84.2%
Blind Test*	84.1%

Usage Example

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load the model and tokenizer
model_name = "shymaa25/barec-readability-sent-arabertv2-d3tok-ce-wkl-strict"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Predict readability for a single sentence
sentence = "هذه الجملة تتطلب مستوى قراءة متقدم."
inputs = tokenizer(sentence, return_tensors="pt", truncation=True, padding=True)
with torch.no_grad():
    outputs = model(**inputs)
    pred = torch.argmax(outputs.logits, dim=1).item() + 1  # Labels from 1–19

print(f"Sentence readability level: {pred}")

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for shymaa25/barec-readability-sent-arabertv2-d3tok-ce-wkl-strict

Base model

aubmindlab/bert-base-arabertv2

Finetuned

CAMeL-Lab/readability-arabertv2-d3tok-CE