SkillScout Reranker - Job-Skill Cross-Encoder
SkillScout Reranker is a cross-encoder that re-ranks candidate skills for a given job title, predicting graded relevance (0=irrelevant, 1=contextual, 2=core).
This is Stage 2 of the TalentGuide two-stage job-skill matching pipeline, trained for TalentCLEF 2026 Task B.
Best pipeline result (TalentCLEF 2026 validation set, server-side): nDCG@10 graded = 0.6896 | nDCG@10 binary = 0.7330 SkillScout Large (bi-encoder) + SkillScout Reranker at blend alpha=0.7.
Model Summary
| Property | Value |
|---|---|
| Base model | cross-encoder/ms-marco-MiniLM-L-12-v2 |
| Architecture | BERT (MiniLM-L12) + 3-class classification head |
| Hidden size | 384 |
| Output classes | 0 = non-relevant, 1 = contextual, 2 = core |
| Training triples | ~130k (job_title, skill, label) |
| Hard negatives | 5 per job, mined from bi-encoder top-K |
| Epochs | 3 |
| Hardware | NVIDIA RTX 3070 8GB, fp16 AMP |
Usage
Installation
pip install transformers torch
Score a single (job, skill) pair
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
tokenizer = AutoTokenizer.from_pretrained("talentguide/skillscout-reranker")
model = AutoModelForSequenceClassification.from_pretrained("talentguide/skillscout-reranker")
model.eval()
job = "Data Scientist"
skill = "data science"
enc = tokenizer(job, skill, return_tensors="pt", truncation=True, max_length=128)
with torch.no_grad():
logits = model(**enc).logits # shape [1, 3]
probs = logits.softmax(-1)[0].tolist() # [P(irrelevant), P(contextual), P(core)]
relevance = logits.argmax(-1).item() # 0, 1, or 2
print(f"Relevance class: {relevance} (0=none, 1=contextual, 2=core)")
print(f"Probs: none={probs[0]:.3f} contextual={probs[1]:.3f} core={probs[2]:.3f}")
# Relevance class: 2 (0=none, 1=contextual, 2=core)
# Probs: none=0.031 contextual=0.142 core=0.827
Re-rank a candidate list
# candidates: list of skill texts from bi-encoder (e.g. top-200)
pairs = [(job, skill) for skill in candidates]
encs = tokenizer(pairs, return_tensors="pt", truncation=True,
padding=True, max_length=128)
with torch.no_grad():
logits = model(**encs).logits # [N, 3]
# Use class-2 logit (core probability) as ranking score
scores = logits[:, 2].tolist()
ranked = sorted(zip(candidates, scores), key=lambda x: -x[1])
for rank, (skill, score) in enumerate(ranked[:10], 1):
print(f"{rank:3d}. [{score:.3f}] {skill}")
Blend with bi-encoder (recommended, alpha=0.7)
# bi_scores: cosine scores from SkillScout Large (normalised to [0,1])
# ce_scores: class-2 logit from this model (normalised to [0,1])
alpha = 0.7
final_score = alpha * bi_score + (1 - alpha) * ce_score
Two-Stage Pipeline Integration
Job title
|
v
[SkillScout Large] <- talentguide/skillscout-large
| top-200 candidates via FAISS ANN
v
[SkillScout Reranker] <- this model
| 3-class graded scoring (core=2, contextual=1, irrelevant=0)
v
Final ranked list
Training Details
Data
| Count | |
|---|---|
| Positive triples (essential, label=2) | ~57,500 |
| Positive triples (optional, label=1) | ~28,600 |
| Hard negatives (label=0, from bi-encoder top-K) | ~15,200 |
| Random negatives (label=0) | ~30,000 |
| Total training triples | ~130,000 |
| Validation queries | 304 |
Hard negatives are mined by running the fine-tuned bi-encoder (SkillScout Large) over all training jobs, collecting the top-K retrieved skills that are NOT in the positive set. This teaches the cross-encoder to distinguish near-miss retrievals from true positives.
Hyperparameters
Base model : cross-encoder/ms-marco-MiniLM-L-12-v2
Task : 3-class sequence classification (BERT + linear head)
Loss : CrossEntropyLoss
Batch size : 32
Epochs : 3
Learning rate : 2e-5, linear warmup 10%
Optimizer : AdamW
Precision : fp16 AMP
Max seq len : 128 tokens
Input format : [CLS] job_title [SEP] skill_name [SEP]
Pipeline Results (graded relevance, full 9052-skill ranking)
| Run | nDCG@10 graded | nDCG@10 binary | MAP |
|---|---|---|---|
| Bi-encoder only (SkillScout Large) | 0.3621 | 0.4830 | 0.4545 |
| + CE bad negatives (v1) | 0.3226 | 0.4025 | 0.4195 |
| + CE fixed negatives (v2) | 0.3315 | 0.4075 | 0.4228 |
| + CE blend alpha=0.7 (local, top-100) | 0.3816 | 0.4973 | 0.4632 |
| + CE blend alpha=0.7 (server, full ranking) | 0.6896 | 0.7330 | 0.2481 |
Local metrics use top-100 retrieval cutoff; server metrics use full 9,052-skill ranking.
Limitations
- Must be paired with a retriever - evaluates pairs, not full corpus ranking. Use with SkillScout Large for efficient retrieval.
- English only - trained on ESCO EN labels.
- ESCO-domain optimised - transfer to other taxonomies may require fine-tuning.
- Speed - re-ranks top-200 candidates (~1-2s per query on GPU). Not suitable for full-corpus scoring at inference time.
Citation
@misc{talentguide-skillscout-reranker-2026,
title = {SkillScout Reranker: Graded Job-Skill Cross-Encoder for TalentCLEF 2026},
author = {TalentGuide},
year = {2026},
url = {https://huggingface.co/talentguide/skillscout-reranker}
}
@misc{talentclef2026taskb,
title = {TalentCLEF 2026 Task B: Job-Skill Matching},
author = {TalentCLEF Organizers},
year = {2026},
url = {https://talentclef.github.io/}
}
Framework Versions
- Python 3.12.10 | Transformers 5.5.0 | PyTorch 2.11.0+cu128
- Downloads last month
- 30
Model tree for talentguide/skillscout-reranker
Base model
microsoft/MiniLM-L12-H384-uncasedEvaluation results
- nDCG@10 Graded (pipeline, server) on TalentCLEF 2026 Task B Validationself-reported0.690
- nDCG@10 Binary (pipeline, server) on TalentCLEF 2026 Task B Validationself-reported0.733