SkillScout Reranker - Job-Skill Cross-Encoder

SkillScout Reranker is a cross-encoder that re-ranks candidate skills for a given job title, predicting graded relevance (0=irrelevant, 1=contextual, 2=core).

This is Stage 2 of the TalentGuide two-stage job-skill matching pipeline, trained for TalentCLEF 2026 Task B.

Best pipeline result (TalentCLEF 2026 validation set, server-side): nDCG@10 graded = 0.6896 | nDCG@10 binary = 0.7330 SkillScout Large (bi-encoder) + SkillScout Reranker at blend alpha=0.7.


Model Summary

Property Value
Base model cross-encoder/ms-marco-MiniLM-L-12-v2
Architecture BERT (MiniLM-L12) + 3-class classification head
Hidden size 384
Output classes 0 = non-relevant, 1 = contextual, 2 = core
Training triples ~130k (job_title, skill, label)
Hard negatives 5 per job, mined from bi-encoder top-K
Epochs 3
Hardware NVIDIA RTX 3070 8GB, fp16 AMP

Usage

Installation

pip install transformers torch

Score a single (job, skill) pair

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

tokenizer = AutoTokenizer.from_pretrained("talentguide/skillscout-reranker")
model     = AutoModelForSequenceClassification.from_pretrained("talentguide/skillscout-reranker")
model.eval()

job   = "Data Scientist"
skill = "data science"

enc = tokenizer(job, skill, return_tensors="pt", truncation=True, max_length=128)
with torch.no_grad():
    logits = model(**enc).logits          # shape [1, 3]
probs = logits.softmax(-1)[0].tolist()  # [P(irrelevant), P(contextual), P(core)]
relevance = logits.argmax(-1).item()     # 0, 1, or 2

print(f"Relevance class: {relevance}  (0=none, 1=contextual, 2=core)")
print(f"Probs: none={probs[0]:.3f}  contextual={probs[1]:.3f}  core={probs[2]:.3f}")
# Relevance class: 2  (0=none, 1=contextual, 2=core)
# Probs: none=0.031  contextual=0.142  core=0.827

Re-rank a candidate list

# candidates: list of skill texts from bi-encoder (e.g. top-200)
pairs = [(job, skill) for skill in candidates]
encs  = tokenizer(pairs, return_tensors="pt", truncation=True,
                  padding=True, max_length=128)

with torch.no_grad():
    logits = model(**encs).logits   # [N, 3]

# Use class-2 logit (core probability) as ranking score
scores = logits[:, 2].tolist()
ranked = sorted(zip(candidates, scores), key=lambda x: -x[1])

for rank, (skill, score) in enumerate(ranked[:10], 1):
    print(f"{rank:3d}. [{score:.3f}]  {skill}")

Blend with bi-encoder (recommended, alpha=0.7)

# bi_scores: cosine scores from SkillScout Large (normalised to [0,1])
# ce_scores:  class-2 logit from this model (normalised to [0,1])
alpha = 0.7
final_score = alpha * bi_score + (1 - alpha) * ce_score

Two-Stage Pipeline Integration

Job title
   |
   v
[SkillScout Large]          <- talentguide/skillscout-large
   |  top-200 candidates via FAISS ANN
   v
[SkillScout Reranker]       <- this model
   |  3-class graded scoring (core=2, contextual=1, irrelevant=0)
   v
Final ranked list

Training Details

Data

Count
Positive triples (essential, label=2) ~57,500
Positive triples (optional, label=1) ~28,600
Hard negatives (label=0, from bi-encoder top-K) ~15,200
Random negatives (label=0) ~30,000
Total training triples ~130,000
Validation queries 304

Hard negatives are mined by running the fine-tuned bi-encoder (SkillScout Large) over all training jobs, collecting the top-K retrieved skills that are NOT in the positive set. This teaches the cross-encoder to distinguish near-miss retrievals from true positives.

Hyperparameters

Base model     : cross-encoder/ms-marco-MiniLM-L-12-v2
Task           : 3-class sequence classification (BERT + linear head)
Loss           : CrossEntropyLoss
Batch size     : 32
Epochs         : 3
Learning rate  : 2e-5, linear warmup 10%
Optimizer      : AdamW
Precision      : fp16 AMP
Max seq len    : 128 tokens
Input format   : [CLS] job_title [SEP] skill_name [SEP]

Pipeline Results (graded relevance, full 9052-skill ranking)

Run nDCG@10 graded nDCG@10 binary MAP
Bi-encoder only (SkillScout Large) 0.3621 0.4830 0.4545
+ CE bad negatives (v1) 0.3226 0.4025 0.4195
+ CE fixed negatives (v2) 0.3315 0.4075 0.4228
+ CE blend alpha=0.7 (local, top-100) 0.3816 0.4973 0.4632
+ CE blend alpha=0.7 (server, full ranking) 0.6896 0.7330 0.2481

Local metrics use top-100 retrieval cutoff; server metrics use full 9,052-skill ranking.


Limitations

  • Must be paired with a retriever - evaluates pairs, not full corpus ranking. Use with SkillScout Large for efficient retrieval.
  • English only - trained on ESCO EN labels.
  • ESCO-domain optimised - transfer to other taxonomies may require fine-tuning.
  • Speed - re-ranks top-200 candidates (~1-2s per query on GPU). Not suitable for full-corpus scoring at inference time.

Citation

@misc{talentguide-skillscout-reranker-2026,
  title  = {SkillScout Reranker: Graded Job-Skill Cross-Encoder for TalentCLEF 2026},
  author = {TalentGuide},
  year   = {2026},
  url    = {https://huggingface.co/talentguide/skillscout-reranker}
}

@misc{talentclef2026taskb,
  title  = {TalentCLEF 2026 Task B: Job-Skill Matching},
  author = {TalentCLEF Organizers},
  year   = {2026},
  url    = {https://talentclef.github.io/}
}

Framework Versions

  • Python 3.12.10 | Transformers 5.5.0 | PyTorch 2.11.0+cu128
Downloads last month
30
Safetensors
Model size
33.4M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for talentguide/skillscout-reranker

Evaluation results

  • nDCG@10 Graded (pipeline, server) on TalentCLEF 2026 Task B Validation
    self-reported
    0.690
  • nDCG@10 Binary (pipeline, server) on TalentCLEF 2026 Task B Validation
    self-reported
    0.733