SkillScout Reranker - Job-Skill Cross-Encoder

SkillScout Reranker is a cross-encoder that re-ranks candidate skills for a given job title, predicting graded relevance (0=irrelevant, 1=contextual, 2=core).

This is Stage 2 of the TalentGuide two-stage job-skill matching pipeline, trained for TalentCLEF 2026 Task B.

Best pipeline result (TalentCLEF 2026 validation set, server-side): nDCG@10 graded = 0.6896 | nDCG@10 binary = 0.7330 SkillScout Large (bi-encoder) + SkillScout Reranker at blend alpha=0.7.

Model Summary

Property	Value
Base model	cross-encoder/ms-marco-MiniLM-L-12-v2
Architecture	BERT (MiniLM-L12) + 3-class classification head
Hidden size	384
Output classes	0 = non-relevant, 1 = contextual, 2 = core
Training triples	~130k (job_title, skill, label)
Hard negatives	5 per job, mined from bi-encoder top-K
Epochs	3
Hardware	NVIDIA RTX 3070 8GB, fp16 AMP

Usage

Installation

pip install transformers torch

Score a single (job, skill) pair

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

tokenizer = AutoTokenizer.from_pretrained("talentguide/skillscout-reranker")
model     = AutoModelForSequenceClassification.from_pretrained("talentguide/skillscout-reranker")
model.eval()

job   = "Data Scientist"
skill = "data science"

enc = tokenizer(job, skill, return_tensors="pt", truncation=True, max_length=128)
with torch.no_grad():
    logits = model(**enc).logits          # shape [1, 3]
probs = logits.softmax(-1)[0].tolist()  # [P(irrelevant), P(contextual), P(core)]
relevance = logits.argmax(-1).item()     # 0, 1, or 2

print(f"Relevance class: {relevance}  (0=none, 1=contextual, 2=core)")
print(f"Probs: none={probs[0]:.3f}  contextual={probs[1]:.3f}  core={probs[2]:.3f}")
# Relevance class: 2  (0=none, 1=contextual, 2=core)
# Probs: none=0.031  contextual=0.142  core=0.827

Re-rank a candidate list

# candidates: list of skill texts from bi-encoder (e.g. top-200)
pairs = [(job, skill) for skill in candidates]
encs  = tokenizer(pairs, return_tensors="pt", truncation=True,
                  padding=True, max_length=128)

with torch.no_grad():
    logits = model(**encs).logits   # [N, 3]

# Use class-2 logit (core probability) as ranking score
scores = logits[:, 2].tolist()
ranked = sorted(zip(candidates, scores), key=lambda x: -x[1])

for rank, (skill, score) in enumerate(ranked[:10], 1):
    print(f"{rank:3d}. [{score:.3f}]  {skill}")

Blend with bi-encoder (recommended, alpha=0.7)

# bi_scores: cosine scores from SkillScout Large (normalised to [0,1])
# ce_scores:  class-2 logit from this model (normalised to [0,1])
alpha = 0.7
final_score = alpha * bi_score + (1 - alpha) * ce_score

Two-Stage Pipeline Integration

Job title
   |
   v
[SkillScout Large]          <- talentguide/skillscout-large
   |  top-200 candidates via FAISS ANN
   v
[SkillScout Reranker]       <- this model
   |  3-class graded scoring (core=2, contextual=1, irrelevant=0)
   v
Final ranked list

Training Details

Data

	Count
Positive triples (essential, label=2)	~57,500
Positive triples (optional, label=1)	~28,600
Hard negatives (label=0, from bi-encoder top-K)	~15,200
Random negatives (label=0)	~30,000
Total training triples	~130,000
Validation queries	304

Hard negatives are mined by running the fine-tuned bi-encoder (SkillScout Large) over all training jobs, collecting the top-K retrieved skills that are NOT in the positive set. This teaches the cross-encoder to distinguish near-miss retrievals from true positives.

Hyperparameters

Base model     : cross-encoder/ms-marco-MiniLM-L-12-v2
Task           : 3-class sequence classification (BERT + linear head)
Loss           : CrossEntropyLoss
Batch size     : 32
Epochs         : 3
Learning rate  : 2e-5, linear warmup 10%
Optimizer      : AdamW
Precision      : fp16 AMP
Max seq len    : 128 tokens
Input format   : [CLS] job_title [SEP] skill_name [SEP]

Pipeline Results (graded relevance, full 9052-skill ranking)

Run	nDCG@10 graded	nDCG@10 binary	MAP
Bi-encoder only (SkillScout Large)	0.3621	0.4830	0.4545
+ CE bad negatives (v1)	0.3226	0.4025	0.4195
+ CE fixed negatives (v2)	0.3315	0.4075	0.4228
+ CE blend alpha=0.7 (local, top-100)	0.3816	0.4973	0.4632
+ CE blend alpha=0.7 (server, full ranking)	0.6896	0.7330	0.2481

Local metrics use top-100 retrieval cutoff; server metrics use full 9,052-skill ranking.

Limitations

Must be paired with a retriever - evaluates pairs, not full corpus ranking. Use with SkillScout Large for efficient retrieval.
English only - trained on ESCO EN labels.
ESCO-domain optimised - transfer to other taxonomies may require fine-tuning.
Speed - re-ranks top-200 candidates (~1-2s per query on GPU). Not suitable for full-corpus scoring at inference time.

Citation

@misc{talentguide-skillscout-reranker-2026,
  title  = {SkillScout Reranker: Graded Job-Skill Cross-Encoder for TalentCLEF 2026},
  author = {TalentGuide},
  year   = {2026},
  url    = {https://huggingface.co/talentguide/skillscout-reranker}
}

@misc{talentclef2026taskb,
  title  = {TalentCLEF 2026 Task B: Job-Skill Matching},
  author = {TalentCLEF Organizers},
  year   = {2026},
  url    = {https://talentclef.github.io/}
}

Framework Versions

Python 3.12.10 | Transformers 5.5.0 | PyTorch 2.11.0+cu128

Downloads last month: 30

Safetensors

Model size

33.4M params

Tensor type

F32

Model tree for talentguide/skillscout-reranker

Base model

microsoft/MiniLM-L12-H384-uncased

Quantized

cross-encoder/ms-marco-MiniLM-L12-v2

Finetuned

(27)

this model

Evaluation results

nDCG@10 Graded (pipeline, server) on TalentCLEF 2026 Task B Validation
self-reported

0.690
nDCG@10 Binary (pipeline, server) on TalentCLEF 2026 Task B Validation
self-reported

0.733