SkillScout Large — Job-to-Skill Dense Retriever
SkillScout Large is a dense bi-encoder for retrieving relevant skills from a job title.
Given a job title (e.g., "Data Scientist"), it encodes it into a 1024-dimensional embedding and retrieves the most semantically relevant skills from the ESCO skill gazetteer (9,052 skills) using cosine similarity.
This is Stage 1 of the TalentGuide two-stage job-skill matching pipeline, trained for TalentCLEF 2026 Task B.
Best pipeline result (TalentCLEF 2026 validation set):
nDCG@10 graded = 0.6896 · nDCG@10 binary = 0.7330
when combined with a fine-tuned cross-encoder re-ranker at blend α = 0.7.
Bi-encoder alone: nDCG@10 graded = 0.3621 · MAP = 0.4545
Model Summary
| Property | Value |
|---|---|
| Base model | jjzha/esco-xlm-roberta-large |
| Architecture | XLM-RoBERTa-large + mean pooling |
| Embedding dimension | 1024 |
| Max sequence length | 64 tokens |
| Training loss | Multiple Negatives Ranking (MNR) |
| Training pairs | 93,720 (ESCO job–skill pairs, essential + optional) |
| Epochs | 3 |
| Best checkpoint | Step 3500 (saved by validation nDCG@10) |
| Hardware | NVIDIA RTX 3070 8GB · fp16 AMP |
What is TalentCLEF Task B?
TalentCLEF 2026 Task B is a graded information-retrieval shared task:
- Query: a job title (e.g., "Electrician")
- Corpus: 9,052 ESCO skills (e.g., "install electric switches", "comply with electrical safety regulations")
- Relevance levels:
2— Core skill (essential regardless of context)1— Contextual skill (depends on employer / industry)0— Non-relevant
Primary metric: nDCG with graded relevance (core=2, contextual=1)
Usage
Installation
pip install sentence-transformers faiss-cpu # or faiss-gpu
Encode & Compare
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("talentguide/skillscout-large")
job = "Data Scientist"
skills = ["data science", "machine learning", "install electric switches"]
embs = model.encode([job] + skills, normalize_embeddings=True)
scores = embs[0] @ embs[1:].T
for skill, score in zip(skills, scores):
print(f"{score:.3f} {skill}")
# 0.872 data science
# 0.731 machine learning
# 0.112 install electric switches
Full Retrieval with FAISS (Recommended)
from sentence_transformers import SentenceTransformer
import faiss, numpy as np
model = SentenceTransformer("talentguide/skillscout-large")
# --- Build index once over your skill corpus ---
skill_texts = [...] # list of skill names / descriptions
embs = model.encode(skill_texts, batch_size=128,
normalize_embeddings=True,
show_progress_bar=True).astype(np.float32)
index = faiss.IndexFlatIP(embs.shape[1]) # inner product on L2-normed = cosine
index.add(embs)
# --- Query at inference time ---
job_title = "Software Engineer"
q = model.encode([job_title], normalize_embeddings=True).astype(np.float32)
scores, idxs = index.search(q, k=50)
for rank, (idx, score) in enumerate(zip(idxs[0], scores[0]), 1):
print(f"{rank:3d}. [{score:.4f}] {skill_texts[idx]}")
Demo Output
Software Engineer
1. [0.942] define software architecture
2. [0.938] software frameworks
3. [0.935] create software design
Data Scientist
1. [0.951] data science
2. [0.921] establish data processes
3. [0.919] create data models
Electrician
1. [0.944] install electric switches
2. [0.938] install electricity sockets
3. [0.930] use electrical wire tools
Two-Stage Pipeline Integration
SkillScout Large is designed as Stage 1 — fast ANN retrieval.
For maximum ranking quality, pair it with a cross-encoder re-ranker:
Job title
│
▼
[SkillScout Large] ← this model
│ top-200 candidates (FAISS ANN, ~40ms)
▼
[Cross-encoder re-ranker]
│ fine-grained re-scoring of top-200
▼
Final ranked list (graded: core > contextual > irrelevant)
Score blending (best result at α = 0.7):
final_score = alpha * biencoder_score + (1 - alpha) * crossencoder_score
Training Details
Data
Source: ESCO occupational ontology, TalentCLEF 2026 training split.
| Count | |
|---|---|
| Raw job–skill pairs (essential + optional) | 114,699 |
| ESCO jobs with aliases | 3,039 |
| ESCO skills with aliases | 13,939 |
| Training InputExamples (after canonical-pair inclusion) | 93,720 |
| Validation queries | 304 |
| Validation corpus (skills) | 9,052 |
| Validation relevance judgments | 56,417 |
Essential pairs are included in full; optional skill pairs are downsampled to 50% of the essential count to maintain class balance.
Hyperparameters
Loss : MultipleNegativesRankingLoss (scale=20, cos_sim)
Batch size : 64 → 63 in-batch negatives per anchor
Epochs : 3
Warmup : 10% of total steps (~440 steps)
Optimizer : AdamW (fused), lr=5e-5, linear decay
Precision : fp16 (AMP)
Max seq length : 64 tokens
Best model saved : by cosine-nDCG@10 on validation (eval every 500 steps)
Seed : 42
Training Curve
| Epoch | Step | Train Loss | nDCG@10 (val) | MAP@100 (val) |
|---|---|---|---|---|
| 0.34 | 500 | 2.9232 | 0.3430 | — |
| 0.68 | 1000 | 2.1179 | 0.3424 | — |
| 1.00 | 1465 | — | 0.3676 | 0.1758 |
| 1.37 | 2000 | 1.7070 | 0.3692 | — |
| 1.71 | 2500 | 1.6366 | 0.3744 | — |
| 2.00 | 2930 | — | 0.3717 | 0.1780 |
| 2.39 | 3500 ✓ | 1.4540 | 0.3769 | 0.1808 |
Best checkpoint saved at step 3500.
Validation Metrics (best checkpoint, binary relevance)
| Metric | Value |
|---|---|
| nDCG@10 | 0.4830 |
| nDCG@50 | 0.4240 |
| nDCG@100 | 0.3769 |
| MAP@100 | 0.1825 |
| MRR@10 | 0.6657 |
| Accuracy@1 | 0.5099 |
| Accuracy@3 | 0.7993 |
| Accuracy@5 | 0.8914 |
| Accuracy@10 | 0.9474 |
Evaluated with sentence_transformers.evaluation.InformationRetrievalEvaluator (binary: any qrel > 0 = relevant).
Pipeline Results (graded nDCG, full 9052-skill ranking, server-side)
| Run | nDCG@10 graded | nDCG@10 binary | MAP |
|---|---|---|---|
Zero-shot jjzha/esco-xlm-roberta-large |
0.2039 | 0.2853 | 0.2663 |
| SkillScout Large (bi-encoder only) | 0.3621 | 0.4830 | 0.4545 |
| SkillScout Large + cross-encoder (α=0.7) | 0.6896 | 0.7330 | 0.2481 |
Competitive Context (TalentCLEF 2025 Task B)
| Team | MAP (test) | Approach |
|---|---|---|
| pjmathematician (winner 2025) | 0.36 | GTE 7B + contrastive + LLM-augmented data |
| NLPnorth (3rd of 14, 2025) | 0.29 | 3-class discriminative classification |
| SkillScout Large (2026 val) | 0.4545 | MNR fine-tuned bi-encoder (Stage 1 only) |
Limitations
- English only — trained on ESCO EN labels.
- ESCO-domain — optimised for the ESCO skill taxonomy; performance on other taxonomies (O*NET, custom) may vary without fine-tuning.
- 64-token cap — long job descriptions should be reduced to a concise title before encoding.
- Graded distinction — the bi-encoder alone does not reliably separate core (2) from contextual (1) skills; a cross-encoder re-ranker is needed for strong graded nDCG.
Citation
@misc{talentguide-skillscout-2026,
title = {SkillScout Large: Dense Job-to-Skill Retrieval for TalentCLEF 2026},
author = {TalentGuide},
year = {2026},
url = {https://huggingface.co/talentguide/skillscout-large}
}
@misc{talentclef2026taskb,
title = {TalentCLEF 2026 Task B: Job-Skill Matching},
author = {TalentCLEF Organizers},
year = {2026},
url = {https://talentclef.github.io/}
}
Framework Versions
| Package | Version |
|---|---|
| Python | 3.12.10 |
| sentence-transformers | 5.3.0 |
| transformers | 5.5.0 |
| PyTorch | 2.11.0+cu128 |
| Accelerate | 1.13.0 |
| Tokenizers | 0.22.2 |
License
- Downloads last month
- 12
Model tree for talentguide/talentclef-biencoder-v1
Base model
jjzha/esco-xlm-roberta-largeEvaluation results
- nDCG@10 on TalentCLEF 2026 Task B — Validation (304 queries, 9052 skills)self-reported0.483
- MAP@100 on TalentCLEF 2026 Task B — Validation (304 queries, 9052 skills)self-reported0.182
- MRR@10 on TalentCLEF 2026 Task B — Validation (304 queries, 9052 skills)self-reported0.666
- Accuracy@1 on TalentCLEF 2026 Task B — Validation (304 queries, 9052 skills)self-reported0.510
- Accuracy@10 on TalentCLEF 2026 Task B — Validation (304 queries, 9052 skills)self-reported0.947