SkillScout Large — Job-to-Skill Dense Retriever

SkillScout Large is a dense bi-encoder for retrieving relevant skills from a job title.
Given a job title (e.g., "Data Scientist"), it encodes it into a 1024-dimensional embedding and retrieves the most semantically relevant skills from the ESCO skill gazetteer (9,052 skills) using cosine similarity.

This is Stage 1 of the TalentGuide two-stage job-skill matching pipeline, trained for TalentCLEF 2026 Task B.

Best pipeline result (TalentCLEF 2026 validation set):
nDCG@10 graded = 0.6896 · nDCG@10 binary = 0.7330
when combined with a fine-tuned cross-encoder re-ranker at blend α = 0.7.
Bi-encoder alone: nDCG@10 graded = 0.3621 · MAP = 0.4545

Model Summary

Property	Value
Base model	`jjzha/esco-xlm-roberta-large`
Architecture	XLM-RoBERTa-large + mean pooling
Embedding dimension	1024
Max sequence length	64 tokens
Training loss	Multiple Negatives Ranking (MNR)
Training pairs	93,720 (ESCO job–skill pairs, essential + optional)
Epochs	3
Best checkpoint	Step 3500 (saved by validation nDCG@10)
Hardware	NVIDIA RTX 3070 8GB · fp16 AMP

What is TalentCLEF Task B?

TalentCLEF 2026 Task B is a graded information-retrieval shared task:

Query: a job title (e.g., "Electrician")
Corpus: 9,052 ESCO skills (e.g., "install electric switches", "comply with electrical safety regulations")
Relevance levels:
- 2 — Core skill (essential regardless of context)
- 1 — Contextual skill (depends on employer / industry)
- 0 — Non-relevant

Primary metric: nDCG with graded relevance (core=2, contextual=1)

Usage

Installation

pip install sentence-transformers faiss-cpu  # or faiss-gpu

Encode & Compare

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("talentguide/skillscout-large")

job    = "Data Scientist"
skills = ["data science", "machine learning", "install electric switches"]

embs   = model.encode([job] + skills, normalize_embeddings=True)
scores = embs[0] @ embs[1:].T

for skill, score in zip(skills, scores):
    print(f"{score:.3f}  {skill}")
# 0.872  data science
# 0.731  machine learning
# 0.112  install electric switches

Full Retrieval with FAISS (Recommended)

from sentence_transformers import SentenceTransformer
import faiss, numpy as np

model = SentenceTransformer("talentguide/skillscout-large")

# --- Build index once over your skill corpus ---
skill_texts = [...]   # list of skill names / descriptions

embs = model.encode(skill_texts, batch_size=128,
                    normalize_embeddings=True,
                    show_progress_bar=True).astype(np.float32)

index = faiss.IndexFlatIP(embs.shape[1])  # inner product on L2-normed = cosine
index.add(embs)

# --- Query at inference time ---
job_title = "Software Engineer"
q = model.encode([job_title], normalize_embeddings=True).astype(np.float32)

scores, idxs = index.search(q, k=50)
for rank, (idx, score) in enumerate(zip(idxs[0], scores[0]), 1):
    print(f"{rank:3d}. [{score:.4f}]  {skill_texts[idx]}")

Demo Output

Software Engineer
   1. [0.942]  define software architecture
   2. [0.938]  software frameworks
   3. [0.935]  create software design

Data Scientist
   1. [0.951]  data science
   2. [0.921]  establish data processes
   3. [0.919]  create data models

Electrician
   1. [0.944]  install electric switches
   2. [0.938]  install electricity sockets
   3. [0.930]  use electrical wire tools

Two-Stage Pipeline Integration

SkillScout Large is designed as Stage 1 — fast ANN retrieval.
For maximum ranking quality, pair it with a cross-encoder re-ranker:

Job title
   │
   ▼
[SkillScout Large]              ← this model
   │  top-200 candidates (FAISS ANN, ~40ms)
   ▼
[Cross-encoder re-ranker]
   │  fine-grained re-scoring of top-200
   ▼
Final ranked list  (graded: core > contextual > irrelevant)

Score blending (best result at α = 0.7):

final_score = alpha * biencoder_score + (1 - alpha) * crossencoder_score

Training Details

Data

Source: ESCO occupational ontology, TalentCLEF 2026 training split.

	Count
Raw job–skill pairs (essential + optional)	114,699
ESCO jobs with aliases	3,039
ESCO skills with aliases	13,939
Training InputExamples (after canonical-pair inclusion)	93,720
Validation queries	304
Validation corpus (skills)	9,052
Validation relevance judgments	56,417

Essential pairs are included in full; optional skill pairs are downsampled to 50% of the essential count to maintain class balance.

Hyperparameters

Loss              : MultipleNegativesRankingLoss (scale=20, cos_sim)
Batch size        : 64  →  63 in-batch negatives per anchor
Epochs            : 3
Warmup            : 10% of total steps (~440 steps)
Optimizer         : AdamW (fused), lr=5e-5, linear decay
Precision         : fp16 (AMP)
Max seq length    : 64 tokens
Best model saved  : by cosine-nDCG@10 on validation (eval every 500 steps)
Seed              : 42

Training Curve

Epoch	Step	Train Loss	nDCG@10 (val)	MAP@100 (val)
0.34	500	2.9232	0.3430	—
0.68	1000	2.1179	0.3424	—
1.00	1465	—	0.3676	0.1758
1.37	2000	1.7070	0.3692	—
1.71	2500	1.6366	0.3744	—
2.00	2930	—	0.3717	0.1780
2.39	3500 ✓	1.4540	0.3769	0.1808

Best checkpoint saved at step 3500.

Validation Metrics (best checkpoint, binary relevance)

Metric	Value
nDCG@10	0.4830
nDCG@50	0.4240
nDCG@100	0.3769
MAP@100	0.1825
MRR@10	0.6657
Accuracy@1	0.5099
Accuracy@3	0.7993
Accuracy@5	0.8914
Accuracy@10	0.9474

Evaluated with sentence_transformers.evaluation.InformationRetrievalEvaluator (binary: any qrel > 0 = relevant).

Pipeline Results (graded nDCG, full 9052-skill ranking, server-side)

Run	nDCG@10 graded	nDCG@10 binary	MAP
Zero-shot `jjzha/esco-xlm-roberta-large`	0.2039	0.2853	0.2663
SkillScout Large (bi-encoder only)	0.3621	0.4830	0.4545
SkillScout Large + cross-encoder (α=0.7)	0.6896	0.7330	0.2481

Competitive Context (TalentCLEF 2025 Task B)

Team	MAP (test)	Approach
pjmathematician (winner 2025)	0.36	GTE 7B + contrastive + LLM-augmented data
NLPnorth (3rd of 14, 2025)	0.29	3-class discriminative classification
SkillScout Large (2026 val)	0.4545	MNR fine-tuned bi-encoder (Stage 1 only)

Limitations

English only — trained on ESCO EN labels.
ESCO-domain — optimised for the ESCO skill taxonomy; performance on other taxonomies (O*NET, custom) may vary without fine-tuning.
64-token cap — long job descriptions should be reduced to a concise title before encoding.
Graded distinction — the bi-encoder alone does not reliably separate core (2) from contextual (1) skills; a cross-encoder re-ranker is needed for strong graded nDCG.

Citation

@misc{talentguide-skillscout-2026,
  title   = {SkillScout Large: Dense Job-to-Skill Retrieval for TalentCLEF 2026},
  author  = {TalentGuide},
  year    = {2026},
  url     = {https://huggingface.co/talentguide/skillscout-large}
}

@misc{talentclef2026taskb,
  title   = {TalentCLEF 2026 Task B: Job-Skill Matching},
  author  = {TalentCLEF Organizers},
  year    = {2026},
  url     = {https://talentclef.github.io/}
}

Framework Versions

Package	Version
Python	3.12.10
sentence-transformers	5.3.0
transformers	5.5.0
PyTorch	2.11.0+cu128
Accelerate	1.13.0
Tokenizers	0.22.2

License

Apache 2.0

Downloads last month: 12

Safetensors

Model size

0.6B params

Tensor type

F32

Model tree for talentguide/talentclef-biencoder-v1

Base model

jjzha/esco-xlm-roberta-large

Finetuned

(4)

this model

Evaluation results

nDCG@10 on TalentCLEF 2026 Task B — Validation (304 queries, 9052 skills)
self-reported

0.483
MAP@100 on TalentCLEF 2026 Task B — Validation (304 queries, 9052 skills)
self-reported

0.182
MRR@10 on TalentCLEF 2026 Task B — Validation (304 queries, 9052 skills)
self-reported

0.666
Accuracy@1 on TalentCLEF 2026 Task B — Validation (304 queries, 9052 skills)
self-reported

0.510
Accuracy@10 on TalentCLEF 2026 Task B — Validation (304 queries, 9052 skills)
self-reported

0.947