SkillScout Large — Job-to-Skill Dense Retriever

SkillScout Large is a dense bi-encoder for retrieving relevant skills from a job title.
Given a job title (e.g., "Data Scientist"), it encodes it into a 1024-dimensional embedding and retrieves the most semantically relevant skills from the ESCO skill gazetteer (9,052 skills) using cosine similarity.

This is Stage 1 of the TalentGuide two-stage job-skill matching pipeline, trained for TalentCLEF 2026 Task B.

Best pipeline result (TalentCLEF 2026 validation set):
nDCG@10 graded = 0.6896 · nDCG@10 binary = 0.7330
when combined with a fine-tuned cross-encoder re-ranker at blend α = 0.7.
Bi-encoder alone: nDCG@10 graded = 0.3621 · MAP = 0.4545


Model Summary

Property Value
Base model jjzha/esco-xlm-roberta-large
Architecture XLM-RoBERTa-large + mean pooling
Embedding dimension 1024
Max sequence length 64 tokens
Training loss Multiple Negatives Ranking (MNR)
Training pairs 93,720 (ESCO job–skill pairs, essential + optional)
Epochs 3
Best checkpoint Step 3500 (saved by validation nDCG@10)
Hardware NVIDIA RTX 3070 8GB · fp16 AMP

What is TalentCLEF Task B?

TalentCLEF 2026 Task B is a graded information-retrieval shared task:

  • Query: a job title (e.g., "Electrician")
  • Corpus: 9,052 ESCO skills (e.g., "install electric switches", "comply with electrical safety regulations")
  • Relevance levels:
    • 2 — Core skill (essential regardless of context)
    • 1 — Contextual skill (depends on employer / industry)
    • 0 — Non-relevant

Primary metric: nDCG with graded relevance (core=2, contextual=1)


Usage

Installation

pip install sentence-transformers faiss-cpu  # or faiss-gpu

Encode & Compare

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("talentguide/skillscout-large")

job    = "Data Scientist"
skills = ["data science", "machine learning", "install electric switches"]

embs   = model.encode([job] + skills, normalize_embeddings=True)
scores = embs[0] @ embs[1:].T

for skill, score in zip(skills, scores):
    print(f"{score:.3f}  {skill}")
# 0.872  data science
# 0.731  machine learning
# 0.112  install electric switches

Full Retrieval with FAISS (Recommended)

from sentence_transformers import SentenceTransformer
import faiss, numpy as np

model = SentenceTransformer("talentguide/skillscout-large")

# --- Build index once over your skill corpus ---
skill_texts = [...]   # list of skill names / descriptions

embs = model.encode(skill_texts, batch_size=128,
                    normalize_embeddings=True,
                    show_progress_bar=True).astype(np.float32)

index = faiss.IndexFlatIP(embs.shape[1])  # inner product on L2-normed = cosine
index.add(embs)

# --- Query at inference time ---
job_title = "Software Engineer"
q = model.encode([job_title], normalize_embeddings=True).astype(np.float32)

scores, idxs = index.search(q, k=50)
for rank, (idx, score) in enumerate(zip(idxs[0], scores[0]), 1):
    print(f"{rank:3d}. [{score:.4f}]  {skill_texts[idx]}")

Demo Output

Software Engineer
   1. [0.942]  define software architecture
   2. [0.938]  software frameworks
   3. [0.935]  create software design

Data Scientist
   1. [0.951]  data science
   2. [0.921]  establish data processes
   3. [0.919]  create data models

Electrician
   1. [0.944]  install electric switches
   2. [0.938]  install electricity sockets
   3. [0.930]  use electrical wire tools

Two-Stage Pipeline Integration

SkillScout Large is designed as Stage 1 — fast ANN retrieval.
For maximum ranking quality, pair it with a cross-encoder re-ranker:

Job title
   │
   ▼
[SkillScout Large]              ← this model
   │  top-200 candidates (FAISS ANN, ~40ms)
   ▼
[Cross-encoder re-ranker]
   │  fine-grained re-scoring of top-200
   ▼
Final ranked list  (graded: core > contextual > irrelevant)

Score blending (best result at α = 0.7):

final_score = alpha * biencoder_score + (1 - alpha) * crossencoder_score

Training Details

Data

Source: ESCO occupational ontology, TalentCLEF 2026 training split.

Count
Raw job–skill pairs (essential + optional) 114,699
ESCO jobs with aliases 3,039
ESCO skills with aliases 13,939
Training InputExamples (after canonical-pair inclusion) 93,720
Validation queries 304
Validation corpus (skills) 9,052
Validation relevance judgments 56,417

Essential pairs are included in full; optional skill pairs are downsampled to 50% of the essential count to maintain class balance.

Hyperparameters

Loss              : MultipleNegativesRankingLoss (scale=20, cos_sim)
Batch size        : 64  →  63 in-batch negatives per anchor
Epochs            : 3
Warmup            : 10% of total steps (~440 steps)
Optimizer         : AdamW (fused), lr=5e-5, linear decay
Precision         : fp16 (AMP)
Max seq length    : 64 tokens
Best model saved  : by cosine-nDCG@10 on validation (eval every 500 steps)
Seed              : 42

Training Curve

Epoch Step Train Loss nDCG@10 (val) MAP@100 (val)
0.34 500 2.9232 0.3430
0.68 1000 2.1179 0.3424
1.00 1465 0.3676 0.1758
1.37 2000 1.7070 0.3692
1.71 2500 1.6366 0.3744
2.00 2930 0.3717 0.1780
2.39 3500 1.4540 0.3769 0.1808

Best checkpoint saved at step 3500.

Validation Metrics (best checkpoint, binary relevance)

Metric Value
nDCG@10 0.4830
nDCG@50 0.4240
nDCG@100 0.3769
MAP@100 0.1825
MRR@10 0.6657
Accuracy@1 0.5099
Accuracy@3 0.7993
Accuracy@5 0.8914
Accuracy@10 0.9474

Evaluated with sentence_transformers.evaluation.InformationRetrievalEvaluator (binary: any qrel > 0 = relevant).

Pipeline Results (graded nDCG, full 9052-skill ranking, server-side)

Run nDCG@10 graded nDCG@10 binary MAP
Zero-shot jjzha/esco-xlm-roberta-large 0.2039 0.2853 0.2663
SkillScout Large (bi-encoder only) 0.3621 0.4830 0.4545
SkillScout Large + cross-encoder (α=0.7) 0.6896 0.7330 0.2481

Competitive Context (TalentCLEF 2025 Task B)

Team MAP (test) Approach
pjmathematician (winner 2025) 0.36 GTE 7B + contrastive + LLM-augmented data
NLPnorth (3rd of 14, 2025) 0.29 3-class discriminative classification
SkillScout Large (2026 val) 0.4545 MNR fine-tuned bi-encoder (Stage 1 only)

Limitations

  • English only — trained on ESCO EN labels.
  • ESCO-domain — optimised for the ESCO skill taxonomy; performance on other taxonomies (O*NET, custom) may vary without fine-tuning.
  • 64-token cap — long job descriptions should be reduced to a concise title before encoding.
  • Graded distinction — the bi-encoder alone does not reliably separate core (2) from contextual (1) skills; a cross-encoder re-ranker is needed for strong graded nDCG.

Citation

@misc{talentguide-skillscout-2026,
  title   = {SkillScout Large: Dense Job-to-Skill Retrieval for TalentCLEF 2026},
  author  = {TalentGuide},
  year    = {2026},
  url     = {https://huggingface.co/talentguide/skillscout-large}
}

@misc{talentclef2026taskb,
  title   = {TalentCLEF 2026 Task B: Job-Skill Matching},
  author  = {TalentCLEF Organizers},
  year    = {2026},
  url     = {https://talentclef.github.io/}
}

Framework Versions

Package Version
Python 3.12.10
sentence-transformers 5.3.0
transformers 5.5.0
PyTorch 2.11.0+cu128
Accelerate 1.13.0
Tokenizers 0.22.2

License

Apache 2.0

Downloads last month
12
Safetensors
Model size
0.6B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for talentguide/talentclef-biencoder-v1

Finetuned
(4)
this model

Evaluation results

  • nDCG@10 on TalentCLEF 2026 Task B — Validation (304 queries, 9052 skills)
    self-reported
    0.483
  • MAP@100 on TalentCLEF 2026 Task B — Validation (304 queries, 9052 skills)
    self-reported
    0.182
  • MRR@10 on TalentCLEF 2026 Task B — Validation (304 queries, 9052 skills)
    self-reported
    0.666
  • Accuracy@1 on TalentCLEF 2026 Task B — Validation (304 queries, 9052 skills)
    self-reported
    0.510
  • Accuracy@10 on TalentCLEF 2026 Task B — Validation (304 queries, 9052 skills)
    self-reported
    0.947