Envision Eye Imaging Classifier

SetFit few-shot classifier for identifying eye imaging datasets from scientific metadata.

Developed by: FAIR Data Innovations Hub in collaboration with the EyeACT Study

Model Description

Uses sentence-transformers/all-mpnet-base-v2 as backbone with 4-class classification:

  • EYE_IMAGING (3): Actual ophthalmic imaging datasets (fundus, OCT, OCTA, cornea)
  • EYE_SOFTWARE (2): Code, tools, models for eye imaging
  • EDGE_CASE (1): Eye research papers, reviews, non-imaging data
  • NEGATIVE (0): Not eye-related

Results on Zenodo

Tested on 515 Zenodo datasets (filtered to resource_type=dataset only):

Class Count
EYE_IMAGING 127
EYE_SOFTWARE 24
EDGE_CASE 32
NEGATIVE 331

Confidence Distribution (EYE_IMAGING)

Confidence Count %
High (>=0.95) 49 38.6%
Medium (0.80-0.95) 70 55.1%
Lower (<0.80) 8 6.3%

Data Pipeline

  • Scraped with datasets-only filter
  • ZIP contents inspected via HTTP Range requests (31,958 files catalogued)
  • Genomics files excluded (.fasta, .h5ad, .vcf, etc.)

Training

  • Base Model: sentence-transformers/all-mpnet-base-v2 (768-dimensional embeddings)
  • Examples: 474 (77 EYE_IMAGING, 48 EYE_SOFTWARE, 79 EDGE_CASE, 270 NEGATIVE)
  • Epochs: 2
  • Batch Size: 16

Validation

Held-out Test Set

Metric Value
Accuracy 0.937
Macro F1 0.902

Spot-Check Validation (33 records)

Metric Value
Accuracy 29/33 (87.9%)
Macro F1 0.828

Per-Class F1 (Spot-Check)

Class F1
EYE_IMAGING 0.947
EDGE_CASE 0.889
NEGATIVE 0.903
EYE_SOFTWARE 0.571

Usage

from sentence_transformers import SentenceTransformer
import joblib

model = SentenceTransformer("fairdataihub/envision-eye-imaging-classifier")
head = joblib.load("model_head.pkl")

embeddings = model.encode(["Retinal OCT dataset for diabetic retinopathy"])
predictions = head.predict(embeddings)

Citation

  • EyeACT Envision project
  • FAIR Data Innovations Hub (fairdataihub.org)
  • sentence-transformers/all-mpnet-base-v2

Contact

EyeACT team: eyeactstudy.org

Downloads last month
17
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support