SentenceTransformer based on BAAI/bge-m3
This is a sentence-transformers model finetuned from BAAI/bge-m3. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Deposium Benchmark Results (2026-02-25)
ONNX INT8 version available: tss-deposium/bge-m3-matryoshka-1024d-onnx-int8 — CPU-optimized, ~571 MB, same quality.
Why Matryoshka?
This model was fine-tuned with MatryoshkaLoss to allow safe dimension truncation. Instead of training separate models for each dimension, a single model produces embeddings where the first N dimensions form a valid, high-quality embedding space.
embedding = model.encode("Bonjour") # [1024] full
embedding_512 = embedding[:512] # [512] still good! (-5.4% quality, -50% storage)
Benchmark: Discrimination by Dimension
Tested on 4 semantic pairs (FR/EN cross-lingual) + 2 negative pairs. Discrimination = avg_positive_similarity - avg_negative_similarity (higher = better).
| Model | Dim | Discrim | vs M2V baseline | Storage (1M vec) |
|---|---|---|---|---|
| m2v-bge-m3-1024d (GPU) | 1024 | 0.312 | baseline | 4 GB |
| gpahal/bge-m3-onnx-int8 | 1024 | 0.377 | +20.8% | 4 GB |
| This model | 1024 | 0.403 | +29.2% | 4 GB |
| This model | 768 | 0.397 | +27.2% | 3 GB |
| This model | 512 | 0.381 | +22.1% | 2 GB |
| This model | 256 | anomalous | — | 1 GB |
Key finding: At 512D, this model STILL outperforms full-resolution bge-m3-onnx-int8 at 1024D (+1.1%), with half the storage.
Per-Pair Cosine Similarity
| Pair | onnx-1024 | matr-1024 | matr-768 | matr-512 | matr-256 |
|---|---|---|---|---|---|
| couple_serrage (FR/EN) | 0.351 | 0.284 | 0.321 | 0.371 | 0.197 |
| fogg_depart (FR/FR) | 0.837 | 0.866 | 0.873 | 0.878 | 0.825 |
| revenue_q2 (EN/FR) | 0.645 | 0.686 | 0.697 | 0.711 | 0.658 |
| moteur_spec (FR/EN) | 0.946 | 0.954 | 0.954 | 0.955 | 0.950 |
Recommended Dimensions
- 1024D: Maximum quality, use when storage is not a concern
- 768D: Safe (-1.4%), -25% storage. Good default.
- 512D: Best cost/quality ratio for cloud CPU. Still beats bge-m3-onnx at 1024D.
- 256D: Not recommended — loses cross-lingual understanding.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: BAAI/bge-m3
- Maximum Sequence Length: 8192 tokens
- Output Dimensionality: 1024 dimensions
- Similarity Function: Cosine Similarity
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 8192, 'do_lower_case': False, 'architecture': 'XLMRobertaModel'})
(1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
'Four hikers are walking up stairs on a small hill.',
'The people are outside.',
"Four people are shown in a gritty basement setting with blue walls and a white door on the ceiling; two of the people wear black t-shirts with a skull-and-crossbones and the words' starve poverty'.",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.6442, 0.2925],
# [0.6442, 1.0000, 0.2358],
# [0.2925, 0.2358, 1.0000]])
Training Details
Training Dataset
Unnamed Dataset
- Size: 672,676 training samples
- Columns:
anchorandpositive - Approximate statistics based on the first 1000 samples:
anchor positive type string string details - min: 5 tokens
- mean: 19.75 tokens
- max: 254 tokens
- min: 3 tokens
- mean: 34.59 tokens
- max: 499 tokens
- Samples:
anchor positive As the brown dog looks the other way, a large black and white dog plays with a smaller black dog.Three dogs are together somewhere.Two young men in black punk rock clothing are sitting on the floor at a playground.two men are sitting in a playgroundThe man is wearing a shirt.A man wearing a black t-shirt is playing seven string bass a stage. - Loss:
MatryoshkaLosswith these parameters:{ "loss": "MultipleNegativesRankingLoss", "matryoshka_dims": [ 1024, 768, 512, 256 ], "matryoshka_weights": [ 1, 1, 1, 1 ], "n_dims_per_step": -1 }
Evaluation Dataset
Unnamed Dataset
- Size: 35,405 evaluation samples
- Columns:
anchorandpositive - Approximate statistics based on the first 1000 samples:
anchor positive type string string details - min: 5 tokens
- mean: 19.06 tokens
- max: 122 tokens
- min: 4 tokens
- mean: 36.56 tokens
- max: 477 tokens
- Samples:
anchor positive A woman in a red dress and a man in a white suite are engaging in a ballet performance on a purple lit stage.A man and woman are dancing.A man wearing reflective sunglasses in a crowd.A man is wearing sunshades.Man with red shoes, white shirt and gray pants climbing.Man is climbing. - Loss:
MatryoshkaLosswith these parameters:{ "loss": "MultipleNegativesRankingLoss", "matryoshka_dims": [ 1024, 768, 512, 256 ], "matryoshka_weights": [ 1, 1, 1, 1 ], "n_dims_per_step": -1 }
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy: stepsper_device_train_batch_size: 32per_device_eval_batch_size: 32learning_rate: 2e-05warmup_ratio: 0.1bf16: Trueload_best_model_at_end: Truedataloader_pin_memory: Falsegradient_checkpointing: True
All Hyperparameters
Click to expand
overwrite_output_dir: Falsedo_predict: Falseeval_strategy: stepsprediction_loss_only: Trueper_device_train_batch_size: 32per_device_eval_batch_size: 32per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 2e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 3max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: Nonewarmup_ratio: 0.1warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falsebf16: Truefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Trueignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}parallelism_config: Nonedeepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torch_fusedoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthproject: huggingfacetrackio_space_id: trackioddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Falsedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsehub_revision: Nonegradient_checkpointing: Truegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters:auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: noneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseliger_kernel_config: Noneeval_use_gather_object: Falseaverage_tokens_across_devices: Trueprompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: proportionalrouter_mapping: {}learning_rate_mapping: {}
Training Logs
Click to expand
| Epoch | Step | Training Loss | Validation Loss |
|---|---|---|---|
| 0.0048 | 100 | 1.2365 | - |
| 0.0095 | 200 | 1.1962 | - |
| 0.0143 | 300 | 0.8874 | - |
| 0.0190 | 400 | 0.907 | - |
| 0.0238 | 500 | 0.8883 | 0.7493 |
| 0.0285 | 600 | 0.7945 | - |
| 0.0333 | 700 | 0.7591 | - |
| 0.0381 | 800 | 0.7265 | - |
| 0.0428 | 900 | 0.782 | - |
| 0.0476 | 1000 | 0.6898 | 0.6750 |
| 0.0523 | 1100 | 0.7668 | - |
| 0.0571 | 1200 | 0.839 | - |
| 0.0618 | 1300 | 0.6697 | - |
| 0.0666 | 1400 | 0.7014 | - |
| 0.0714 | 1500 | 0.741 | 0.6327 |
| 0.0761 | 1600 | 0.7033 | - |
| 0.0809 | 1700 | 0.7595 | - |
| 0.0856 | 1800 | 0.6845 | - |
| 0.0904 | 1900 | 0.7591 | - |
| 0.0951 | 2000 | 0.7153 | 0.6154 |
| 0.0999 | 2100 | 0.6861 | - |
| 0.1047 | 2200 | 0.7151 | - |
| 0.1094 | 2300 | 0.5997 | - |
| 0.1142 | 2400 | 0.633 | - |
| 0.1189 | 2500 | 0.6318 | 0.6044 |
| 0.1237 | 2600 | 0.6289 | - |
| 0.1284 | 2700 | 0.631 | - |
| 0.1332 | 2800 | 0.6986 | - |
| 0.1380 | 2900 | 0.6594 | - |
| 0.1427 | 3000 | 0.6295 | 0.5899 |
| 0.1475 | 3100 | 0.6494 | - |
| 0.1522 | 3200 | 0.6393 | - |
| 0.1570 | 3300 | 0.5855 | - |
| 0.1617 | 3400 | 0.6057 | - |
| 0.1665 | 3500 | 0.6009 | 0.5922 |
| 0.1712 | 3600 | 0.6307 | - |
| 0.1760 | 3700 | 0.6032 | - |
| 0.1808 | 3800 | 0.5862 | - |
| 0.1855 | 3900 | 0.6514 | - |
| 0.1903 | 4000 | 0.5814 | 0.5605 |
| 0.1950 | 4100 | 0.7021 | - |
| 0.1998 | 4200 | 0.5975 | - |
| 0.2045 | 4300 | 0.6037 | - |
| 0.2093 | 4400 | 0.5936 | - |
| 0.2141 | 4500 | 0.6214 | 0.5786 |
| 0.2188 | 4600 | 0.6136 | - |
| 0.2236 | 4700 | 0.5722 | - |
| 0.2283 | 4800 | 0.6056 | - |
| 0.2331 | 4900 | 0.5931 | - |
| 0.2378 | 5000 | 0.666 | 0.5665 |
| 0.2426 | 5100 | 0.5996 | - |
| 0.2474 | 5200 | 0.6105 | - |
| 0.2521 | 5300 | 0.6273 | - |
| 0.2569 | 5400 | 0.6868 | - |
| 0.2616 | 5500 | 0.5339 | 0.5799 |
| 0.2664 | 5600 | 0.6471 | - |
| 0.2711 | 5700 | 0.5705 | - |
| 0.2759 | 5800 | 0.6521 | - |
| 0.2807 | 5900 | 0.6084 | - |
| 0.2854 | 6000 | 0.616 | 0.5630 |
| 0.2902 | 6100 | 0.6128 | - |
| 0.2949 | 6200 | 0.5838 | - |
| 0.2997 | 6300 | 0.5029 | - |
| 0.3044 | 6400 | 0.623 | - |
| 0.3092 | 6500 | 0.5841 | 0.5566 |
| 0.3140 | 6600 | 0.5746 | - |
| 0.3187 | 6700 | 0.5202 | - |
| 0.3235 | 6800 | 0.5921 | - |
| 0.3282 | 6900 | 0.5642 | - |
| 0.3330 | 7000 | 0.6183 | 0.5364 |
| 0.3377 | 7100 | 0.5632 | - |
| 0.3425 | 7200 | 0.5062 | - |
| 0.3473 | 7300 | 0.4998 | - |
| 0.3520 | 7400 | 0.5703 | - |
| 0.3568 | 7500 | 0.5544 | 0.5469 |
| 0.3615 | 7600 | 0.5461 | - |
| 0.3663 | 7700 | 0.5716 | - |
| 0.3710 | 7800 | 0.5733 | - |
| 0.3758 | 7900 | 0.5549 | - |
| 0.3806 | 8000 | 0.5658 | 0.5347 |
| 0.3853 | 8100 | 0.5841 | - |
| 0.3901 | 8200 | 0.5051 | - |
| 0.3948 | 8300 | 0.4764 | - |
| 0.3996 | 8400 | 0.626 | - |
| 0.4043 | 8500 | 0.5284 | 0.5159 |
| 0.4091 | 8600 | 0.5733 | - |
| 0.4139 | 8700 | 0.5064 | - |
| 0.4186 | 8800 | 0.5758 | - |
| 0.4234 | 8900 | 0.5735 | - |
| 0.4281 | 9000 | 0.5811 | 0.4957 |
| 0.4329 | 9100 | 0.4942 | - |
| 0.4376 | 9200 | 0.5554 | - |
| 0.4424 | 9300 | 0.5678 | - |
| 0.4472 | 9400 | 0.529 | - |
| 0.4519 | 9500 | 0.4851 | 0.4828 |
| 0.4567 | 9600 | 0.4621 | - |
| 0.4614 | 9700 | 0.5172 | - |
| 0.4662 | 9800 | 0.4862 | - |
| 0.4709 | 9900 | 0.4796 | - |
| 0.4757 | 10000 | 0.4548 | 0.4830 |
| 0.4804 | 10100 | 0.492 | - |
| 0.4852 | 10200 | 0.4963 | - |
| 0.4900 | 10300 | 0.4963 | - |
| 0.4947 | 10400 | 0.4664 | - |
| 0.4995 | 10500 | 0.4786 | 0.4889 |
| 0.5042 | 10600 | 0.4631 | - |
| 0.5090 | 10700 | 0.4932 | - |
| 0.5137 | 10800 | 0.5028 | - |
| 0.5185 | 10900 | 0.4905 | - |
| 0.5233 | 11000 | 0.4544 | 0.4653 |
| 0.5280 | 11100 | 0.4681 | - |
| 0.5328 | 11200 | 0.5148 | - |
| 0.5375 | 11300 | 0.4606 | - |
| 0.5423 | 11400 | 0.4743 | - |
| 0.5470 | 11500 | 0.4904 | 0.4522 |
| 0.5518 | 11600 | 0.526 | - |
| 0.5566 | 11700 | 0.4677 | - |
| 0.5613 | 11800 | 0.4964 | - |
| 0.5661 | 11900 | 0.5397 | - |
| 0.5708 | 12000 | 0.5114 | 0.4529 |
| 0.5756 | 12100 | 0.4969 | - |
| 0.5803 | 12200 | 0.4959 | - |
| 0.5851 | 12300 | 0.4258 | - |
| 0.5899 | 12400 | 0.4875 | - |
| 0.5946 | 12500 | 0.4807 | 0.4374 |
| 0.5994 | 12600 | 0.4994 | - |
| 0.6041 | 12700 | 0.3952 | - |
| 0.6089 | 12800 | 0.4229 | - |
| 0.6136 | 12900 | 0.466 | - |
| 0.6184 | 13000 | 0.4637 | 0.4240 |
| 0.6232 | 13100 | 0.4637 | - |
| 0.6279 | 13200 | 0.4177 | - |
| 0.6327 | 13300 | 0.4338 | - |
| 0.6374 | 13400 | 0.4296 | - |
| 0.6422 | 13500 | 0.4401 | 0.4241 |
| 0.6469 | 13600 | 0.4643 | - |
| 0.6517 | 13700 | 0.3955 | - |
| 0.6565 | 13800 | 0.4819 | - |
| 0.6612 | 13900 | 0.4793 | - |
| 0.6660 | 14000 | 0.458 | 0.4321 |
| 0.6707 | 14100 | 0.4382 | - |
| 0.6755 | 14200 | 0.4201 | - |
| 0.6802 | 14300 | 0.4567 | - |
| 0.6850 | 14400 | 0.4488 | - |
| 0.6898 | 14500 | 0.399 | 0.4185 |
| 0.6945 | 14600 | 0.3928 | - |
| 0.6993 | 14700 | 0.4477 | - |
| 0.7040 | 14800 | 0.4592 | - |
| 0.7088 | 14900 | 0.393 | - |
| 0.7135 | 15000 | 0.4394 | 0.4024 |
| 0.7183 | 15100 | 0.4117 | - |
| 0.7231 | 15200 | 0.3872 | - |
| 0.7278 | 15300 | 0.4194 | - |
| 0.7326 | 15400 | 0.3954 | - |
| 0.7373 | 15500 | 0.4439 | 0.3979 |
| 0.7421 | 15600 | 0.3534 | - |
| 0.7468 | 15700 | 0.4407 | - |
| 0.7516 | 15800 | 0.4586 | - |
| 0.7564 | 15900 | 0.3718 | - |
| 0.7611 | 16000 | 0.449 | 0.3999 |
| 0.7659 | 16100 | 0.4213 | - |
| 0.7706 | 16200 | 0.4192 | - |
| 0.7754 | 16300 | 0.4121 | - |
| 0.7801 | 16400 | 0.3409 | - |
| 0.7849 | 16500 | 0.388 | 0.3905 |
| 0.7896 | 16600 | 0.3648 | - |
| 0.7944 | 16700 | 0.4352 | - |
| 0.7992 | 16800 | 0.424 | - |
| 0.8039 | 16900 | 0.4363 | - |
| 0.8087 | 17000 | 0.426 | 0.3969 |
| 0.8134 | 17100 | 0.5142 | - |
| 0.8182 | 17200 | 0.3944 | - |
| 0.8229 | 17300 | 0.4604 | - |
| 0.8277 | 17400 | 0.3765 | - |
| 0.8325 | 17500 | 0.4707 | 0.3765 |
| 0.8372 | 17600 | 0.3848 | - |
| 0.8420 | 17700 | 0.3869 | - |
| 0.8467 | 17800 | 0.4391 | - |
| 0.8515 | 17900 | 0.4037 | - |
| 0.8562 | 18000 | 0.3394 | 0.3758 |
| 0.8610 | 18100 | 0.3987 | - |
| 0.8658 | 18200 | 0.3238 | - |
| 0.8705 | 18300 | 0.4504 | - |
| 0.8753 | 18400 | 0.4041 | - |
| 0.8800 | 18500 | 0.3812 | 0.3778 |
| 0.8848 | 18600 | 0.3602 | - |
| 0.8895 | 18700 | 0.3782 | - |
| 0.8943 | 18800 | 0.3781 | - |
| 0.8991 | 18900 | 0.4069 | - |
| 0.9038 | 19000 | 0.3682 | 0.3691 |
| 0.9086 | 19100 | 0.4038 | - |
| 0.9133 | 19200 | 0.3652 | - |
| 0.9181 | 19300 | 0.3383 | - |
| 0.9228 | 19400 | 0.4312 | - |
| 0.9276 | 19500 | 0.3837 | 0.3660 |
| 0.9324 | 19600 | 0.3733 | - |
| 0.9371 | 19700 | 0.3542 | - |
| 0.9419 | 19800 | 0.406 | - |
| 0.9466 | 19900 | 0.3632 | - |
| 0.9514 | 20000 | 0.3984 | 0.3783 |
| 0.9561 | 20100 | 0.3984 | - |
| 0.9609 | 20200 | 0.38 | - |
| 0.9657 | 20300 | 0.388 | - |
| 0.9704 | 20400 | 0.3766 | - |
| 0.9752 | 20500 | 0.3298 | 0.3498 |
| 0.9799 | 20600 | 0.3308 | - |
| 0.9847 | 20700 | 0.3884 | - |
| 0.9894 | 20800 | 0.3674 | - |
| 0.9942 | 20900 | 0.4107 | - |
| 0.9990 | 21000 | 0.3739 | 0.3513 |
| 1.0037 | 21100 | 0.3398 | - |
| 1.0085 | 21200 | 0.3711 | - |
| 1.0132 | 21300 | 0.265 | - |
| 1.0180 | 21400 | 0.3464 | - |
| 1.0227 | 21500 | 0.3265 | 0.3463 |
| 1.0275 | 21600 | 0.274 | - |
| 1.0323 | 21700 | 0.3063 | - |
| 1.0370 | 21800 | 0.2679 | - |
| 1.0418 | 21900 | 0.3099 | - |
| 1.0465 | 22000 | 0.35 | 0.3533 |
| 1.0513 | 22100 | 0.3021 | - |
| 1.0560 | 22200 | 0.3505 | - |
| 1.0608 | 22300 | 0.2589 | - |
| 1.0656 | 22400 | 0.3791 | - |
| 1.0703 | 22500 | 0.3113 | 0.3460 |
| 1.0751 | 22600 | 0.3624 | - |
| 1.0798 | 22700 | 0.3676 | - |
| 1.0846 | 22800 | 0.3194 | - |
| 1.0893 | 22900 | 0.343 | - |
| 1.0941 | 23000 | 0.3446 | 0.3370 |
| 1.0988 | 23100 | 0.4403 | - |
| 1.1036 | 23200 | 0.2646 | - |
| 1.1084 | 23300 | 0.3115 | - |
| 1.1131 | 23400 | 0.3024 | - |
| 1.1179 | 23500 | 0.3613 | 0.3407 |
| 1.1226 | 23600 | 0.306 | - |
| 1.1274 | 23700 | 0.298 | - |
| 1.1321 | 23800 | 0.3751 | - |
| 1.1369 | 23900 | 0.288 | - |
| 1.1417 | 24000 | 0.2877 | 0.3472 |
| 1.1464 | 24100 | 0.2986 | - |
| 1.1512 | 24200 | 0.3018 | - |
| 1.1559 | 24300 | 0.3603 | - |
| 1.1607 | 24400 | 0.3413 | - |
| 1.1654 | 24500 | 0.3171 | 0.3266 |
| 1.1702 | 24600 | 0.3096 | - |
| 1.1750 | 24700 | 0.348 | - |
| 1.1797 | 24800 | 0.3971 | - |
| 1.1845 | 24900 | 0.3127 | - |
| 1.1892 | 25000 | 0.3162 | 0.3256 |
| 1.1940 | 25100 | 0.3035 | - |
| 1.1987 | 25200 | 0.2711 | - |
| 1.2035 | 25300 | 0.2615 | - |
| 1.2083 | 25400 | 0.301 | - |
| 1.2130 | 25500 | 0.3146 | 0.3297 |
| 1.2178 | 25600 | 0.3389 | - |
| 1.2225 | 25700 | 0.3027 | - |
| 1.2273 | 25800 | 0.329 | - |
| 1.2320 | 25900 | 0.3478 | - |
| 1.2368 | 26000 | 0.2924 | 0.3179 |
| 1.2416 | 26100 | 0.331 | - |
| 1.2463 | 26200 | 0.3109 | - |
| 1.2511 | 26300 | 0.3033 | - |
| 1.2558 | 26400 | 0.2905 | - |
| 1.2606 | 26500 | 0.2989 | 0.3228 |
| 1.2653 | 26600 | 0.3156 | - |
| 1.2701 | 26700 | 0.3124 | - |
| 1.2749 | 26800 | 0.3052 | - |
| 1.2796 | 26900 | 0.272 | - |
| 1.2844 | 27000 | 0.3114 | 0.3213 |
| 1.2891 | 27100 | 0.3205 | - |
| 1.2939 | 27200 | 0.279 | - |
| 1.2986 | 27300 | 0.2678 | - |
| 1.3034 | 27400 | 0.2663 | - |
| 1.3082 | 27500 | 0.2885 | 0.3159 |
| 1.3129 | 27600 | 0.2889 | - |
| 1.3177 | 27700 | 0.327 | - |
| 1.3224 | 27800 | 0.3169 | - |
| 1.3272 | 27900 | 0.3398 | - |
| 1.3319 | 28000 | 0.2835 | 0.3097 |
| 1.3367 | 28100 | 0.3434 | - |
| 1.3415 | 28200 | 0.2885 | - |
| 1.3462 | 28300 | 0.3164 | - |
| 1.3510 | 28400 | 0.3618 | - |
| 1.3557 | 28500 | 0.26 | 0.3106 |
| 1.3605 | 28600 | 0.2671 | - |
| 1.3652 | 28700 | 0.2745 | - |
| 1.3700 | 28800 | 0.2531 | - |
| 1.3748 | 28900 | 0.2954 | - |
| 1.3795 | 29000 | 0.2679 | 0.3106 |
| 1.3843 | 29100 | 0.3344 | - |
| 1.3890 | 29200 | 0.3315 | - |
| 1.3938 | 29300 | 0.2603 | - |
| 1.3985 | 29400 | 0.2822 | - |
| 1.4033 | 29500 | 0.3416 | 0.3012 |
| 1.4080 | 29600 | 0.3274 | - |
| 1.4128 | 29700 | 0.3179 | - |
| 1.4176 | 29800 | 0.2861 | - |
| 1.4223 | 29900 | 0.2574 | - |
| 1.4271 | 30000 | 0.2261 | 0.3072 |
| 1.4318 | 30100 | 0.318 | - |
| 1.4366 | 30200 | 0.2942 | - |
| 1.4413 | 30300 | 0.2831 | - |
| 1.4461 | 30400 | 0.2801 | - |
| 1.4509 | 30500 | 0.2433 | 0.2977 |
| 1.4556 | 30600 | 0.2805 | - |
| 1.4604 | 30700 | 0.2909 | - |
| 1.4651 | 30800 | 0.2996 | - |
| 1.4699 | 30900 | 0.2945 | - |
| 1.4746 | 31000 | 0.2686 | 0.2890 |
| 1.4794 | 31100 | 0.2498 | - |
| 1.4842 | 31200 | 0.3214 | - |
| 1.4889 | 31300 | 0.281 | - |
| 1.4937 | 31400 | 0.25 | - |
| 1.4984 | 31500 | 0.2648 | 0.2862 |
| 1.5032 | 31600 | 0.297 | - |
| 1.5079 | 31700 | 0.298 | - |
| 1.5127 | 31800 | 0.2675 | - |
| 1.5175 | 31900 | 0.268 | - |
| 1.5222 | 32000 | 0.2662 | 0.2827 |
| 1.5270 | 32100 | 0.2227 | - |
| 1.5317 | 32200 | 0.2764 | - |
| 1.5365 | 32300 | 0.2499 | - |
| 1.5412 | 32400 | 0.2789 | - |
| 1.5460 | 32500 | 0.2522 | 0.2806 |
| 1.5508 | 32600 | 0.3053 | - |
| 1.5555 | 32700 | 0.2367 | - |
| 1.5603 | 32800 | 0.3354 | - |
| 1.5650 | 32900 | 0.2504 | - |
| 1.5698 | 33000 | 0.2766 | 0.2782 |
| 1.5745 | 33100 | 0.2338 | - |
| 1.5793 | 33200 | 0.2539 | - |
| 1.5841 | 33300 | 0.3004 | - |
| 1.5888 | 33400 | 0.2705 | - |
| 1.5936 | 33500 | 0.2613 | 0.2759 |
| 1.5983 | 33600 | 0.2618 | - |
| 1.6031 | 33700 | 0.2459 | - |
| 1.6078 | 33800 | 0.2349 | - |
| 1.6126 | 33900 | 0.3481 | - |
| 1.6174 | 34000 | 0.2243 | 0.2793 |
| 1.6221 | 34100 | 0.2564 | - |
| 1.6269 | 34200 | 0.2643 | - |
| 1.6316 | 34300 | 0.356 | - |
| 1.6364 | 34400 | 0.2273 | - |
| 1.6411 | 34500 | 0.2577 | 0.2724 |
| 1.6459 | 34600 | 0.2958 | - |
| 1.6507 | 34700 | 0.2778 | - |
| 1.6554 | 34800 | 0.2608 | - |
| 1.6602 | 34900 | 0.2667 | - |
| 1.6649 | 35000 | 0.2377 | 0.2736 |
| 1.6697 | 35100 | 0.2787 | - |
| 1.6744 | 35200 | 0.2698 | - |
| 1.6792 | 35300 | 0.2505 | - |
| 1.6840 | 35400 | 0.2794 | - |
| 1.6887 | 35500 | 0.2382 | 0.2656 |
| 1.6935 | 35600 | 0.2373 | - |
| 1.6982 | 35700 | 0.2303 | - |
| 1.7030 | 35800 | 0.1983 | - |
| 1.7077 | 35900 | 0.29 | - |
| 1.7125 | 36000 | 0.2608 | 0.2707 |
| 1.7172 | 36100 | 0.2978 | - |
| 1.7220 | 36200 | 0.3158 | - |
| 1.7268 | 36300 | 0.2679 | - |
| 1.7315 | 36400 | 0.245 | - |
| 1.7363 | 36500 | 0.2423 | 0.2768 |
| 1.7410 | 36600 | 0.2301 | - |
| 1.7458 | 36700 | 0.2189 | - |
| 1.7505 | 36800 | 0.2335 | - |
| 1.7553 | 36900 | 0.2773 | - |
| 1.7601 | 37000 | 0.2448 | 0.2723 |
| 1.7648 | 37100 | 0.2404 | - |
| 1.7696 | 37200 | 0.2733 | - |
| 1.7743 | 37300 | 0.2075 | - |
| 1.7791 | 37400 | 0.2489 | - |
| 1.7838 | 37500 | 0.2678 | 0.2564 |
| 1.7886 | 37600 | 0.2473 | - |
| 1.7934 | 37700 | 0.2401 | - |
| 1.7981 | 37800 | 0.2334 | - |
| 1.8029 | 37900 | 0.2712 | - |
| 1.8076 | 38000 | 0.2631 | 0.2622 |
| 1.8124 | 38100 | 0.2375 | - |
| 1.8171 | 38200 | 0.2644 | - |
| 1.8219 | 38300 | 0.2028 | - |
| 1.8267 | 38400 | 0.2653 | - |
| 1.8314 | 38500 | 0.2161 | 0.2590 |
| 1.8362 | 38600 | 0.2494 | - |
| 1.8409 | 38700 | 0.2457 | - |
| 1.8457 | 38800 | 0.2316 | - |
| 1.8504 | 38900 | 0.1991 | - |
| 1.8552 | 39000 | 0.2342 | 0.2565 |
| 1.8600 | 39100 | 0.2326 | - |
| 1.8647 | 39200 | 0.25 | - |
| 1.8695 | 39300 | 0.237 | - |
| 1.8742 | 39400 | 0.2329 | - |
| 1.8790 | 39500 | 0.2613 | 0.2566 |
| 1.8837 | 39600 | 0.2363 | - |
| 1.8885 | 39700 | 0.2362 | - |
| 1.8933 | 39800 | 0.2354 | - |
| 1.8980 | 39900 | 0.2374 | - |
| 1.9028 | 40000 | 0.2586 | 0.2545 |
| 1.9075 | 40100 | 0.2231 | - |
| 1.9123 | 40200 | 0.2653 | - |
| 1.9170 | 40300 | 0.2537 | - |
| 1.9218 | 40400 | 0.206 | - |
| 1.9266 | 40500 | 0.2342 | 0.2506 |
| 1.9313 | 40600 | 0.2343 | - |
| 1.9361 | 40700 | 0.2034 | - |
| 1.9408 | 40800 | 0.2383 | - |
| 1.9456 | 40900 | 0.2805 | - |
| 1.9503 | 41000 | 0.2499 | 0.2474 |
| 1.9551 | 41100 | 0.3116 | - |
| 1.9599 | 41200 | 0.2522 | - |
| 1.9646 | 41300 | 0.2264 | - |
| 1.9694 | 41400 | 0.2398 | - |
| 1.9741 | 41500 | 0.2239 | 0.2487 |
| 1.9789 | 41600 | 0.2299 | - |
| 1.9836 | 41700 | 0.2262 | - |
| 1.9884 | 41800 | 0.2522 | - |
| 1.9932 | 41900 | 0.2332 | - |
| 1.9979 | 42000 | 0.2221 | 0.2487 |
| 2.0027 | 42100 | 0.245 | - |
| 2.0074 | 42200 | 0.1898 | - |
| 2.0122 | 42300 | 0.2015 | - |
| 2.0169 | 42400 | 0.2135 | - |
| 2.0217 | 42500 | 0.2153 | 0.2395 |
| 2.0264 | 42600 | 0.1568 | - |
| 2.0312 | 42700 | 0.2178 | - |
| 2.0360 | 42800 | 0.1757 | - |
| 2.0407 | 42900 | 0.239 | - |
| 2.0455 | 43000 | 0.1538 | 0.2430 |
| 2.0502 | 43100 | 0.1727 | - |
| 2.0550 | 43200 | 0.153 | - |
| 2.0597 | 43300 | 0.1773 | - |
| 2.0645 | 43400 | 0.1752 | - |
| 2.0693 | 43500 | 0.1586 | 0.2416 |
| 2.0740 | 43600 | 0.2497 | - |
| 2.0788 | 43700 | 0.217 | - |
| 2.0835 | 43800 | 0.2227 | - |
| 2.0883 | 43900 | 0.1811 | - |
| 2.0930 | 44000 | 0.2125 | 0.2422 |
| 2.0978 | 44100 | 0.2005 | - |
| 2.1026 | 44200 | 0.1776 | - |
| 2.1073 | 44300 | 0.186 | - |
| 2.1121 | 44400 | 0.2546 | - |
| 2.1168 | 44500 | 0.1598 | 0.2377 |
| 2.1216 | 44600 | 0.2231 | - |
| 2.1263 | 44700 | 0.1524 | - |
| 2.1311 | 44800 | 0.1786 | - |
| 2.1359 | 44900 | 0.1788 | - |
| 2.1406 | 45000 | 0.2073 | 0.2372 |
| 2.1454 | 45100 | 0.1347 | - |
| 2.1501 | 45200 | 0.1523 | - |
| 2.1549 | 45300 | 0.2168 | - |
| 2.1596 | 45400 | 0.1498 | - |
| 2.1644 | 45500 | 0.2213 | 0.2299 |
| 2.1692 | 45600 | 0.1809 | - |
| 2.1739 | 45700 | 0.1969 | - |
| 2.1787 | 45800 | 0.2001 | - |
| 2.1834 | 45900 | 0.2014 | - |
| 2.1882 | 46000 | 0.1711 | 0.2328 |
| 2.1929 | 46100 | 0.2257 | - |
| 2.1977 | 46200 | 0.1634 | - |
| 2.2025 | 46300 | 0.1698 | - |
| 2.2072 | 46400 | 0.1837 | - |
| 2.2120 | 46500 | 0.1665 | 0.2330 |
| 2.2167 | 46600 | 0.197 | - |
| 2.2215 | 46700 | 0.1567 | - |
| 2.2262 | 46800 | 0.1762 | - |
| 2.2310 | 46900 | 0.1646 | - |
| 2.2358 | 47000 | 0.2108 | 0.2322 |
| 2.2405 | 47100 | 0.2234 | - |
| 2.2453 | 47200 | 0.2163 | - |
| 2.2500 | 47300 | 0.188 | - |
| 2.2548 | 47400 | 0.1846 | - |
| 2.2595 | 47500 | 0.1794 | 0.2273 |
| 2.2643 | 47600 | 0.2637 | - |
| 2.2691 | 47700 | 0.1596 | - |
| 2.2738 | 47800 | 0.1676 | - |
| 2.2786 | 47900 | 0.2099 | - |
| 2.2833 | 48000 | 0.2002 | 0.2298 |
| 2.2881 | 48100 | 0.153 | - |
| 2.2928 | 48200 | 0.2079 | - |
| 2.2976 | 48300 | 0.2117 | - |
| 2.3023 | 48400 | 0.2472 | - |
| 2.3071 | 48500 | 0.1786 | 0.2223 |
| 2.3119 | 48600 | 0.1416 | - |
| 2.3166 | 48700 | 0.1869 | - |
| 2.3214 | 48800 | 0.185 | - |
| 2.3261 | 48900 | 0.1763 | - |
| 2.3309 | 49000 | 0.1533 | 0.2245 |
| 2.3356 | 49100 | 0.1856 | - |
| 2.3404 | 49200 | 0.2195 | - |
| 2.3452 | 49300 | 0.1748 | - |
| 2.3499 | 49400 | 0.1773 | - |
| 2.3547 | 49500 | 0.1546 | 0.2228 |
| 2.3594 | 49600 | 0.1543 | - |
| 2.3642 | 49700 | 0.208 | - |
| 2.3689 | 49800 | 0.1735 | - |
| 2.3737 | 49900 | 0.1463 | - |
| 2.3785 | 50000 | 0.2065 | 0.2225 |
| 2.3832 | 50100 | 0.1651 | - |
| 2.3880 | 50200 | 0.2091 | - |
| 2.3927 | 50300 | 0.1427 | - |
| 2.3975 | 50400 | 0.2033 | - |
| 2.4022 | 50500 | 0.1541 | 0.2206 |
| 2.4070 | 50600 | 0.1508 | - |
| 2.4118 | 50700 | 0.1693 | - |
| 2.4165 | 50800 | 0.2133 | - |
| 2.4213 | 50900 | 0.1709 | - |
| 2.4260 | 51000 | 0.1339 | 0.2209 |
| 2.4308 | 51100 | 0.1961 | - |
| 2.4355 | 51200 | 0.1569 | - |
| 2.4403 | 51300 | 0.1595 | - |
| 2.4451 | 51400 | 0.2285 | - |
| 2.4498 | 51500 | 0.1765 | 0.2175 |
| 2.4546 | 51600 | 0.1913 | - |
| 2.4593 | 51700 | 0.2017 | - |
| 2.4641 | 51800 | 0.158 | - |
| 2.4688 | 51900 | 0.2082 | - |
| 2.4736 | 52000 | 0.244 | 0.2116 |
| 2.4784 | 52100 | 0.1674 | - |
| 2.4831 | 52200 | 0.192 | - |
| 2.4879 | 52300 | 0.1793 | - |
| 2.4926 | 52400 | 0.1776 | - |
| 2.4974 | 52500 | 0.1644 | 0.2125 |
| 2.5021 | 52600 | 0.1668 | - |
| 2.5069 | 52700 | 0.2223 | - |
| 2.5117 | 52800 | 0.1969 | - |
| 2.5164 | 52900 | 0.2236 | - |
| 2.5212 | 53000 | 0.1869 | 0.2099 |
| 2.5259 | 53100 | 0.1664 | - |
| 2.5307 | 53200 | 0.1799 | - |
| 2.5354 | 53300 | 0.177 | - |
| 2.5402 | 53400 | 0.1515 | - |
| 2.5450 | 53500 | 0.1993 | 0.2111 |
| 2.5497 | 53600 | 0.163 | - |
| 2.5545 | 53700 | 0.1992 | - |
| 2.5592 | 53800 | 0.1932 | - |
| 2.5640 | 53900 | 0.1957 | - |
| 2.5687 | 54000 | 0.1464 | 0.2107 |
| 2.5735 | 54100 | 0.1961 | - |
| 2.5783 | 54200 | 0.2057 | - |
| 2.5830 | 54300 | 0.1703 | - |
| 2.5878 | 54400 | 0.1883 | - |
| 2.5925 | 54500 | 0.2052 | 0.2103 |
| 2.5973 | 54600 | 0.1601 | - |
| 2.6020 | 54700 | 0.1901 | - |
| 2.6068 | 54800 | 0.162 | - |
| 2.6115 | 54900 | 0.1765 | - |
| 2.6163 | 55000 | 0.1397 | 0.2103 |
| 2.6211 | 55100 | 0.1881 | - |
| 2.6258 | 55200 | 0.1562 | - |
| 2.6306 | 55300 | 0.1752 | - |
| 2.6353 | 55400 | 0.2074 | - |
| 2.6401 | 55500 | 0.1504 | 0.2098 |
| 2.6448 | 55600 | 0.1816 | - |
| 2.6496 | 55700 | 0.1811 | - |
| 2.6544 | 55800 | 0.1881 | - |
| 2.6591 | 55900 | 0.2019 | - |
| 2.6639 | 56000 | 0.2076 | 0.2097 |
| 2.6686 | 56100 | 0.2108 | - |
| 2.6734 | 56200 | 0.2011 | - |
| 2.6781 | 56300 | 0.1642 | - |
| 2.6829 | 56400 | 0.2325 | - |
| 2.6877 | 56500 | 0.1844 | 0.2069 |
| 2.6924 | 56600 | 0.1617 | - |
| 2.6972 | 56700 | 0.1693 | - |
| 2.7019 | 56800 | 0.1617 | - |
| 2.7067 | 56900 | 0.197 | - |
| 2.7114 | 57000 | 0.2182 | 0.2066 |
| 2.7162 | 57100 | 0.1724 | - |
| 2.7210 | 57200 | 0.1773 | - |
| 2.7257 | 57300 | 0.1532 | - |
| 2.7305 | 57400 | 0.2125 | - |
| 2.7352 | 57500 | 0.1384 | 0.2056 |
| 2.7400 | 57600 | 0.1366 | - |
| 2.7447 | 57700 | 0.1943 | - |
| 2.7495 | 57800 | 0.1869 | - |
| 2.7543 | 57900 | 0.1785 | - |
| 2.7590 | 58000 | 0.1752 | 0.2059 |
| 2.7638 | 58100 | 0.1643 | - |
| 2.7685 | 58200 | 0.2154 | - |
| 2.7733 | 58300 | 0.2041 | - |
| 2.7780 | 58400 | 0.1911 | - |
| 2.7828 | 58500 | 0.1547 | 0.2060 |
| 2.7876 | 58600 | 0.1314 | - |
| 2.7923 | 58700 | 0.1906 | - |
| 2.7971 | 58800 | 0.226 | - |
| 2.8018 | 58900 | 0.1612 | - |
| 2.8066 | 59000 | 0.1823 | 0.2045 |
| 2.8113 | 59100 | 0.1688 | - |
| 2.8161 | 59200 | 0.1754 | - |
| 2.8209 | 59300 | 0.1451 | - |
| 2.8256 | 59400 | 0.1564 | - |
| 2.8304 | 59500 | 0.2103 | 0.2042 |
| 2.8351 | 59600 | 0.1653 | - |
| 2.8399 | 59700 | 0.1812 | - |
| 2.8446 | 59800 | 0.1992 | - |
| 2.8494 | 59900 | 0.1727 | - |
| 2.8542 | 60000 | 0.1489 | 0.2036 |
| 2.8589 | 60100 | 0.2228 | - |
| 2.8637 | 60200 | 0.1926 | - |
| 2.8684 | 60300 | 0.2053 | - |
| 2.8732 | 60400 | 0.1613 | - |
| 2.8779 | 60500 | 0.1553 | 0.2027 |
| 2.8827 | 60600 | 0.1684 | - |
| 2.8875 | 60700 | 0.1974 | - |
| 2.8922 | 60800 | 0.1759 | - |
| 2.8970 | 60900 | 0.1824 | - |
| 2.9017 | 61000 | 0.1449 | 0.2020 |
| 2.9065 | 61100 | 0.1558 | - |
| 2.9112 | 61200 | 0.1811 | - |
| 2.9160 | 61300 | 0.2124 | - |
| 2.9207 | 61400 | 0.1776 | - |
| 2.9255 | 61500 | 0.1921 | 0.2009 |
| 2.9303 | 61600 | 0.2143 | - |
| 2.9350 | 61700 | 0.2309 | - |
| 2.9398 | 61800 | 0.1468 | - |
| 2.9445 | 61900 | 0.134 | - |
| 2.9493 | 62000 | 0.1477 | 0.2009 |
| 2.9540 | 62100 | 0.1731 | - |
| 2.9588 | 62200 | 0.1427 | - |
| 2.9636 | 62300 | 0.1554 | - |
| 2.9683 | 62400 | 0.1566 | - |
| 2.9731 | 62500 | 0.1616 | 0.2011 |
| 2.9778 | 62600 | 0.1648 | - |
| 2.9826 | 62700 | 0.2204 | - |
| 2.9873 | 62800 | 0.149 | - |
| 2.9921 | 62900 | 0.2051 | - |
| 2.9969 | 63000 | 0.151 | 0.2008 |
- The bold row denotes the saved checkpoint.
Framework Versions
- Python: 3.12.12
- Sentence Transformers: 5.2.3
- Transformers: 4.57.6
- PyTorch: 2.10.0+cu128
- Accelerate: 1.12.0
- Datasets: 4.0.0
- Tokenizers: 0.22.2
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MatryoshkaLoss
@misc{kusupati2024matryoshka,
title={Matryoshka Representation Learning},
author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
year={2024},
eprint={2205.13147},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
MultipleNegativesRankingLoss
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
- Downloads last month
- 2