llama-150M-20260205-original

This model is a fine-tuned version of llama_small_config.json on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 4.8044

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • gradient_accumulation_steps: 32
  • total_train_batch_size: 512
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 1000
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss
7.3861 0.0174 500 7.2470
6.6276 0.0347 1000 6.5153
6.1992 0.0521 1500 6.1106
5.9348 0.0695 2000 5.8573
5.7589 0.0868 2500 5.6832
5.6333 0.1042 3000 5.5625
5.539 0.1216 3500 5.4692
5.4692 0.1390 4000 5.3959
5.4019 0.1563 4500 5.3360
5.3538 0.1737 5000 5.2857
5.2978 0.1911 5500 5.2404
5.2713 0.2084 6000 5.2031
5.2348 0.2258 6500 5.1679
5.2008 0.2432 7000 5.1379
5.1738 0.2605 7500 5.1117
5.1508 0.2779 8000 5.0885
5.126 0.2953 8500 5.0637
5.1046 0.3126 9000 5.0443
5.0889 0.3300 9500 5.0262
5.0709 0.3474 10000 5.0100
5.0485 0.3647 10500 4.9933
5.036 0.3821 11000 4.9774
5.0213 0.3995 11500 4.9657
5.013 0.4169 12000 4.9505
4.9999 0.4342 12500 4.9409
4.9832 0.4516 13000 4.9282
4.972 0.4690 13500 4.9185
4.9657 0.4863 14000 4.9091
4.9626 0.5037 14500 4.9017
4.95 0.5211 15000 4.8932
4.9378 0.5384 15500 4.8848
4.9291 0.5558 16000 4.8785
4.9203 0.5732 16500 4.8714
4.916 0.5905 17000 4.8648
4.9126 0.6079 17500 4.8587
4.9044 0.6253 18000 4.8534
4.8978 0.6426 18500 4.8463
4.892 0.6600 19000 4.8434
4.889 0.6774 19500 4.8379
4.8845 0.6948 20000 4.8328
4.8827 0.7121 20500 4.8296
4.8752 0.7295 21000 4.8264
4.8756 0.7469 21500 4.8233
4.8726 0.7642 22000 4.8199
4.8639 0.7816 22500 4.8178
4.8673 0.7990 23000 4.8149
4.8619 0.8163 23500 4.8130
4.8619 0.8337 24000 4.8113
4.856 0.8511 24500 4.8094
4.853 0.8684 25000 4.8084
4.8569 0.8858 25500 4.8070
4.8537 0.9032 26000 4.8062
4.8551 0.9206 26500 4.8055
4.8551 0.9379 27000 4.8052
4.8507 0.9553 27500 4.8047
4.8568 0.9727 28000 4.8045
4.854 0.9900 28500 4.8044

Framework versions

  • Transformers 4.53.3
  • Pytorch 2.6.0+cu126
  • Datasets 3.5.0
  • Tokenizers 0.21.1
Downloads last month
306
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support