llama-150M-20260205-original
This model is a fine-tuned version of llama_small_config.json on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 4.8044
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- gradient_accumulation_steps: 32
- total_train_batch_size: 512
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 1000
- num_epochs: 1
Training results
| Training Loss | Epoch | Step | Validation Loss |
|---|---|---|---|
| 7.3861 | 0.0174 | 500 | 7.2470 |
| 6.6276 | 0.0347 | 1000 | 6.5153 |
| 6.1992 | 0.0521 | 1500 | 6.1106 |
| 5.9348 | 0.0695 | 2000 | 5.8573 |
| 5.7589 | 0.0868 | 2500 | 5.6832 |
| 5.6333 | 0.1042 | 3000 | 5.5625 |
| 5.539 | 0.1216 | 3500 | 5.4692 |
| 5.4692 | 0.1390 | 4000 | 5.3959 |
| 5.4019 | 0.1563 | 4500 | 5.3360 |
| 5.3538 | 0.1737 | 5000 | 5.2857 |
| 5.2978 | 0.1911 | 5500 | 5.2404 |
| 5.2713 | 0.2084 | 6000 | 5.2031 |
| 5.2348 | 0.2258 | 6500 | 5.1679 |
| 5.2008 | 0.2432 | 7000 | 5.1379 |
| 5.1738 | 0.2605 | 7500 | 5.1117 |
| 5.1508 | 0.2779 | 8000 | 5.0885 |
| 5.126 | 0.2953 | 8500 | 5.0637 |
| 5.1046 | 0.3126 | 9000 | 5.0443 |
| 5.0889 | 0.3300 | 9500 | 5.0262 |
| 5.0709 | 0.3474 | 10000 | 5.0100 |
| 5.0485 | 0.3647 | 10500 | 4.9933 |
| 5.036 | 0.3821 | 11000 | 4.9774 |
| 5.0213 | 0.3995 | 11500 | 4.9657 |
| 5.013 | 0.4169 | 12000 | 4.9505 |
| 4.9999 | 0.4342 | 12500 | 4.9409 |
| 4.9832 | 0.4516 | 13000 | 4.9282 |
| 4.972 | 0.4690 | 13500 | 4.9185 |
| 4.9657 | 0.4863 | 14000 | 4.9091 |
| 4.9626 | 0.5037 | 14500 | 4.9017 |
| 4.95 | 0.5211 | 15000 | 4.8932 |
| 4.9378 | 0.5384 | 15500 | 4.8848 |
| 4.9291 | 0.5558 | 16000 | 4.8785 |
| 4.9203 | 0.5732 | 16500 | 4.8714 |
| 4.916 | 0.5905 | 17000 | 4.8648 |
| 4.9126 | 0.6079 | 17500 | 4.8587 |
| 4.9044 | 0.6253 | 18000 | 4.8534 |
| 4.8978 | 0.6426 | 18500 | 4.8463 |
| 4.892 | 0.6600 | 19000 | 4.8434 |
| 4.889 | 0.6774 | 19500 | 4.8379 |
| 4.8845 | 0.6948 | 20000 | 4.8328 |
| 4.8827 | 0.7121 | 20500 | 4.8296 |
| 4.8752 | 0.7295 | 21000 | 4.8264 |
| 4.8756 | 0.7469 | 21500 | 4.8233 |
| 4.8726 | 0.7642 | 22000 | 4.8199 |
| 4.8639 | 0.7816 | 22500 | 4.8178 |
| 4.8673 | 0.7990 | 23000 | 4.8149 |
| 4.8619 | 0.8163 | 23500 | 4.8130 |
| 4.8619 | 0.8337 | 24000 | 4.8113 |
| 4.856 | 0.8511 | 24500 | 4.8094 |
| 4.853 | 0.8684 | 25000 | 4.8084 |
| 4.8569 | 0.8858 | 25500 | 4.8070 |
| 4.8537 | 0.9032 | 26000 | 4.8062 |
| 4.8551 | 0.9206 | 26500 | 4.8055 |
| 4.8551 | 0.9379 | 27000 | 4.8052 |
| 4.8507 | 0.9553 | 27500 | 4.8047 |
| 4.8568 | 0.9727 | 28000 | 4.8045 |
| 4.854 | 0.9900 | 28500 | 4.8044 |
Framework versions
- Transformers 4.53.3
- Pytorch 2.6.0+cu126
- Datasets 3.5.0
- Tokenizers 0.21.1
- Downloads last month
- 306