opt-babylm1-randomremoval_seed-211_5e-6
This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 2.9412
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 32
- eval_batch_size: 64
- seed: 211
- gradient_accumulation_steps: 8
- total_train_batch_size: 256
- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 20.0
Training results
| Training Loss | Epoch | Step | Validation Loss |
|---|---|---|---|
| 5.2735 | 0.4236 | 1000 | 5.2398 |
| 4.6857 | 0.8472 | 2000 | 4.6730 |
| 4.2047 | 1.2707 | 3000 | 4.1978 |
| 3.823 | 1.6943 | 4000 | 3.8052 |
| 3.5292 | 2.1178 | 5000 | 3.5258 |
| 3.4022 | 2.5414 | 6000 | 3.3899 |
| 3.2996 | 2.9649 | 7000 | 3.2950 |
| 3.2152 | 3.3884 | 8000 | 3.2340 |
| 3.1702 | 3.8120 | 9000 | 3.1864 |
| 3.108 | 4.2355 | 10000 | 3.1606 |
| 3.0887 | 4.6591 | 11000 | 3.1228 |
| 3.0226 | 5.0826 | 12000 | 3.1032 |
| 3.0248 | 5.5062 | 13000 | 3.0780 |
| 3.0111 | 5.9298 | 14000 | 3.0611 |
| 2.9725 | 6.3533 | 15000 | 3.0511 |
| 2.9622 | 6.7769 | 16000 | 3.0339 |
| 2.9145 | 7.2004 | 17000 | 3.0308 |
| 2.9295 | 7.6240 | 18000 | 3.0180 |
| 2.8761 | 8.0474 | 19000 | 3.0116 |
| 2.8902 | 8.4710 | 20000 | 3.0033 |
| 2.888 | 8.8946 | 21000 | 2.9931 |
| 2.8518 | 9.3181 | 22000 | 2.9897 |
| 2.8576 | 9.7417 | 23000 | 2.9821 |
| 2.8181 | 10.1652 | 24000 | 2.9829 |
| 2.8314 | 10.5888 | 25000 | 2.9756 |
| 2.8233 | 11.0123 | 26000 | 2.9747 |
| 2.8083 | 11.4359 | 27000 | 2.9684 |
| 2.8122 | 11.8595 | 28000 | 2.9641 |
| 2.7739 | 12.2830 | 29000 | 2.9651 |
| 2.7861 | 12.7066 | 30000 | 2.9566 |
| 2.7509 | 13.1300 | 31000 | 2.9615 |
| 2.7678 | 13.5536 | 32000 | 2.9534 |
| 2.7649 | 13.9772 | 33000 | 2.9496 |
| 2.7413 | 14.4007 | 34000 | 2.9514 |
| 2.7492 | 14.8243 | 35000 | 2.9475 |
| 2.7143 | 15.2478 | 36000 | 2.9495 |
| 2.7271 | 15.6714 | 37000 | 2.9467 |
| 2.6955 | 16.0949 | 38000 | 2.9482 |
| 2.7101 | 16.5185 | 39000 | 2.9445 |
| 2.7154 | 16.9421 | 40000 | 2.9417 |
| 2.689 | 17.3656 | 41000 | 2.9452 |
| 2.6937 | 17.7892 | 42000 | 2.9416 |
| 2.6809 | 18.2126 | 43000 | 2.9444 |
| 2.6875 | 18.6362 | 44000 | 2.9414 |
| 2.674 | 19.0597 | 45000 | 2.9421 |
| 2.6718 | 19.4833 | 46000 | 2.9416 |
| 2.6715 | 19.9069 | 47000 | 2.9412 |
Framework versions
- Transformers 4.54.0
- Pytorch 2.10.0+cu128
- Datasets 3.2.0
- Tokenizers 0.21.4
- Downloads last month
- 161