llama-150M-20260205-original

This model is a fine-tuned version of llama_small_config.json on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 16
eval_batch_size: 16
seed: 42
gradient_accumulation_steps: 32
total_train_batch_size: 512
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 1000
num_epochs: 1

Training Loss	Epoch	Step	Validation Loss
7.3861	0.0174	500	7.2470
6.6276	0.0347	1000	6.5153
6.1992	0.0521	1500	6.1106
5.9348	0.0695	2000	5.8573
5.7589	0.0868	2500	5.6832
5.6333	0.1042	3000	5.5625
5.539	0.1216	3500	5.4692
5.4692	0.1390	4000	5.3959
5.4019	0.1563	4500	5.3360
5.3538	0.1737	5000	5.2857
5.2978	0.1911	5500	5.2404
5.2713	0.2084	6000	5.2031
5.2348	0.2258	6500	5.1679
5.2008	0.2432	7000	5.1379
5.1738	0.2605	7500	5.1117
5.1508	0.2779	8000	5.0885
5.126	0.2953	8500	5.0637
5.1046	0.3126	9000	5.0443
5.0889	0.3300	9500	5.0262
5.0709	0.3474	10000	5.0100
5.0485	0.3647	10500	4.9933
5.036	0.3821	11000	4.9774
5.0213	0.3995	11500	4.9657
5.013	0.4169	12000	4.9505
4.9999	0.4342	12500	4.9409
4.9832	0.4516	13000	4.9282
4.972	0.4690	13500	4.9185
4.9657	0.4863	14000	4.9091
4.9626	0.5037	14500	4.9017
4.95	0.5211	15000	4.8932
4.9378	0.5384	15500	4.8848
4.9291	0.5558	16000	4.8785
4.9203	0.5732	16500	4.8714
4.916	0.5905	17000	4.8648
4.9126	0.6079	17500	4.8587
4.9044	0.6253	18000	4.8534
4.8978	0.6426	18500	4.8463
4.892	0.6600	19000	4.8434
4.889	0.6774	19500	4.8379
4.8845	0.6948	20000	4.8328
4.8827	0.7121	20500	4.8296
4.8752	0.7295	21000	4.8264
4.8756	0.7469	21500	4.8233
4.8726	0.7642	22000	4.8199
4.8639	0.7816	22500	4.8178
4.8673	0.7990	23000	4.8149
4.8619	0.8163	23500	4.8130
4.8619	0.8337	24000	4.8113
4.856	0.8511	24500	4.8094
4.853	0.8684	25000	4.8084
4.8569	0.8858	25500	4.8070
4.8537	0.9032	26000	4.8062
4.8551	0.9206	26500	4.8055
4.8551	0.9379	27000	4.8052
4.8507	0.9553	27500	4.8047
4.8568	0.9727	28000	4.8045
4.854	0.9900	28500	4.8044

Safetensors

Model size

0.2B params

Tensor type

F32