ThomasTheMaker's picture
Upload folder using huggingface_hub
c6ae8e9 verified
2025-08-29 17:46:50 - pico-train - INFO - Step 0 -- ๐Ÿ“Š Evaluation Results
2025-08-29 17:46:50 - pico-train - INFO - โ””โ”€โ”€ paloma: inf
2025-08-29 17:46:50 - pico-train - INFO - ==================================================
2025-08-29 17:46:50 - pico-train - INFO - โœจ Training Configuration
2025-08-29 17:46:50 - pico-train - INFO - ==================================================
2025-08-29 17:46:50 - pico-train - INFO - โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
2025-08-29 17:46:50 - pico-train - INFO - โ”‚ checkpointing: โ”‚
2025-08-29 17:46:50 - pico-train - INFO - โ”‚ checkpoints_dir: checkpoints โ”‚
2025-08-29 17:46:50 - pico-train - INFO - โ”‚ evaluation: โ”‚
2025-08-29 17:46:50 - pico-train - INFO - โ”‚ eval_results_dir: eval_results โ”‚
2025-08-29 17:46:50 - pico-train - INFO - โ”‚ fabric_checkpoint_dir: fabric_state โ”‚
2025-08-29 17:46:50 - pico-train - INFO - โ”‚ fabric_checkpoint_filename: checkpoint.pt โ”‚
2025-08-29 17:46:50 - pico-train - INFO - โ”‚ hf_checkpoint: โ”‚
2025-08-29 17:46:50 - pico-train - INFO - โ”‚ collection_slug: null โ”‚
2025-08-29 17:46:50 - pico-train - INFO - โ”‚ repo_id: ThomasTheMaker/pico-decoder-tiny โ”‚
2025-08-29 17:46:50 - pico-train - INFO - โ”‚ learning_dynamics: โ”‚
2025-08-29 17:46:50 - pico-train - INFO - โ”‚ batch_size: 1 โ”‚
2025-08-29 17:46:50 - pico-train - INFO - โ”‚ eval_data: null โ”‚
2025-08-29 17:46:50 - pico-train - INFO - โ”‚ layer_suffixes: โ”‚
2025-08-29 17:46:50 - pico-train - INFO - โ”‚ - attention.v_proj โ”‚
2025-08-29 17:46:50 - pico-train - INFO - โ”‚ - attention.o_proj โ”‚
2025-08-29 17:46:50 - pico-train - INFO - โ”‚ - swiglu.w_2 โ”‚
2025-08-29 17:46:50 - pico-train - INFO - โ”‚ sequence_idx: -1 โ”‚
2025-08-29 17:46:50 - pico-train - INFO - โ”‚ learning_dynamics_dir: learning_dynamics โ”‚
2025-08-29 17:46:50 - pico-train - INFO - โ”‚ logs_dir: logs โ”‚
2025-08-29 17:46:50 - pico-train - INFO - โ”‚ run_name: pico-decoder-tiny-dolma5M-v1 โ”‚
2025-08-29 17:46:50 - pico-train - INFO - โ”‚ runs_dir: runs โ”‚
2025-08-29 17:46:50 - pico-train - INFO - โ”‚ save_every_n_steps: 500 โ”‚
2025-08-29 17:46:50 - pico-train - INFO - โ”‚ save_to_hf: true โ”‚
2025-08-29 17:46:50 - pico-train - INFO - โ”‚ training: โ”‚
2025-08-29 17:46:50 - pico-train - INFO - โ”‚ auto_resume: true โ”‚
2025-08-29 17:46:50 - pico-train - INFO - โ”‚ data: โ”‚
2025-08-29 17:46:50 - pico-train - INFO - โ”‚ dataloader: โ”‚
2025-08-29 17:46:50 - pico-train - INFO - โ”‚ batch_size: 4 โ”‚
2025-08-29 17:46:50 - pico-train - INFO - โ”‚ dataset: โ”‚
2025-08-29 17:46:50 - pico-train - INFO - โ”‚ name: ThomasTheMaker/pretokenized-dolma-5M โ”‚
2025-08-29 17:46:50 - pico-train - INFO - โ”‚ tokenizer: โ”‚
2025-08-29 17:46:50 - pico-train - INFO - โ”‚ name: allenai/OLMo-7B-0724-hf โ”‚
2025-08-29 17:46:50 - pico-train - INFO - โ”‚ vocab_size: 50304 โ”‚
2025-08-29 17:46:50 - pico-train - INFO - โ”‚ evaluation: โ”‚
2025-08-29 17:46:50 - pico-train - INFO - โ”‚ metrics: โ”‚
2025-08-29 17:46:50 - pico-train - INFO - โ”‚ - paloma โ”‚
2025-08-29 17:46:50 - pico-train - INFO - โ”‚ paloma: โ”‚
2025-08-29 17:46:50 - pico-train - INFO - โ”‚ batch_size: 1 โ”‚
2025-08-29 17:46:50 - pico-train - INFO - โ”‚ dataset_name: pico-lm/pretokenized-paloma-tinsy โ”‚
2025-08-29 17:46:50 - pico-train - INFO - โ”‚ dataset_split: val โ”‚
2025-08-29 17:46:50 - pico-train - INFO - โ”‚ max_length: 2048 โ”‚
2025-08-29 17:46:50 - pico-train - INFO - โ”‚ model: โ”‚
2025-08-29 17:46:50 - pico-train - INFO - โ”‚ activation_hidden_dim: 384 โ”‚
2025-08-29 17:46:50 - pico-train - INFO - โ”‚ attention_n_heads: 12 โ”‚
2025-08-29 17:46:50 - pico-train - INFO - โ”‚ attention_n_kv_heads: 4 โ”‚
2025-08-29 17:46:50 - pico-train - INFO - โ”‚ batch_size: 1024 โ”‚
2025-08-29 17:46:50 - pico-train - INFO - โ”‚ d_model: 96 โ”‚
2025-08-29 17:46:50 - pico-train - INFO - โ”‚ max_seq_len: 2048 โ”‚
2025-08-29 17:46:50 - pico-train - INFO - โ”‚ model_type: pico_decoder โ”‚
2025-08-29 17:46:50 - pico-train - INFO - โ”‚ n_layers: 12 โ”‚
2025-08-29 17:46:50 - pico-train - INFO - โ”‚ norm_eps: 1.0e-06 โ”‚
2025-08-29 17:46:50 - pico-train - INFO - โ”‚ position_emb_theta: 10000.0 โ”‚
2025-08-29 17:46:50 - pico-train - INFO - โ”‚ vocab_size: 50304 โ”‚
2025-08-29 17:46:50 - pico-train - INFO - โ”‚ monitoring: โ”‚
2025-08-29 17:46:50 - pico-train - INFO - โ”‚ logging: โ”‚
2025-08-29 17:46:50 - pico-train - INFO - โ”‚ log_every_n_steps: 25 โ”‚
2025-08-29 17:46:50 - pico-train - INFO - โ”‚ log_level: INFO โ”‚
2025-08-29 17:46:50 - pico-train - INFO - โ”‚ save_to_wandb: false โ”‚
2025-08-29 17:46:50 - pico-train - INFO - โ”‚ wandb: โ”‚
2025-08-29 17:46:50 - pico-train - INFO - โ”‚ entity: boymyc โ”‚
2025-08-29 17:46:50 - pico-train - INFO - โ”‚ project: pico-decoder-tiny โ”‚
2025-08-29 17:46:50 - pico-train - INFO - โ”‚ training: โ”‚
2025-08-29 17:46:50 - pico-train - INFO - โ”‚ fabric: โ”‚
2025-08-29 17:46:50 - pico-train - INFO - โ”‚ accelerator: cuda โ”‚
2025-08-29 17:46:50 - pico-train - INFO - โ”‚ num_devices: 1 โ”‚
2025-08-29 17:46:50 - pico-train - INFO - โ”‚ num_nodes: 1 โ”‚
2025-08-29 17:46:50 - pico-train - INFO - โ”‚ precision: bf16-mixed โ”‚
2025-08-29 17:46:50 - pico-train - INFO - โ”‚ max_steps: 20000 โ”‚
2025-08-29 17:46:50 - pico-train - INFO - โ”‚ optimization: โ”‚
2025-08-29 17:46:50 - pico-train - INFO - โ”‚ gradient_accumulation_steps: 4 โ”‚
2025-08-29 17:46:50 - pico-train - INFO - โ”‚ lr: 5.0e-05 โ”‚
2025-08-29 17:46:50 - pico-train - INFO - โ”‚ lr_scheduler: cosine โ”‚
2025-08-29 17:46:50 - pico-train - INFO - โ”‚ lr_warmup_steps: 8000 โ”‚
2025-08-29 17:46:50 - pico-train - INFO - โ”‚ optimizer: adamw โ”‚
2025-08-29 17:46:50 - pico-train - INFO - โ”‚ โ”‚
2025-08-29 17:46:50 - pico-train - INFO - โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ
2025-08-29 17:46:50 - pico-train - INFO - ==================================================
2025-08-29 17:46:50 - pico-train - INFO - โ›ญ Runtime Summary:
2025-08-29 17:46:50 - pico-train - INFO - ==================================================
2025-08-29 17:46:50 - pico-train - INFO - Starting from step: 0
2025-08-29 17:46:50 - pico-train - INFO - Model Setup:
2025-08-29 17:46:50 - pico-train - INFO - โ””โ”€ Total Parameters: 11,282,784
2025-08-29 17:46:50 - pico-train - INFO - โ””โ”€ Trainable Parameters: 11,282,784
2025-08-29 17:46:50 - pico-train - INFO - Distributed Setup:
2025-08-29 17:46:50 - pico-train - INFO - โ””โ”€ Number of Devices: 1
2025-08-29 17:46:50 - pico-train - INFO - โ””โ”€ Device Type: NVIDIA GeForce RTX 5090
2025-08-29 17:46:50 - pico-train - INFO - โ””โ”€ Available Memory: 33.68 GB
2025-08-29 17:46:50 - pico-train - INFO - Software Setup:
2025-08-29 17:46:50 - pico-train - INFO - โ””โ”€ Python Version: 3.10.12
2025-08-29 17:46:50 - pico-train - INFO - โ””โ”€ PyTorch Version: 2.8.0+cu128
2025-08-29 17:46:50 - pico-train - INFO - โ””โ”€ CUDA Version: 12.8
2025-08-29 17:46:50 - pico-train - INFO - โ””โ”€ Operating System: Linux 6.8.0-63-generic
2025-08-29 17:46:50 - pico-train - INFO - Batch Size Configuration:
2025-08-29 17:46:50 - pico-train - INFO - โ””โ”€ Global Batch Size: 4
2025-08-29 17:46:50 - pico-train - INFO - โ””โ”€ Per Device Batch Size: 1
2025-08-29 17:46:50 - pico-train - INFO - โ””โ”€ Gradient Accumulation Steps: 4
2025-08-29 17:46:50 - pico-train - INFO - ==================================================
2025-08-29 17:46:52 - pico-train - INFO - Step 0 -- ๐Ÿ”„ Training Metrics
2025-08-29 17:46:52 - pico-train - INFO - โ”œโ”€โ”€ Loss: 10.9975
2025-08-29 17:46:52 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 0.00e+00
2025-08-29 17:46:52 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 17:46:52 - pico-train - INFO - Step 0 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-29 17:47:06 - pico-train - INFO - Step 25 -- ๐Ÿ”„ Training Metrics
2025-08-29 17:47:06 - pico-train - INFO - โ”œโ”€โ”€ Loss: 10.9972
2025-08-29 17:47:06 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.56e-07
2025-08-29 17:47:06 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 17:47:18 - pico-train - INFO - Step 50 -- ๐Ÿ”„ Training Metrics
2025-08-29 17:47:18 - pico-train - INFO - โ”œโ”€โ”€ Loss: 11.0030
2025-08-29 17:47:18 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.13e-07
2025-08-29 17:47:18 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 17:47:31 - pico-train - INFO - Step 75 -- ๐Ÿ”„ Training Metrics
2025-08-29 17:47:31 - pico-train - INFO - โ”œโ”€โ”€ Loss: 11.0034
2025-08-29 17:47:31 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.69e-07
2025-08-29 17:47:31 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 17:47:43 - pico-train - INFO - Step 100 -- ๐Ÿ”„ Training Metrics
2025-08-29 17:47:43 - pico-train - INFO - โ”œโ”€โ”€ Loss: 10.9962
2025-08-29 17:47:43 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 6.25e-07
2025-08-29 17:47:43 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 17:47:56 - pico-train - INFO - Step 125 -- ๐Ÿ”„ Training Metrics
2025-08-29 17:47:56 - pico-train - INFO - โ”œโ”€โ”€ Loss: 10.9973
2025-08-29 17:47:56 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.81e-07
2025-08-29 17:47:56 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 17:48:08 - pico-train - INFO - Step 150 -- ๐Ÿ”„ Training Metrics
2025-08-29 17:48:08 - pico-train - INFO - โ”œโ”€โ”€ Loss: 10.9943
2025-08-29 17:48:08 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.38e-07
2025-08-29 17:48:08 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 17:48:21 - pico-train - INFO - Step 175 -- ๐Ÿ”„ Training Metrics
2025-08-29 17:48:21 - pico-train - INFO - โ”œโ”€โ”€ Loss: 10.9860
2025-08-29 17:48:21 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.09e-06
2025-08-29 17:48:21 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 17:48:33 - pico-train - INFO - Step 200 -- ๐Ÿ”„ Training Metrics
2025-08-29 17:48:33 - pico-train - INFO - โ”œโ”€โ”€ Loss: 10.9885
2025-08-29 17:48:33 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.25e-06
2025-08-29 17:48:33 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 17:48:46 - pico-train - INFO - Step 225 -- ๐Ÿ”„ Training Metrics
2025-08-29 17:48:46 - pico-train - INFO - โ”œโ”€โ”€ Loss: 10.9816
2025-08-29 17:48:46 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.41e-06
2025-08-29 17:48:46 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 17:48:59 - pico-train - INFO - Step 250 -- ๐Ÿ”„ Training Metrics
2025-08-29 17:48:59 - pico-train - INFO - โ”œโ”€โ”€ Loss: 10.9786
2025-08-29 17:48:59 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.56e-06
2025-08-29 17:48:59 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 17:49:11 - pico-train - INFO - Step 275 -- ๐Ÿ”„ Training Metrics
2025-08-29 17:49:11 - pico-train - INFO - โ”œโ”€โ”€ Loss: 10.9707
2025-08-29 17:49:11 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.72e-06
2025-08-29 17:49:11 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 17:49:23 - pico-train - INFO - Step 300 -- ๐Ÿ”„ Training Metrics
2025-08-29 17:49:23 - pico-train - INFO - โ”œโ”€โ”€ Loss: 10.9700
2025-08-29 17:49:23 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.88e-06
2025-08-29 17:49:23 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 17:49:36 - pico-train - INFO - Step 325 -- ๐Ÿ”„ Training Metrics
2025-08-29 17:49:36 - pico-train - INFO - โ”œโ”€โ”€ Loss: 10.9626
2025-08-29 17:49:36 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.03e-06
2025-08-29 17:49:36 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 17:49:48 - pico-train - INFO - Step 350 -- ๐Ÿ”„ Training Metrics
2025-08-29 17:49:48 - pico-train - INFO - โ”œโ”€โ”€ Loss: 10.9580
2025-08-29 17:49:48 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.19e-06
2025-08-29 17:49:48 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 17:50:01 - pico-train - INFO - Step 375 -- ๐Ÿ”„ Training Metrics
2025-08-29 17:50:01 - pico-train - INFO - โ”œโ”€โ”€ Loss: 10.9486
2025-08-29 17:50:01 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.34e-06
2025-08-29 17:50:01 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 17:50:13 - pico-train - INFO - Step 400 -- ๐Ÿ”„ Training Metrics
2025-08-29 17:50:13 - pico-train - INFO - โ”œโ”€โ”€ Loss: 10.9417
2025-08-29 17:50:13 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.50e-06
2025-08-29 17:50:13 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 17:50:26 - pico-train - INFO - Step 425 -- ๐Ÿ”„ Training Metrics
2025-08-29 17:50:26 - pico-train - INFO - โ”œโ”€โ”€ Loss: 10.9328
2025-08-29 17:50:26 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.66e-06
2025-08-29 17:50:26 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 17:50:39 - pico-train - INFO - Step 450 -- ๐Ÿ”„ Training Metrics
2025-08-29 17:50:39 - pico-train - INFO - โ”œโ”€โ”€ Loss: 10.9242
2025-08-29 17:50:39 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.81e-06
2025-08-29 17:50:39 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 17:50:51 - pico-train - INFO - Step 475 -- ๐Ÿ”„ Training Metrics
2025-08-29 17:50:51 - pico-train - INFO - โ”œโ”€โ”€ Loss: 10.9170
2025-08-29 17:50:51 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.97e-06
2025-08-29 17:50:51 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 17:51:03 - pico-train - INFO - Step 500 -- ๐Ÿ’พ Saving Checkpoint
2025-08-29 17:52:53 - pico-train - INFO - Step 500 -- ๐Ÿ“Š Evaluation Results
2025-08-29 17:52:53 - pico-train - INFO - โ””โ”€โ”€ paloma: inf
2025-08-29 17:52:55 - pico-train - INFO - Step 500 -- ๐Ÿ”„ Training Metrics
2025-08-29 17:52:55 - pico-train - INFO - โ”œโ”€โ”€ Loss: 10.8979
2025-08-29 17:52:55 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.13e-06
2025-08-29 17:52:55 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 17:52:55 - pico-train - INFO - Step 500 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-29 17:53:09 - pico-train - INFO - Step 525 -- ๐Ÿ”„ Training Metrics
2025-08-29 17:53:09 - pico-train - INFO - โ”œโ”€โ”€ Loss: 10.8890
2025-08-29 17:53:09 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.28e-06
2025-08-29 17:53:09 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 17:53:22 - pico-train - INFO - Step 550 -- ๐Ÿ”„ Training Metrics
2025-08-29 17:53:22 - pico-train - INFO - โ”œโ”€โ”€ Loss: 10.8846
2025-08-29 17:53:22 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.44e-06
2025-08-29 17:53:22 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 17:53:35 - pico-train - INFO - Step 575 -- ๐Ÿ”„ Training Metrics
2025-08-29 17:53:35 - pico-train - INFO - โ”œโ”€โ”€ Loss: 10.8657
2025-08-29 17:53:35 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.59e-06
2025-08-29 17:53:35 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 17:53:47 - pico-train - INFO - Step 600 -- ๐Ÿ”„ Training Metrics
2025-08-29 17:53:47 - pico-train - INFO - โ”œโ”€โ”€ Loss: 10.8590
2025-08-29 17:53:47 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.75e-06
2025-08-29 17:53:47 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 17:54:00 - pico-train - INFO - Step 625 -- ๐Ÿ”„ Training Metrics
2025-08-29 17:54:00 - pico-train - INFO - โ”œโ”€โ”€ Loss: 10.8328
2025-08-29 17:54:00 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.91e-06
2025-08-29 17:54:00 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 17:54:12 - pico-train - INFO - Step 650 -- ๐Ÿ”„ Training Metrics
2025-08-29 17:54:12 - pico-train - INFO - โ”œโ”€โ”€ Loss: 10.8166
2025-08-29 17:54:12 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.06e-06
2025-08-29 17:54:12 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 17:54:25 - pico-train - INFO - Step 675 -- ๐Ÿ”„ Training Metrics
2025-08-29 17:54:25 - pico-train - INFO - โ”œโ”€โ”€ Loss: 10.7913
2025-08-29 17:54:25 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.22e-06
2025-08-29 17:54:25 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 17:54:38 - pico-train - INFO - Step 700 -- ๐Ÿ”„ Training Metrics
2025-08-29 17:54:38 - pico-train - INFO - โ”œโ”€โ”€ Loss: 10.7609
2025-08-29 17:54:38 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.37e-06
2025-08-29 17:54:38 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 17:54:50 - pico-train - INFO - Step 725 -- ๐Ÿ”„ Training Metrics
2025-08-29 17:54:50 - pico-train - INFO - โ”œโ”€โ”€ Loss: 10.7322
2025-08-29 17:54:50 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.53e-06
2025-08-29 17:54:50 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 17:55:03 - pico-train - INFO - Step 750 -- ๐Ÿ”„ Training Metrics
2025-08-29 17:55:03 - pico-train - INFO - โ”œโ”€โ”€ Loss: 10.7121
2025-08-29 17:55:03 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.69e-06
2025-08-29 17:55:03 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 17:55:15 - pico-train - INFO - Step 775 -- ๐Ÿ”„ Training Metrics
2025-08-29 17:55:15 - pico-train - INFO - โ”œโ”€โ”€ Loss: 10.6877
2025-08-29 17:55:15 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.84e-06
2025-08-29 17:55:15 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 17:55:28 - pico-train - INFO - Step 800 -- ๐Ÿ”„ Training Metrics
2025-08-29 17:55:28 - pico-train - INFO - โ”œโ”€โ”€ Loss: 10.6436
2025-08-29 17:55:28 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 17:55:28 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 17:55:41 - pico-train - INFO - Step 825 -- ๐Ÿ”„ Training Metrics
2025-08-29 17:55:41 - pico-train - INFO - โ”œโ”€โ”€ Loss: 10.6256
2025-08-29 17:55:41 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.16e-06
2025-08-29 17:55:41 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 17:55:53 - pico-train - INFO - Step 850 -- ๐Ÿ”„ Training Metrics
2025-08-29 17:55:53 - pico-train - INFO - โ”œโ”€โ”€ Loss: 10.5961
2025-08-29 17:55:53 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.31e-06
2025-08-29 17:55:53 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 17:56:06 - pico-train - INFO - Step 875 -- ๐Ÿ”„ Training Metrics
2025-08-29 17:56:06 - pico-train - INFO - โ”œโ”€โ”€ Loss: 10.5443
2025-08-29 17:56:06 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.47e-06
2025-08-29 17:56:06 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 17:56:18 - pico-train - INFO - Step 900 -- ๐Ÿ”„ Training Metrics
2025-08-29 17:56:18 - pico-train - INFO - โ”œโ”€โ”€ Loss: 10.5197
2025-08-29 17:56:18 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.63e-06
2025-08-29 17:56:18 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 17:56:31 - pico-train - INFO - Step 925 -- ๐Ÿ”„ Training Metrics
2025-08-29 17:56:31 - pico-train - INFO - โ”œโ”€โ”€ Loss: 10.4854
2025-08-29 17:56:31 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.78e-06
2025-08-29 17:56:31 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 17:56:44 - pico-train - INFO - Step 950 -- ๐Ÿ”„ Training Metrics
2025-08-29 17:56:44 - pico-train - INFO - โ”œโ”€โ”€ Loss: 10.4826
2025-08-29 17:56:44 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.94e-06
2025-08-29 17:56:44 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 17:56:56 - pico-train - INFO - Step 975 -- ๐Ÿ”„ Training Metrics
2025-08-29 17:56:56 - pico-train - INFO - โ”œโ”€โ”€ Loss: 10.4557
2025-08-29 17:56:56 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 6.09e-06
2025-08-29 17:56:56 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 17:57:09 - pico-train - INFO - Step 1000 -- ๐Ÿ’พ Saving Checkpoint
2025-08-29 17:59:05 - pico-train - INFO - Step 1000 -- ๐Ÿ“Š Evaluation Results
2025-08-29 17:59:05 - pico-train - INFO - โ””โ”€โ”€ paloma: 7.125172406420199e+27
2025-08-29 17:59:08 - pico-train - INFO - Step 1000 -- ๐Ÿ”„ Training Metrics
2025-08-29 17:59:08 - pico-train - INFO - โ”œโ”€โ”€ Loss: 10.4142
2025-08-29 17:59:08 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 6.25e-06
2025-08-29 17:59:08 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 17:59:08 - pico-train - INFO - Step 1000 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-29 17:59:22 - pico-train - INFO - Step 1025 -- ๐Ÿ”„ Training Metrics
2025-08-29 17:59:22 - pico-train - INFO - โ”œโ”€โ”€ Loss: 10.3885
2025-08-29 17:59:22 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 6.41e-06
2025-08-29 17:59:22 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 17:59:35 - pico-train - INFO - Step 1050 -- ๐Ÿ”„ Training Metrics
2025-08-29 17:59:35 - pico-train - INFO - โ”œโ”€โ”€ Loss: 10.3737
2025-08-29 17:59:35 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 6.56e-06
2025-08-29 17:59:35 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 17:59:47 - pico-train - INFO - Step 1075 -- ๐Ÿ”„ Training Metrics
2025-08-29 17:59:47 - pico-train - INFO - โ”œโ”€โ”€ Loss: 10.3534
2025-08-29 17:59:47 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 6.72e-06
2025-08-29 17:59:47 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:00:00 - pico-train - INFO - Step 1100 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:00:00 - pico-train - INFO - โ”œโ”€โ”€ Loss: 10.3219
2025-08-29 18:00:00 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 6.88e-06
2025-08-29 18:00:00 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:00:13 - pico-train - INFO - Step 1125 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:00:13 - pico-train - INFO - โ”œโ”€โ”€ Loss: 10.3064
2025-08-29 18:00:13 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.03e-06
2025-08-29 18:00:13 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:00:25 - pico-train - INFO - Step 1150 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:00:25 - pico-train - INFO - โ”œโ”€โ”€ Loss: 10.2761
2025-08-29 18:00:25 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.19e-06
2025-08-29 18:00:25 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:00:38 - pico-train - INFO - Step 1175 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:00:38 - pico-train - INFO - โ”œโ”€โ”€ Loss: 10.2592
2025-08-29 18:00:38 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.34e-06
2025-08-29 18:00:38 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:00:50 - pico-train - INFO - Step 1200 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:00:50 - pico-train - INFO - โ”œโ”€โ”€ Loss: 10.2420
2025-08-29 18:00:50 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.50e-06
2025-08-29 18:00:50 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:01:03 - pico-train - INFO - Step 1225 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:01:03 - pico-train - INFO - โ”œโ”€โ”€ Loss: 10.2141
2025-08-29 18:01:03 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.66e-06
2025-08-29 18:01:03 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:01:15 - pico-train - INFO - Step 1250 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:01:15 - pico-train - INFO - โ”œโ”€โ”€ Loss: 10.1882
2025-08-29 18:01:15 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.81e-06
2025-08-29 18:01:15 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:01:28 - pico-train - INFO - Step 1275 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:01:28 - pico-train - INFO - โ”œโ”€โ”€ Loss: 10.1608
2025-08-29 18:01:28 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.97e-06
2025-08-29 18:01:28 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:01:40 - pico-train - INFO - Step 1300 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:01:40 - pico-train - INFO - โ”œโ”€โ”€ Loss: 10.1460
2025-08-29 18:01:40 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.13e-06
2025-08-29 18:01:40 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:01:53 - pico-train - INFO - Step 1325 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:01:53 - pico-train - INFO - โ”œโ”€โ”€ Loss: 10.0944
2025-08-29 18:01:53 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.28e-06
2025-08-29 18:01:53 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:02:05 - pico-train - INFO - Step 1350 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:02:05 - pico-train - INFO - โ”œโ”€โ”€ Loss: 10.0885
2025-08-29 18:02:05 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.44e-06
2025-08-29 18:02:05 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:02:18 - pico-train - INFO - Step 1375 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:02:18 - pico-train - INFO - โ”œโ”€โ”€ Loss: 10.0748
2025-08-29 18:02:18 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.59e-06
2025-08-29 18:02:18 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:02:31 - pico-train - INFO - Step 1400 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:02:31 - pico-train - INFO - โ”œโ”€โ”€ Loss: 10.0425
2025-08-29 18:02:31 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.75e-06
2025-08-29 18:02:31 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:02:43 - pico-train - INFO - Step 1425 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:02:43 - pico-train - INFO - โ”œโ”€โ”€ Loss: 10.0422
2025-08-29 18:02:43 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.91e-06
2025-08-29 18:02:43 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:02:56 - pico-train - INFO - Step 1450 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:02:56 - pico-train - INFO - โ”œโ”€โ”€ Loss: 10.0039
2025-08-29 18:02:56 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.06e-06
2025-08-29 18:02:56 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:03:08 - pico-train - INFO - Step 1475 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:03:08 - pico-train - INFO - โ”œโ”€โ”€ Loss: 9.9736
2025-08-29 18:03:08 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.22e-06
2025-08-29 18:03:08 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:03:20 - pico-train - INFO - Step 1500 -- ๐Ÿ’พ Saving Checkpoint
2025-08-29 18:05:15 - pico-train - INFO - Step 1500 -- ๐Ÿ“Š Evaluation Results
2025-08-29 18:05:15 - pico-train - INFO - โ””โ”€โ”€ paloma: 6.5469212698356e+18
2025-08-29 18:05:17 - pico-train - INFO - Step 1500 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:05:17 - pico-train - INFO - โ”œโ”€โ”€ Loss: 9.9729
2025-08-29 18:05:17 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.38e-06
2025-08-29 18:05:17 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:05:17 - pico-train - INFO - Step 1500 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-29 18:05:32 - pico-train - INFO - Step 1525 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:05:32 - pico-train - INFO - โ”œโ”€โ”€ Loss: 9.9379
2025-08-29 18:05:32 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.53e-06
2025-08-29 18:05:32 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:05:45 - pico-train - INFO - Step 1550 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:05:45 - pico-train - INFO - โ”œโ”€โ”€ Loss: 9.8819
2025-08-29 18:05:45 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.69e-06
2025-08-29 18:05:45 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:05:57 - pico-train - INFO - Step 1575 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:05:57 - pico-train - INFO - โ”œโ”€โ”€ Loss: 9.8702
2025-08-29 18:05:57 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.84e-06
2025-08-29 18:05:57 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:06:10 - pico-train - INFO - Step 1600 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:06:10 - pico-train - INFO - โ”œโ”€โ”€ Loss: 9.8571
2025-08-29 18:06:10 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.00e-05
2025-08-29 18:06:10 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:06:23 - pico-train - INFO - Step 1625 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:06:23 - pico-train - INFO - โ”œโ”€โ”€ Loss: 9.8356
2025-08-29 18:06:23 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.02e-05
2025-08-29 18:06:23 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:06:35 - pico-train - INFO - Step 1650 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:06:35 - pico-train - INFO - โ”œโ”€โ”€ Loss: 9.7973
2025-08-29 18:06:35 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.03e-05
2025-08-29 18:06:35 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:06:48 - pico-train - INFO - Step 1675 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:06:48 - pico-train - INFO - โ”œโ”€โ”€ Loss: 9.7745
2025-08-29 18:06:48 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.05e-05
2025-08-29 18:06:48 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:07:01 - pico-train - INFO - Step 1700 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:07:01 - pico-train - INFO - โ”œโ”€โ”€ Loss: 9.7673
2025-08-29 18:07:01 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.06e-05
2025-08-29 18:07:01 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:07:14 - pico-train - INFO - Step 1725 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:07:14 - pico-train - INFO - โ”œโ”€โ”€ Loss: 9.7406
2025-08-29 18:07:14 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.08e-05
2025-08-29 18:07:14 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:07:26 - pico-train - INFO - Step 1750 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:07:26 - pico-train - INFO - โ”œโ”€โ”€ Loss: 9.7312
2025-08-29 18:07:26 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.09e-05
2025-08-29 18:07:26 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:07:39 - pico-train - INFO - Step 1775 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:07:39 - pico-train - INFO - โ”œโ”€โ”€ Loss: 9.6563
2025-08-29 18:07:39 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.11e-05
2025-08-29 18:07:39 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:07:51 - pico-train - INFO - Step 1800 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:07:51 - pico-train - INFO - โ”œโ”€โ”€ Loss: 9.6515
2025-08-29 18:07:51 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.13e-05
2025-08-29 18:07:51 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:08:04 - pico-train - INFO - Step 1825 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:08:04 - pico-train - INFO - โ”œโ”€โ”€ Loss: 9.6241
2025-08-29 18:08:04 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.14e-05
2025-08-29 18:08:04 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:08:17 - pico-train - INFO - Step 1850 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:08:17 - pico-train - INFO - โ”œโ”€โ”€ Loss: 9.6015
2025-08-29 18:08:17 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.16e-05
2025-08-29 18:08:17 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:08:29 - pico-train - INFO - Step 1875 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:08:29 - pico-train - INFO - โ”œโ”€โ”€ Loss: 9.5933
2025-08-29 18:08:29 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.17e-05
2025-08-29 18:08:29 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:08:42 - pico-train - INFO - Step 1900 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:08:42 - pico-train - INFO - โ”œโ”€โ”€ Loss: 9.5544
2025-08-29 18:08:42 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.19e-05
2025-08-29 18:08:42 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:08:54 - pico-train - INFO - Step 1925 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:08:54 - pico-train - INFO - โ”œโ”€โ”€ Loss: 9.5407
2025-08-29 18:08:54 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.20e-05
2025-08-29 18:08:54 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:09:07 - pico-train - INFO - Step 1950 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:09:07 - pico-train - INFO - โ”œโ”€โ”€ Loss: 9.5431
2025-08-29 18:09:07 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.22e-05
2025-08-29 18:09:07 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:09:19 - pico-train - INFO - Step 1975 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:09:19 - pico-train - INFO - โ”œโ”€โ”€ Loss: 9.4853
2025-08-29 18:09:19 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.23e-05
2025-08-29 18:09:19 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:09:31 - pico-train - INFO - Step 2000 -- ๐Ÿ’พ Saving Checkpoint
2025-08-29 18:11:25 - pico-train - INFO - Step 2000 -- ๐Ÿ“Š Evaluation Results
2025-08-29 18:11:25 - pico-train - INFO - โ””โ”€โ”€ paloma: 5.118641309912889e+18
2025-08-29 18:11:29 - pico-train - INFO - Step 2000 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:11:29 - pico-train - INFO - โ”œโ”€โ”€ Loss: 9.4665
2025-08-29 18:11:29 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.25e-05
2025-08-29 18:11:29 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:11:29 - pico-train - INFO - Step 2000 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-29 18:11:43 - pico-train - INFO - Step 2025 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:11:43 - pico-train - INFO - โ”œโ”€โ”€ Loss: 9.4621
2025-08-29 18:11:43 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.27e-05
2025-08-29 18:11:43 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:11:56 - pico-train - INFO - Step 2050 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:11:56 - pico-train - INFO - โ”œโ”€โ”€ Loss: 9.4031
2025-08-29 18:11:56 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.28e-05
2025-08-29 18:11:56 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:12:08 - pico-train - INFO - Step 2075 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:12:08 - pico-train - INFO - โ”œโ”€โ”€ Loss: 9.3699
2025-08-29 18:12:08 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.30e-05
2025-08-29 18:12:08 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:12:21 - pico-train - INFO - Step 2100 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:12:21 - pico-train - INFO - โ”œโ”€โ”€ Loss: 9.3422
2025-08-29 18:12:21 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.31e-05
2025-08-29 18:12:21 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:12:33 - pico-train - INFO - Step 2125 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:12:33 - pico-train - INFO - โ”œโ”€โ”€ Loss: 9.3129
2025-08-29 18:12:33 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.33e-05
2025-08-29 18:12:33 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:12:46 - pico-train - INFO - Step 2150 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:12:46 - pico-train - INFO - โ”œโ”€โ”€ Loss: 9.2917
2025-08-29 18:12:46 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.34e-05
2025-08-29 18:12:46 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:12:58 - pico-train - INFO - Step 2175 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:12:58 - pico-train - INFO - โ”œโ”€โ”€ Loss: 9.2670
2025-08-29 18:12:58 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.36e-05
2025-08-29 18:12:58 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:13:11 - pico-train - INFO - Step 2200 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:13:11 - pico-train - INFO - โ”œโ”€โ”€ Loss: 9.2512
2025-08-29 18:13:11 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.38e-05
2025-08-29 18:13:11 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:13:23 - pico-train - INFO - Step 2225 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:13:23 - pico-train - INFO - โ”œโ”€โ”€ Loss: 9.2737
2025-08-29 18:13:23 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.39e-05
2025-08-29 18:13:23 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:13:36 - pico-train - INFO - Step 2250 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:13:36 - pico-train - INFO - โ”œโ”€โ”€ Loss: 9.2357
2025-08-29 18:13:36 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.41e-05
2025-08-29 18:13:36 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:13:49 - pico-train - INFO - Step 2275 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:13:49 - pico-train - INFO - โ”œโ”€โ”€ Loss: 9.1471
2025-08-29 18:13:49 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.42e-05
2025-08-29 18:13:49 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:14:01 - pico-train - INFO - Step 2300 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:14:01 - pico-train - INFO - โ”œโ”€โ”€ Loss: 9.1305
2025-08-29 18:14:01 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.44e-05
2025-08-29 18:14:01 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:14:14 - pico-train - INFO - Step 2325 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:14:14 - pico-train - INFO - โ”œโ”€โ”€ Loss: 9.1430
2025-08-29 18:14:14 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.45e-05
2025-08-29 18:14:14 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:14:26 - pico-train - INFO - Step 2350 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:14:26 - pico-train - INFO - โ”œโ”€โ”€ Loss: 9.0948
2025-08-29 18:14:26 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.47e-05
2025-08-29 18:14:26 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:14:39 - pico-train - INFO - Step 2375 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:14:39 - pico-train - INFO - โ”œโ”€โ”€ Loss: 9.0256
2025-08-29 18:14:39 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.48e-05
2025-08-29 18:14:39 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:14:52 - pico-train - INFO - Step 2400 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:14:52 - pico-train - INFO - โ”œโ”€โ”€ Loss: 9.0664
2025-08-29 18:14:52 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.50e-05
2025-08-29 18:14:52 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:15:04 - pico-train - INFO - Step 2425 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:15:04 - pico-train - INFO - โ”œโ”€โ”€ Loss: 9.0020
2025-08-29 18:15:04 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.52e-05
2025-08-29 18:15:04 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:15:17 - pico-train - INFO - Step 2450 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:15:17 - pico-train - INFO - โ”œโ”€โ”€ Loss: 8.9518
2025-08-29 18:15:17 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.53e-05
2025-08-29 18:15:17 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:15:29 - pico-train - INFO - Step 2475 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:15:29 - pico-train - INFO - โ”œโ”€โ”€ Loss: 8.9717
2025-08-29 18:15:29 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.55e-05
2025-08-29 18:15:29 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:15:41 - pico-train - INFO - Step 2500 -- ๐Ÿ’พ Saving Checkpoint
2025-08-29 18:17:36 - pico-train - INFO - Step 2500 -- ๐Ÿ“Š Evaluation Results
2025-08-29 18:17:36 - pico-train - INFO - โ””โ”€โ”€ paloma: 3.37924315167126e+18
2025-08-29 18:17:38 - pico-train - INFO - Step 2500 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:17:38 - pico-train - INFO - โ”œโ”€โ”€ Loss: 8.9536
2025-08-29 18:17:38 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.56e-05
2025-08-29 18:17:38 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:17:38 - pico-train - INFO - Step 2500 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-29 18:17:52 - pico-train - INFO - Step 2525 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:17:52 - pico-train - INFO - โ”œโ”€โ”€ Loss: 8.8812
2025-08-29 18:17:52 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.58e-05
2025-08-29 18:17:52 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:18:05 - pico-train - INFO - Step 2550 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:18:05 - pico-train - INFO - โ”œโ”€โ”€ Loss: 8.8824
2025-08-29 18:18:05 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.59e-05
2025-08-29 18:18:05 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:18:17 - pico-train - INFO - Step 2575 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:18:17 - pico-train - INFO - โ”œโ”€โ”€ Loss: 8.8564
2025-08-29 18:18:17 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.61e-05
2025-08-29 18:18:17 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:18:30 - pico-train - INFO - Step 2600 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:18:30 - pico-train - INFO - โ”œโ”€โ”€ Loss: 8.8419
2025-08-29 18:18:30 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.63e-05
2025-08-29 18:18:30 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:18:43 - pico-train - INFO - Step 2625 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:18:43 - pico-train - INFO - โ”œโ”€โ”€ Loss: 8.7865
2025-08-29 18:18:43 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.64e-05
2025-08-29 18:18:43 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:18:55 - pico-train - INFO - Step 2650 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:18:55 - pico-train - INFO - โ”œโ”€โ”€ Loss: 8.7493
2025-08-29 18:18:55 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.66e-05
2025-08-29 18:18:55 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:19:08 - pico-train - INFO - Step 2675 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:19:08 - pico-train - INFO - โ”œโ”€โ”€ Loss: 8.7255
2025-08-29 18:19:08 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.67e-05
2025-08-29 18:19:08 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:19:20 - pico-train - INFO - Step 2700 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:19:20 - pico-train - INFO - โ”œโ”€โ”€ Loss: 8.6469
2025-08-29 18:19:20 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.69e-05
2025-08-29 18:19:20 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:19:33 - pico-train - INFO - Step 2725 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:19:33 - pico-train - INFO - โ”œโ”€โ”€ Loss: 8.6799
2025-08-29 18:19:33 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.70e-05
2025-08-29 18:19:33 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:19:45 - pico-train - INFO - Step 2750 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:19:45 - pico-train - INFO - โ”œโ”€โ”€ Loss: 8.6974
2025-08-29 18:19:45 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.72e-05
2025-08-29 18:19:45 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:19:58 - pico-train - INFO - Step 2775 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:19:58 - pico-train - INFO - โ”œโ”€โ”€ Loss: 8.6441
2025-08-29 18:19:58 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.73e-05
2025-08-29 18:19:58 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:20:10 - pico-train - INFO - Step 2800 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:20:10 - pico-train - INFO - โ”œโ”€โ”€ Loss: 8.6689
2025-08-29 18:20:10 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.75e-05
2025-08-29 18:20:10 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:20:23 - pico-train - INFO - Step 2825 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:20:23 - pico-train - INFO - โ”œโ”€โ”€ Loss: 8.5732
2025-08-29 18:20:23 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.77e-05
2025-08-29 18:20:23 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:20:35 - pico-train - INFO - Step 2850 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:20:35 - pico-train - INFO - โ”œโ”€โ”€ Loss: 8.5955
2025-08-29 18:20:35 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.78e-05
2025-08-29 18:20:35 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:20:48 - pico-train - INFO - Step 2875 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:20:48 - pico-train - INFO - โ”œโ”€โ”€ Loss: 8.5823
2025-08-29 18:20:48 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.80e-05
2025-08-29 18:20:48 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:21:00 - pico-train - INFO - Step 2900 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:21:00 - pico-train - INFO - โ”œโ”€โ”€ Loss: 8.5968
2025-08-29 18:21:00 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.81e-05
2025-08-29 18:21:00 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:21:13 - pico-train - INFO - Step 2925 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:21:13 - pico-train - INFO - โ”œโ”€โ”€ Loss: 8.4721
2025-08-29 18:21:13 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.83e-05
2025-08-29 18:21:13 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:21:26 - pico-train - INFO - Step 2950 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:21:26 - pico-train - INFO - โ”œโ”€โ”€ Loss: 8.4672
2025-08-29 18:21:26 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.84e-05
2025-08-29 18:21:26 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:21:38 - pico-train - INFO - Step 2975 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:21:38 - pico-train - INFO - โ”œโ”€โ”€ Loss: 8.4033
2025-08-29 18:21:38 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.86e-05
2025-08-29 18:21:38 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:21:50 - pico-train - INFO - Step 3000 -- ๐Ÿ’พ Saving Checkpoint
2025-08-29 18:24:11 - pico-train - INFO - Step 3000 -- ๐Ÿ“Š Evaluation Results
2025-08-29 18:24:11 - pico-train - INFO - โ””โ”€โ”€ paloma: 6.892747900243237e+18
2025-08-29 18:24:14 - pico-train - INFO - Step 3000 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:24:14 - pico-train - INFO - โ”œโ”€โ”€ Loss: 8.4947
2025-08-29 18:24:14 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.88e-05
2025-08-29 18:24:14 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:24:14 - pico-train - INFO - Step 3000 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-29 18:24:28 - pico-train - INFO - Step 3025 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:24:28 - pico-train - INFO - โ”œโ”€โ”€ Loss: 8.3780
2025-08-29 18:24:28 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.89e-05
2025-08-29 18:24:28 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:24:41 - pico-train - INFO - Step 3050 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:24:41 - pico-train - INFO - โ”œโ”€โ”€ Loss: 8.3581
2025-08-29 18:24:41 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.91e-05
2025-08-29 18:24:41 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:24:54 - pico-train - INFO - Step 3075 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:24:54 - pico-train - INFO - โ”œโ”€โ”€ Loss: 8.3341
2025-08-29 18:24:54 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.92e-05
2025-08-29 18:24:54 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:25:06 - pico-train - INFO - Step 3100 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:25:06 - pico-train - INFO - โ”œโ”€โ”€ Loss: 8.3391
2025-08-29 18:25:06 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.94e-05
2025-08-29 18:25:06 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:25:19 - pico-train - INFO - Step 3125 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:25:19 - pico-train - INFO - โ”œโ”€โ”€ Loss: 8.3670
2025-08-29 18:25:19 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.95e-05
2025-08-29 18:25:19 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:25:32 - pico-train - INFO - Step 3150 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:25:32 - pico-train - INFO - โ”œโ”€โ”€ Loss: 8.2370
2025-08-29 18:25:32 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.97e-05
2025-08-29 18:25:32 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:25:44 - pico-train - INFO - Step 3175 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:25:44 - pico-train - INFO - โ”œโ”€โ”€ Loss: 8.2879
2025-08-29 18:25:44 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.98e-05
2025-08-29 18:25:44 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:25:57 - pico-train - INFO - Step 3200 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:25:57 - pico-train - INFO - โ”œโ”€โ”€ Loss: 8.2706
2025-08-29 18:25:57 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-29 18:25:57 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:26:09 - pico-train - INFO - Step 3225 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:26:09 - pico-train - INFO - โ”œโ”€โ”€ Loss: 8.1983
2025-08-29 18:26:09 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.02e-05
2025-08-29 18:26:09 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:26:22 - pico-train - INFO - Step 3250 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:26:22 - pico-train - INFO - โ”œโ”€โ”€ Loss: 8.2174
2025-08-29 18:26:22 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.03e-05
2025-08-29 18:26:22 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:26:34 - pico-train - INFO - Step 3275 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:26:34 - pico-train - INFO - โ”œโ”€โ”€ Loss: 8.2229
2025-08-29 18:26:34 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.05e-05
2025-08-29 18:26:34 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:26:47 - pico-train - INFO - Step 3300 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:26:47 - pico-train - INFO - โ”œโ”€โ”€ Loss: 8.1398
2025-08-29 18:26:47 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.06e-05
2025-08-29 18:26:47 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:26:59 - pico-train - INFO - Step 3325 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:26:59 - pico-train - INFO - โ”œโ”€โ”€ Loss: 8.1430
2025-08-29 18:26:59 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.08e-05
2025-08-29 18:26:59 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:27:12 - pico-train - INFO - Step 3350 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:27:12 - pico-train - INFO - โ”œโ”€โ”€ Loss: 8.1471
2025-08-29 18:27:12 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.09e-05
2025-08-29 18:27:12 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:27:25 - pico-train - INFO - Step 3375 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:27:25 - pico-train - INFO - โ”œโ”€โ”€ Loss: 8.0908
2025-08-29 18:27:25 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.11e-05
2025-08-29 18:27:25 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:27:37 - pico-train - INFO - Step 3400 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:27:37 - pico-train - INFO - โ”œโ”€โ”€ Loss: 8.1165
2025-08-29 18:27:37 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.13e-05
2025-08-29 18:27:37 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:27:50 - pico-train - INFO - Step 3425 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:27:50 - pico-train - INFO - โ”œโ”€โ”€ Loss: 8.0957
2025-08-29 18:27:50 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.14e-05
2025-08-29 18:27:50 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:28:02 - pico-train - INFO - Step 3450 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:28:02 - pico-train - INFO - โ”œโ”€โ”€ Loss: 8.1115
2025-08-29 18:28:02 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.16e-05
2025-08-29 18:28:02 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:28:15 - pico-train - INFO - Step 3475 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:28:15 - pico-train - INFO - โ”œโ”€โ”€ Loss: 8.0623
2025-08-29 18:28:15 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.17e-05
2025-08-29 18:28:15 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:28:27 - pico-train - INFO - Step 3500 -- ๐Ÿ’พ Saving Checkpoint
2025-08-29 18:30:20 - pico-train - INFO - Step 3500 -- ๐Ÿ“Š Evaluation Results
2025-08-29 18:30:20 - pico-train - INFO - โ””โ”€โ”€ paloma: 2.0436832271954907e+19
2025-08-29 18:30:23 - pico-train - INFO - Step 3500 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:30:23 - pico-train - INFO - โ”œโ”€โ”€ Loss: 8.0527
2025-08-29 18:30:23 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.19e-05
2025-08-29 18:30:23 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:30:23 - pico-train - INFO - Step 3500 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-29 18:30:37 - pico-train - INFO - Step 3525 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:30:37 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.9975
2025-08-29 18:30:37 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.20e-05
2025-08-29 18:30:37 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:30:50 - pico-train - INFO - Step 3550 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:30:50 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.9881
2025-08-29 18:30:50 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.22e-05
2025-08-29 18:30:50 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:31:03 - pico-train - INFO - Step 3575 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:31:03 - pico-train - INFO - โ”œโ”€โ”€ Loss: 8.0060
2025-08-29 18:31:03 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.23e-05
2025-08-29 18:31:03 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:31:15 - pico-train - INFO - Step 3600 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:31:15 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.9366
2025-08-29 18:31:15 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.25e-05
2025-08-29 18:31:15 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:31:28 - pico-train - INFO - Step 3625 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:31:28 - pico-train - INFO - โ”œโ”€โ”€ Loss: 8.0252
2025-08-29 18:31:28 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.27e-05
2025-08-29 18:31:28 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:31:40 - pico-train - INFO - Step 3650 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:31:40 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.9160
2025-08-29 18:31:40 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.28e-05
2025-08-29 18:31:40 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:31:53 - pico-train - INFO - Step 3675 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:31:53 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.9470
2025-08-29 18:31:53 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.30e-05
2025-08-29 18:31:53 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:32:05 - pico-train - INFO - Step 3700 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:32:05 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.8943
2025-08-29 18:32:05 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.31e-05
2025-08-29 18:32:05 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:32:18 - pico-train - INFO - Step 3725 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:32:18 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.8951
2025-08-29 18:32:18 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.33e-05
2025-08-29 18:32:18 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:32:31 - pico-train - INFO - Step 3750 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:32:31 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.9316
2025-08-29 18:32:31 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.34e-05
2025-08-29 18:32:31 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:32:43 - pico-train - INFO - Step 3775 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:32:43 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.9407
2025-08-29 18:32:43 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.36e-05
2025-08-29 18:32:43 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:32:56 - pico-train - INFO - Step 3800 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:32:56 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.9385
2025-08-29 18:32:56 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.38e-05
2025-08-29 18:32:56 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:33:08 - pico-train - INFO - Step 3825 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:33:08 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.8800
2025-08-29 18:33:08 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.39e-05
2025-08-29 18:33:08 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:33:21 - pico-train - INFO - Step 3850 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:33:21 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.9207
2025-08-29 18:33:21 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.41e-05
2025-08-29 18:33:21 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:33:33 - pico-train - INFO - Step 3875 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:33:33 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.8258
2025-08-29 18:33:33 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.42e-05
2025-08-29 18:33:33 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:33:46 - pico-train - INFO - Step 3900 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:33:46 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.9005
2025-08-29 18:33:46 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.44e-05
2025-08-29 18:33:46 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:33:59 - pico-train - INFO - Step 3925 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:33:59 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.8232
2025-08-29 18:33:59 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.45e-05
2025-08-29 18:33:59 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:34:11 - pico-train - INFO - Step 3950 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:34:11 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.7847
2025-08-29 18:34:11 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.47e-05
2025-08-29 18:34:11 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:34:24 - pico-train - INFO - Step 3975 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:34:24 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.7909
2025-08-29 18:34:24 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.48e-05
2025-08-29 18:34:24 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:34:36 - pico-train - INFO - Step 4000 -- ๐Ÿ’พ Saving Checkpoint
2025-08-29 18:36:31 - pico-train - INFO - Step 4000 -- ๐Ÿ“Š Evaluation Results
2025-08-29 18:36:31 - pico-train - INFO - โ””โ”€โ”€ paloma: 4.1410268232311005e+19
2025-08-29 18:36:34 - pico-train - INFO - Step 4000 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:36:34 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.7419
2025-08-29 18:36:34 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.50e-05
2025-08-29 18:36:34 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:36:34 - pico-train - INFO - Step 4000 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-29 18:36:48 - pico-train - INFO - Step 4025 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:36:48 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.8031
2025-08-29 18:36:48 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.52e-05
2025-08-29 18:36:48 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:37:01 - pico-train - INFO - Step 4050 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:37:01 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.7948
2025-08-29 18:37:01 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.53e-05
2025-08-29 18:37:01 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:37:14 - pico-train - INFO - Step 4075 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:37:14 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.7259
2025-08-29 18:37:14 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.55e-05
2025-08-29 18:37:14 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:37:26 - pico-train - INFO - Step 4100 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:37:26 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.8406
2025-08-29 18:37:26 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.56e-05
2025-08-29 18:37:26 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:37:39 - pico-train - INFO - Step 4125 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:37:39 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.7938
2025-08-29 18:37:39 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.58e-05
2025-08-29 18:37:39 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:37:51 - pico-train - INFO - Step 4150 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:37:51 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.7101
2025-08-29 18:37:51 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.59e-05
2025-08-29 18:37:51 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:38:04 - pico-train - INFO - Step 4175 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:38:04 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.6633
2025-08-29 18:38:04 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.61e-05
2025-08-29 18:38:04 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:38:17 - pico-train - INFO - Step 4200 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:38:17 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.6830
2025-08-29 18:38:17 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.63e-05
2025-08-29 18:38:17 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:38:29 - pico-train - INFO - Step 4225 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:38:29 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.7106
2025-08-29 18:38:29 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.64e-05
2025-08-29 18:38:29 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:38:42 - pico-train - INFO - Step 4250 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:38:42 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.7174
2025-08-29 18:38:42 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.66e-05
2025-08-29 18:38:42 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:38:54 - pico-train - INFO - Step 4275 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:38:54 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.7508
2025-08-29 18:38:54 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.67e-05
2025-08-29 18:38:54 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:39:07 - pico-train - INFO - Step 4300 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:39:07 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.6831
2025-08-29 18:39:07 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.69e-05
2025-08-29 18:39:07 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:39:19 - pico-train - INFO - Step 4325 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:39:19 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.6498
2025-08-29 18:39:19 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.70e-05
2025-08-29 18:39:19 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:39:32 - pico-train - INFO - Step 4350 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:39:32 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.6668
2025-08-29 18:39:32 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.72e-05
2025-08-29 18:39:32 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:39:45 - pico-train - INFO - Step 4375 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:39:45 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.6852
2025-08-29 18:39:45 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.73e-05
2025-08-29 18:39:45 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:39:57 - pico-train - INFO - Step 4400 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:39:57 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.6469
2025-08-29 18:39:57 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.75e-05
2025-08-29 18:39:57 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:40:10 - pico-train - INFO - Step 4425 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:40:10 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.7448
2025-08-29 18:40:10 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.77e-05
2025-08-29 18:40:10 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:40:22 - pico-train - INFO - Step 4450 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:40:22 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.7422
2025-08-29 18:40:22 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.78e-05
2025-08-29 18:40:22 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:40:35 - pico-train - INFO - Step 4475 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:40:35 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.6918
2025-08-29 18:40:35 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.80e-05
2025-08-29 18:40:35 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:40:47 - pico-train - INFO - Step 4500 -- ๐Ÿ’พ Saving Checkpoint
2025-08-29 18:42:40 - pico-train - INFO - Step 4500 -- ๐Ÿ“Š Evaluation Results
2025-08-29 18:42:40 - pico-train - INFO - โ””โ”€โ”€ paloma: 3.4524340411684053e+19
2025-08-29 18:42:43 - pico-train - INFO - Step 4500 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:42:43 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.7084
2025-08-29 18:42:43 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.81e-05
2025-08-29 18:42:43 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:42:43 - pico-train - INFO - Step 4500 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-29 18:42:57 - pico-train - INFO - Step 4525 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:42:57 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.7220
2025-08-29 18:42:57 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.83e-05
2025-08-29 18:42:57 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:43:10 - pico-train - INFO - Step 4550 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:43:10 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.6893
2025-08-29 18:43:10 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.84e-05
2025-08-29 18:43:10 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:43:22 - pico-train - INFO - Step 4575 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:43:22 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.6454
2025-08-29 18:43:22 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.86e-05
2025-08-29 18:43:22 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:43:35 - pico-train - INFO - Step 4600 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:43:35 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.6298
2025-08-29 18:43:35 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.87e-05
2025-08-29 18:43:35 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:43:48 - pico-train - INFO - Step 4625 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:43:48 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.6420
2025-08-29 18:43:48 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.89e-05
2025-08-29 18:43:48 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:44:00 - pico-train - INFO - Step 4650 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:44:00 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.6247
2025-08-29 18:44:00 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.91e-05
2025-08-29 18:44:00 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:44:13 - pico-train - INFO - Step 4675 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:44:13 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.6448
2025-08-29 18:44:13 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.92e-05
2025-08-29 18:44:13 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:44:25 - pico-train - INFO - Step 4700 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:44:25 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.6506
2025-08-29 18:44:25 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.94e-05
2025-08-29 18:44:25 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:44:38 - pico-train - INFO - Step 4725 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:44:38 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.6356
2025-08-29 18:44:38 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.95e-05
2025-08-29 18:44:38 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:44:51 - pico-train - INFO - Step 4750 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:44:51 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.6426
2025-08-29 18:44:51 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.97e-05
2025-08-29 18:44:51 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:45:03 - pico-train - INFO - Step 4775 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:45:03 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.6388
2025-08-29 18:45:03 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.98e-05
2025-08-29 18:45:03 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:45:16 - pico-train - INFO - Step 4800 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:45:16 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.5216
2025-08-29 18:45:16 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.00e-05
2025-08-29 18:45:16 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:45:28 - pico-train - INFO - Step 4825 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:45:28 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.5367
2025-08-29 18:45:28 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.02e-05
2025-08-29 18:45:28 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:45:41 - pico-train - INFO - Step 4850 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:45:41 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.5084
2025-08-29 18:45:41 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.03e-05
2025-08-29 18:45:41 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:45:54 - pico-train - INFO - Step 4875 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:45:54 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.6092
2025-08-29 18:45:54 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.05e-05
2025-08-29 18:45:54 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:46:06 - pico-train - INFO - Step 4900 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:46:06 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.5760
2025-08-29 18:46:06 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.06e-05
2025-08-29 18:46:06 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:46:19 - pico-train - INFO - Step 4925 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:46:19 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.5686
2025-08-29 18:46:19 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.08e-05
2025-08-29 18:46:19 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:46:31 - pico-train - INFO - Step 4950 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:46:31 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.5583
2025-08-29 18:46:31 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.09e-05
2025-08-29 18:46:31 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:46:44 - pico-train - INFO - Step 4975 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:46:44 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.5818
2025-08-29 18:46:44 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.11e-05
2025-08-29 18:46:44 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:46:56 - pico-train - INFO - Step 5000 -- ๐Ÿ’พ Saving Checkpoint
2025-08-29 18:49:02 - pico-train - INFO - Step 5000 -- ๐Ÿ“Š Evaluation Results
2025-08-29 18:49:02 - pico-train - INFO - โ””โ”€โ”€ paloma: 2.320698426399461e+19
2025-08-29 18:49:04 - pico-train - INFO - Step 5000 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:49:04 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.6004
2025-08-29 18:49:04 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.13e-05
2025-08-29 18:49:04 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:49:04 - pico-train - INFO - Step 5000 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-29 18:49:18 - pico-train - INFO - Step 5025 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:49:18 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.5371
2025-08-29 18:49:18 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.14e-05
2025-08-29 18:49:18 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:49:31 - pico-train - INFO - Step 5050 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:49:31 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.5179
2025-08-29 18:49:31 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.16e-05
2025-08-29 18:49:31 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:49:44 - pico-train - INFO - Step 5075 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:49:44 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.5255
2025-08-29 18:49:44 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.17e-05
2025-08-29 18:49:44 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:49:56 - pico-train - INFO - Step 5100 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:49:56 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.5155
2025-08-29 18:49:56 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.19e-05
2025-08-29 18:49:56 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:50:10 - pico-train - INFO - Step 5125 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:50:10 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.5660
2025-08-29 18:50:10 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.20e-05
2025-08-29 18:50:10 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:50:22 - pico-train - INFO - Step 5150 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:50:22 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.4797
2025-08-29 18:50:22 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.22e-05
2025-08-29 18:50:22 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:50:35 - pico-train - INFO - Step 5175 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:50:35 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.6224
2025-08-29 18:50:35 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.23e-05
2025-08-29 18:50:35 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:50:48 - pico-train - INFO - Step 5200 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:50:48 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.4821
2025-08-29 18:50:48 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.25e-05
2025-08-29 18:50:48 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:51:00 - pico-train - INFO - Step 5225 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:51:00 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.4765
2025-08-29 18:51:00 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.27e-05
2025-08-29 18:51:00 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:51:13 - pico-train - INFO - Step 5250 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:51:13 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.4680
2025-08-29 18:51:13 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.28e-05
2025-08-29 18:51:13 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:51:26 - pico-train - INFO - Step 5275 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:51:26 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.5165
2025-08-29 18:51:26 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.30e-05
2025-08-29 18:51:26 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:51:38 - pico-train - INFO - Step 5300 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:51:38 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.5334
2025-08-29 18:51:38 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.31e-05
2025-08-29 18:51:38 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:51:51 - pico-train - INFO - Step 5325 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:51:51 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.5053
2025-08-29 18:51:51 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.33e-05
2025-08-29 18:51:51 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:52:04 - pico-train - INFO - Step 5350 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:52:04 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.5115
2025-08-29 18:52:04 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.34e-05
2025-08-29 18:52:04 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:52:16 - pico-train - INFO - Step 5375 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:52:16 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.4736
2025-08-29 18:52:16 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.36e-05
2025-08-29 18:52:16 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:52:29 - pico-train - INFO - Step 5400 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:52:29 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.4520
2025-08-29 18:52:29 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.38e-05
2025-08-29 18:52:29 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:52:41 - pico-train - INFO - Step 5425 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:52:41 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.4596
2025-08-29 18:52:41 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.39e-05
2025-08-29 18:52:41 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:52:54 - pico-train - INFO - Step 5450 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:52:54 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.4518
2025-08-29 18:52:54 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.41e-05
2025-08-29 18:52:54 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:53:06 - pico-train - INFO - Step 5475 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:53:06 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.4308
2025-08-29 18:53:06 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.42e-05
2025-08-29 18:53:06 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:53:18 - pico-train - INFO - Step 5500 -- ๐Ÿ’พ Saving Checkpoint
2025-08-29 18:55:23 - pico-train - INFO - Step 5500 -- ๐Ÿ“Š Evaluation Results
2025-08-29 18:55:23 - pico-train - INFO - โ””โ”€โ”€ paloma: 3.1834097890526753e+19
2025-08-29 18:55:25 - pico-train - INFO - Step 5500 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:55:25 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.4627
2025-08-29 18:55:25 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.44e-05
2025-08-29 18:55:25 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:55:25 - pico-train - INFO - Step 5500 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-29 18:55:39 - pico-train - INFO - Step 5525 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:55:39 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.4095
2025-08-29 18:55:39 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.45e-05
2025-08-29 18:55:39 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:55:52 - pico-train - INFO - Step 5550 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:55:52 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.4423
2025-08-29 18:55:52 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.47e-05
2025-08-29 18:55:52 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:56:04 - pico-train - INFO - Step 5575 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:56:04 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.4600
2025-08-29 18:56:04 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.48e-05
2025-08-29 18:56:04 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:56:17 - pico-train - INFO - Step 5600 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:56:17 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.3457
2025-08-29 18:56:17 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.50e-05
2025-08-29 18:56:17 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:56:31 - pico-train - INFO - Step 5625 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:56:31 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.4838
2025-08-29 18:56:31 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.52e-05
2025-08-29 18:56:31 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:56:43 - pico-train - INFO - Step 5650 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:56:43 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.4556
2025-08-29 18:56:43 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.53e-05
2025-08-29 18:56:43 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:56:56 - pico-train - INFO - Step 5675 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:56:56 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.4220
2025-08-29 18:56:56 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.55e-05
2025-08-29 18:56:56 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:57:09 - pico-train - INFO - Step 5700 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:57:09 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.4307
2025-08-29 18:57:09 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.56e-05
2025-08-29 18:57:09 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:57:21 - pico-train - INFO - Step 5725 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:57:21 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.3795
2025-08-29 18:57:21 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.58e-05
2025-08-29 18:57:21 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:57:34 - pico-train - INFO - Step 5750 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:57:34 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.3855
2025-08-29 18:57:34 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.59e-05
2025-08-29 18:57:34 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:57:47 - pico-train - INFO - Step 5775 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:57:47 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.3518
2025-08-29 18:57:47 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.61e-05
2025-08-29 18:57:47 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:57:59 - pico-train - INFO - Step 5800 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:57:59 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.3794
2025-08-29 18:57:59 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.63e-05
2025-08-29 18:57:59 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:58:12 - pico-train - INFO - Step 5825 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:58:12 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.3591
2025-08-29 18:58:12 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.64e-05
2025-08-29 18:58:12 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:58:24 - pico-train - INFO - Step 5850 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:58:24 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.3489
2025-08-29 18:58:24 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.66e-05
2025-08-29 18:58:24 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:58:37 - pico-train - INFO - Step 5875 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:58:37 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.4108
2025-08-29 18:58:37 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.67e-05
2025-08-29 18:58:37 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:58:50 - pico-train - INFO - Step 5900 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:58:50 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.3580
2025-08-29 18:58:50 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.69e-05
2025-08-29 18:58:50 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:59:02 - pico-train - INFO - Step 5925 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:59:02 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.3131
2025-08-29 18:59:02 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.70e-05
2025-08-29 18:59:02 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:59:15 - pico-train - INFO - Step 5950 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:59:15 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.2905
2025-08-29 18:59:15 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.72e-05
2025-08-29 18:59:15 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:59:27 - pico-train - INFO - Step 5975 -- ๐Ÿ”„ Training Metrics
2025-08-29 18:59:27 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.3466
2025-08-29 18:59:27 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.73e-05
2025-08-29 18:59:27 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 18:59:40 - pico-train - INFO - Step 6000 -- ๐Ÿ’พ Saving Checkpoint
2025-08-29 19:01:34 - pico-train - INFO - Step 6000 -- ๐Ÿ“Š Evaluation Results
2025-08-29 19:01:34 - pico-train - INFO - โ””โ”€โ”€ paloma: 4.457139025979801e+19
2025-08-29 19:01:35 - pico-train - INFO - Step 6000 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:01:35 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.3765
2025-08-29 19:01:35 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.75e-05
2025-08-29 19:01:35 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:01:35 - pico-train - INFO - Step 6000 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-29 19:01:49 - pico-train - INFO - Step 6025 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:01:49 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.2870
2025-08-29 19:01:49 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.77e-05
2025-08-29 19:01:49 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:02:02 - pico-train - INFO - Step 6050 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:02:02 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.3333
2025-08-29 19:02:02 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.78e-05
2025-08-29 19:02:02 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:02:14 - pico-train - INFO - Step 6075 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:02:14 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.3098
2025-08-29 19:02:14 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.80e-05
2025-08-29 19:02:14 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:02:27 - pico-train - INFO - Step 6100 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:02:27 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.2594
2025-08-29 19:02:27 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.81e-05
2025-08-29 19:02:27 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:02:40 - pico-train - INFO - Step 6125 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:02:40 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.3327
2025-08-29 19:02:40 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.83e-05
2025-08-29 19:02:40 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:02:53 - pico-train - INFO - Step 6150 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:02:53 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.3030
2025-08-29 19:02:53 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.84e-05
2025-08-29 19:02:53 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:03:05 - pico-train - INFO - Step 6175 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:03:05 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.2523
2025-08-29 19:03:05 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.86e-05
2025-08-29 19:03:05 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:03:18 - pico-train - INFO - Step 6200 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:03:18 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.2546
2025-08-29 19:03:18 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.87e-05
2025-08-29 19:03:18 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:03:30 - pico-train - INFO - Step 6225 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:03:30 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.3242
2025-08-29 19:03:30 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.89e-05
2025-08-29 19:03:30 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:03:43 - pico-train - INFO - Step 6250 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:03:43 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.2035
2025-08-29 19:03:43 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.91e-05
2025-08-29 19:03:43 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:03:56 - pico-train - INFO - Step 6275 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:03:56 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.2334
2025-08-29 19:03:56 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.92e-05
2025-08-29 19:03:56 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:04:08 - pico-train - INFO - Step 6300 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:04:08 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.2295
2025-08-29 19:04:08 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.94e-05
2025-08-29 19:04:08 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:04:21 - pico-train - INFO - Step 6325 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:04:21 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.3051
2025-08-29 19:04:21 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.95e-05
2025-08-29 19:04:21 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:04:34 - pico-train - INFO - Step 6350 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:04:34 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.3188
2025-08-29 19:04:34 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.97e-05
2025-08-29 19:04:34 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:04:46 - pico-train - INFO - Step 6375 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:04:46 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.3212
2025-08-29 19:04:46 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.98e-05
2025-08-29 19:04:46 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:04:59 - pico-train - INFO - Step 6400 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:04:59 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.2465
2025-08-29 19:04:59 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.00e-05
2025-08-29 19:04:59 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:05:12 - pico-train - INFO - Step 6425 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:05:12 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.2081
2025-08-29 19:05:12 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.02e-05
2025-08-29 19:05:12 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:05:24 - pico-train - INFO - Step 6450 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:05:24 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.2852
2025-08-29 19:05:24 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.03e-05
2025-08-29 19:05:24 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:05:37 - pico-train - INFO - Step 6475 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:05:37 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.2074
2025-08-29 19:05:37 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.05e-05
2025-08-29 19:05:37 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:05:49 - pico-train - INFO - Step 6500 -- ๐Ÿ’พ Saving Checkpoint
2025-08-29 19:07:49 - pico-train - INFO - Step 6500 -- ๐Ÿ“Š Evaluation Results
2025-08-29 19:07:49 - pico-train - INFO - โ””โ”€โ”€ paloma: 7.3062353841856406e+19
2025-08-29 19:07:50 - pico-train - INFO - Step 6500 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:07:50 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.2520
2025-08-29 19:07:50 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.06e-05
2025-08-29 19:07:50 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:07:50 - pico-train - INFO - Step 6500 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-29 19:08:05 - pico-train - INFO - Step 6525 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:08:05 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.2115
2025-08-29 19:08:05 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.08e-05
2025-08-29 19:08:05 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:08:18 - pico-train - INFO - Step 6550 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:08:18 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.2435
2025-08-29 19:08:18 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.09e-05
2025-08-29 19:08:18 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:08:30 - pico-train - INFO - Step 6575 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:08:30 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.1962
2025-08-29 19:08:30 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.11e-05
2025-08-29 19:08:30 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:08:43 - pico-train - INFO - Step 6600 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:08:43 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.1631
2025-08-29 19:08:43 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.12e-05
2025-08-29 19:08:43 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:08:56 - pico-train - INFO - Step 6625 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:08:56 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.2525
2025-08-29 19:08:56 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.14e-05
2025-08-29 19:08:56 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:09:09 - pico-train - INFO - Step 6650 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:09:09 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.2133
2025-08-29 19:09:09 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.16e-05
2025-08-29 19:09:09 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:09:21 - pico-train - INFO - Step 6675 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:09:21 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.2248
2025-08-29 19:09:21 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.17e-05
2025-08-29 19:09:21 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:09:34 - pico-train - INFO - Step 6700 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:09:34 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.1928
2025-08-29 19:09:34 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.19e-05
2025-08-29 19:09:34 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:09:46 - pico-train - INFO - Step 6725 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:09:46 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.1698
2025-08-29 19:09:46 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.20e-05
2025-08-29 19:09:46 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:09:59 - pico-train - INFO - Step 6750 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:09:59 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.3037
2025-08-29 19:09:59 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.22e-05
2025-08-29 19:09:59 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:10:11 - pico-train - INFO - Step 6775 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:10:11 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.2451
2025-08-29 19:10:11 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.23e-05
2025-08-29 19:10:11 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:10:24 - pico-train - INFO - Step 6800 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:10:24 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.1373
2025-08-29 19:10:24 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.25e-05
2025-08-29 19:10:24 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:10:37 - pico-train - INFO - Step 6825 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:10:37 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.1390
2025-08-29 19:10:37 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.27e-05
2025-08-29 19:10:37 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:10:49 - pico-train - INFO - Step 6850 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:10:49 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.1296
2025-08-29 19:10:49 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.28e-05
2025-08-29 19:10:49 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:11:02 - pico-train - INFO - Step 6875 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:11:02 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.0961
2025-08-29 19:11:02 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.30e-05
2025-08-29 19:11:02 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:11:14 - pico-train - INFO - Step 6900 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:11:14 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.1408
2025-08-29 19:11:14 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.31e-05
2025-08-29 19:11:14 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:11:27 - pico-train - INFO - Step 6925 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:11:27 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.1852
2025-08-29 19:11:27 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.33e-05
2025-08-29 19:11:27 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:11:39 - pico-train - INFO - Step 6950 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:11:39 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.2067
2025-08-29 19:11:39 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.34e-05
2025-08-29 19:11:39 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:11:52 - pico-train - INFO - Step 6975 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:11:52 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.0681
2025-08-29 19:11:52 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.36e-05
2025-08-29 19:11:52 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:12:04 - pico-train - INFO - Step 7000 -- ๐Ÿ’พ Saving Checkpoint
2025-08-29 19:14:06 - pico-train - INFO - Step 7000 -- ๐Ÿ“Š Evaluation Results
2025-08-29 19:14:06 - pico-train - INFO - โ””โ”€โ”€ paloma: 1.2357969480287024e+20
2025-08-29 19:14:08 - pico-train - INFO - Step 7000 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:14:08 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.1813
2025-08-29 19:14:08 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.37e-05
2025-08-29 19:14:08 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:14:08 - pico-train - INFO - Step 7000 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-29 19:14:22 - pico-train - INFO - Step 7025 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:14:22 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.1992
2025-08-29 19:14:22 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.39e-05
2025-08-29 19:14:22 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:14:34 - pico-train - INFO - Step 7050 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:14:34 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.1409
2025-08-29 19:14:34 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.41e-05
2025-08-29 19:14:34 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:14:47 - pico-train - INFO - Step 7075 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:14:47 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.1271
2025-08-29 19:14:47 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.42e-05
2025-08-29 19:14:47 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:15:00 - pico-train - INFO - Step 7100 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:15:00 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.1720
2025-08-29 19:15:00 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.44e-05
2025-08-29 19:15:00 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:15:17 - pico-train - INFO - Step 7125 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:15:17 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.1515
2025-08-29 19:15:17 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.45e-05
2025-08-29 19:15:17 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:15:30 - pico-train - INFO - Step 7150 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:15:30 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.0898
2025-08-29 19:15:30 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.47e-05
2025-08-29 19:15:30 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:15:42 - pico-train - INFO - Step 7175 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:15:42 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.0996
2025-08-29 19:15:42 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.48e-05
2025-08-29 19:15:42 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:15:55 - pico-train - INFO - Step 7200 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:15:55 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.0610
2025-08-29 19:15:55 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.50e-05
2025-08-29 19:15:55 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:16:07 - pico-train - INFO - Step 7225 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:16:07 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.1939
2025-08-29 19:16:07 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.52e-05
2025-08-29 19:16:07 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:16:20 - pico-train - INFO - Step 7250 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:16:20 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.0355
2025-08-29 19:16:20 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.53e-05
2025-08-29 19:16:20 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:16:32 - pico-train - INFO - Step 7275 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:16:32 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.0935
2025-08-29 19:16:32 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.55e-05
2025-08-29 19:16:32 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:16:45 - pico-train - INFO - Step 7300 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:16:45 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.0689
2025-08-29 19:16:45 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.56e-05
2025-08-29 19:16:45 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:16:57 - pico-train - INFO - Step 7325 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:16:57 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.0265
2025-08-29 19:16:57 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.58e-05
2025-08-29 19:16:57 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:17:10 - pico-train - INFO - Step 7350 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:17:10 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.0963
2025-08-29 19:17:10 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.59e-05
2025-08-29 19:17:10 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:17:23 - pico-train - INFO - Step 7375 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:17:23 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.1138
2025-08-29 19:17:23 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.61e-05
2025-08-29 19:17:23 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:17:35 - pico-train - INFO - Step 7400 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:17:35 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.0414
2025-08-29 19:17:35 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.63e-05
2025-08-29 19:17:35 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:17:48 - pico-train - INFO - Step 7425 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:17:48 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.0753
2025-08-29 19:17:48 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.64e-05
2025-08-29 19:17:48 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:18:00 - pico-train - INFO - Step 7450 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:18:00 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.0603
2025-08-29 19:18:00 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.66e-05
2025-08-29 19:18:00 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:18:13 - pico-train - INFO - Step 7475 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:18:13 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.0818
2025-08-29 19:18:13 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.67e-05
2025-08-29 19:18:13 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:18:25 - pico-train - INFO - Step 7500 -- ๐Ÿ’พ Saving Checkpoint
2025-08-29 19:21:05 - pico-train - INFO - Step 7500 -- ๐Ÿ“Š Evaluation Results
2025-08-29 19:21:05 - pico-train - INFO - โ””โ”€โ”€ paloma: 2.7199371732053928e+20
2025-08-29 19:21:07 - pico-train - INFO - Step 7500 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:21:07 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.0788
2025-08-29 19:21:07 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.69e-05
2025-08-29 19:21:07 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:21:07 - pico-train - INFO - Step 7500 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-29 19:21:42 - pico-train - INFO - Step 7525 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:21:42 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.9952
2025-08-29 19:21:42 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.70e-05
2025-08-29 19:21:42 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:22:00 - pico-train - INFO - Step 7550 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:22:00 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.0114
2025-08-29 19:22:00 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.72e-05
2025-08-29 19:22:00 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:22:12 - pico-train - INFO - Step 7575 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:22:12 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.0611
2025-08-29 19:22:12 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.73e-05
2025-08-29 19:22:12 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:22:25 - pico-train - INFO - Step 7600 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:22:25 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.0057
2025-08-29 19:22:25 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.75e-05
2025-08-29 19:22:25 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:22:38 - pico-train - INFO - Step 7625 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:22:38 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.0182
2025-08-29 19:22:38 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.77e-05
2025-08-29 19:22:38 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:22:51 - pico-train - INFO - Step 7650 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:22:51 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.0271
2025-08-29 19:22:51 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.78e-05
2025-08-29 19:22:51 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:23:03 - pico-train - INFO - Step 7675 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:23:03 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.0817
2025-08-29 19:23:03 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.80e-05
2025-08-29 19:23:03 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:23:16 - pico-train - INFO - Step 7700 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:23:16 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.0859
2025-08-29 19:23:16 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.81e-05
2025-08-29 19:23:16 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:23:28 - pico-train - INFO - Step 7725 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:23:28 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.9859
2025-08-29 19:23:28 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.83e-05
2025-08-29 19:23:28 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:23:41 - pico-train - INFO - Step 7750 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:23:41 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.0380
2025-08-29 19:23:41 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.84e-05
2025-08-29 19:23:41 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:23:54 - pico-train - INFO - Step 7775 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:23:54 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.9784
2025-08-29 19:23:54 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.86e-05
2025-08-29 19:23:54 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:24:06 - pico-train - INFO - Step 7800 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:24:06 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.0304
2025-08-29 19:24:06 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.87e-05
2025-08-29 19:24:06 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:24:19 - pico-train - INFO - Step 7825 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:24:19 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.0000
2025-08-29 19:24:19 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.89e-05
2025-08-29 19:24:19 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:24:31 - pico-train - INFO - Step 7850 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:24:31 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.0159
2025-08-29 19:24:31 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.91e-05
2025-08-29 19:24:31 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:24:44 - pico-train - INFO - Step 7875 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:24:44 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.9859
2025-08-29 19:24:44 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.92e-05
2025-08-29 19:24:44 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:24:56 - pico-train - INFO - Step 7900 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:24:56 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.9348
2025-08-29 19:24:56 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.94e-05
2025-08-29 19:24:56 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:25:09 - pico-train - INFO - Step 7925 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:25:09 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.9541
2025-08-29 19:25:09 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.95e-05
2025-08-29 19:25:09 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:25:22 - pico-train - INFO - Step 7950 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:25:22 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.9342
2025-08-29 19:25:22 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.97e-05
2025-08-29 19:25:22 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:25:34 - pico-train - INFO - Step 7975 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:25:34 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.0294
2025-08-29 19:25:34 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.98e-05
2025-08-29 19:25:34 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:25:46 - pico-train - INFO - Step 8000 -- ๐Ÿ’พ Saving Checkpoint
2025-08-29 19:29:05 - pico-train - INFO - Step 8000 -- ๐Ÿ“Š Evaluation Results
2025-08-29 19:29:05 - pico-train - INFO - โ””โ”€โ”€ paloma: 7.181862506006892e+20
2025-08-29 19:29:08 - pico-train - INFO - Step 8000 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:29:08 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.0412
2025-08-29 19:29:08 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-05
2025-08-29 19:29:08 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:29:08 - pico-train - INFO - Step 8000 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-29 19:29:25 - pico-train - INFO - Step 8025 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:29:25 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.9111
2025-08-29 19:29:25 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-05
2025-08-29 19:29:25 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:29:54 - pico-train - INFO - Step 8050 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:29:54 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.0142
2025-08-29 19:29:54 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-05
2025-08-29 19:29:54 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:30:26 - pico-train - INFO - Step 8075 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:30:26 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.9201
2025-08-29 19:30:26 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-05
2025-08-29 19:30:26 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:30:56 - pico-train - INFO - Step 8100 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:30:56 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.9100
2025-08-29 19:30:56 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-05
2025-08-29 19:30:56 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:31:32 - pico-train - INFO - Step 8125 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:31:32 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.9728
2025-08-29 19:31:32 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-05
2025-08-29 19:31:32 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:32:06 - pico-train - INFO - Step 8150 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:32:06 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.9962
2025-08-29 19:32:06 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-05
2025-08-29 19:32:06 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:32:40 - pico-train - INFO - Step 8175 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:32:40 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.0076
2025-08-29 19:32:40 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-05
2025-08-29 19:32:40 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:33:12 - pico-train - INFO - Step 8200 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:33:12 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.8807
2025-08-29 19:33:12 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-05
2025-08-29 19:33:12 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:33:46 - pico-train - INFO - Step 8225 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:33:46 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.8498
2025-08-29 19:33:46 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-05
2025-08-29 19:33:46 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:34:19 - pico-train - INFO - Step 8250 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:34:19 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.9326
2025-08-29 19:34:19 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.99e-05
2025-08-29 19:34:19 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:34:52 - pico-train - INFO - Step 8275 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:34:52 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.8967
2025-08-29 19:34:52 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.99e-05
2025-08-29 19:34:52 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:35:25 - pico-train - INFO - Step 8300 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:35:25 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.9631
2025-08-29 19:35:25 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.99e-05
2025-08-29 19:35:25 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:35:57 - pico-train - INFO - Step 8325 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:35:57 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.8934
2025-08-29 19:35:57 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.99e-05
2025-08-29 19:35:57 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:36:10 - pico-train - INFO - Step 8350 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:36:10 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.8573
2025-08-29 19:36:10 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.99e-05
2025-08-29 19:36:10 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:36:23 - pico-train - INFO - Step 8375 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:36:23 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.9487
2025-08-29 19:36:23 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.99e-05
2025-08-29 19:36:23 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:36:35 - pico-train - INFO - Step 8400 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:36:35 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.8890
2025-08-29 19:36:35 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.99e-05
2025-08-29 19:36:35 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:36:53 - pico-train - INFO - Step 8425 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:36:53 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.9669
2025-08-29 19:36:53 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.98e-05
2025-08-29 19:36:53 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:37:06 - pico-train - INFO - Step 8450 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:37:06 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.9062
2025-08-29 19:37:06 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.98e-05
2025-08-29 19:37:06 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:37:18 - pico-train - INFO - Step 8475 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:37:18 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.8962
2025-08-29 19:37:18 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.98e-05
2025-08-29 19:37:18 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:37:31 - pico-train - INFO - Step 8500 -- ๐Ÿ’พ Saving Checkpoint
2025-08-29 19:39:47 - pico-train - INFO - Step 8500 -- ๐Ÿ“Š Evaluation Results
2025-08-29 19:39:47 - pico-train - INFO - โ””โ”€โ”€ paloma: 1.544414097062705e+21
2025-08-29 19:39:48 - pico-train - INFO - Step 8500 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:39:48 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.9129
2025-08-29 19:39:48 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.98e-05
2025-08-29 19:39:48 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:39:48 - pico-train - INFO - Step 8500 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-29 19:40:04 - pico-train - INFO - Step 8525 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:40:04 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.8971
2025-08-29 19:40:04 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.98e-05
2025-08-29 19:40:04 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:40:17 - pico-train - INFO - Step 8550 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:40:17 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.8434
2025-08-29 19:40:17 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.97e-05
2025-08-29 19:40:17 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:40:29 - pico-train - INFO - Step 8575 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:40:29 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.8232
2025-08-29 19:40:29 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.97e-05
2025-08-29 19:40:29 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:40:43 - pico-train - INFO - Step 8600 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:40:43 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.9620
2025-08-29 19:40:43 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.97e-05
2025-08-29 19:40:43 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:40:57 - pico-train - INFO - Step 8625 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:40:57 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.8810
2025-08-29 19:40:57 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.97e-05
2025-08-29 19:40:57 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:41:09 - pico-train - INFO - Step 8650 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:41:09 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.8215
2025-08-29 19:41:09 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.96e-05
2025-08-29 19:41:09 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:41:22 - pico-train - INFO - Step 8675 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:41:22 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.8250
2025-08-29 19:41:22 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.96e-05
2025-08-29 19:41:22 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:41:35 - pico-train - INFO - Step 8700 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:41:35 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.9534
2025-08-29 19:41:35 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.96e-05
2025-08-29 19:41:35 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:41:48 - pico-train - INFO - Step 8725 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:41:48 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.8382
2025-08-29 19:41:48 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.96e-05
2025-08-29 19:41:48 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:42:00 - pico-train - INFO - Step 8750 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:42:00 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.8307
2025-08-29 19:42:00 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.95e-05
2025-08-29 19:42:00 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:42:13 - pico-train - INFO - Step 8775 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:42:13 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.8339
2025-08-29 19:42:13 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.95e-05
2025-08-29 19:42:13 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:42:26 - pico-train - INFO - Step 8800 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:42:26 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.8389
2025-08-29 19:42:26 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.95e-05
2025-08-29 19:42:26 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:42:38 - pico-train - INFO - Step 8825 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:42:38 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.8220
2025-08-29 19:42:38 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.94e-05
2025-08-29 19:42:38 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:42:51 - pico-train - INFO - Step 8850 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:42:51 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.7964
2025-08-29 19:42:51 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.94e-05
2025-08-29 19:42:51 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:43:04 - pico-train - INFO - Step 8875 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:43:04 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.7773
2025-08-29 19:43:04 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.93e-05
2025-08-29 19:43:04 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:43:16 - pico-train - INFO - Step 8900 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:43:16 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.8768
2025-08-29 19:43:16 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.93e-05
2025-08-29 19:43:16 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:43:29 - pico-train - INFO - Step 8925 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:43:29 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.8547
2025-08-29 19:43:29 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.93e-05
2025-08-29 19:43:29 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:43:42 - pico-train - INFO - Step 8950 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:43:42 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.8451
2025-08-29 19:43:42 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.92e-05
2025-08-29 19:43:42 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:43:55 - pico-train - INFO - Step 8975 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:43:55 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.8555
2025-08-29 19:43:55 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.92e-05
2025-08-29 19:43:55 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:44:07 - pico-train - INFO - Step 9000 -- ๐Ÿ’พ Saving Checkpoint
2025-08-29 19:46:25 - pico-train - INFO - Step 9000 -- ๐Ÿ“Š Evaluation Results
2025-08-29 19:46:25 - pico-train - INFO - โ””โ”€โ”€ paloma: 3.7651081241501e+21
2025-08-29 19:46:27 - pico-train - INFO - Step 9000 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:46:27 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.8261
2025-08-29 19:46:27 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.91e-05
2025-08-29 19:46:27 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:46:27 - pico-train - INFO - Step 9000 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-29 19:46:43 - pico-train - INFO - Step 9025 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:46:43 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.8269
2025-08-29 19:46:43 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.91e-05
2025-08-29 19:46:43 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:46:55 - pico-train - INFO - Step 9050 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:46:55 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.7945
2025-08-29 19:46:55 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.91e-05
2025-08-29 19:46:55 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:47:08 - pico-train - INFO - Step 9075 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:47:08 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.8263
2025-08-29 19:47:08 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.90e-05
2025-08-29 19:47:08 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:47:21 - pico-train - INFO - Step 9100 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:47:21 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.7599
2025-08-29 19:47:21 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.90e-05
2025-08-29 19:47:21 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:47:35 - pico-train - INFO - Step 9125 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:47:35 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.8176
2025-08-29 19:47:35 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.89e-05
2025-08-29 19:47:35 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:47:47 - pico-train - INFO - Step 9150 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:47:47 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.7519
2025-08-29 19:47:47 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.89e-05
2025-08-29 19:47:47 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:48:00 - pico-train - INFO - Step 9175 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:48:00 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.8001
2025-08-29 19:48:00 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.88e-05
2025-08-29 19:48:00 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:48:12 - pico-train - INFO - Step 9200 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:48:12 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.8289
2025-08-29 19:48:12 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.88e-05
2025-08-29 19:48:12 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:48:25 - pico-train - INFO - Step 9225 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:48:25 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.9078
2025-08-29 19:48:25 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.87e-05
2025-08-29 19:48:25 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:48:37 - pico-train - INFO - Step 9250 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:48:37 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.7747
2025-08-29 19:48:37 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.87e-05
2025-08-29 19:48:37 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:48:50 - pico-train - INFO - Step 9275 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:48:50 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.7717
2025-08-29 19:48:50 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.86e-05
2025-08-29 19:48:50 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:49:02 - pico-train - INFO - Step 9300 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:49:02 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.7096
2025-08-29 19:49:02 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.86e-05
2025-08-29 19:49:02 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:49:15 - pico-train - INFO - Step 9325 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:49:15 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.8895
2025-08-29 19:49:15 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.85e-05
2025-08-29 19:49:15 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:49:28 - pico-train - INFO - Step 9350 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:49:28 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.8665
2025-08-29 19:49:28 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.85e-05
2025-08-29 19:49:28 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:49:40 - pico-train - INFO - Step 9375 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:49:40 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.7131
2025-08-29 19:49:40 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.84e-05
2025-08-29 19:49:40 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:49:53 - pico-train - INFO - Step 9400 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:49:53 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.7064
2025-08-29 19:49:53 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.83e-05
2025-08-29 19:49:53 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:50:06 - pico-train - INFO - Step 9425 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:50:06 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.8320
2025-08-29 19:50:06 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.83e-05
2025-08-29 19:50:06 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:50:18 - pico-train - INFO - Step 9450 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:50:18 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.7871
2025-08-29 19:50:18 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.82e-05
2025-08-29 19:50:18 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:50:31 - pico-train - INFO - Step 9475 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:50:31 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.7753
2025-08-29 19:50:31 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.82e-05
2025-08-29 19:50:31 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:50:43 - pico-train - INFO - Step 9500 -- ๐Ÿ’พ Saving Checkpoint
2025-08-29 19:52:47 - pico-train - INFO - Step 9500 -- ๐Ÿ“Š Evaluation Results
2025-08-29 19:52:47 - pico-train - INFO - โ””โ”€โ”€ paloma: 8.42703670029247e+21
2025-08-29 19:52:48 - pico-train - INFO - Step 9500 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:52:48 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.7261
2025-08-29 19:52:48 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.81e-05
2025-08-29 19:52:48 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:52:48 - pico-train - INFO - Step 9500 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-29 19:53:04 - pico-train - INFO - Step 9525 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:53:04 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.7447
2025-08-29 19:53:04 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.80e-05
2025-08-29 19:53:04 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:53:16 - pico-train - INFO - Step 9550 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:53:16 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.6755
2025-08-29 19:53:16 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.80e-05
2025-08-29 19:53:16 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:53:29 - pico-train - INFO - Step 9575 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:53:29 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.7466
2025-08-29 19:53:29 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.79e-05
2025-08-29 19:53:29 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:53:42 - pico-train - INFO - Step 9600 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:53:42 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.8755
2025-08-29 19:53:42 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.78e-05
2025-08-29 19:53:42 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:53:55 - pico-train - INFO - Step 9625 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:53:55 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.8187
2025-08-29 19:53:55 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.78e-05
2025-08-29 19:53:55 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:54:08 - pico-train - INFO - Step 9650 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:54:08 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.7354
2025-08-29 19:54:08 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.77e-05
2025-08-29 19:54:08 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:54:21 - pico-train - INFO - Step 9675 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:54:21 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.6726
2025-08-29 19:54:21 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.76e-05
2025-08-29 19:54:21 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:54:34 - pico-train - INFO - Step 9700 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:54:34 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.7547
2025-08-29 19:54:34 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.76e-05
2025-08-29 19:54:34 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:54:46 - pico-train - INFO - Step 9725 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:54:46 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.6679
2025-08-29 19:54:46 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.75e-05
2025-08-29 19:54:46 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:54:59 - pico-train - INFO - Step 9750 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:54:59 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.7332
2025-08-29 19:54:59 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.74e-05
2025-08-29 19:54:59 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:55:12 - pico-train - INFO - Step 9775 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:55:12 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.7595
2025-08-29 19:55:12 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.73e-05
2025-08-29 19:55:12 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:55:25 - pico-train - INFO - Step 9800 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:55:25 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.7183
2025-08-29 19:55:25 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.73e-05
2025-08-29 19:55:25 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:55:38 - pico-train - INFO - Step 9825 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:55:38 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.7211
2025-08-29 19:55:38 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.72e-05
2025-08-29 19:55:38 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:55:51 - pico-train - INFO - Step 9850 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:55:51 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.7175
2025-08-29 19:55:51 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.71e-05
2025-08-29 19:55:51 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:56:04 - pico-train - INFO - Step 9875 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:56:04 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.8444
2025-08-29 19:56:04 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.70e-05
2025-08-29 19:56:04 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:56:17 - pico-train - INFO - Step 9900 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:56:17 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.7806
2025-08-29 19:56:17 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.70e-05
2025-08-29 19:56:17 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:56:30 - pico-train - INFO - Step 9925 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:56:30 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.7188
2025-08-29 19:56:30 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.69e-05
2025-08-29 19:56:30 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:56:43 - pico-train - INFO - Step 9950 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:56:43 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.7352
2025-08-29 19:56:43 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.68e-05
2025-08-29 19:56:43 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:56:56 - pico-train - INFO - Step 9975 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:56:56 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.6457
2025-08-29 19:56:56 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.67e-05
2025-08-29 19:56:56 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:57:08 - pico-train - INFO - Step 10000 -- ๐Ÿ’พ Saving Checkpoint
2025-08-29 19:59:31 - pico-train - INFO - Step 10000 -- ๐Ÿ“Š Evaluation Results
2025-08-29 19:59:31 - pico-train - INFO - โ””โ”€โ”€ paloma: 1.2602927078762317e+22
2025-08-29 19:59:32 - pico-train - INFO - Step 10000 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:59:32 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.6998
2025-08-29 19:59:32 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.67e-05
2025-08-29 19:59:32 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 19:59:32 - pico-train - INFO - Step 10000 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-29 19:59:47 - pico-train - INFO - Step 10025 -- ๐Ÿ”„ Training Metrics
2025-08-29 19:59:47 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.7628
2025-08-29 19:59:47 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.66e-05
2025-08-29 19:59:47 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:00:00 - pico-train - INFO - Step 10050 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:00:00 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.6822
2025-08-29 20:00:00 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.65e-05
2025-08-29 20:00:00 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:00:12 - pico-train - INFO - Step 10075 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:00:12 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.6664
2025-08-29 20:00:12 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.64e-05
2025-08-29 20:00:12 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:00:25 - pico-train - INFO - Step 10100 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:00:25 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.6829
2025-08-29 20:00:25 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.63e-05
2025-08-29 20:00:25 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:00:39 - pico-train - INFO - Step 10125 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:00:39 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.6947
2025-08-29 20:00:39 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.62e-05
2025-08-29 20:00:39 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:00:51 - pico-train - INFO - Step 10150 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:00:51 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.7610
2025-08-29 20:00:51 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.61e-05
2025-08-29 20:00:51 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:01:04 - pico-train - INFO - Step 10175 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:01:04 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.7909
2025-08-29 20:01:04 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.61e-05
2025-08-29 20:01:04 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:01:17 - pico-train - INFO - Step 10200 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:01:17 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.7476
2025-08-29 20:01:17 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.60e-05
2025-08-29 20:01:17 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:01:29 - pico-train - INFO - Step 10225 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:01:29 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.6857
2025-08-29 20:01:29 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.59e-05
2025-08-29 20:01:29 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:01:43 - pico-train - INFO - Step 10250 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:01:43 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.6632
2025-08-29 20:01:43 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.58e-05
2025-08-29 20:01:43 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:01:56 - pico-train - INFO - Step 10275 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:01:56 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.7323
2025-08-29 20:01:56 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.57e-05
2025-08-29 20:01:56 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:02:08 - pico-train - INFO - Step 10300 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:02:08 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.6958
2025-08-29 20:02:08 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.56e-05
2025-08-29 20:02:08 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:02:21 - pico-train - INFO - Step 10325 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:02:21 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.6858
2025-08-29 20:02:21 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.55e-05
2025-08-29 20:02:21 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:02:34 - pico-train - INFO - Step 10350 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:02:34 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.7033
2025-08-29 20:02:34 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.54e-05
2025-08-29 20:02:34 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:02:46 - pico-train - INFO - Step 10375 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:02:46 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.6819
2025-08-29 20:02:46 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.53e-05
2025-08-29 20:02:46 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:02:59 - pico-train - INFO - Step 10400 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:02:59 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.7418
2025-08-29 20:02:59 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.52e-05
2025-08-29 20:02:59 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:03:12 - pico-train - INFO - Step 10425 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:03:12 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.7313
2025-08-29 20:03:12 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.51e-05
2025-08-29 20:03:12 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:03:24 - pico-train - INFO - Step 10450 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:03:24 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.6978
2025-08-29 20:03:24 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.50e-05
2025-08-29 20:03:24 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:03:37 - pico-train - INFO - Step 10475 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:03:37 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.6200
2025-08-29 20:03:37 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.49e-05
2025-08-29 20:03:37 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:03:49 - pico-train - INFO - Step 10500 -- ๐Ÿ’พ Saving Checkpoint
2025-08-29 20:06:30 - pico-train - INFO - Step 10500 -- ๐Ÿ“Š Evaluation Results
2025-08-29 20:06:30 - pico-train - INFO - โ””โ”€โ”€ paloma: 2.548096053613213e+22
2025-08-29 20:06:32 - pico-train - INFO - Step 10500 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:06:32 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.6817
2025-08-29 20:06:32 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.48e-05
2025-08-29 20:06:32 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:06:32 - pico-train - INFO - Step 10500 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-29 20:06:46 - pico-train - INFO - Step 10525 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:06:46 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.6343
2025-08-29 20:06:46 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.47e-05
2025-08-29 20:06:46 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:06:59 - pico-train - INFO - Step 10550 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:06:59 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.5885
2025-08-29 20:06:59 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.46e-05
2025-08-29 20:06:59 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:07:12 - pico-train - INFO - Step 10575 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:07:12 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.6060
2025-08-29 20:07:12 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.45e-05
2025-08-29 20:07:12 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:07:24 - pico-train - INFO - Step 10600 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:07:24 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.6924
2025-08-29 20:07:24 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.44e-05
2025-08-29 20:07:24 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:07:37 - pico-train - INFO - Step 10625 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:07:37 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.6004
2025-08-29 20:07:37 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.43e-05
2025-08-29 20:07:37 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:07:50 - pico-train - INFO - Step 10650 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:07:50 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.6561
2025-08-29 20:07:50 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.42e-05
2025-08-29 20:07:50 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:08:02 - pico-train - INFO - Step 10675 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:08:02 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.6306
2025-08-29 20:08:02 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.41e-05
2025-08-29 20:08:02 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:08:15 - pico-train - INFO - Step 10700 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:08:15 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.6713
2025-08-29 20:08:15 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.40e-05
2025-08-29 20:08:15 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:08:28 - pico-train - INFO - Step 10725 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:08:28 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.5606
2025-08-29 20:08:28 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.39e-05
2025-08-29 20:08:28 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:08:41 - pico-train - INFO - Step 10750 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:08:41 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.6740
2025-08-29 20:08:41 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.38e-05
2025-08-29 20:08:41 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:08:54 - pico-train - INFO - Step 10775 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:08:54 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.6308
2025-08-29 20:08:54 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.37e-05
2025-08-29 20:08:54 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:09:06 - pico-train - INFO - Step 10800 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:09:06 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.7103
2025-08-29 20:09:06 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.36e-05
2025-08-29 20:09:06 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:09:19 - pico-train - INFO - Step 10825 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:09:19 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.6176
2025-08-29 20:09:19 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.35e-05
2025-08-29 20:09:19 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:09:32 - pico-train - INFO - Step 10850 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:09:32 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.6927
2025-08-29 20:09:32 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.34e-05
2025-08-29 20:09:32 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:09:44 - pico-train - INFO - Step 10875 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:09:44 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.7048
2025-08-29 20:09:44 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.32e-05
2025-08-29 20:09:44 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:09:57 - pico-train - INFO - Step 10900 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:09:57 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.6776
2025-08-29 20:09:57 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.31e-05
2025-08-29 20:09:57 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:10:10 - pico-train - INFO - Step 10925 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:10:10 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.6591
2025-08-29 20:10:10 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.30e-05
2025-08-29 20:10:10 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:10:22 - pico-train - INFO - Step 10950 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:10:22 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.6584
2025-08-29 20:10:22 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.29e-05
2025-08-29 20:10:22 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:10:35 - pico-train - INFO - Step 10975 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:10:35 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.6623
2025-08-29 20:10:35 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.28e-05
2025-08-29 20:10:35 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:10:47 - pico-train - INFO - Step 11000 -- ๐Ÿ’พ Saving Checkpoint
2025-08-29 20:12:49 - pico-train - INFO - Step 11000 -- ๐Ÿ“Š Evaluation Results
2025-08-29 20:12:49 - pico-train - INFO - โ””โ”€โ”€ paloma: 3.4687567457405914e+22
2025-08-29 20:12:50 - pico-train - INFO - Step 11000 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:12:50 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.5963
2025-08-29 20:12:50 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.27e-05
2025-08-29 20:12:50 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:12:50 - pico-train - INFO - Step 11000 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-29 20:13:05 - pico-train - INFO - Step 11025 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:13:05 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.6530
2025-08-29 20:13:05 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.26e-05
2025-08-29 20:13:05 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:13:17 - pico-train - INFO - Step 11050 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:13:17 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.6117
2025-08-29 20:13:17 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.24e-05
2025-08-29 20:13:17 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:13:30 - pico-train - INFO - Step 11075 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:13:30 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.6225
2025-08-29 20:13:30 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.23e-05
2025-08-29 20:13:30 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:13:43 - pico-train - INFO - Step 11100 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:13:43 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.6265
2025-08-29 20:13:43 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.22e-05
2025-08-29 20:13:43 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:13:55 - pico-train - INFO - Step 11125 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:13:55 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.6243
2025-08-29 20:13:55 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.21e-05
2025-08-29 20:13:55 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:14:08 - pico-train - INFO - Step 11150 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:14:08 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.6638
2025-08-29 20:14:08 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.20e-05
2025-08-29 20:14:08 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:14:21 - pico-train - INFO - Step 11175 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:14:21 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.5506
2025-08-29 20:14:21 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.18e-05
2025-08-29 20:14:21 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:14:33 - pico-train - INFO - Step 11200 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:14:33 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.6098
2025-08-29 20:14:33 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.17e-05
2025-08-29 20:14:33 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:14:46 - pico-train - INFO - Step 11225 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:14:46 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.6271
2025-08-29 20:14:46 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.16e-05
2025-08-29 20:14:46 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:15:02 - pico-train - INFO - Step 11250 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:15:02 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.5983
2025-08-29 20:15:02 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.15e-05
2025-08-29 20:15:02 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:15:14 - pico-train - INFO - Step 11275 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:15:14 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.6346
2025-08-29 20:15:14 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.14e-05
2025-08-29 20:15:14 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:15:27 - pico-train - INFO - Step 11300 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:15:27 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.5797
2025-08-29 20:15:27 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.12e-05
2025-08-29 20:15:27 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:15:40 - pico-train - INFO - Step 11325 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:15:40 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.6250
2025-08-29 20:15:40 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.11e-05
2025-08-29 20:15:40 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:15:52 - pico-train - INFO - Step 11350 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:15:52 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.6715
2025-08-29 20:15:52 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.10e-05
2025-08-29 20:15:52 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:16:05 - pico-train - INFO - Step 11375 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:16:05 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.5902
2025-08-29 20:16:05 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.09e-05
2025-08-29 20:16:05 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:16:18 - pico-train - INFO - Step 11400 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:16:18 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.6557
2025-08-29 20:16:18 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.07e-05
2025-08-29 20:16:18 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:16:30 - pico-train - INFO - Step 11425 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:16:30 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.6095
2025-08-29 20:16:30 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.06e-05
2025-08-29 20:16:30 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:16:43 - pico-train - INFO - Step 11450 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:16:43 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.5971
2025-08-29 20:16:43 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.05e-05
2025-08-29 20:16:43 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:16:56 - pico-train - INFO - Step 11475 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:16:56 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.5727
2025-08-29 20:16:56 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.03e-05
2025-08-29 20:16:56 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:17:08 - pico-train - INFO - Step 11500 -- ๐Ÿ’พ Saving Checkpoint
2025-08-29 20:19:11 - pico-train - INFO - Step 11500 -- ๐Ÿ“Š Evaluation Results
2025-08-29 20:19:11 - pico-train - INFO - โ””โ”€โ”€ paloma: 6.754268354433947e+22
2025-08-29 20:19:12 - pico-train - INFO - Step 11500 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:19:12 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.5578
2025-08-29 20:19:12 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.02e-05
2025-08-29 20:19:12 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:19:12 - pico-train - INFO - Step 11500 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-29 20:19:27 - pico-train - INFO - Step 11525 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:19:27 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.6098
2025-08-29 20:19:27 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.01e-05
2025-08-29 20:19:27 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:19:40 - pico-train - INFO - Step 11550 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:19:40 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.5940
2025-08-29 20:19:40 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.00e-05
2025-08-29 20:19:40 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:19:52 - pico-train - INFO - Step 11575 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:19:52 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.6258
2025-08-29 20:19:52 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.98e-05
2025-08-29 20:19:52 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:20:05 - pico-train - INFO - Step 11600 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:20:05 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.5818
2025-08-29 20:20:05 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.97e-05
2025-08-29 20:20:05 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:20:18 - pico-train - INFO - Step 11625 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:20:18 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.5947
2025-08-29 20:20:18 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.96e-05
2025-08-29 20:20:18 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:20:30 - pico-train - INFO - Step 11650 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:20:30 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.5353
2025-08-29 20:20:30 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.94e-05
2025-08-29 20:20:30 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:20:43 - pico-train - INFO - Step 11675 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:20:43 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.6060
2025-08-29 20:20:43 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.93e-05
2025-08-29 20:20:43 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:20:56 - pico-train - INFO - Step 11700 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:20:56 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4960
2025-08-29 20:20:56 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.92e-05
2025-08-29 20:20:56 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:21:08 - pico-train - INFO - Step 11725 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:21:08 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.5251
2025-08-29 20:21:08 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.90e-05
2025-08-29 20:21:08 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:21:24 - pico-train - INFO - Step 11750 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:21:24 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.6466
2025-08-29 20:21:24 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.89e-05
2025-08-29 20:21:24 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:21:37 - pico-train - INFO - Step 11775 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:21:37 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.6170
2025-08-29 20:21:37 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.88e-05
2025-08-29 20:21:37 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:21:50 - pico-train - INFO - Step 11800 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:21:50 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.5495
2025-08-29 20:21:50 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.86e-05
2025-08-29 20:21:50 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:22:02 - pico-train - INFO - Step 11825 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:22:02 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.5836
2025-08-29 20:22:02 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.85e-05
2025-08-29 20:22:02 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:22:15 - pico-train - INFO - Step 11850 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:22:15 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.6164
2025-08-29 20:22:15 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.83e-05
2025-08-29 20:22:15 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:22:27 - pico-train - INFO - Step 11875 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:22:27 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.6049
2025-08-29 20:22:27 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.82e-05
2025-08-29 20:22:27 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:22:40 - pico-train - INFO - Step 11900 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:22:40 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.5526
2025-08-29 20:22:40 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.81e-05
2025-08-29 20:22:40 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:22:53 - pico-train - INFO - Step 11925 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:22:53 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4726
2025-08-29 20:22:53 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.79e-05
2025-08-29 20:22:53 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:23:05 - pico-train - INFO - Step 11950 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:23:05 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.6102
2025-08-29 20:23:05 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.78e-05
2025-08-29 20:23:05 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:23:18 - pico-train - INFO - Step 11975 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:23:18 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.5932
2025-08-29 20:23:18 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.76e-05
2025-08-29 20:23:18 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:23:30 - pico-train - INFO - Step 12000 -- ๐Ÿ’พ Saving Checkpoint
2025-08-29 20:25:41 - pico-train - INFO - Step 12000 -- ๐Ÿ“Š Evaluation Results
2025-08-29 20:25:41 - pico-train - INFO - โ””โ”€โ”€ paloma: 9.960108515700423e+22
2025-08-29 20:25:42 - pico-train - INFO - Step 12000 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:25:42 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.6366
2025-08-29 20:25:42 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.75e-05
2025-08-29 20:25:42 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:25:42 - pico-train - INFO - Step 12000 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-29 20:25:57 - pico-train - INFO - Step 12025 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:25:57 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.5240
2025-08-29 20:25:57 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.74e-05
2025-08-29 20:25:57 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:26:09 - pico-train - INFO - Step 12050 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:26:09 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.5622
2025-08-29 20:26:09 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.72e-05
2025-08-29 20:26:09 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:26:22 - pico-train - INFO - Step 12075 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:26:22 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4987
2025-08-29 20:26:22 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.71e-05
2025-08-29 20:26:22 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:26:35 - pico-train - INFO - Step 12100 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:26:35 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.5377
2025-08-29 20:26:35 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.69e-05
2025-08-29 20:26:35 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:26:48 - pico-train - INFO - Step 12125 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:26:48 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.5937
2025-08-29 20:26:48 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.68e-05
2025-08-29 20:26:48 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:27:00 - pico-train - INFO - Step 12150 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:27:00 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.5536
2025-08-29 20:27:00 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.66e-05
2025-08-29 20:27:00 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:27:13 - pico-train - INFO - Step 12175 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:27:13 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.5493
2025-08-29 20:27:13 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.65e-05
2025-08-29 20:27:13 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:27:26 - pico-train - INFO - Step 12200 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:27:26 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.5813
2025-08-29 20:27:26 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.63e-05
2025-08-29 20:27:26 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:27:38 - pico-train - INFO - Step 12225 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:27:38 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.5618
2025-08-29 20:27:38 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.62e-05
2025-08-29 20:27:38 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:27:52 - pico-train - INFO - Step 12250 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:27:52 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.5158
2025-08-29 20:27:52 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.61e-05
2025-08-29 20:27:52 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:28:05 - pico-train - INFO - Step 12275 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:28:05 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.6372
2025-08-29 20:28:05 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.59e-05
2025-08-29 20:28:05 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:28:17 - pico-train - INFO - Step 12300 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:28:17 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.6048
2025-08-29 20:28:17 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.58e-05
2025-08-29 20:28:17 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:28:30 - pico-train - INFO - Step 12325 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:28:30 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4816
2025-08-29 20:28:30 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.56e-05
2025-08-29 20:28:30 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:28:43 - pico-train - INFO - Step 12350 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:28:43 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.5455
2025-08-29 20:28:43 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.55e-05
2025-08-29 20:28:43 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:28:55 - pico-train - INFO - Step 12375 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:28:55 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.6032
2025-08-29 20:28:55 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.53e-05
2025-08-29 20:28:55 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:29:08 - pico-train - INFO - Step 12400 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:29:08 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.5575
2025-08-29 20:29:08 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.52e-05
2025-08-29 20:29:08 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:29:21 - pico-train - INFO - Step 12425 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:29:21 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.5083
2025-08-29 20:29:21 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.50e-05
2025-08-29 20:29:21 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:29:33 - pico-train - INFO - Step 12450 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:29:33 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.5872
2025-08-29 20:29:33 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.49e-05
2025-08-29 20:29:33 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:29:46 - pico-train - INFO - Step 12475 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:29:46 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.5212
2025-08-29 20:29:46 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.47e-05
2025-08-29 20:29:46 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:29:58 - pico-train - INFO - Step 12500 -- ๐Ÿ’พ Saving Checkpoint
2025-08-29 20:32:08 - pico-train - INFO - Step 12500 -- ๐Ÿ“Š Evaluation Results
2025-08-29 20:32:08 - pico-train - INFO - โ””โ”€โ”€ paloma: 1.6834857831717288e+23
2025-08-29 20:32:09 - pico-train - INFO - Step 12500 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:32:09 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.5582
2025-08-29 20:32:09 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.46e-05
2025-08-29 20:32:09 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:32:09 - pico-train - INFO - Step 12500 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-29 20:32:24 - pico-train - INFO - Step 12525 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:32:24 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.5608
2025-08-29 20:32:24 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.44e-05
2025-08-29 20:32:24 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:32:37 - pico-train - INFO - Step 12550 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:32:37 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.5180
2025-08-29 20:32:37 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.43e-05
2025-08-29 20:32:37 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:32:49 - pico-train - INFO - Step 12575 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:32:49 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.5553
2025-08-29 20:32:49 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.41e-05
2025-08-29 20:32:49 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:33:02 - pico-train - INFO - Step 12600 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:33:02 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.5012
2025-08-29 20:33:02 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.40e-05
2025-08-29 20:33:02 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:33:14 - pico-train - INFO - Step 12625 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:33:14 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4864
2025-08-29 20:33:14 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.38e-05
2025-08-29 20:33:14 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:33:27 - pico-train - INFO - Step 12650 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:33:27 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.5061
2025-08-29 20:33:27 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.37e-05
2025-08-29 20:33:27 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:33:40 - pico-train - INFO - Step 12675 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:33:40 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.5794
2025-08-29 20:33:40 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.35e-05
2025-08-29 20:33:40 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:33:52 - pico-train - INFO - Step 12700 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:33:52 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.5345
2025-08-29 20:33:52 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.33e-05
2025-08-29 20:33:52 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:34:05 - pico-train - INFO - Step 12725 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:34:05 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4304
2025-08-29 20:34:05 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.32e-05
2025-08-29 20:34:05 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:34:18 - pico-train - INFO - Step 12750 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:34:18 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.5336
2025-08-29 20:34:18 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.30e-05
2025-08-29 20:34:18 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:34:31 - pico-train - INFO - Step 12775 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:34:31 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4534
2025-08-29 20:34:31 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.29e-05
2025-08-29 20:34:31 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:34:43 - pico-train - INFO - Step 12800 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:34:43 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.5394
2025-08-29 20:34:43 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.27e-05
2025-08-29 20:34:43 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:34:56 - pico-train - INFO - Step 12825 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:34:56 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.5892
2025-08-29 20:34:56 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.26e-05
2025-08-29 20:34:56 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:35:09 - pico-train - INFO - Step 12850 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:35:09 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.5827
2025-08-29 20:35:09 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.24e-05
2025-08-29 20:35:09 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:35:22 - pico-train - INFO - Step 12875 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:35:22 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4602
2025-08-29 20:35:22 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.23e-05
2025-08-29 20:35:22 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:35:35 - pico-train - INFO - Step 12900 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:35:35 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4629
2025-08-29 20:35:35 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.21e-05
2025-08-29 20:35:35 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:35:47 - pico-train - INFO - Step 12925 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:35:47 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.5758
2025-08-29 20:35:47 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.19e-05
2025-08-29 20:35:47 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:36:00 - pico-train - INFO - Step 12950 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:36:00 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.5234
2025-08-29 20:36:00 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.18e-05
2025-08-29 20:36:00 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:36:12 - pico-train - INFO - Step 12975 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:36:13 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.5186
2025-08-29 20:36:13 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.16e-05
2025-08-29 20:36:13 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:36:25 - pico-train - INFO - Step 13000 -- ๐Ÿ’พ Saving Checkpoint
2025-08-29 20:38:32 - pico-train - INFO - Step 13000 -- ๐Ÿ“Š Evaluation Results
2025-08-29 20:38:32 - pico-train - INFO - โ””โ”€โ”€ paloma: 2.530184410087534e+23
2025-08-29 20:38:34 - pico-train - INFO - Step 13000 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:38:34 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.5317
2025-08-29 20:38:34 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.15e-05
2025-08-29 20:38:34 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:38:34 - pico-train - INFO - Step 13000 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-29 20:38:49 - pico-train - INFO - Step 13025 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:38:49 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.5595
2025-08-29 20:38:49 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.13e-05
2025-08-29 20:38:49 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:39:01 - pico-train - INFO - Step 13050 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:39:01 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.5374
2025-08-29 20:39:01 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.12e-05
2025-08-29 20:39:01 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:39:14 - pico-train - INFO - Step 13075 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:39:14 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4943
2025-08-29 20:39:14 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.10e-05
2025-08-29 20:39:14 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:39:26 - pico-train - INFO - Step 13100 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:39:26 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4770
2025-08-29 20:39:26 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.08e-05
2025-08-29 20:39:26 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:39:39 - pico-train - INFO - Step 13125 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:39:39 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.6083
2025-08-29 20:39:39 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.07e-05
2025-08-29 20:39:39 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:39:52 - pico-train - INFO - Step 13150 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:39:52 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4500
2025-08-29 20:39:52 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.05e-05
2025-08-29 20:39:52 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:40:04 - pico-train - INFO - Step 13175 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:40:04 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.5230
2025-08-29 20:40:04 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.04e-05
2025-08-29 20:40:04 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:40:17 - pico-train - INFO - Step 13200 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:40:17 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.5150
2025-08-29 20:40:17 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.02e-05
2025-08-29 20:40:17 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:40:29 - pico-train - INFO - Step 13225 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:40:29 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4379
2025-08-29 20:40:29 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.00e-05
2025-08-29 20:40:29 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:40:43 - pico-train - INFO - Step 13250 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:40:43 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.5261
2025-08-29 20:40:43 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.99e-05
2025-08-29 20:40:43 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:40:56 - pico-train - INFO - Step 13275 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:40:56 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4582
2025-08-29 20:40:56 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.97e-05
2025-08-29 20:40:56 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:41:08 - pico-train - INFO - Step 13300 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:41:08 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4420
2025-08-29 20:41:08 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.96e-05
2025-08-29 20:41:08 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:41:21 - pico-train - INFO - Step 13325 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:41:21 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.5268
2025-08-29 20:41:21 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.94e-05
2025-08-29 20:41:21 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:41:34 - pico-train - INFO - Step 13350 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:41:34 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4300
2025-08-29 20:41:34 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.92e-05
2025-08-29 20:41:34 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:41:46 - pico-train - INFO - Step 13375 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:41:46 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4366
2025-08-29 20:41:46 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.91e-05
2025-08-29 20:41:46 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:41:59 - pico-train - INFO - Step 13400 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:41:59 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.5664
2025-08-29 20:41:59 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.89e-05
2025-08-29 20:41:59 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:42:11 - pico-train - INFO - Step 13425 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:42:11 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.5522
2025-08-29 20:42:11 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.87e-05
2025-08-29 20:42:11 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:42:24 - pico-train - INFO - Step 13450 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:42:24 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4326
2025-08-29 20:42:24 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.86e-05
2025-08-29 20:42:24 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:42:36 - pico-train - INFO - Step 13475 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:42:36 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4898
2025-08-29 20:42:36 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.84e-05
2025-08-29 20:42:36 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:42:49 - pico-train - INFO - Step 13500 -- ๐Ÿ’พ Saving Checkpoint
2025-08-29 20:44:58 - pico-train - INFO - Step 13500 -- ๐Ÿ“Š Evaluation Results
2025-08-29 20:44:58 - pico-train - INFO - โ””โ”€โ”€ paloma: 3.495795801566584e+23
2025-08-29 20:45:00 - pico-train - INFO - Step 13500 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:45:00 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4517
2025-08-29 20:45:00 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.83e-05
2025-08-29 20:45:00 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:45:00 - pico-train - INFO - Step 13500 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-29 20:45:16 - pico-train - INFO - Step 13525 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:45:16 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4512
2025-08-29 20:45:16 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.81e-05
2025-08-29 20:45:16 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:45:28 - pico-train - INFO - Step 13550 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:45:28 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4535
2025-08-29 20:45:28 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.79e-05
2025-08-29 20:45:28 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:45:41 - pico-train - INFO - Step 13575 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:45:41 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.5099
2025-08-29 20:45:41 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.78e-05
2025-08-29 20:45:41 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:45:53 - pico-train - INFO - Step 13600 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:45:53 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4340
2025-08-29 20:45:53 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.76e-05
2025-08-29 20:45:53 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:46:06 - pico-train - INFO - Step 13625 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:46:06 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4295
2025-08-29 20:46:06 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.75e-05
2025-08-29 20:46:06 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:46:18 - pico-train - INFO - Step 13650 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:46:18 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3784
2025-08-29 20:46:18 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.73e-05
2025-08-29 20:46:18 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:46:31 - pico-train - INFO - Step 13675 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:46:31 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.5333
2025-08-29 20:46:31 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.71e-05
2025-08-29 20:46:31 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:46:44 - pico-train - INFO - Step 13700 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:46:44 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.5142
2025-08-29 20:46:44 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.70e-05
2025-08-29 20:46:44 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:46:56 - pico-train - INFO - Step 13725 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:46:56 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4840
2025-08-29 20:46:56 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.68e-05
2025-08-29 20:46:56 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:47:10 - pico-train - INFO - Step 13750 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:47:10 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4268
2025-08-29 20:47:10 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.66e-05
2025-08-29 20:47:10 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:47:22 - pico-train - INFO - Step 13775 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:47:22 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4121
2025-08-29 20:47:22 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.65e-05
2025-08-29 20:47:22 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:47:35 - pico-train - INFO - Step 13800 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:47:35 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4483
2025-08-29 20:47:35 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.63e-05
2025-08-29 20:47:35 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:47:48 - pico-train - INFO - Step 13825 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:47:48 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4498
2025-08-29 20:47:48 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.61e-05
2025-08-29 20:47:48 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:48:00 - pico-train - INFO - Step 13850 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:48:00 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4725
2025-08-29 20:48:00 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.60e-05
2025-08-29 20:48:00 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:48:13 - pico-train - INFO - Step 13875 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:48:13 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4732
2025-08-29 20:48:13 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.58e-05
2025-08-29 20:48:13 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:48:26 - pico-train - INFO - Step 13900 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:48:26 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3980
2025-08-29 20:48:26 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.57e-05
2025-08-29 20:48:26 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:48:38 - pico-train - INFO - Step 13925 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:48:38 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4906
2025-08-29 20:48:38 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.55e-05
2025-08-29 20:48:38 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:48:51 - pico-train - INFO - Step 13950 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:48:51 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4096
2025-08-29 20:48:51 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.53e-05
2025-08-29 20:48:51 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:49:04 - pico-train - INFO - Step 13975 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:49:04 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4280
2025-08-29 20:49:04 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.52e-05
2025-08-29 20:49:04 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:49:16 - pico-train - INFO - Step 14000 -- ๐Ÿ’พ Saving Checkpoint
2025-08-29 20:51:14 - pico-train - INFO - Step 14000 -- ๐Ÿ“Š Evaluation Results
2025-08-29 20:51:14 - pico-train - INFO - โ””โ”€โ”€ paloma: 4.7779579346961524e+23
2025-08-29 20:51:17 - pico-train - INFO - Step 14000 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:51:17 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3189
2025-08-29 20:51:17 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.50e-05
2025-08-29 20:51:17 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:51:17 - pico-train - INFO - Step 14000 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-29 20:51:33 - pico-train - INFO - Step 14025 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:51:33 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4994
2025-08-29 20:51:33 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.48e-05
2025-08-29 20:51:33 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:51:45 - pico-train - INFO - Step 14050 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:51:45 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4802
2025-08-29 20:51:45 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.47e-05
2025-08-29 20:51:45 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:51:58 - pico-train - INFO - Step 14075 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:51:58 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.5178
2025-08-29 20:51:58 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.45e-05
2025-08-29 20:51:58 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:52:11 - pico-train - INFO - Step 14100 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:52:11 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4472
2025-08-29 20:52:11 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.43e-05
2025-08-29 20:52:11 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:52:23 - pico-train - INFO - Step 14125 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:52:23 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.5737
2025-08-29 20:52:23 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.42e-05
2025-08-29 20:52:23 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:52:36 - pico-train - INFO - Step 14150 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:52:36 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4612
2025-08-29 20:52:36 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.40e-05
2025-08-29 20:52:36 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:52:48 - pico-train - INFO - Step 14175 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:52:48 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4352
2025-08-29 20:52:48 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.39e-05
2025-08-29 20:52:48 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:53:01 - pico-train - INFO - Step 14200 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:53:01 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.5225
2025-08-29 20:53:01 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.37e-05
2025-08-29 20:53:01 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:53:13 - pico-train - INFO - Step 14225 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:53:13 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4075
2025-08-29 20:53:13 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.35e-05
2025-08-29 20:53:13 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:53:27 - pico-train - INFO - Step 14250 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:53:27 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4283
2025-08-29 20:53:27 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.34e-05
2025-08-29 20:53:27 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:53:40 - pico-train - INFO - Step 14275 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:53:40 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4336
2025-08-29 20:53:40 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.32e-05
2025-08-29 20:53:40 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:53:53 - pico-train - INFO - Step 14300 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:53:53 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4139
2025-08-29 20:53:53 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.30e-05
2025-08-29 20:53:53 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:54:05 - pico-train - INFO - Step 14325 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:54:05 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.5305
2025-08-29 20:54:05 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.29e-05
2025-08-29 20:54:05 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:54:18 - pico-train - INFO - Step 14350 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:54:18 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4196
2025-08-29 20:54:18 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.27e-05
2025-08-29 20:54:18 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:54:30 - pico-train - INFO - Step 14375 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:54:30 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.5622
2025-08-29 20:54:30 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.25e-05
2025-08-29 20:54:30 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:54:43 - pico-train - INFO - Step 14400 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:54:43 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4562
2025-08-29 20:54:43 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.24e-05
2025-08-29 20:54:43 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:54:55 - pico-train - INFO - Step 14425 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:54:55 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3897
2025-08-29 20:54:55 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.22e-05
2025-08-29 20:54:55 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:55:08 - pico-train - INFO - Step 14450 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:55:08 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4705
2025-08-29 20:55:08 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.21e-05
2025-08-29 20:55:08 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:55:21 - pico-train - INFO - Step 14475 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:55:21 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4615
2025-08-29 20:55:21 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.19e-05
2025-08-29 20:55:21 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:55:33 - pico-train - INFO - Step 14500 -- ๐Ÿ’พ Saving Checkpoint
2025-08-29 20:58:09 - pico-train - INFO - Step 14500 -- ๐Ÿ“Š Evaluation Results
2025-08-29 20:58:09 - pico-train - INFO - โ””โ”€โ”€ paloma: 6.36811357145983e+23
2025-08-29 20:58:11 - pico-train - INFO - Step 14500 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:58:11 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4280
2025-08-29 20:58:11 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.17e-05
2025-08-29 20:58:11 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:58:11 - pico-train - INFO - Step 14500 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-29 20:58:27 - pico-train - INFO - Step 14525 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:58:27 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4418
2025-08-29 20:58:27 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.16e-05
2025-08-29 20:58:27 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:58:39 - pico-train - INFO - Step 14550 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:58:39 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.5223
2025-08-29 20:58:39 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.14e-05
2025-08-29 20:58:39 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:58:52 - pico-train - INFO - Step 14575 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:58:52 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4231
2025-08-29 20:58:52 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.13e-05
2025-08-29 20:58:52 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:59:04 - pico-train - INFO - Step 14600 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:59:04 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4879
2025-08-29 20:59:04 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.11e-05
2025-08-29 20:59:04 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:59:17 - pico-train - INFO - Step 14625 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:59:17 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4622
2025-08-29 20:59:17 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.09e-05
2025-08-29 20:59:17 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:59:30 - pico-train - INFO - Step 14650 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:59:30 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3997
2025-08-29 20:59:30 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.08e-05
2025-08-29 20:59:30 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:59:42 - pico-train - INFO - Step 14675 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:59:42 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.5009
2025-08-29 20:59:42 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.06e-05
2025-08-29 20:59:42 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 20:59:55 - pico-train - INFO - Step 14700 -- ๐Ÿ”„ Training Metrics
2025-08-29 20:59:55 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4203
2025-08-29 20:59:55 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.04e-05
2025-08-29 20:59:55 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:00:08 - pico-train - INFO - Step 14725 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:00:08 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4063
2025-08-29 21:00:08 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.03e-05
2025-08-29 21:00:08 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:00:21 - pico-train - INFO - Step 14750 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:00:21 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4672
2025-08-29 21:00:21 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.01e-05
2025-08-29 21:00:21 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:00:34 - pico-train - INFO - Step 14775 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:00:34 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.5252
2025-08-29 21:00:34 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-29 21:00:34 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:00:47 - pico-train - INFO - Step 14800 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:00:47 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.5220
2025-08-29 21:00:47 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.98e-05
2025-08-29 21:00:47 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:00:59 - pico-train - INFO - Step 14825 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:00:59 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4118
2025-08-29 21:00:59 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.96e-05
2025-08-29 21:00:59 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:01:12 - pico-train - INFO - Step 14850 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:01:12 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4979
2025-08-29 21:01:12 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.95e-05
2025-08-29 21:01:12 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:01:24 - pico-train - INFO - Step 14875 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:01:24 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4114
2025-08-29 21:01:24 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.93e-05
2025-08-29 21:01:24 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:01:37 - pico-train - INFO - Step 14900 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:01:37 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4437
2025-08-29 21:01:37 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.92e-05
2025-08-29 21:01:37 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:01:49 - pico-train - INFO - Step 14925 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:01:49 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.5580
2025-08-29 21:01:49 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.90e-05
2025-08-29 21:01:49 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:02:02 - pico-train - INFO - Step 14950 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:02:02 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4578
2025-08-29 21:02:02 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.88e-05
2025-08-29 21:02:02 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:02:15 - pico-train - INFO - Step 14975 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:02:15 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4092
2025-08-29 21:02:15 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.87e-05
2025-08-29 21:02:15 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:02:27 - pico-train - INFO - Step 15000 -- ๐Ÿ’พ Saving Checkpoint
2025-08-29 21:06:10 - pico-train - INFO - Step 15000 -- ๐Ÿ“Š Evaluation Results
2025-08-29 21:06:10 - pico-train - INFO - โ””โ”€โ”€ paloma: 7.425894520816826e+23
2025-08-29 21:06:12 - pico-train - INFO - Step 15000 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:06:12 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.5556
2025-08-29 21:06:12 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.85e-05
2025-08-29 21:06:12 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:06:12 - pico-train - INFO - Step 15000 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-29 21:06:28 - pico-train - INFO - Step 15025 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:06:28 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.5084
2025-08-29 21:06:28 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.84e-05
2025-08-29 21:06:28 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:06:41 - pico-train - INFO - Step 15050 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:06:41 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4875
2025-08-29 21:06:41 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.82e-05
2025-08-29 21:06:41 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:06:53 - pico-train - INFO - Step 15075 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:06:53 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4415
2025-08-29 21:06:53 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.81e-05
2025-08-29 21:06:53 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:07:06 - pico-train - INFO - Step 15100 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:07:06 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4890
2025-08-29 21:07:06 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.79e-05
2025-08-29 21:07:06 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:07:18 - pico-train - INFO - Step 15125 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:07:18 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4124
2025-08-29 21:07:18 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.77e-05
2025-08-29 21:07:18 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:07:31 - pico-train - INFO - Step 15150 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:07:31 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3762
2025-08-29 21:07:31 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.76e-05
2025-08-29 21:07:31 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:07:43 - pico-train - INFO - Step 15175 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:07:43 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4269
2025-08-29 21:07:43 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.74e-05
2025-08-29 21:07:43 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:07:56 - pico-train - INFO - Step 15200 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:07:56 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4523
2025-08-29 21:07:56 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.73e-05
2025-08-29 21:07:56 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:08:09 - pico-train - INFO - Step 15225 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:08:09 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4065
2025-08-29 21:08:09 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.71e-05
2025-08-29 21:08:09 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:08:22 - pico-train - INFO - Step 15250 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:08:22 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4635
2025-08-29 21:08:22 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.70e-05
2025-08-29 21:08:22 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:08:35 - pico-train - INFO - Step 15275 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:08:35 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4213
2025-08-29 21:08:35 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.68e-05
2025-08-29 21:08:35 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:08:47 - pico-train - INFO - Step 15300 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:08:47 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4003
2025-08-29 21:08:47 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.67e-05
2025-08-29 21:08:47 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:09:00 - pico-train - INFO - Step 15325 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:09:00 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3599
2025-08-29 21:09:00 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.65e-05
2025-08-29 21:09:00 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:09:12 - pico-train - INFO - Step 15350 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:09:12 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4026
2025-08-29 21:09:12 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.63e-05
2025-08-29 21:09:12 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:09:26 - pico-train - INFO - Step 15375 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:09:26 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4583
2025-08-29 21:09:26 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.62e-05
2025-08-29 21:09:26 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:09:39 - pico-train - INFO - Step 15400 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:09:39 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3811
2025-08-29 21:09:39 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.60e-05
2025-08-29 21:09:39 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:09:51 - pico-train - INFO - Step 15425 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:09:51 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4031
2025-08-29 21:09:51 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.59e-05
2025-08-29 21:09:51 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:10:04 - pico-train - INFO - Step 15450 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:10:04 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4722
2025-08-29 21:10:04 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.57e-05
2025-08-29 21:10:04 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:10:17 - pico-train - INFO - Step 15475 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:10:17 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4515
2025-08-29 21:10:17 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.56e-05
2025-08-29 21:10:17 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:10:29 - pico-train - INFO - Step 15500 -- ๐Ÿ’พ Saving Checkpoint
2025-08-29 21:12:31 - pico-train - INFO - Step 15500 -- ๐Ÿ“Š Evaluation Results
2025-08-29 21:12:31 - pico-train - INFO - โ””โ”€โ”€ paloma: 8.24811762866625e+23
2025-08-29 21:12:32 - pico-train - INFO - Step 15500 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:12:32 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.5211
2025-08-29 21:12:32 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.54e-05
2025-08-29 21:12:32 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:12:32 - pico-train - INFO - Step 15500 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-29 21:12:47 - pico-train - INFO - Step 15525 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:12:47 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3531
2025-08-29 21:12:47 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.53e-05
2025-08-29 21:12:47 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:13:00 - pico-train - INFO - Step 15550 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:13:00 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4191
2025-08-29 21:13:00 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.51e-05
2025-08-29 21:13:00 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:13:12 - pico-train - INFO - Step 15575 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:13:12 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3696
2025-08-29 21:13:12 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.50e-05
2025-08-29 21:13:12 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:13:25 - pico-train - INFO - Step 15600 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:13:25 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3854
2025-08-29 21:13:25 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.48e-05
2025-08-29 21:13:25 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:13:38 - pico-train - INFO - Step 15625 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:13:38 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4512
2025-08-29 21:13:38 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.47e-05
2025-08-29 21:13:38 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:13:50 - pico-train - INFO - Step 15650 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:13:50 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4353
2025-08-29 21:13:50 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.45e-05
2025-08-29 21:13:50 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:14:03 - pico-train - INFO - Step 15675 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:14:03 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4748
2025-08-29 21:14:03 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.44e-05
2025-08-29 21:14:03 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:14:15 - pico-train - INFO - Step 15700 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:14:15 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4150
2025-08-29 21:14:15 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.42e-05
2025-08-29 21:14:15 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:14:28 - pico-train - INFO - Step 15725 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:14:28 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4840
2025-08-29 21:14:28 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.41e-05
2025-08-29 21:14:28 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:14:40 - pico-train - INFO - Step 15750 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:14:40 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3685
2025-08-29 21:14:40 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.39e-05
2025-08-29 21:14:40 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:14:53 - pico-train - INFO - Step 15775 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:14:53 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4182
2025-08-29 21:14:53 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.38e-05
2025-08-29 21:14:53 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:15:06 - pico-train - INFO - Step 15800 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:15:06 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4032
2025-08-29 21:15:06 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.37e-05
2025-08-29 21:15:06 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:15:18 - pico-train - INFO - Step 15825 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:15:18 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.5306
2025-08-29 21:15:18 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.35e-05
2025-08-29 21:15:18 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:15:31 - pico-train - INFO - Step 15850 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:15:31 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4199
2025-08-29 21:15:31 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.34e-05
2025-08-29 21:15:31 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:15:44 - pico-train - INFO - Step 15875 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:15:44 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3861
2025-08-29 21:15:44 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.32e-05
2025-08-29 21:15:44 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:15:57 - pico-train - INFO - Step 15900 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:15:57 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4234
2025-08-29 21:15:57 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.31e-05
2025-08-29 21:15:57 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:16:09 - pico-train - INFO - Step 15925 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:16:09 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3634
2025-08-29 21:16:09 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.29e-05
2025-08-29 21:16:09 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:16:22 - pico-train - INFO - Step 15950 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:16:22 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3444
2025-08-29 21:16:22 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.28e-05
2025-08-29 21:16:22 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:16:35 - pico-train - INFO - Step 15975 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:16:35 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.5188
2025-08-29 21:16:35 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.26e-05
2025-08-29 21:16:35 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:16:47 - pico-train - INFO - Step 16000 -- ๐Ÿ’พ Saving Checkpoint
2025-08-29 21:19:08 - pico-train - INFO - Step 16000 -- ๐Ÿ“Š Evaluation Results
2025-08-29 21:19:08 - pico-train - INFO - โ””โ”€โ”€ paloma: 1.1206091008496824e+24
2025-08-29 21:19:11 - pico-train - INFO - Step 16000 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:19:11 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4775
2025-08-29 21:19:11 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.25e-05
2025-08-29 21:19:11 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:19:11 - pico-train - INFO - Step 16000 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-29 21:19:27 - pico-train - INFO - Step 16025 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:19:27 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4575
2025-08-29 21:19:27 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.24e-05
2025-08-29 21:19:27 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:19:39 - pico-train - INFO - Step 16050 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:19:39 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4128
2025-08-29 21:19:39 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.22e-05
2025-08-29 21:19:39 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:19:52 - pico-train - INFO - Step 16075 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:19:52 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3496
2025-08-29 21:19:52 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.21e-05
2025-08-29 21:19:52 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:20:05 - pico-train - INFO - Step 16100 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:20:05 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3495
2025-08-29 21:20:05 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.19e-05
2025-08-29 21:20:05 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:20:17 - pico-train - INFO - Step 16125 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:20:17 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4011
2025-08-29 21:20:17 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.18e-05
2025-08-29 21:20:17 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:20:30 - pico-train - INFO - Step 16150 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:20:30 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3754
2025-08-29 21:20:30 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.17e-05
2025-08-29 21:20:30 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:20:43 - pico-train - INFO - Step 16175 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:20:43 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4419
2025-08-29 21:20:43 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.15e-05
2025-08-29 21:20:43 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:20:55 - pico-train - INFO - Step 16200 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:20:55 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.5383
2025-08-29 21:20:55 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.14e-05
2025-08-29 21:20:55 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:21:08 - pico-train - INFO - Step 16225 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:21:08 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4051
2025-08-29 21:21:08 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.12e-05
2025-08-29 21:21:08 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:21:21 - pico-train - INFO - Step 16250 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:21:21 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4001
2025-08-29 21:21:21 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.11e-05
2025-08-29 21:21:21 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:21:34 - pico-train - INFO - Step 16275 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:21:34 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4433
2025-08-29 21:21:34 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.10e-05
2025-08-29 21:21:34 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:21:47 - pico-train - INFO - Step 16300 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:21:47 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4289
2025-08-29 21:21:47 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.08e-05
2025-08-29 21:21:47 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:21:59 - pico-train - INFO - Step 16325 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:21:59 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4217
2025-08-29 21:21:59 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.07e-05
2025-08-29 21:21:59 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:22:12 - pico-train - INFO - Step 16350 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:22:12 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3954
2025-08-29 21:22:12 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.06e-05
2025-08-29 21:22:12 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:22:34 - pico-train - INFO - Step 16375 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:22:34 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4462
2025-08-29 21:22:34 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.04e-05
2025-08-29 21:22:34 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:22:47 - pico-train - INFO - Step 16400 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:22:47 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4041
2025-08-29 21:22:47 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.03e-05
2025-08-29 21:22:47 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:23:00 - pico-train - INFO - Step 16425 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:23:00 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4381
2025-08-29 21:23:00 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.02e-05
2025-08-29 21:23:00 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:23:13 - pico-train - INFO - Step 16450 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:23:13 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3713
2025-08-29 21:23:13 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.00e-05
2025-08-29 21:23:13 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:23:25 - pico-train - INFO - Step 16475 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:23:25 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3954
2025-08-29 21:23:25 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.91e-06
2025-08-29 21:23:25 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:23:38 - pico-train - INFO - Step 16500 -- ๐Ÿ’พ Saving Checkpoint
2025-08-29 21:25:45 - pico-train - INFO - Step 16500 -- ๐Ÿ“Š Evaluation Results
2025-08-29 21:25:45 - pico-train - INFO - โ””โ”€โ”€ paloma: 1.237561338460672e+24
2025-08-29 21:25:47 - pico-train - INFO - Step 16500 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:25:47 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4280
2025-08-29 21:25:47 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.78e-06
2025-08-29 21:25:47 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:25:47 - pico-train - INFO - Step 16500 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-29 21:26:02 - pico-train - INFO - Step 16525 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:26:02 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3385
2025-08-29 21:26:02 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.65e-06
2025-08-29 21:26:02 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:26:15 - pico-train - INFO - Step 16550 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:26:15 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3865
2025-08-29 21:26:15 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.52e-06
2025-08-29 21:26:15 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:26:27 - pico-train - INFO - Step 16575 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:26:27 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3920
2025-08-29 21:26:27 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.39e-06
2025-08-29 21:26:27 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:26:40 - pico-train - INFO - Step 16600 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:26:40 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4103
2025-08-29 21:26:40 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.27e-06
2025-08-29 21:26:40 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:26:53 - pico-train - INFO - Step 16625 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:26:53 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.5227
2025-08-29 21:26:53 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.14e-06
2025-08-29 21:26:53 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:27:05 - pico-train - INFO - Step 16650 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:27:05 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3691
2025-08-29 21:27:05 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.01e-06
2025-08-29 21:27:05 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:27:18 - pico-train - INFO - Step 16675 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:27:18 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4122
2025-08-29 21:27:18 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.89e-06
2025-08-29 21:27:18 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:27:31 - pico-train - INFO - Step 16700 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:27:31 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4154
2025-08-29 21:27:31 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.76e-06
2025-08-29 21:27:31 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:27:43 - pico-train - INFO - Step 16725 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:27:43 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3769
2025-08-29 21:27:43 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.64e-06
2025-08-29 21:27:43 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:27:56 - pico-train - INFO - Step 16750 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:27:56 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3878
2025-08-29 21:27:56 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.52e-06
2025-08-29 21:27:56 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:28:08 - pico-train - INFO - Step 16775 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:28:08 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4015
2025-08-29 21:28:08 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.39e-06
2025-08-29 21:28:08 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:28:21 - pico-train - INFO - Step 16800 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:28:21 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4290
2025-08-29 21:28:21 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.27e-06
2025-08-29 21:28:21 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:28:33 - pico-train - INFO - Step 16825 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:28:33 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4070
2025-08-29 21:28:33 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.15e-06
2025-08-29 21:28:33 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:28:46 - pico-train - INFO - Step 16850 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:28:46 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4331
2025-08-29 21:28:46 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.03e-06
2025-08-29 21:28:46 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:28:59 - pico-train - INFO - Step 16875 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:28:59 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3833
2025-08-29 21:28:59 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.91e-06
2025-08-29 21:28:59 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:29:12 - pico-train - INFO - Step 16900 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:29:12 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4244
2025-08-29 21:29:12 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.79e-06
2025-08-29 21:29:12 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:29:24 - pico-train - INFO - Step 16925 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:29:24 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3542
2025-08-29 21:29:24 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.67e-06
2025-08-29 21:29:24 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:29:37 - pico-train - INFO - Step 16950 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:29:37 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4249
2025-08-29 21:29:37 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.56e-06
2025-08-29 21:29:37 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:29:50 - pico-train - INFO - Step 16975 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:29:50 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4345
2025-08-29 21:29:50 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.44e-06
2025-08-29 21:29:50 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:30:02 - pico-train - INFO - Step 17000 -- ๐Ÿ’พ Saving Checkpoint
2025-08-29 21:32:11 - pico-train - INFO - Step 17000 -- ๐Ÿ“Š Evaluation Results
2025-08-29 21:32:11 - pico-train - INFO - โ””โ”€โ”€ paloma: 1.304531243949835e+24
2025-08-29 21:32:12 - pico-train - INFO - Step 17000 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:32:12 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3667
2025-08-29 21:32:12 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.32e-06
2025-08-29 21:32:12 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:32:12 - pico-train - INFO - Step 17000 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-29 21:32:27 - pico-train - INFO - Step 17025 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:32:27 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4075
2025-08-29 21:32:27 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.21e-06
2025-08-29 21:32:27 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:32:39 - pico-train - INFO - Step 17050 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:32:39 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4075
2025-08-29 21:32:39 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.09e-06
2025-08-29 21:32:39 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:32:52 - pico-train - INFO - Step 17075 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:32:52 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4692
2025-08-29 21:32:52 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 6.98e-06
2025-08-29 21:32:52 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:33:05 - pico-train - INFO - Step 17100 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:33:05 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4737
2025-08-29 21:33:05 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 6.87e-06
2025-08-29 21:33:05 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:33:17 - pico-train - INFO - Step 17125 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:33:17 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3527
2025-08-29 21:33:17 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 6.75e-06
2025-08-29 21:33:17 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:33:30 - pico-train - INFO - Step 17150 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:33:30 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4079
2025-08-29 21:33:30 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 6.64e-06
2025-08-29 21:33:30 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:33:43 - pico-train - INFO - Step 17175 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:33:43 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2427
2025-08-29 21:33:43 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 6.53e-06
2025-08-29 21:33:43 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:33:55 - pico-train - INFO - Step 17200 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:33:55 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3806
2025-08-29 21:33:55 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 6.42e-06
2025-08-29 21:33:55 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:34:08 - pico-train - INFO - Step 17225 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:34:08 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4464
2025-08-29 21:34:08 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 6.31e-06
2025-08-29 21:34:08 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:34:21 - pico-train - INFO - Step 17250 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:34:21 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4264
2025-08-29 21:34:21 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 6.20e-06
2025-08-29 21:34:21 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:34:33 - pico-train - INFO - Step 17275 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:34:33 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4196
2025-08-29 21:34:33 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 6.10e-06
2025-08-29 21:34:33 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:34:46 - pico-train - INFO - Step 17300 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:34:46 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3639
2025-08-29 21:34:46 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.99e-06
2025-08-29 21:34:46 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:34:58 - pico-train - INFO - Step 17325 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:34:58 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3534
2025-08-29 21:34:58 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.88e-06
2025-08-29 21:34:58 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:35:11 - pico-train - INFO - Step 17350 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:35:11 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4196
2025-08-29 21:35:11 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.78e-06
2025-08-29 21:35:11 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:35:25 - pico-train - INFO - Step 17375 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:35:25 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4977
2025-08-29 21:35:25 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.67e-06
2025-08-29 21:35:25 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:35:38 - pico-train - INFO - Step 17400 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:35:38 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4950
2025-08-29 21:35:38 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.57e-06
2025-08-29 21:35:38 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:35:50 - pico-train - INFO - Step 17425 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:35:50 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4814
2025-08-29 21:35:50 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.47e-06
2025-08-29 21:35:50 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:36:03 - pico-train - INFO - Step 17450 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:36:03 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3986
2025-08-29 21:36:03 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.37e-06
2025-08-29 21:36:03 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:36:16 - pico-train - INFO - Step 17475 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:36:16 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3909
2025-08-29 21:36:16 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.27e-06
2025-08-29 21:36:16 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:36:28 - pico-train - INFO - Step 17500 -- ๐Ÿ’พ Saving Checkpoint
2025-08-29 21:38:26 - pico-train - INFO - Step 17500 -- ๐Ÿ“Š Evaluation Results
2025-08-29 21:38:26 - pico-train - INFO - โ””โ”€โ”€ paloma: 1.263308985203994e+24
2025-08-29 21:38:27 - pico-train - INFO - Step 17500 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:38:27 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3834
2025-08-29 21:38:27 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.17e-06
2025-08-29 21:38:27 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:38:27 - pico-train - INFO - Step 17500 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-29 21:38:42 - pico-train - INFO - Step 17525 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:38:42 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4161
2025-08-29 21:38:42 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.07e-06
2025-08-29 21:38:42 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:38:55 - pico-train - INFO - Step 17550 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:38:55 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3901
2025-08-29 21:38:55 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 21:38:55 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:39:07 - pico-train - INFO - Step 17575 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:39:07 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3559
2025-08-29 21:39:07 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 21:39:07 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:39:20 - pico-train - INFO - Step 17600 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:39:20 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3792
2025-08-29 21:39:20 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 21:39:20 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:39:33 - pico-train - INFO - Step 17625 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:39:33 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4937
2025-08-29 21:39:33 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 21:39:33 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:39:45 - pico-train - INFO - Step 17650 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:39:45 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4112
2025-08-29 21:39:45 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 21:39:45 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:39:58 - pico-train - INFO - Step 17675 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:39:58 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4311
2025-08-29 21:39:58 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 21:39:58 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:40:11 - pico-train - INFO - Step 17700 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:40:11 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4455
2025-08-29 21:40:11 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 21:40:11 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:40:23 - pico-train - INFO - Step 17725 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:40:23 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4370
2025-08-29 21:40:23 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 21:40:23 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:40:36 - pico-train - INFO - Step 17750 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:40:36 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3680
2025-08-29 21:40:36 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 21:40:36 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:40:49 - pico-train - INFO - Step 17775 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:40:49 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4037
2025-08-29 21:40:49 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 21:40:49 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:41:01 - pico-train - INFO - Step 17800 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:41:01 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3438
2025-08-29 21:41:01 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 21:41:01 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:41:14 - pico-train - INFO - Step 17825 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:41:14 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.5015
2025-08-29 21:41:14 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 21:41:14 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:41:27 - pico-train - INFO - Step 17850 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:41:27 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3998
2025-08-29 21:41:27 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 21:41:27 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:41:40 - pico-train - INFO - Step 17875 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:41:40 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3919
2025-08-29 21:41:40 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 21:41:40 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:41:53 - pico-train - INFO - Step 17900 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:41:53 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4118
2025-08-29 21:41:53 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 21:41:53 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:42:06 - pico-train - INFO - Step 17925 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:42:06 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4447
2025-08-29 21:42:06 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 21:42:06 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:42:18 - pico-train - INFO - Step 17950 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:42:18 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4113
2025-08-29 21:42:18 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 21:42:18 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:42:31 - pico-train - INFO - Step 17975 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:42:31 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3771
2025-08-29 21:42:31 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 21:42:31 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:42:43 - pico-train - INFO - Step 18000 -- ๐Ÿ’พ Saving Checkpoint
2025-08-29 21:44:54 - pico-train - INFO - Step 18000 -- ๐Ÿ“Š Evaluation Results
2025-08-29 21:44:54 - pico-train - INFO - โ””โ”€โ”€ paloma: 1.4237252986808802e+24
2025-08-29 21:44:55 - pico-train - INFO - Step 18000 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:44:55 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4250
2025-08-29 21:44:55 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 21:44:55 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:44:55 - pico-train - INFO - Step 18000 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-29 21:45:09 - pico-train - INFO - Step 18025 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:45:09 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4218
2025-08-29 21:45:09 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 21:45:09 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:45:22 - pico-train - INFO - Step 18050 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:45:22 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3776
2025-08-29 21:45:22 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 21:45:22 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:45:35 - pico-train - INFO - Step 18075 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:45:35 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3610
2025-08-29 21:45:35 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 21:45:35 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:45:47 - pico-train - INFO - Step 18100 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:45:47 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3657
2025-08-29 21:45:47 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 21:45:47 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:46:00 - pico-train - INFO - Step 18125 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:46:00 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4142
2025-08-29 21:46:00 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 21:46:00 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:46:13 - pico-train - INFO - Step 18150 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:46:13 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3405
2025-08-29 21:46:13 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 21:46:13 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:46:25 - pico-train - INFO - Step 18175 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:46:25 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4436
2025-08-29 21:46:25 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 21:46:25 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:46:38 - pico-train - INFO - Step 18200 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:46:38 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3272
2025-08-29 21:46:38 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 21:46:38 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:46:50 - pico-train - INFO - Step 18225 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:46:50 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4581
2025-08-29 21:46:50 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 21:46:50 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:47:03 - pico-train - INFO - Step 18250 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:47:03 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4066
2025-08-29 21:47:03 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 21:47:03 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:47:15 - pico-train - INFO - Step 18275 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:47:15 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4504
2025-08-29 21:47:15 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 21:47:15 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:47:28 - pico-train - INFO - Step 18300 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:47:28 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3557
2025-08-29 21:47:28 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 21:47:28 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:47:41 - pico-train - INFO - Step 18325 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:47:41 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3500
2025-08-29 21:47:41 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 21:47:41 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:47:53 - pico-train - INFO - Step 18350 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:47:53 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4390
2025-08-29 21:47:53 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 21:47:53 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:48:06 - pico-train - INFO - Step 18375 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:48:06 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3690
2025-08-29 21:48:06 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 21:48:06 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:48:19 - pico-train - INFO - Step 18400 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:48:19 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3101
2025-08-29 21:48:19 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 21:48:19 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:48:32 - pico-train - INFO - Step 18425 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:48:32 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3969
2025-08-29 21:48:32 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 21:48:32 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:48:44 - pico-train - INFO - Step 18450 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:48:44 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3362
2025-08-29 21:48:44 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 21:48:44 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:48:57 - pico-train - INFO - Step 18475 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:48:57 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3378
2025-08-29 21:48:57 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 21:48:57 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:49:09 - pico-train - INFO - Step 18500 -- ๐Ÿ’พ Saving Checkpoint
2025-08-29 21:51:03 - pico-train - INFO - Step 18500 -- ๐Ÿ“Š Evaluation Results
2025-08-29 21:51:03 - pico-train - INFO - โ””โ”€โ”€ paloma: 1.4819541114529576e+24
2025-08-29 21:51:05 - pico-train - INFO - Step 18500 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:51:05 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3848
2025-08-29 21:51:05 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 21:51:05 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:51:05 - pico-train - INFO - Step 18500 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-29 21:51:19 - pico-train - INFO - Step 18525 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:51:19 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3383
2025-08-29 21:51:19 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 21:51:19 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:51:32 - pico-train - INFO - Step 18550 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:51:32 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3507
2025-08-29 21:51:32 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 21:51:32 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:51:44 - pico-train - INFO - Step 18575 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:51:44 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4269
2025-08-29 21:51:44 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 21:51:44 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:51:57 - pico-train - INFO - Step 18600 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:51:57 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3890
2025-08-29 21:51:57 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 21:51:57 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:52:10 - pico-train - INFO - Step 18625 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:52:10 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3352
2025-08-29 21:52:10 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 21:52:10 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:52:22 - pico-train - INFO - Step 18650 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:52:22 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3953
2025-08-29 21:52:22 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 21:52:22 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:52:35 - pico-train - INFO - Step 18675 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:52:35 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4438
2025-08-29 21:52:35 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 21:52:35 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:52:47 - pico-train - INFO - Step 18700 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:52:47 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4212
2025-08-29 21:52:47 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 21:52:47 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:53:00 - pico-train - INFO - Step 18725 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:53:00 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3805
2025-08-29 21:53:00 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 21:53:00 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:53:12 - pico-train - INFO - Step 18750 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:53:12 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3933
2025-08-29 21:53:12 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 21:53:12 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:53:25 - pico-train - INFO - Step 18775 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:53:25 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3642
2025-08-29 21:53:25 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 21:53:25 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:53:38 - pico-train - INFO - Step 18800 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:53:38 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3683
2025-08-29 21:53:38 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 21:53:38 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:53:50 - pico-train - INFO - Step 18825 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:53:50 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4197
2025-08-29 21:53:50 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 21:53:50 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:54:03 - pico-train - INFO - Step 18850 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:54:03 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3640
2025-08-29 21:54:03 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 21:54:03 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:54:16 - pico-train - INFO - Step 18875 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:54:16 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3246
2025-08-29 21:54:16 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 21:54:16 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:54:29 - pico-train - INFO - Step 18900 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:54:29 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4408
2025-08-29 21:54:29 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 21:54:29 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:54:41 - pico-train - INFO - Step 18925 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:54:41 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4179
2025-08-29 21:54:41 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 21:54:41 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:54:54 - pico-train - INFO - Step 18950 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:54:54 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4054
2025-08-29 21:54:54 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 21:54:54 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:55:06 - pico-train - INFO - Step 18975 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:55:06 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3416
2025-08-29 21:55:06 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 21:55:06 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:55:18 - pico-train - INFO - Step 19000 -- ๐Ÿ’พ Saving Checkpoint
2025-08-29 21:57:12 - pico-train - INFO - Step 19000 -- ๐Ÿ“Š Evaluation Results
2025-08-29 21:57:12 - pico-train - INFO - โ””โ”€โ”€ paloma: 1.5800319204327158e+24
2025-08-29 21:57:14 - pico-train - INFO - Step 19000 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:57:14 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3180
2025-08-29 21:57:14 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 21:57:14 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:57:14 - pico-train - INFO - Step 19000 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-29 21:57:28 - pico-train - INFO - Step 19025 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:57:28 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3237
2025-08-29 21:57:28 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 21:57:28 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:57:41 - pico-train - INFO - Step 19050 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:57:41 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3803
2025-08-29 21:57:41 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 21:57:41 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:57:53 - pico-train - INFO - Step 19075 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:57:53 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4063
2025-08-29 21:57:53 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 21:57:53 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:58:06 - pico-train - INFO - Step 19100 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:58:06 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3269
2025-08-29 21:58:06 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 21:58:06 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:58:19 - pico-train - INFO - Step 19125 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:58:19 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2963
2025-08-29 21:58:19 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 21:58:19 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:58:32 - pico-train - INFO - Step 19150 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:58:32 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4027
2025-08-29 21:58:32 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 21:58:32 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:58:44 - pico-train - INFO - Step 19175 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:58:44 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4356
2025-08-29 21:58:44 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 21:58:44 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:58:57 - pico-train - INFO - Step 19200 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:58:57 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3680
2025-08-29 21:58:57 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 21:58:57 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:59:10 - pico-train - INFO - Step 19225 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:59:10 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3971
2025-08-29 21:59:10 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 21:59:10 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:59:22 - pico-train - INFO - Step 19250 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:59:22 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3903
2025-08-29 21:59:22 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 21:59:22 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:59:35 - pico-train - INFO - Step 19275 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:59:35 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3859
2025-08-29 21:59:35 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 21:59:35 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 21:59:47 - pico-train - INFO - Step 19300 -- ๐Ÿ”„ Training Metrics
2025-08-29 21:59:47 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3196
2025-08-29 21:59:47 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 21:59:47 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 22:00:00 - pico-train - INFO - Step 19325 -- ๐Ÿ”„ Training Metrics
2025-08-29 22:00:00 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3837
2025-08-29 22:00:00 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 22:00:00 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 22:00:13 - pico-train - INFO - Step 19350 -- ๐Ÿ”„ Training Metrics
2025-08-29 22:00:13 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3884
2025-08-29 22:00:13 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 22:00:13 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 22:00:26 - pico-train - INFO - Step 19375 -- ๐Ÿ”„ Training Metrics
2025-08-29 22:00:26 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3896
2025-08-29 22:00:26 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 22:00:26 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 22:00:39 - pico-train - INFO - Step 19400 -- ๐Ÿ”„ Training Metrics
2025-08-29 22:00:39 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3826
2025-08-29 22:00:39 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 22:00:39 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 22:00:51 - pico-train - INFO - Step 19425 -- ๐Ÿ”„ Training Metrics
2025-08-29 22:00:51 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3350
2025-08-29 22:00:51 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 22:00:51 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 22:01:04 - pico-train - INFO - Step 19450 -- ๐Ÿ”„ Training Metrics
2025-08-29 22:01:04 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3507
2025-08-29 22:01:04 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 22:01:04 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 22:01:17 - pico-train - INFO - Step 19475 -- ๐Ÿ”„ Training Metrics
2025-08-29 22:01:17 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4104
2025-08-29 22:01:17 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 22:01:17 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 22:01:29 - pico-train - INFO - Step 19500 -- ๐Ÿ’พ Saving Checkpoint
2025-08-29 22:03:25 - pico-train - INFO - Step 19500 -- ๐Ÿ“Š Evaluation Results
2025-08-29 22:03:25 - pico-train - INFO - โ””โ”€โ”€ paloma: 1.7237442602041698e+24
2025-08-29 22:03:26 - pico-train - INFO - Step 19500 -- ๐Ÿ”„ Training Metrics
2025-08-29 22:03:26 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4120
2025-08-29 22:03:26 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 22:03:26 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 22:03:26 - pico-train - INFO - Step 19500 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-29 22:03:41 - pico-train - INFO - Step 19525 -- ๐Ÿ”„ Training Metrics
2025-08-29 22:03:41 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4176
2025-08-29 22:03:41 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 22:03:41 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 22:03:54 - pico-train - INFO - Step 19550 -- ๐Ÿ”„ Training Metrics
2025-08-29 22:03:54 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3271
2025-08-29 22:03:54 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 22:03:54 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 22:04:06 - pico-train - INFO - Step 19575 -- ๐Ÿ”„ Training Metrics
2025-08-29 22:04:06 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3965
2025-08-29 22:04:06 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 22:04:06 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 22:04:19 - pico-train - INFO - Step 19600 -- ๐Ÿ”„ Training Metrics
2025-08-29 22:04:19 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3484
2025-08-29 22:04:19 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 22:04:19 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 22:04:31 - pico-train - INFO - Step 19625 -- ๐Ÿ”„ Training Metrics
2025-08-29 22:04:31 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3829
2025-08-29 22:04:31 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 22:04:31 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 22:04:44 - pico-train - INFO - Step 19650 -- ๐Ÿ”„ Training Metrics
2025-08-29 22:04:44 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3032
2025-08-29 22:04:44 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 22:04:44 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 22:04:57 - pico-train - INFO - Step 19675 -- ๐Ÿ”„ Training Metrics
2025-08-29 22:04:57 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3678
2025-08-29 22:04:57 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 22:04:57 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 22:05:09 - pico-train - INFO - Step 19700 -- ๐Ÿ”„ Training Metrics
2025-08-29 22:05:09 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3765
2025-08-29 22:05:09 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 22:05:09 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 22:05:22 - pico-train - INFO - Step 19725 -- ๐Ÿ”„ Training Metrics
2025-08-29 22:05:22 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3547
2025-08-29 22:05:22 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 22:05:22 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 22:05:34 - pico-train - INFO - Step 19750 -- ๐Ÿ”„ Training Metrics
2025-08-29 22:05:34 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4484
2025-08-29 22:05:34 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 22:05:34 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 22:05:47 - pico-train - INFO - Step 19775 -- ๐Ÿ”„ Training Metrics
2025-08-29 22:05:47 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3166
2025-08-29 22:05:47 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 22:05:47 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 22:06:00 - pico-train - INFO - Step 19800 -- ๐Ÿ”„ Training Metrics
2025-08-29 22:06:00 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4591
2025-08-29 22:06:00 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 22:06:00 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 22:06:12 - pico-train - INFO - Step 19825 -- ๐Ÿ”„ Training Metrics
2025-08-29 22:06:12 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4335
2025-08-29 22:06:12 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 22:06:12 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 22:06:25 - pico-train - INFO - Step 19850 -- ๐Ÿ”„ Training Metrics
2025-08-29 22:06:25 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4255
2025-08-29 22:06:25 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 22:06:25 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 22:06:38 - pico-train - INFO - Step 19875 -- ๐Ÿ”„ Training Metrics
2025-08-29 22:06:38 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4091
2025-08-29 22:06:38 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 22:06:38 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 22:06:51 - pico-train - INFO - Step 19900 -- ๐Ÿ”„ Training Metrics
2025-08-29 22:06:51 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3115
2025-08-29 22:06:51 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 22:06:51 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 22:07:03 - pico-train - INFO - Step 19925 -- ๐Ÿ”„ Training Metrics
2025-08-29 22:07:03 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4247
2025-08-29 22:07:03 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 22:07:03 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 22:07:16 - pico-train - INFO - Step 19950 -- ๐Ÿ”„ Training Metrics
2025-08-29 22:07:16 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4009
2025-08-29 22:07:16 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 22:07:16 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 22:07:28 - pico-train - INFO - Step 19975 -- ๐Ÿ”„ Training Metrics
2025-08-29 22:07:28 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4714
2025-08-29 22:07:28 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 22:07:28 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 22:07:40 - pico-train - INFO - Step 20000 -- ๐Ÿ’พ Saving Checkpoint
2025-08-29 22:11:41 - pico-train - INFO - Step 20000 -- ๐Ÿ“Š Evaluation Results
2025-08-29 22:11:41 - pico-train - INFO - โ””โ”€โ”€ paloma: 1.8399778163273925e+24
2025-08-29 22:11:42 - pico-train - INFO - ๐ŸŽ‰ Training complete! Final step: 20000