/usr/local/lib/python3.12/dist-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. import pynvml # type: ignore[import] W0129 13:15:47.092000 283685 torch/distributed/run.py:803] W0129 13:15:47.092000 283685 torch/distributed/run.py:803] ***************************************** W0129 13:15:47.092000 283685 torch/distributed/run.py:803] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. W0129 13:15:47.092000 283685 torch/distributed/run.py:803] ***************************************** /usr/local/lib/python3.12/dist-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. import pynvml # type: ignore[import] /usr/local/lib/python3.12/dist-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. import pynvml # type: ignore[import] /usr/local/lib/python3.12/dist-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. import pynvml # type: ignore[import] /usr/local/lib/python3.12/dist-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. import pynvml # type: ignore[import] /usr/local/lib/python3.12/dist-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. import pynvml # type: ignore[import] /usr/local/lib/python3.12/dist-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. import pynvml # type: ignore[import] /usr/local/lib/python3.12/dist-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. import pynvml # type: ignore[import] Trainer._get_train_sampler replaced with custom implementation. Trainer._get_train_sampler replaced with custom implementation. [2026-01-29 13:15:56,852] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2026-01-29 13:15:56,859] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) Trainer._get_train_sampler replaced with custom implementation. Trainer._get_train_sampler replaced with custom implementation. Trainer._get_train_sampler replaced with custom implementation. [2026-01-29 13:15:57,035] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) Trainer._get_train_sampler replaced with custom implementation. [2026-01-29 13:15:57,071] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2026-01-29 13:15:57,102] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) Trainer._get_train_sampler replaced with custom implementation. [2026-01-29 13:15:57,145] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2026-01-29 13:15:57,191] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2026-01-29 13:15:57,953] [INFO] [comm.py:658:init_distributed] cdb=None [2026-01-29 13:15:58,006] [INFO] [comm.py:658:init_distributed] cdb=None [2026-01-29 13:15:58,089] [INFO] [comm.py:658:init_distributed] cdb=None [2026-01-29 13:15:58,124] [INFO] [comm.py:658:init_distributed] cdb=None [2026-01-29 13:15:58,145] [INFO] [comm.py:658:init_distributed] cdb=None [2026-01-29 13:15:58,151] [INFO] [comm.py:658:init_distributed] cdb=None [2026-01-29 13:15:58,253] [INFO] [comm.py:658:init_distributed] cdb=None [2026-01-29 13:15:58,254] [INFO] [comm.py:689:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl Warning: FlashAttention 3 is not available, falling back to PyTorch's scaled_dot_product_attention Warning: FlashAttention 3 is not available, falling back to PyTorch's scaled_dot_product_attention Warning: FlashAttention 3 is not available, falling back to PyTorch's scaled_dot_product_attention Warning: FlashAttention 3 is not available, falling back to PyTorch's scaled_dot_product_attention Warning: FlashAttention 3 is not available, falling back to PyTorch's scaled_dot_product_attention Warning: FlashAttention 3 is not available, falling back to PyTorch's scaled_dot_product_attention Warning: FlashAttention 3 is not available, falling back to PyTorch's scaled_dot_product_attention You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. Loading checkpoint shards: 0%| | 0/2 [00:00