| /home/junrushao/micromamba/envs/python311/bin/python -m mlc_chat gen_config /opt/scratch/assets/gpt_bigcode-santacoder --quantization q4f16_1 --conv-template LM --output /home/junrushao/tmp/tmp50xzwpqb --context-window-size 2048 |
| [2024-01-08 19:18:37] INFO auto_config.py:115: [92mFound[0m model configuration: /opt/scratch/assets/gpt_bigcode-santacoder/config.json |
| [2024-01-08 19:18:37] INFO auto_config.py:151: [92mFound[0m model type: [1mgpt_bigcode[0m. Use `--model-type` to override. |
| [2024-01-08 19:18:37] INFO gpt_bigcode_model.py:41: [1mcontext_window_size[0m not found in config.json. Falling back to [1mn_positions[0m (2048) |
| [2024-01-08 19:18:37] INFO gpt_bigcode_model.py:54: [1mprefill_chunk_size[0m defaults to [1mcontext_window_size[0m (2048) |
| [2024-01-08 19:18:37] INFO compiler_flags.py:118: Overriding [1mcontext_window_size[0m from 2048 to 2048 |
| [2024-01-08 19:18:37] INFO gen_config.py:119: [91mNot found[0m generation_config.json: /opt/scratch/assets/gpt_bigcode-santacoder/generation_config.json |
| [2024-01-08 19:18:37] INFO gen_config.py:117: [config.json] Setting [1mbos_token_id[0m: 49152 |
| [2024-01-08 19:18:37] INFO gen_config.py:117: [config.json] Setting [1meos_token_id[0m: 49152 |
| [2024-01-08 19:18:37] INFO gen_config.py:131: [91mNot found[0m tokenizer config: /opt/scratch/assets/gpt_bigcode-santacoder/tokenizer.model |
| [2024-01-08 19:18:37] INFO gen_config.py:129: [92mFound[0m tokenizer config: /opt/scratch/assets/gpt_bigcode-santacoder/tokenizer.json. Copying to [1m/home/junrushao/tmp/tmp50xzwpqb/tokenizer.json[0m |
| [2024-01-08 19:18:37] INFO gen_config.py:131: [91mNot found[0m tokenizer config: /opt/scratch/assets/gpt_bigcode-santacoder/vocab.json |
| [2024-01-08 19:18:37] INFO gen_config.py:131: [91mNot found[0m tokenizer config: /opt/scratch/assets/gpt_bigcode-santacoder/merges.txt |
| [2024-01-08 19:18:37] INFO gen_config.py:131: [91mNot found[0m tokenizer config: /opt/scratch/assets/gpt_bigcode-santacoder/added_tokens.json |
| [2024-01-08 19:18:37] INFO gen_config.py:129: [92mFound[0m tokenizer config: /opt/scratch/assets/gpt_bigcode-santacoder/tokenizer_config.json. Copying to [1m/home/junrushao/tmp/tmp50xzwpqb/tokenizer_config.json[0m |
| [2024-01-08 19:18:37] INFO gen_config.py:70: [System default] Setting [1mpad_token_id[0m: 0 |
| [2024-01-08 19:18:37] INFO gen_config.py:70: [System default] Setting [1mtemperature[0m: 0.7 |
| [2024-01-08 19:18:37] INFO gen_config.py:70: [System default] Setting [1mrepetition_penalty[0m: 1.0 |
| [2024-01-08 19:18:37] INFO gen_config.py:70: [System default] Setting [1mtop_p[0m: 0.95 |
| [2024-01-08 19:18:37] INFO gen_config.py:70: [System default] Setting [1mmean_gen_len[0m: 128 |
| [2024-01-08 19:18:37] INFO gen_config.py:70: [System default] Setting [1mmax_gen_len[0m: 512 |
| [2024-01-08 19:18:37] INFO gen_config.py:70: [System default] Setting [1mshift_fill_factor[0m: 0.3 |
| [2024-01-08 19:18:37] INFO gen_config.py:159: Dumping configuration file to: [1m/home/junrushao/tmp/tmp50xzwpqb/mlc-chat-config.json[0m |
| /home/junrushao/micromamba/envs/python311/bin/python -m mlc_chat convert_weight /opt/scratch/assets/gpt_bigcode-santacoder --quantization q4f16_1 --source-format auto --output /home/junrushao/tmp/tmp50xzwpqb |
| [2024-01-08 19:18:38] INFO auto_config.py:115: [92mFound[0m model configuration: /opt/scratch/assets/gpt_bigcode-santacoder/config.json |
| [2024-01-08 19:18:38] INFO auto_device.py:76: [92mFound[0m device: cuda:0 |
| [2024-01-08 19:18:38] INFO auto_device.py:76: [92mFound[0m device: cuda:1 |
| [2024-01-08 19:18:38] INFO auto_device.py:76: [92mFound[0m device: cuda:2 |
| [2024-01-08 19:18:38] INFO auto_device.py:76: [92mFound[0m device: cuda:3 |
| [2024-01-08 19:18:39] INFO auto_device.py:85: [91mNot found[0m device: rocm:0 |
| [2024-01-08 19:18:39] INFO auto_device.py:85: [91mNot found[0m device: metal:0 |
| [2024-01-08 19:18:40] INFO auto_device.py:85: [91mNot found[0m device: vulkan:0 |
| [2024-01-08 19:18:40] INFO auto_device.py:85: [91mNot found[0m device: opencl:0 |
| [2024-01-08 19:18:40] INFO auto_device.py:33: Using device: [1mcuda:0[0m |
| [2024-01-08 19:18:40] INFO auto_weight.py:70: Finding weights in: /opt/scratch/assets/gpt_bigcode-santacoder |
| [2024-01-08 19:18:40] INFO auto_weight.py:129: [92mFound[0m source weight format: huggingface-torch. Source configuration: /opt/scratch/assets/gpt_bigcode-santacoder/pytorch_model.bin |
| [2024-01-08 19:18:40] INFO auto_weight.py:143: [92mFound[0m source weight format: huggingface-safetensor. Source configuration: /opt/scratch/assets/gpt_bigcode-santacoder/model.safetensors.index.json |
| [2024-01-08 19:18:40] INFO auto_weight.py:106: Using source weight configuration: [1m/opt/scratch/assets/gpt_bigcode-santacoder/pytorch_model.bin[0m. Use `--source` to override. |
| [2024-01-08 19:18:40] INFO auto_weight.py:110: Using source weight format: [1mhuggingface-torch[0m. Use `--source-format` to override. |
| [2024-01-08 19:18:40] INFO auto_config.py:151: [92mFound[0m model type: [1mgpt_bigcode[0m. Use `--model-type` to override. |
| [2024-01-08 19:18:40] INFO gpt_bigcode_model.py:41: [1mcontext_window_size[0m not found in config.json. Falling back to [1mn_positions[0m (2048) |
| [2024-01-08 19:18:40] INFO gpt_bigcode_model.py:54: [1mprefill_chunk_size[0m defaults to [1mcontext_window_size[0m (2048) |
| [2024-01-08 19:18:43] INFO huggingface_loader.py:169: Loading HF parameters from: /opt/scratch/assets/gpt_bigcode-santacoder/pytorch_model.bin |
| [1mWeight conversion with arguments:[0m |
| [1m--config[0m /opt/scratch/assets/gpt_bigcode-santacoder/config.json |
| [1m--quantization[0m GroupQuantize(name='q4f16_1', kind='group-quant', group_size=32, quantize_dtype='int4', storage_dtype='uint32', model_dtype='float16', num_elem_per_storage=8, num_storage_per_group=4, max_int_value=7) |
| [1m--model-type[0m gpt_bigcode |
| [1m--device[0m cuda:0 |
| [1m--source[0m /opt/scratch/assets/gpt_bigcode-santacoder/pytorch_model.bin |
| [1m--source-format[0m huggingface-torch |
| [1m--output[0m /home/junrushao/tmp/tmp50xzwpqb |
|
0%| | 0/293 [00:00<?, ?it/s]
[2024-01-08 19:18:46] INFO group_quantization.py:212: Compiling quantize function for key: (49280, 2048, 'float16', 'cuda') |
|
0%| | 0/293 [00:00<?, ?it/s]
[2024-01-08 19:18:47] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.wte.q_weight[0m", shape: (49280, 256), dtype: uint32 |
|
0%| | 0/293 [00:01<?, ?it/s]
[2024-01-08 19:18:47] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.wte.q_scale[0m", shape: (49280, 64), dtype: float16 |
|
0%| | 0/293 [00:01<?, ?it/s]
0%|β | 1/293 [00:01<07:35, 1.56s/it]
[2024-01-08 19:18:47] INFO group_quantization.py:212: Compiling quantize function for key: (2048, 2048, 'float16', 'cuda') |
|
0%|β | 1/293 [00:01<07:35, 1.56s/it]
[2024-01-08 19:18:48] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.wpe.q_weight[0m", shape: (2048, 256), dtype: uint32 |
|
0%|β | 1/293 [00:02<07:35, 1.56s/it]
[2024-01-08 19:18:48] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.wpe.q_scale[0m", shape: (2048, 64), dtype: float16 |
|
0%|β | 1/293 [00:02<07:35, 1.56s/it]
1%|ββ | 2/293 [00:02<04:32, 1.07it/s]
[2024-01-08 19:18:48] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.0.ln_1.weight[0m", shape: (2048,), dtype: float16 |
|
1%|ββ | 2/293 [00:02<04:32, 1.07it/s]
[2024-01-08 19:18:48] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.0.ln_1.bias[0m", shape: (2048,), dtype: float16 |
|
1%|ββ | 2/293 [00:02<04:32, 1.07it/s]
[2024-01-08 19:18:48] INFO group_quantization.py:212: Compiling quantize function for key: (2304, 2048, 'float16', 'cuda') |
|
1%|ββ | 2/293 [00:02<04:32, 1.07it/s]
[2024-01-08 19:18:48] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.0.attn.c_attn.q_weight[0m", shape: (2304, 256), dtype: uint32 |
|
1%|ββ | 2/293 [00:02<04:32, 1.07it/s]
[2024-01-08 19:18:48] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.0.attn.c_attn.q_scale[0m", shape: (2304, 64), dtype: float16 |
|
1%|ββ | 2/293 [00:02<04:32, 1.07it/s]
2%|ββββ | 5/293 [00:02<01:51, 2.58it/s]
[2024-01-08 19:18:48] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.0.attn.c_attn.bias[0m", shape: (2304,), dtype: float16 |
|
2%|ββββ | 5/293 [00:02<01:51, 2.58it/s]
[2024-01-08 19:18:48] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.0.attn.c_proj.q_weight[0m", shape: (2048, 256), dtype: uint32 |
|
2%|ββββ | 5/293 [00:02<01:51, 2.58it/s]
[2024-01-08 19:18:48] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.0.attn.c_proj.q_scale[0m", shape: (2048, 64), dtype: float16 |
|
2%|ββββ | 5/293 [00:02<01:51, 2.58it/s]
[2024-01-08 19:18:48] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.0.attn.c_proj.bias[0m", shape: (2048,), dtype: float16 |
|
2%|ββββ | 5/293 [00:02<01:51, 2.58it/s]
[2024-01-08 19:18:48] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.0.ln_2.weight[0m", shape: (2048,), dtype: float16 |
|
2%|ββββ | 5/293 [00:02<01:51, 2.58it/s]
[2024-01-08 19:18:48] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.0.ln_2.bias[0m", shape: (2048,), dtype: float16 |
|
2%|ββββ | 5/293 [00:02<01:51, 2.58it/s]
[2024-01-08 19:18:48] INFO group_quantization.py:212: Compiling quantize function for key: (8192, 2048, 'float16', 'cuda') |
|
2%|ββββ | 5/293 [00:02<01:51, 2.58it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.0.mlp.c_fc.q_weight[0m", shape: (8192, 256), dtype: uint32 |
|
2%|ββββ | 5/293 [00:03<01:51, 2.58it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.0.mlp.c_fc.q_scale[0m", shape: (8192, 64), dtype: float16 |
|
2%|ββββ | 5/293 [00:03<01:51, 2.58it/s]
4%|ββββββββ | 11/293 [00:03<00:52, 5.32it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.0.mlp.c_fc.bias[0m", shape: (8192,), dtype: float16 |
|
4%|ββββββββ | 11/293 [00:03<00:52, 5.32it/s]
[2024-01-08 19:18:49] INFO group_quantization.py:212: Compiling quantize function for key: (2048, 8192, 'float16', 'cuda') |
|
4%|ββββββββ | 11/293 [00:03<00:52, 5.32it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.0.mlp.c_proj.q_weight[0m", shape: (2048, 1024), dtype: uint32 |
|
4%|ββββββββ | 11/293 [00:03<00:52, 5.32it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.0.mlp.c_proj.q_scale[0m", shape: (2048, 256), dtype: float16 |
|
4%|ββββββββ | 11/293 [00:03<00:52, 5.32it/s]
4%|βββββββββ | 13/293 [00:03<00:57, 4.88it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.0.mlp.c_proj.bias[0m", shape: (2048,), dtype: float16 |
|
4%|βββββββββ | 13/293 [00:03<00:57, 4.88it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.1.ln_1.weight[0m", shape: (2048,), dtype: float16 |
|
4%|βββββββββ | 13/293 [00:03<00:57, 4.88it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.1.ln_1.bias[0m", shape: (2048,), dtype: float16 |
|
4%|βββββββββ | 13/293 [00:03<00:57, 4.88it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.1.attn.c_attn.q_weight[0m", shape: (2304, 256), dtype: uint32 |
|
4%|βββββββββ | 13/293 [00:03<00:57, 4.88it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.1.attn.c_attn.q_scale[0m", shape: (2304, 64), dtype: float16 |
|
4%|βββββββββ | 13/293 [00:03<00:57, 4.88it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.1.attn.c_attn.bias[0m", shape: (2304,), dtype: float16 |
|
4%|βββββββββ | 13/293 [00:03<00:57, 4.88it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.1.attn.c_proj.q_weight[0m", shape: (2048, 256), dtype: uint32 |
|
4%|βββββββββ | 13/293 [00:03<00:57, 4.88it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.1.attn.c_proj.q_scale[0m", shape: (2048, 64), dtype: float16 |
|
4%|βββββββββ | 13/293 [00:03<00:57, 4.88it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.1.attn.c_proj.bias[0m", shape: (2048,), dtype: float16 |
|
4%|βββββββββ | 13/293 [00:03<00:57, 4.88it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.1.ln_2.weight[0m", shape: (2048,), dtype: float16 |
|
4%|βββββββββ | 13/293 [00:03<00:57, 4.88it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.1.ln_2.bias[0m", shape: (2048,), dtype: float16 |
|
4%|βββββββββ | 13/293 [00:03<00:57, 4.88it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.1.mlp.c_fc.q_weight[0m", shape: (8192, 256), dtype: uint32 |
|
4%|βββββββββ | 13/293 [00:03<00:57, 4.88it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.1.mlp.c_fc.q_scale[0m", shape: (8192, 64), dtype: float16 |
|
4%|βββββββββ | 13/293 [00:03<00:57, 4.88it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.1.mlp.c_fc.bias[0m", shape: (8192,), dtype: float16 |
|
4%|βββββββββ | 13/293 [00:03<00:57, 4.88it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.1.mlp.c_proj.q_weight[0m", shape: (2048, 1024), dtype: uint32 |
|
4%|βββββββββ | 13/293 [00:03<00:57, 4.88it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.1.mlp.c_proj.q_scale[0m", shape: (2048, 256), dtype: float16 |
|
4%|βββββββββ | 13/293 [00:03<00:57, 4.88it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.1.mlp.c_proj.bias[0m", shape: (2048,), dtype: float16 |
|
4%|βββββββββ | 13/293 [00:03<00:57, 4.88it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.2.ln_1.weight[0m", shape: (2048,), dtype: float16 |
|
4%|βββββββββ | 13/293 [00:03<00:57, 4.88it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.2.ln_1.bias[0m", shape: (2048,), dtype: float16 |
|
4%|βββββββββ | 13/293 [00:03<00:57, 4.88it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.2.attn.c_attn.q_weight[0m", shape: (2304, 256), dtype: uint32 |
|
4%|βββββββββ | 13/293 [00:03<00:57, 4.88it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.2.attn.c_attn.q_scale[0m", shape: (2304, 64), dtype: float16 |
|
4%|βββββββββ | 13/293 [00:03<00:57, 4.88it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.2.attn.c_attn.bias[0m", shape: (2304,), dtype: float16 |
|
4%|βββββββββ | 13/293 [00:03<00:57, 4.88it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.2.attn.c_proj.q_weight[0m", shape: (2048, 256), dtype: uint32 |
|
4%|βββββββββ | 13/293 [00:03<00:57, 4.88it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.2.attn.c_proj.q_scale[0m", shape: (2048, 64), dtype: float16 |
|
4%|βββββββββ | 13/293 [00:03<00:57, 4.88it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.2.attn.c_proj.bias[0m", shape: (2048,), dtype: float16 |
|
4%|βββββββββ | 13/293 [00:03<00:57, 4.88it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.2.ln_2.weight[0m", shape: (2048,), dtype: float16 |
|
4%|βββββββββ | 13/293 [00:03<00:57, 4.88it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.2.ln_2.bias[0m", shape: (2048,), dtype: float16 |
|
4%|βββββββββ | 13/293 [00:03<00:57, 4.88it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.2.mlp.c_fc.q_weight[0m", shape: (8192, 256), dtype: uint32 |
|
4%|βββββββββ | 13/293 [00:03<00:57, 4.88it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.2.mlp.c_fc.q_scale[0m", shape: (8192, 64), dtype: float16 |
|
4%|βββββββββ | 13/293 [00:03<00:57, 4.88it/s]
12%|ββββββββββββββββββββββββ | 35/293 [00:03<00:12, 21.43it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.2.mlp.c_fc.bias[0m", shape: (8192,), dtype: float16 |
|
12%|ββββββββββββββββββββββββ | 35/293 [00:03<00:12, 21.43it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.2.mlp.c_proj.q_weight[0m", shape: (2048, 1024), dtype: uint32 |
|
12%|ββββββββββββββββββββββββ | 35/293 [00:03<00:12, 21.43it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.2.mlp.c_proj.q_scale[0m", shape: (2048, 256), dtype: float16 |
|
12%|ββββββββββββββββββββββββ | 35/293 [00:03<00:12, 21.43it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.2.mlp.c_proj.bias[0m", shape: (2048,), dtype: float16 |
|
12%|ββββββββββββββββββββββββ | 35/293 [00:03<00:12, 21.43it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.3.ln_1.weight[0m", shape: (2048,), dtype: float16 |
|
12%|ββββββββββββββββββββββββ | 35/293 [00:03<00:12, 21.43it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.3.ln_1.bias[0m", shape: (2048,), dtype: float16 |
|
12%|ββββββββββββββββββββββββ | 35/293 [00:03<00:12, 21.43it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.3.attn.c_attn.q_weight[0m", shape: (2304, 256), dtype: uint32 |
|
12%|ββββββββββββββββββββββββ | 35/293 [00:03<00:12, 21.43it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.3.attn.c_attn.q_scale[0m", shape: (2304, 64), dtype: float16 |
|
12%|ββββββββββββββββββββββββ | 35/293 [00:03<00:12, 21.43it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.3.attn.c_attn.bias[0m", shape: (2304,), dtype: float16 |
|
12%|ββββββββββββββββββββββββ | 35/293 [00:03<00:12, 21.43it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.3.attn.c_proj.q_weight[0m", shape: (2048, 256), dtype: uint32 |
|
12%|ββββββββββββββββββββββββ | 35/293 [00:03<00:12, 21.43it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.3.attn.c_proj.q_scale[0m", shape: (2048, 64), dtype: float16 |
|
12%|ββββββββββββββββββββββββ | 35/293 [00:03<00:12, 21.43it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.3.attn.c_proj.bias[0m", shape: (2048,), dtype: float16 |
|
12%|ββββββββββββββββββββββββ | 35/293 [00:03<00:12, 21.43it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.3.ln_2.weight[0m", shape: (2048,), dtype: float16 |
|
12%|ββββββββββββββββββββββββ | 35/293 [00:03<00:12, 21.43it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.3.ln_2.bias[0m", shape: (2048,), dtype: float16 |
|
12%|ββββββββββββββββββββββββ | 35/293 [00:03<00:12, 21.43it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.3.mlp.c_fc.q_weight[0m", shape: (8192, 256), dtype: uint32 |
|
12%|ββββββββββββββββββββββββ | 35/293 [00:03<00:12, 21.43it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.3.mlp.c_fc.q_scale[0m", shape: (8192, 64), dtype: float16 |
|
12%|ββββββββββββββββββββββββ | 35/293 [00:03<00:12, 21.43it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.3.mlp.c_fc.bias[0m", shape: (8192,), dtype: float16 |
|
12%|ββββββββββββββββββββββββ | 35/293 [00:03<00:12, 21.43it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.3.mlp.c_proj.q_weight[0m", shape: (2048, 1024), dtype: uint32 |
|
12%|ββββββββββββββββββββββββ | 35/293 [00:03<00:12, 21.43it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.3.mlp.c_proj.q_scale[0m", shape: (2048, 256), dtype: float16 |
|
12%|ββββββββββββββββββββββββ | 35/293 [00:03<00:12, 21.43it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.3.mlp.c_proj.bias[0m", shape: (2048,), dtype: float16 |
|
12%|ββββββββββββββββββββββββ | 35/293 [00:03<00:12, 21.43it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.4.ln_1.weight[0m", shape: (2048,), dtype: float16 |
|
12%|ββββββββββββββββββββββββ | 35/293 [00:03<00:12, 21.43it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.4.ln_1.bias[0m", shape: (2048,), dtype: float16 |
|
12%|ββββββββββββββββββββββββ | 35/293 [00:03<00:12, 21.43it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.4.attn.c_attn.q_weight[0m", shape: (2304, 256), dtype: uint32 |
|
12%|ββββββββββββββββββββββββ | 35/293 [00:03<00:12, 21.43it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.4.attn.c_attn.q_scale[0m", shape: (2304, 64), dtype: float16 |
|
12%|ββββββββββββββββββββββββ | 35/293 [00:03<00:12, 21.43it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.4.attn.c_attn.bias[0m", shape: (2304,), dtype: float16 |
|
12%|ββββββββββββββββββββββββ | 35/293 [00:03<00:12, 21.43it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.4.attn.c_proj.q_weight[0m", shape: (2048, 256), dtype: uint32 |
|
12%|ββββββββββββββββββββββββ | 35/293 [00:03<00:12, 21.43it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.4.attn.c_proj.q_scale[0m", shape: (2048, 64), dtype: float16 |
|
12%|ββββββββββββββββββββββββ | 35/293 [00:03<00:12, 21.43it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.4.attn.c_proj.bias[0m", shape: (2048,), dtype: float16 |
|
12%|ββββββββββββββββββββββββ | 35/293 [00:03<00:12, 21.43it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.4.ln_2.weight[0m", shape: (2048,), dtype: float16 |
|
12%|ββββββββββββββββββββββββ | 35/293 [00:03<00:12, 21.43it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.4.ln_2.bias[0m", shape: (2048,), dtype: float16 |
|
12%|ββββββββββββββββββββββββ | 35/293 [00:03<00:12, 21.43it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.4.mlp.c_fc.q_weight[0m", shape: (8192, 256), dtype: uint32 |
|
12%|ββββββββββββββββββββββββ | 35/293 [00:03<00:12, 21.43it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.4.mlp.c_fc.q_scale[0m", shape: (8192, 64), dtype: float16 |
|
12%|ββββββββββββββββββββββββ | 35/293 [00:03<00:12, 21.43it/s]
20%|ββββββββββββββββββββββββββββββββββββββββ | 59/293 [00:03<00:05, 42.79it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.4.mlp.c_fc.bias[0m", shape: (8192,), dtype: float16 |
|
20%|ββββββββββββββββββββββββββββββββββββββββ | 59/293 [00:03<00:05, 42.79it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.4.mlp.c_proj.q_weight[0m", shape: (2048, 1024), dtype: uint32 |
|
20%|ββββββββββββββββββββββββββββββββββββββββ | 59/293 [00:03<00:05, 42.79it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.4.mlp.c_proj.q_scale[0m", shape: (2048, 256), dtype: float16 |
|
20%|ββββββββββββββββββββββββββββββββββββββββ | 59/293 [00:03<00:05, 42.79it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.4.mlp.c_proj.bias[0m", shape: (2048,), dtype: float16 |
|
20%|ββββββββββββββββββββββββββββββββββββββββ | 59/293 [00:03<00:05, 42.79it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.5.ln_1.weight[0m", shape: (2048,), dtype: float16 |
|
20%|ββββββββββββββββββββββββββββββββββββββββ | 59/293 [00:03<00:05, 42.79it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.5.ln_1.bias[0m", shape: (2048,), dtype: float16 |
|
20%|ββββββββββββββββββββββββββββββββββββββββ | 59/293 [00:03<00:05, 42.79it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.5.attn.c_attn.q_weight[0m", shape: (2304, 256), dtype: uint32 |
|
20%|ββββββββββββββββββββββββββββββββββββββββ | 59/293 [00:03<00:05, 42.79it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.5.attn.c_attn.q_scale[0m", shape: (2304, 64), dtype: float16 |
|
20%|ββββββββββββββββββββββββββββββββββββββββ | 59/293 [00:03<00:05, 42.79it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.5.attn.c_attn.bias[0m", shape: (2304,), dtype: float16 |
|
20%|ββββββββββββββββββββββββββββββββββββββββ | 59/293 [00:03<00:05, 42.79it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.5.attn.c_proj.q_weight[0m", shape: (2048, 256), dtype: uint32 |
|
20%|ββββββββββββββββββββββββββββββββββββββββ | 59/293 [00:03<00:05, 42.79it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.5.attn.c_proj.q_scale[0m", shape: (2048, 64), dtype: float16 |
|
20%|ββββββββββββββββββββββββββββββββββββββββ | 59/293 [00:03<00:05, 42.79it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.5.attn.c_proj.bias[0m", shape: (2048,), dtype: float16 |
|
20%|ββββββββββββββββββββββββββββββββββββββββ | 59/293 [00:03<00:05, 42.79it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.5.ln_2.weight[0m", shape: (2048,), dtype: float16 |
|
20%|ββββββββββββββββββββββββββββββββββββββββ | 59/293 [00:03<00:05, 42.79it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.5.ln_2.bias[0m", shape: (2048,), dtype: float16 |
|
20%|ββββββββββββββββββββββββββββββββββββββββ | 59/293 [00:03<00:05, 42.79it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.5.mlp.c_fc.q_weight[0m", shape: (8192, 256), dtype: uint32 |
|
20%|ββββββββββββββββββββββββββββββββββββββββ | 59/293 [00:03<00:05, 42.79it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.5.mlp.c_fc.q_scale[0m", shape: (8192, 64), dtype: float16 |
|
20%|ββββββββββββββββββββββββββββββββββββββββ | 59/293 [00:03<00:05, 42.79it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.5.mlp.c_fc.bias[0m", shape: (8192,), dtype: float16 |
|
20%|ββββββββββββββββββββββββββββββββββββββββ | 59/293 [00:03<00:05, 42.79it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.5.mlp.c_proj.q_weight[0m", shape: (2048, 1024), dtype: uint32 |
|
20%|ββββββββββββββββββββββββββββββββββββββββ | 59/293 [00:03<00:05, 42.79it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.5.mlp.c_proj.q_scale[0m", shape: (2048, 256), dtype: float16 |
|
20%|ββββββββββββββββββββββββββββββββββββββββ | 59/293 [00:03<00:05, 42.79it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.5.mlp.c_proj.bias[0m", shape: (2048,), dtype: float16 |
|
20%|ββββββββββββββββββββββββββββββββββββββββ | 59/293 [00:03<00:05, 42.79it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.6.ln_1.weight[0m", shape: (2048,), dtype: float16 |
|
20%|ββββββββββββββββββββββββββββββββββββββββ | 59/293 [00:03<00:05, 42.79it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.6.ln_1.bias[0m", shape: (2048,), dtype: float16 |
|
20%|ββββββββββββββββββββββββββββββββββββββββ | 59/293 [00:03<00:05, 42.79it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.6.attn.c_attn.q_weight[0m", shape: (2304, 256), dtype: uint32 |
|
20%|ββββββββββββββββββββββββββββββββββββββββ | 59/293 [00:03<00:05, 42.79it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.6.attn.c_attn.q_scale[0m", shape: (2304, 64), dtype: float16 |
|
20%|ββββββββββββββββββββββββββββββββββββββββ | 59/293 [00:03<00:05, 42.79it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.6.attn.c_attn.bias[0m", shape: (2304,), dtype: float16 |
|
20%|ββββββββββββββββββββββββββββββββββββββββ | 59/293 [00:03<00:05, 42.79it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.6.attn.c_proj.q_weight[0m", shape: (2048, 256), dtype: uint32 |
|
20%|ββββββββββββββββββββββββββββββββββββββββ | 59/293 [00:03<00:05, 42.79it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.6.attn.c_proj.q_scale[0m", shape: (2048, 64), dtype: float16 |
|
20%|ββββββββββββββββββββββββββββββββββββββββ | 59/293 [00:03<00:05, 42.79it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.6.attn.c_proj.bias[0m", shape: (2048,), dtype: float16 |
|
20%|ββββββββββββββββββββββββββββββββββββββββ | 59/293 [00:03<00:05, 42.79it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.6.ln_2.weight[0m", shape: (2048,), dtype: float16 |
|
20%|ββββββββββββββββββββββββββββββββββββββββ | 59/293 [00:03<00:05, 42.79it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.6.ln_2.bias[0m", shape: (2048,), dtype: float16 |
|
20%|ββββββββββββββββββββββββββββββββββββββββ | 59/293 [00:03<00:05, 42.79it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.6.mlp.c_fc.q_weight[0m", shape: (8192, 256), dtype: uint32 |
|
20%|ββββββββββββββββββββββββββββββββββββββββ | 59/293 [00:03<00:05, 42.79it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.6.mlp.c_fc.q_scale[0m", shape: (8192, 64), dtype: float16 |
|
20%|ββββββββββββββββββββββββββββββββββββββββ | 59/293 [00:03<00:05, 42.79it/s]
28%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 83/293 [00:03<00:03, 65.98it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.6.mlp.c_fc.bias[0m", shape: (8192,), dtype: float16 |
|
28%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 83/293 [00:03<00:03, 65.98it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.6.mlp.c_proj.q_weight[0m", shape: (2048, 1024), dtype: uint32 |
|
28%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 83/293 [00:03<00:03, 65.98it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.6.mlp.c_proj.q_scale[0m", shape: (2048, 256), dtype: float16 |
|
28%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 83/293 [00:03<00:03, 65.98it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.6.mlp.c_proj.bias[0m", shape: (2048,), dtype: float16 |
|
28%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 83/293 [00:03<00:03, 65.98it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.7.ln_1.weight[0m", shape: (2048,), dtype: float16 |
|
28%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 83/293 [00:03<00:03, 65.98it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.7.ln_1.bias[0m", shape: (2048,), dtype: float16 |
|
28%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 83/293 [00:03<00:03, 65.98it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.7.attn.c_attn.q_weight[0m", shape: (2304, 256), dtype: uint32 |
|
28%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 83/293 [00:03<00:03, 65.98it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.7.attn.c_attn.q_scale[0m", shape: (2304, 64), dtype: float16 |
|
28%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 83/293 [00:03<00:03, 65.98it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.7.attn.c_attn.bias[0m", shape: (2304,), dtype: float16 |
|
28%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 83/293 [00:03<00:03, 65.98it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.7.attn.c_proj.q_weight[0m", shape: (2048, 256), dtype: uint32 |
|
28%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 83/293 [00:03<00:03, 65.98it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.7.attn.c_proj.q_scale[0m", shape: (2048, 64), dtype: float16 |
|
28%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 83/293 [00:04<00:03, 65.98it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.7.attn.c_proj.bias[0m", shape: (2048,), dtype: float16 |
|
28%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 83/293 [00:04<00:03, 65.98it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.7.ln_2.weight[0m", shape: (2048,), dtype: float16 |
|
28%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 83/293 [00:04<00:03, 65.98it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.7.ln_2.bias[0m", shape: (2048,), dtype: float16 |
|
28%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 83/293 [00:04<00:03, 65.98it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.7.mlp.c_fc.q_weight[0m", shape: (8192, 256), dtype: uint32 |
|
28%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 83/293 [00:04<00:03, 65.98it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.7.mlp.c_fc.q_scale[0m", shape: (8192, 64), dtype: float16 |
|
28%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 83/293 [00:04<00:03, 65.98it/s]
[2024-01-08 19:18:49] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.7.mlp.c_fc.bias[0m", shape: (8192,), dtype: float16 |
|
28%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 83/293 [00:04<00:03, 65.98it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.7.mlp.c_proj.q_weight[0m", shape: (2048, 1024), dtype: uint32 |
|
28%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 83/293 [00:04<00:03, 65.98it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.7.mlp.c_proj.q_scale[0m", shape: (2048, 256), dtype: float16 |
|
28%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 83/293 [00:04<00:03, 65.98it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.7.mlp.c_proj.bias[0m", shape: (2048,), dtype: float16 |
|
28%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 83/293 [00:04<00:03, 65.98it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.8.ln_1.weight[0m", shape: (2048,), dtype: float16 |
|
28%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 83/293 [00:04<00:03, 65.98it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.8.ln_1.bias[0m", shape: (2048,), dtype: float16 |
|
28%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 83/293 [00:04<00:03, 65.98it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.8.attn.c_attn.q_weight[0m", shape: (2304, 256), dtype: uint32 |
|
28%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 83/293 [00:04<00:03, 65.98it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.8.attn.c_attn.q_scale[0m", shape: (2304, 64), dtype: float16 |
|
28%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 83/293 [00:04<00:03, 65.98it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.8.attn.c_attn.bias[0m", shape: (2304,), dtype: float16 |
|
28%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 83/293 [00:04<00:03, 65.98it/s]
35%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 102/293 [00:04<00:02, 83.84it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.8.attn.c_proj.q_weight[0m", shape: (2048, 256), dtype: uint32 |
|
35%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 102/293 [00:04<00:02, 83.84it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.8.attn.c_proj.q_scale[0m", shape: (2048, 64), dtype: float16 |
|
35%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 102/293 [00:04<00:02, 83.84it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.8.attn.c_proj.bias[0m", shape: (2048,), dtype: float16 |
|
35%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 102/293 [00:04<00:02, 83.84it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.8.ln_2.weight[0m", shape: (2048,), dtype: float16 |
|
35%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 102/293 [00:04<00:02, 83.84it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.8.ln_2.bias[0m", shape: (2048,), dtype: float16 |
|
35%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 102/293 [00:04<00:02, 83.84it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.8.mlp.c_fc.q_weight[0m", shape: (8192, 256), dtype: uint32 |
|
35%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 102/293 [00:04<00:02, 83.84it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.8.mlp.c_fc.q_scale[0m", shape: (8192, 64), dtype: float16 |
|
35%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 102/293 [00:04<00:02, 83.84it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.8.mlp.c_fc.bias[0m", shape: (8192,), dtype: float16 |
|
35%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 102/293 [00:04<00:02, 83.84it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.8.mlp.c_proj.q_weight[0m", shape: (2048, 1024), dtype: uint32 |
|
35%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 102/293 [00:04<00:02, 83.84it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.8.mlp.c_proj.q_scale[0m", shape: (2048, 256), dtype: float16 |
|
35%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 102/293 [00:04<00:02, 83.84it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.8.mlp.c_proj.bias[0m", shape: (2048,), dtype: float16 |
|
35%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 102/293 [00:04<00:02, 83.84it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.9.ln_1.weight[0m", shape: (2048,), dtype: float16 |
|
35%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 102/293 [00:04<00:02, 83.84it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.9.ln_1.bias[0m", shape: (2048,), dtype: float16 |
|
35%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 102/293 [00:04<00:02, 83.84it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.9.attn.c_attn.q_weight[0m", shape: (2304, 256), dtype: uint32 |
|
35%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 102/293 [00:04<00:02, 83.84it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.9.attn.c_attn.q_scale[0m", shape: (2304, 64), dtype: float16 |
|
35%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 102/293 [00:04<00:02, 83.84it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.9.attn.c_attn.bias[0m", shape: (2304,), dtype: float16 |
|
35%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 102/293 [00:04<00:02, 83.84it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.9.attn.c_proj.q_weight[0m", shape: (2048, 256), dtype: uint32 |
|
35%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 102/293 [00:04<00:02, 83.84it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.9.attn.c_proj.q_scale[0m", shape: (2048, 64), dtype: float16 |
|
35%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 102/293 [00:04<00:02, 83.84it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.9.attn.c_proj.bias[0m", shape: (2048,), dtype: float16 |
|
35%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 102/293 [00:04<00:02, 83.84it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.9.ln_2.weight[0m", shape: (2048,), dtype: float16 |
|
35%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 102/293 [00:04<00:02, 83.84it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.9.ln_2.bias[0m", shape: (2048,), dtype: float16 |
|
35%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 102/293 [00:04<00:02, 83.84it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.9.mlp.c_fc.q_weight[0m", shape: (8192, 256), dtype: uint32 |
|
35%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 102/293 [00:04<00:02, 83.84it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.9.mlp.c_fc.q_scale[0m", shape: (8192, 64), dtype: float16 |
|
35%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 102/293 [00:04<00:02, 83.84it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.9.mlp.c_fc.bias[0m", shape: (8192,), dtype: float16 |
|
35%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 102/293 [00:04<00:02, 83.84it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.9.mlp.c_proj.q_weight[0m", shape: (2048, 1024), dtype: uint32 |
|
35%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 102/293 [00:04<00:02, 83.84it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.9.mlp.c_proj.q_scale[0m", shape: (2048, 256), dtype: float16 |
|
35%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 102/293 [00:04<00:02, 83.84it/s]
41%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 121/293 [00:04<00:01, 98.77it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.9.mlp.c_proj.bias[0m", shape: (2048,), dtype: float16 |
|
41%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 121/293 [00:04<00:01, 98.77it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.10.ln_1.weight[0m", shape: (2048,), dtype: float16 |
|
41%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 121/293 [00:04<00:01, 98.77it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.10.ln_1.bias[0m", shape: (2048,), dtype: float16 |
|
41%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 121/293 [00:04<00:01, 98.77it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.10.attn.c_attn.q_weight[0m", shape: (2304, 256), dtype: uint32 |
|
41%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 121/293 [00:04<00:01, 98.77it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.10.attn.c_attn.q_scale[0m", shape: (2304, 64), dtype: float16 |
|
41%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 121/293 [00:04<00:01, 98.77it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.10.attn.c_attn.bias[0m", shape: (2304,), dtype: float16 |
|
41%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 121/293 [00:04<00:01, 98.77it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.10.attn.c_proj.q_weight[0m", shape: (2048, 256), dtype: uint32 |
|
41%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 121/293 [00:04<00:01, 98.77it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.10.attn.c_proj.q_scale[0m", shape: (2048, 64), dtype: float16 |
|
41%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 121/293 [00:04<00:01, 98.77it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.10.attn.c_proj.bias[0m", shape: (2048,), dtype: float16 |
|
41%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 121/293 [00:04<00:01, 98.77it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.10.ln_2.weight[0m", shape: (2048,), dtype: float16 |
|
41%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 121/293 [00:04<00:01, 98.77it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.10.ln_2.bias[0m", shape: (2048,), dtype: float16 |
|
41%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 121/293 [00:04<00:01, 98.77it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.10.mlp.c_fc.q_weight[0m", shape: (8192, 256), dtype: uint32 |
|
41%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 121/293 [00:04<00:01, 98.77it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.10.mlp.c_fc.q_scale[0m", shape: (8192, 64), dtype: float16 |
|
41%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 121/293 [00:04<00:01, 98.77it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.10.mlp.c_fc.bias[0m", shape: (8192,), dtype: float16 |
|
41%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 121/293 [00:04<00:01, 98.77it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.10.mlp.c_proj.q_weight[0m", shape: (2048, 1024), dtype: uint32 |
|
41%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 121/293 [00:04<00:01, 98.77it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.10.mlp.c_proj.q_scale[0m", shape: (2048, 256), dtype: float16 |
|
41%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 121/293 [00:04<00:01, 98.77it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.10.mlp.c_proj.bias[0m", shape: (2048,), dtype: float16 |
|
41%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 121/293 [00:04<00:01, 98.77it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.11.ln_1.weight[0m", shape: (2048,), dtype: float16 |
|
41%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 121/293 [00:04<00:01, 98.77it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.11.ln_1.bias[0m", shape: (2048,), dtype: float16 |
|
41%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 121/293 [00:04<00:01, 98.77it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.11.attn.c_attn.q_weight[0m", shape: (2304, 256), dtype: uint32 |
|
41%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 121/293 [00:04<00:01, 98.77it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.11.attn.c_attn.q_scale[0m", shape: (2304, 64), dtype: float16 |
|
41%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 121/293 [00:04<00:01, 98.77it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.11.attn.c_attn.bias[0m", shape: (2304,), dtype: float16 |
|
41%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 121/293 [00:04<00:01, 98.77it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.11.attn.c_proj.q_weight[0m", shape: (2048, 256), dtype: uint32 |
|
41%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 121/293 [00:04<00:01, 98.77it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.11.attn.c_proj.q_scale[0m", shape: (2048, 64), dtype: float16 |
|
41%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 121/293 [00:04<00:01, 98.77it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.11.attn.c_proj.bias[0m", shape: (2048,), dtype: float16 |
|
41%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 121/293 [00:04<00:01, 98.77it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.11.ln_2.weight[0m", shape: (2048,), dtype: float16 |
|
41%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 121/293 [00:04<00:01, 98.77it/s]
48%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 141/293 [00:04<00:01, 118.21it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.11.ln_2.bias[0m", shape: (2048,), dtype: float16 |
|
48%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 141/293 [00:04<00:01, 118.21it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.11.mlp.c_fc.q_weight[0m", shape: (8192, 256), dtype: uint32 |
|
48%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 141/293 [00:04<00:01, 118.21it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.11.mlp.c_fc.q_scale[0m", shape: (8192, 64), dtype: float16 |
|
48%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 141/293 [00:04<00:01, 118.21it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.11.mlp.c_fc.bias[0m", shape: (8192,), dtype: float16 |
|
48%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 141/293 [00:04<00:01, 118.21it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.11.mlp.c_proj.q_weight[0m", shape: (2048, 1024), dtype: uint32 |
|
48%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 141/293 [00:04<00:01, 118.21it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.11.mlp.c_proj.q_scale[0m", shape: (2048, 256), dtype: float16 |
|
48%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 141/293 [00:04<00:01, 118.21it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.11.mlp.c_proj.bias[0m", shape: (2048,), dtype: float16 |
|
48%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 141/293 [00:04<00:01, 118.21it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.12.ln_1.weight[0m", shape: (2048,), dtype: float16 |
|
48%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 141/293 [00:04<00:01, 118.21it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.12.ln_1.bias[0m", shape: (2048,), dtype: float16 |
|
48%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 141/293 [00:04<00:01, 118.21it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.12.attn.c_attn.q_weight[0m", shape: (2304, 256), dtype: uint32 |
|
48%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 141/293 [00:04<00:01, 118.21it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.12.attn.c_attn.q_scale[0m", shape: (2304, 64), dtype: float16 |
|
48%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 141/293 [00:04<00:01, 118.21it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.12.attn.c_attn.bias[0m", shape: (2304,), dtype: float16 |
|
48%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 141/293 [00:04<00:01, 118.21it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.12.attn.c_proj.q_weight[0m", shape: (2048, 256), dtype: uint32 |
|
48%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 141/293 [00:04<00:01, 118.21it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.12.attn.c_proj.q_scale[0m", shape: (2048, 64), dtype: float16 |
|
48%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 141/293 [00:04<00:01, 118.21it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.12.attn.c_proj.bias[0m", shape: (2048,), dtype: float16 |
|
48%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 141/293 [00:04<00:01, 118.21it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.12.ln_2.weight[0m", shape: (2048,), dtype: float16 |
|
48%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 141/293 [00:04<00:01, 118.21it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.12.ln_2.bias[0m", shape: (2048,), dtype: float16 |
|
48%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 141/293 [00:04<00:01, 118.21it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.12.mlp.c_fc.q_weight[0m", shape: (8192, 256), dtype: uint32 |
|
48%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 141/293 [00:04<00:01, 118.21it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.12.mlp.c_fc.q_scale[0m", shape: (8192, 64), dtype: float16 |
|
48%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 141/293 [00:04<00:01, 118.21it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.12.mlp.c_fc.bias[0m", shape: (8192,), dtype: float16 |
|
48%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 141/293 [00:04<00:01, 118.21it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.12.mlp.c_proj.q_weight[0m", shape: (2048, 1024), dtype: uint32 |
|
48%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 141/293 [00:04<00:01, 118.21it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.12.mlp.c_proj.q_scale[0m", shape: (2048, 256), dtype: float16 |
|
48%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 141/293 [00:04<00:01, 118.21it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.12.mlp.c_proj.bias[0m", shape: (2048,), dtype: float16 |
|
48%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 141/293 [00:04<00:01, 118.21it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.13.ln_1.weight[0m", shape: (2048,), dtype: float16 |
|
48%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 141/293 [00:04<00:01, 118.21it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.13.ln_1.bias[0m", shape: (2048,), dtype: float16 |
|
48%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 141/293 [00:04<00:01, 118.21it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.13.attn.c_attn.q_weight[0m", shape: (2304, 256), dtype: uint32 |
|
48%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 141/293 [00:04<00:01, 118.21it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.13.attn.c_attn.q_scale[0m", shape: (2304, 64), dtype: float16 |
|
48%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 141/293 [00:04<00:01, 118.21it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.13.attn.c_attn.bias[0m", shape: (2304,), dtype: float16 |
|
48%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 141/293 [00:04<00:01, 118.21it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.13.attn.c_proj.q_weight[0m", shape: (2048, 256), dtype: uint32 |
|
48%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 141/293 [00:04<00:01, 118.21it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.13.attn.c_proj.q_scale[0m", shape: (2048, 64), dtype: float16 |
|
48%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 141/293 [00:04<00:01, 118.21it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.13.attn.c_proj.bias[0m", shape: (2048,), dtype: float16 |
|
48%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 141/293 [00:04<00:01, 118.21it/s]
56%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 164/293 [00:04<00:00, 142.32it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.13.ln_2.weight[0m", shape: (2048,), dtype: float16 |
|
56%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 164/293 [00:04<00:00, 142.32it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.13.ln_2.bias[0m", shape: (2048,), dtype: float16 |
|
56%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 164/293 [00:04<00:00, 142.32it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.13.mlp.c_fc.q_weight[0m", shape: (8192, 256), dtype: uint32 |
|
56%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 164/293 [00:04<00:00, 142.32it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.13.mlp.c_fc.q_scale[0m", shape: (8192, 64), dtype: float16 |
|
56%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 164/293 [00:04<00:00, 142.32it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.13.mlp.c_fc.bias[0m", shape: (8192,), dtype: float16 |
|
56%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 164/293 [00:04<00:00, 142.32it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.13.mlp.c_proj.q_weight[0m", shape: (2048, 1024), dtype: uint32 |
|
56%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 164/293 [00:04<00:00, 142.32it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.13.mlp.c_proj.q_scale[0m", shape: (2048, 256), dtype: float16 |
|
56%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 164/293 [00:04<00:00, 142.32it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.13.mlp.c_proj.bias[0m", shape: (2048,), dtype: float16 |
|
56%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 164/293 [00:04<00:00, 142.32it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.14.ln_1.weight[0m", shape: (2048,), dtype: float16 |
|
56%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 164/293 [00:04<00:00, 142.32it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.14.ln_1.bias[0m", shape: (2048,), dtype: float16 |
|
56%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 164/293 [00:04<00:00, 142.32it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.14.attn.c_attn.q_weight[0m", shape: (2304, 256), dtype: uint32 |
|
56%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 164/293 [00:04<00:00, 142.32it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.14.attn.c_attn.q_scale[0m", shape: (2304, 64), dtype: float16 |
|
56%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 164/293 [00:04<00:00, 142.32it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.14.attn.c_attn.bias[0m", shape: (2304,), dtype: float16 |
|
56%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 164/293 [00:04<00:00, 142.32it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.14.attn.c_proj.q_weight[0m", shape: (2048, 256), dtype: uint32 |
|
56%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 164/293 [00:04<00:00, 142.32it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.14.attn.c_proj.q_scale[0m", shape: (2048, 64), dtype: float16 |
|
56%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 164/293 [00:04<00:00, 142.32it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.14.attn.c_proj.bias[0m", shape: (2048,), dtype: float16 |
|
56%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 164/293 [00:04<00:00, 142.32it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.14.ln_2.weight[0m", shape: (2048,), dtype: float16 |
|
56%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 164/293 [00:04<00:00, 142.32it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.14.ln_2.bias[0m", shape: (2048,), dtype: float16 |
|
56%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 164/293 [00:04<00:00, 142.32it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.14.mlp.c_fc.q_weight[0m", shape: (8192, 256), dtype: uint32 |
|
56%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 164/293 [00:04<00:00, 142.32it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.14.mlp.c_fc.q_scale[0m", shape: (8192, 64), dtype: float16 |
|
56%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 164/293 [00:04<00:00, 142.32it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.14.mlp.c_fc.bias[0m", shape: (8192,), dtype: float16 |
|
56%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 164/293 [00:04<00:00, 142.32it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.14.mlp.c_proj.q_weight[0m", shape: (2048, 1024), dtype: uint32 |
|
56%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 164/293 [00:04<00:00, 142.32it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.14.mlp.c_proj.q_scale[0m", shape: (2048, 256), dtype: float16 |
|
56%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 164/293 [00:04<00:00, 142.32it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.14.mlp.c_proj.bias[0m", shape: (2048,), dtype: float16 |
|
56%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 164/293 [00:04<00:00, 142.32it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.15.ln_1.weight[0m", shape: (2048,), dtype: float16 |
|
56%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 164/293 [00:04<00:00, 142.32it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.15.ln_1.bias[0m", shape: (2048,), dtype: float16 |
|
56%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 164/293 [00:04<00:00, 142.32it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.15.attn.c_attn.q_weight[0m", shape: (2304, 256), dtype: uint32 |
|
56%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 164/293 [00:04<00:00, 142.32it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.15.attn.c_attn.q_scale[0m", shape: (2304, 64), dtype: float16 |
|
56%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 164/293 [00:04<00:00, 142.32it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.15.attn.c_attn.bias[0m", shape: (2304,), dtype: float16 |
|
56%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 164/293 [00:04<00:00, 142.32it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.15.attn.c_proj.q_weight[0m", shape: (2048, 256), dtype: uint32 |
|
56%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 164/293 [00:04<00:00, 142.32it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.15.attn.c_proj.q_scale[0m", shape: (2048, 64), dtype: float16 |
|
56%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 164/293 [00:04<00:00, 142.32it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.15.attn.c_proj.bias[0m", shape: (2048,), dtype: float16 |
|
56%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 164/293 [00:04<00:00, 142.32it/s]
64%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 188/293 [00:04<00:00, 165.33it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.15.ln_2.weight[0m", shape: (2048,), dtype: float16 |
|
64%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 188/293 [00:04<00:00, 165.33it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.15.ln_2.bias[0m", shape: (2048,), dtype: float16 |
|
64%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 188/293 [00:04<00:00, 165.33it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.15.mlp.c_fc.q_weight[0m", shape: (8192, 256), dtype: uint32 |
|
64%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 188/293 [00:04<00:00, 165.33it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.15.mlp.c_fc.q_scale[0m", shape: (8192, 64), dtype: float16 |
|
64%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 188/293 [00:04<00:00, 165.33it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.15.mlp.c_fc.bias[0m", shape: (8192,), dtype: float16 |
|
64%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 188/293 [00:04<00:00, 165.33it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.15.mlp.c_proj.q_weight[0m", shape: (2048, 1024), dtype: uint32 |
|
64%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 188/293 [00:04<00:00, 165.33it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.15.mlp.c_proj.q_scale[0m", shape: (2048, 256), dtype: float16 |
|
64%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 188/293 [00:04<00:00, 165.33it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.15.mlp.c_proj.bias[0m", shape: (2048,), dtype: float16 |
|
64%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 188/293 [00:04<00:00, 165.33it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.16.ln_1.weight[0m", shape: (2048,), dtype: float16 |
|
64%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 188/293 [00:04<00:00, 165.33it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.16.ln_1.bias[0m", shape: (2048,), dtype: float16 |
|
64%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 188/293 [00:04<00:00, 165.33it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.16.attn.c_attn.q_weight[0m", shape: (2304, 256), dtype: uint32 |
|
64%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 188/293 [00:04<00:00, 165.33it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.16.attn.c_attn.q_scale[0m", shape: (2304, 64), dtype: float16 |
|
64%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 188/293 [00:04<00:00, 165.33it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.16.attn.c_attn.bias[0m", shape: (2304,), dtype: float16 |
|
64%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 188/293 [00:04<00:00, 165.33it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.16.attn.c_proj.q_weight[0m", shape: (2048, 256), dtype: uint32 |
|
64%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 188/293 [00:04<00:00, 165.33it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.16.attn.c_proj.q_scale[0m", shape: (2048, 64), dtype: float16 |
|
64%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 188/293 [00:04<00:00, 165.33it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.16.attn.c_proj.bias[0m", shape: (2048,), dtype: float16 |
|
64%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 188/293 [00:04<00:00, 165.33it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.16.ln_2.weight[0m", shape: (2048,), dtype: float16 |
|
64%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 188/293 [00:04<00:00, 165.33it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.16.ln_2.bias[0m", shape: (2048,), dtype: float16 |
|
64%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 188/293 [00:04<00:00, 165.33it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.16.mlp.c_fc.q_weight[0m", shape: (8192, 256), dtype: uint32 |
|
64%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 188/293 [00:04<00:00, 165.33it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.16.mlp.c_fc.q_scale[0m", shape: (8192, 64), dtype: float16 |
|
64%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 188/293 [00:04<00:00, 165.33it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.16.mlp.c_fc.bias[0m", shape: (8192,), dtype: float16 |
|
64%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 188/293 [00:04<00:00, 165.33it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.16.mlp.c_proj.q_weight[0m", shape: (2048, 1024), dtype: uint32 |
|
64%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 188/293 [00:04<00:00, 165.33it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.16.mlp.c_proj.q_scale[0m", shape: (2048, 256), dtype: float16 |
|
64%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 188/293 [00:04<00:00, 165.33it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.16.mlp.c_proj.bias[0m", shape: (2048,), dtype: float16 |
|
64%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 188/293 [00:04<00:00, 165.33it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.17.ln_1.weight[0m", shape: (2048,), dtype: float16 |
|
64%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 188/293 [00:04<00:00, 165.33it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.17.ln_1.bias[0m", shape: (2048,), dtype: float16 |
|
64%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 188/293 [00:04<00:00, 165.33it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.17.attn.c_attn.q_weight[0m", shape: (2304, 256), dtype: uint32 |
|
64%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 188/293 [00:04<00:00, 165.33it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.17.attn.c_attn.q_scale[0m", shape: (2304, 64), dtype: float16 |
|
64%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 188/293 [00:04<00:00, 165.33it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.17.attn.c_attn.bias[0m", shape: (2304,), dtype: float16 |
|
64%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 188/293 [00:04<00:00, 165.33it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.17.attn.c_proj.q_weight[0m", shape: (2048, 256), dtype: uint32 |
|
64%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 188/293 [00:04<00:00, 165.33it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.17.attn.c_proj.q_scale[0m", shape: (2048, 64), dtype: float16 |
|
64%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 188/293 [00:04<00:00, 165.33it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.17.attn.c_proj.bias[0m", shape: (2048,), dtype: float16 |
|
64%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 188/293 [00:04<00:00, 165.33it/s]
72%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 212/293 [00:04<00:00, 183.95it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.17.ln_2.weight[0m", shape: (2048,), dtype: float16 |
|
72%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 212/293 [00:04<00:00, 183.95it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.17.ln_2.bias[0m", shape: (2048,), dtype: float16 |
|
72%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 212/293 [00:04<00:00, 183.95it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.17.mlp.c_fc.q_weight[0m", shape: (8192, 256), dtype: uint32 |
|
72%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 212/293 [00:04<00:00, 183.95it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.17.mlp.c_fc.q_scale[0m", shape: (8192, 64), dtype: float16 |
|
72%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 212/293 [00:04<00:00, 183.95it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.17.mlp.c_fc.bias[0m", shape: (8192,), dtype: float16 |
|
72%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 212/293 [00:04<00:00, 183.95it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.17.mlp.c_proj.q_weight[0m", shape: (2048, 1024), dtype: uint32 |
|
72%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 212/293 [00:04<00:00, 183.95it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.17.mlp.c_proj.q_scale[0m", shape: (2048, 256), dtype: float16 |
|
72%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 212/293 [00:04<00:00, 183.95it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.17.mlp.c_proj.bias[0m", shape: (2048,), dtype: float16 |
|
72%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 212/293 [00:04<00:00, 183.95it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.18.ln_1.weight[0m", shape: (2048,), dtype: float16 |
|
72%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 212/293 [00:04<00:00, 183.95it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.18.ln_1.bias[0m", shape: (2048,), dtype: float16 |
|
72%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 212/293 [00:04<00:00, 183.95it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.18.attn.c_attn.q_weight[0m", shape: (2304, 256), dtype: uint32 |
|
72%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 212/293 [00:04<00:00, 183.95it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.18.attn.c_attn.q_scale[0m", shape: (2304, 64), dtype: float16 |
|
72%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 212/293 [00:04<00:00, 183.95it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.18.attn.c_attn.bias[0m", shape: (2304,), dtype: float16 |
|
72%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 212/293 [00:04<00:00, 183.95it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.18.attn.c_proj.q_weight[0m", shape: (2048, 256), dtype: uint32 |
|
72%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 212/293 [00:04<00:00, 183.95it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.18.attn.c_proj.q_scale[0m", shape: (2048, 64), dtype: float16 |
|
72%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 212/293 [00:04<00:00, 183.95it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.18.attn.c_proj.bias[0m", shape: (2048,), dtype: float16 |
|
72%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 212/293 [00:04<00:00, 183.95it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.18.ln_2.weight[0m", shape: (2048,), dtype: float16 |
|
72%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 212/293 [00:04<00:00, 183.95it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.18.ln_2.bias[0m", shape: (2048,), dtype: float16 |
|
72%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 212/293 [00:04<00:00, 183.95it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.18.mlp.c_fc.q_weight[0m", shape: (8192, 256), dtype: uint32 |
|
72%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 212/293 [00:04<00:00, 183.95it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.18.mlp.c_fc.q_scale[0m", shape: (8192, 64), dtype: float16 |
|
72%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 212/293 [00:04<00:00, 183.95it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.18.mlp.c_fc.bias[0m", shape: (8192,), dtype: float16 |
|
72%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 212/293 [00:04<00:00, 183.95it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.18.mlp.c_proj.q_weight[0m", shape: (2048, 1024), dtype: uint32 |
|
72%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 212/293 [00:04<00:00, 183.95it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.18.mlp.c_proj.q_scale[0m", shape: (2048, 256), dtype: float16 |
|
72%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 212/293 [00:04<00:00, 183.95it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.18.mlp.c_proj.bias[0m", shape: (2048,), dtype: float16 |
|
72%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 212/293 [00:04<00:00, 183.95it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.19.ln_1.weight[0m", shape: (2048,), dtype: float16 |
|
72%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 212/293 [00:04<00:00, 183.95it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.19.ln_1.bias[0m", shape: (2048,), dtype: float16 |
|
72%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 212/293 [00:04<00:00, 183.95it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.19.attn.c_attn.q_weight[0m", shape: (2304, 256), dtype: uint32 |
|
72%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 212/293 [00:04<00:00, 183.95it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.19.attn.c_attn.q_scale[0m", shape: (2304, 64), dtype: float16 |
|
72%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 212/293 [00:04<00:00, 183.95it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.19.attn.c_attn.bias[0m", shape: (2304,), dtype: float16 |
|
72%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 212/293 [00:04<00:00, 183.95it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.19.attn.c_proj.q_weight[0m", shape: (2048, 256), dtype: uint32 |
|
72%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 212/293 [00:04<00:00, 183.95it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.19.attn.c_proj.q_scale[0m", shape: (2048, 64), dtype: float16 |
|
72%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 212/293 [00:04<00:00, 183.95it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.19.attn.c_proj.bias[0m", shape: (2048,), dtype: float16 |
|
72%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 212/293 [00:04<00:00, 183.95it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.19.ln_2.weight[0m", shape: (2048,), dtype: float16 |
|
72%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 212/293 [00:04<00:00, 183.95it/s]
81%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 237/293 [00:04<00:00, 201.38it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.19.ln_2.bias[0m", shape: (2048,), dtype: float16 |
|
81%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 237/293 [00:04<00:00, 201.38it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.19.mlp.c_fc.q_weight[0m", shape: (8192, 256), dtype: uint32 |
|
81%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 237/293 [00:04<00:00, 201.38it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.19.mlp.c_fc.q_scale[0m", shape: (8192, 64), dtype: float16 |
|
81%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 237/293 [00:04<00:00, 201.38it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.19.mlp.c_fc.bias[0m", shape: (8192,), dtype: float16 |
|
81%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 237/293 [00:04<00:00, 201.38it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.19.mlp.c_proj.q_weight[0m", shape: (2048, 1024), dtype: uint32 |
|
81%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 237/293 [00:04<00:00, 201.38it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.19.mlp.c_proj.q_scale[0m", shape: (2048, 256), dtype: float16 |
|
81%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 237/293 [00:04<00:00, 201.38it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.19.mlp.c_proj.bias[0m", shape: (2048,), dtype: float16 |
|
81%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 237/293 [00:04<00:00, 201.38it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.20.ln_1.weight[0m", shape: (2048,), dtype: float16 |
|
81%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 237/293 [00:04<00:00, 201.38it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.20.ln_1.bias[0m", shape: (2048,), dtype: float16 |
|
81%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 237/293 [00:04<00:00, 201.38it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.20.attn.c_attn.q_weight[0m", shape: (2304, 256), dtype: uint32 |
|
81%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 237/293 [00:04<00:00, 201.38it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.20.attn.c_attn.q_scale[0m", shape: (2304, 64), dtype: float16 |
|
81%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 237/293 [00:04<00:00, 201.38it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.20.attn.c_attn.bias[0m", shape: (2304,), dtype: float16 |
|
81%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 237/293 [00:04<00:00, 201.38it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.20.attn.c_proj.q_weight[0m", shape: (2048, 256), dtype: uint32 |
|
81%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 237/293 [00:04<00:00, 201.38it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.20.attn.c_proj.q_scale[0m", shape: (2048, 64), dtype: float16 |
|
81%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 237/293 [00:04<00:00, 201.38it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.20.attn.c_proj.bias[0m", shape: (2048,), dtype: float16 |
|
81%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 237/293 [00:04<00:00, 201.38it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.20.ln_2.weight[0m", shape: (2048,), dtype: float16 |
|
81%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 237/293 [00:04<00:00, 201.38it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.20.ln_2.bias[0m", shape: (2048,), dtype: float16 |
|
81%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 237/293 [00:04<00:00, 201.38it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.20.mlp.c_fc.q_weight[0m", shape: (8192, 256), dtype: uint32 |
|
81%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 237/293 [00:04<00:00, 201.38it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.20.mlp.c_fc.q_scale[0m", shape: (8192, 64), dtype: float16 |
|
81%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 237/293 [00:04<00:00, 201.38it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.20.mlp.c_fc.bias[0m", shape: (8192,), dtype: float16 |
|
81%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 237/293 [00:04<00:00, 201.38it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.20.mlp.c_proj.q_weight[0m", shape: (2048, 1024), dtype: uint32 |
|
81%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 237/293 [00:04<00:00, 201.38it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.20.mlp.c_proj.q_scale[0m", shape: (2048, 256), dtype: float16 |
|
81%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 237/293 [00:04<00:00, 201.38it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.20.mlp.c_proj.bias[0m", shape: (2048,), dtype: float16 |
|
81%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 237/293 [00:04<00:00, 201.38it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.21.ln_1.weight[0m", shape: (2048,), dtype: float16 |
|
81%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 237/293 [00:04<00:00, 201.38it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.21.ln_1.bias[0m", shape: (2048,), dtype: float16 |
|
81%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 237/293 [00:04<00:00, 201.38it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.21.attn.c_attn.q_weight[0m", shape: (2304, 256), dtype: uint32 |
|
81%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 237/293 [00:04<00:00, 201.38it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.21.attn.c_attn.q_scale[0m", shape: (2304, 64), dtype: float16 |
|
81%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 237/293 [00:04<00:00, 201.38it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.21.attn.c_attn.bias[0m", shape: (2304,), dtype: float16 |
|
81%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 237/293 [00:04<00:00, 201.38it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.21.attn.c_proj.q_weight[0m", shape: (2048, 256), dtype: uint32 |
|
81%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 237/293 [00:04<00:00, 201.38it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.21.attn.c_proj.q_scale[0m", shape: (2048, 64), dtype: float16 |
|
81%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 237/293 [00:04<00:00, 201.38it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.21.attn.c_proj.bias[0m", shape: (2048,), dtype: float16 |
|
81%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 237/293 [00:04<00:00, 201.38it/s]
89%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 260/293 [00:04<00:00, 208.83it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.21.ln_2.weight[0m", shape: (2048,), dtype: float16 |
|
89%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 260/293 [00:04<00:00, 208.83it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.21.ln_2.bias[0m", shape: (2048,), dtype: float16 |
|
89%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 260/293 [00:04<00:00, 208.83it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.21.mlp.c_fc.q_weight[0m", shape: (8192, 256), dtype: uint32 |
|
89%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 260/293 [00:04<00:00, 208.83it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.21.mlp.c_fc.q_scale[0m", shape: (8192, 64), dtype: float16 |
|
89%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 260/293 [00:04<00:00, 208.83it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.21.mlp.c_fc.bias[0m", shape: (8192,), dtype: float16 |
|
89%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 260/293 [00:04<00:00, 208.83it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.21.mlp.c_proj.q_weight[0m", shape: (2048, 1024), dtype: uint32 |
|
89%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 260/293 [00:04<00:00, 208.83it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.21.mlp.c_proj.q_scale[0m", shape: (2048, 256), dtype: float16 |
|
89%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 260/293 [00:04<00:00, 208.83it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.21.mlp.c_proj.bias[0m", shape: (2048,), dtype: float16 |
|
89%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 260/293 [00:04<00:00, 208.83it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.22.ln_1.weight[0m", shape: (2048,), dtype: float16 |
|
89%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 260/293 [00:04<00:00, 208.83it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.22.ln_1.bias[0m", shape: (2048,), dtype: float16 |
|
89%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 260/293 [00:04<00:00, 208.83it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.22.attn.c_attn.q_weight[0m", shape: (2304, 256), dtype: uint32 |
|
89%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 260/293 [00:04<00:00, 208.83it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.22.attn.c_attn.q_scale[0m", shape: (2304, 64), dtype: float16 |
|
89%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 260/293 [00:04<00:00, 208.83it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.22.attn.c_attn.bias[0m", shape: (2304,), dtype: float16 |
|
89%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 260/293 [00:04<00:00, 208.83it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.22.attn.c_proj.q_weight[0m", shape: (2048, 256), dtype: uint32 |
|
89%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 260/293 [00:04<00:00, 208.83it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.22.attn.c_proj.q_scale[0m", shape: (2048, 64), dtype: float16 |
|
89%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 260/293 [00:04<00:00, 208.83it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.22.attn.c_proj.bias[0m", shape: (2048,), dtype: float16 |
|
89%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 260/293 [00:04<00:00, 208.83it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.22.ln_2.weight[0m", shape: (2048,), dtype: float16 |
|
89%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 260/293 [00:04<00:00, 208.83it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.22.ln_2.bias[0m", shape: (2048,), dtype: float16 |
|
89%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 260/293 [00:04<00:00, 208.83it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.22.mlp.c_fc.q_weight[0m", shape: (8192, 256), dtype: uint32 |
|
89%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 260/293 [00:04<00:00, 208.83it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.22.mlp.c_fc.q_scale[0m", shape: (8192, 64), dtype: float16 |
|
89%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 260/293 [00:04<00:00, 208.83it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.22.mlp.c_fc.bias[0m", shape: (8192,), dtype: float16 |
|
89%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 260/293 [00:04<00:00, 208.83it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.22.mlp.c_proj.q_weight[0m", shape: (2048, 1024), dtype: uint32 |
|
89%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 260/293 [00:04<00:00, 208.83it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.22.mlp.c_proj.q_scale[0m", shape: (2048, 256), dtype: float16 |
|
89%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 260/293 [00:04<00:00, 208.83it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.22.mlp.c_proj.bias[0m", shape: (2048,), dtype: float16 |
|
89%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 260/293 [00:04<00:00, 208.83it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.23.ln_1.weight[0m", shape: (2048,), dtype: float16 |
|
89%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 260/293 [00:04<00:00, 208.83it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.23.ln_1.bias[0m", shape: (2048,), dtype: float16 |
|
89%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 260/293 [00:04<00:00, 208.83it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.23.attn.c_attn.q_weight[0m", shape: (2304, 256), dtype: uint32 |
|
89%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 260/293 [00:04<00:00, 208.83it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.23.attn.c_attn.q_scale[0m", shape: (2304, 64), dtype: float16 |
|
89%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 260/293 [00:04<00:00, 208.83it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.23.attn.c_attn.bias[0m", shape: (2304,), dtype: float16 |
|
89%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 260/293 [00:04<00:00, 208.83it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.23.attn.c_proj.q_weight[0m", shape: (2048, 256), dtype: uint32 |
|
89%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 260/293 [00:04<00:00, 208.83it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.23.attn.c_proj.q_scale[0m", shape: (2048, 64), dtype: float16 |
|
89%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 260/293 [00:04<00:00, 208.83it/s]
97%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 283/293 [00:04<00:00, 213.38it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.23.attn.c_proj.bias[0m", shape: (2048,), dtype: float16 |
|
97%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 283/293 [00:04<00:00, 213.38it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.23.ln_2.weight[0m", shape: (2048,), dtype: float16 |
|
97%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 283/293 [00:04<00:00, 213.38it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.23.ln_2.bias[0m", shape: (2048,), dtype: float16 |
|
97%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 283/293 [00:04<00:00, 213.38it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.23.mlp.c_fc.q_weight[0m", shape: (8192, 256), dtype: uint32 |
|
97%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 283/293 [00:04<00:00, 213.38it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.23.mlp.c_fc.q_scale[0m", shape: (8192, 64), dtype: float16 |
|
97%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 283/293 [00:04<00:00, 213.38it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.23.mlp.c_fc.bias[0m", shape: (8192,), dtype: float16 |
|
97%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 283/293 [00:04<00:00, 213.38it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.23.mlp.c_proj.q_weight[0m", shape: (2048, 1024), dtype: uint32 |
|
97%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 283/293 [00:04<00:00, 213.38it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.23.mlp.c_proj.q_scale[0m", shape: (2048, 256), dtype: float16 |
|
97%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 283/293 [00:04<00:00, 213.38it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.23.mlp.c_proj.bias[0m", shape: (2048,), dtype: float16 |
|
97%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 283/293 [00:04<00:00, 213.38it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.ln_f.weight[0m", shape: (2048,), dtype: float16 |
|
97%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 283/293 [00:04<00:00, 213.38it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.ln_f.bias[0m", shape: (2048,), dtype: float16 |
|
97%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 283/293 [00:04<00:00, 213.38it/s]
[2024-01-08 19:18:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mlm_head.q_weight[0m", shape: (49280, 256), dtype: uint32 |
|
97%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 283/293 [00:05<00:00, 213.38it/s]
[2024-01-08 19:18:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mlm_head.q_scale[0m", shape: (49280, 64), dtype: float16 |
|
97%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 283/293 [00:05<00:00, 213.38it/s]
100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 293/293 [00:05<00:00, 58.12it/s] |
| [2024-01-08 19:18:51] INFO huggingface_loader.py:179: Unloading HF weight file: /opt/scratch/assets/gpt_bigcode-santacoder/pytorch_model.bin |
| [2024-01-08 19:18:51] INFO stats.py:71: [92mTime usage[0m: HF loading: 2.582 sec; Pre-quantization mapping: 0.797 sec; Quantization: 3.251 sec |
| [2024-01-08 19:18:51] INFO stats.py:85: [92mRAM usage[0m: Peak RAM: 2.283 GB. Total bytes loaded from disk: 2.283 GB |
| [2024-01-08 19:18:51] INFO convert_weight.py:119: [92mParameter size[0m after quantization: 0.643 GB |
| [2024-01-08 19:18:51] INFO convert_weight.py:124: [92mTotal parameters[0m: 1,225,811,968 |
| [2024-01-08 19:18:51] INFO convert_weight.py:125: [92mBits per parameter[0m: 4.505 |
| Start storing to cache /home/junrushao/tmp/tmp50xzwpqb |
|
[0001/0392] saving transformer.wte.q_weight
[0002/0392] saving transformer.wte.q_scale
[0003/0392] saving transformer.wpe.q_weight
[0004/0392] saving transformer.wpe.q_scale
[0005/0392] saving transformer.h.0.ln_1.weight
[0006/0392] saving transformer.h.0.ln_1.bias
[0007/0392] saving transformer.h.0.attn.c_attn.q_weight
[0008/0392] saving transformer.h.0.attn.c_attn.q_scale
[0009/0392] saving transformer.h.0.attn.c_attn.bias
[0010/0392] saving transformer.h.0.attn.c_proj.q_weight
[0011/0392] saving transformer.h.0.attn.c_proj.q_scale
[0012/0392] saving transformer.h.0.attn.c_proj.bias
[0013/0392] saving transformer.h.0.ln_2.weight
[0014/0392] saving transformer.h.0.ln_2.bias
[0015/0392] saving transformer.h.0.mlp.c_fc.q_weight
[0016/0392] saving transformer.h.0.mlp.c_fc.q_scale
[0017/0392] saving transformer.h.0.mlp.c_fc.bias
[0018/0392] saving transformer.h.0.mlp.c_proj.q_weight
[0019/0392] saving transformer.h.0.mlp.c_proj.q_scale
[0020/0392] saving transformer.h.0.mlp.c_proj.bias
[0021/0392] saving transformer.h.1.ln_1.weight
[0022/0392] saving transformer.h.1.ln_1.bias
[0023/0392] saving transformer.h.1.attn.c_attn.q_weight
[0024/0392] saving transformer.h.1.attn.c_attn.q_scale
[0025/0392] saving transformer.h.1.attn.c_attn.bias
[0026/0392] saving transformer.h.1.attn.c_proj.q_weight
[0027/0392] saving transformer.h.1.attn.c_proj.q_scale
[0028/0392] saving transformer.h.1.attn.c_proj.bias
[0029/0392] saving transformer.h.1.ln_2.weight
[0030/0392] saving transformer.h.1.ln_2.bias
[0031/0392] saving transformer.h.1.mlp.c_fc.q_weight
[0032/0392] saving transformer.h.1.mlp.c_fc.q_scale
[0033/0392] saving transformer.h.1.mlp.c_fc.bias
[0034/0392] saving transformer.h.1.mlp.c_proj.q_weight
[0035/0392] saving transformer.h.1.mlp.c_proj.q_scale
[0036/0392] saving transformer.h.1.mlp.c_proj.bias
[0037/0392] saving transformer.h.2.ln_1.weight
[0038/0392] saving transformer.h.2.ln_1.bias
[0039/0392] saving transformer.h.2.attn.c_attn.q_weight
[0040/0392] saving transformer.h.2.attn.c_attn.q_scale
[0041/0392] saving transformer.h.2.attn.c_attn.bias
[0042/0392] saving transformer.h.2.attn.c_proj.q_weight
[0043/0392] saving transformer.h.2.attn.c_proj.q_scale
[0044/0392] saving transformer.h.2.attn.c_proj.bias
[0045/0392] saving transformer.h.2.ln_2.weight
[0046/0392] saving transformer.h.2.ln_2.bias
[0047/0392] saving transformer.h.2.mlp.c_fc.q_weight
[0048/0392] saving transformer.h.2.mlp.c_fc.q_scale
[0049/0392] saving transformer.h.2.mlp.c_fc.bias
[0050/0392] saving transformer.h.2.mlp.c_proj.q_weight
[0051/0392] saving transformer.h.2.mlp.c_proj.q_scale
[0052/0392] saving transformer.h.2.mlp.c_proj.bias
[0053/0392] saving transformer.h.3.ln_1.weight
[0054/0392] saving transformer.h.3.ln_1.bias
[0055/0392] saving transformer.h.3.attn.c_attn.q_weight
[0056/0392] saving transformer.h.3.attn.c_attn.q_scale
[0057/0392] saving transformer.h.3.attn.c_attn.bias
[0058/0392] saving transformer.h.3.attn.c_proj.q_weight
[0059/0392] saving transformer.h.3.attn.c_proj.q_scale
[0060/0392] saving transformer.h.3.attn.c_proj.bias
[0061/0392] saving transformer.h.3.ln_2.weight
[0062/0392] saving transformer.h.3.ln_2.bias
[0063/0392] saving transformer.h.3.mlp.c_fc.q_weight
[0064/0392] saving transformer.h.3.mlp.c_fc.q_scale
[0065/0392] saving transformer.h.3.mlp.c_fc.bias
[0066/0392] saving transformer.h.3.mlp.c_proj.q_weight
[0067/0392] saving transformer.h.3.mlp.c_proj.q_scale
[0068/0392] saving transformer.h.3.mlp.c_proj.bias
[0069/0392] saving transformer.h.4.ln_1.weight
[0070/0392] saving transformer.h.4.ln_1.bias
[0071/0392] saving transformer.h.4.attn.c_attn.q_weight
[0072/0392] saving transformer.h.4.attn.c_attn.q_scale
[0073/0392] saving transformer.h.4.attn.c_attn.bias
[0074/0392] saving transformer.h.4.attn.c_proj.q_weight
[0075/0392] saving transformer.h.4.attn.c_proj.q_scale
[0076/0392] saving transformer.h.4.attn.c_proj.bias
[0077/0392] saving transformer.h.4.ln_2.weight
[0078/0392] saving transformer.h.4.ln_2.bias
[0079/0392] saving transformer.h.4.mlp.c_fc.q_weight
[0080/0392] saving transformer.h.4.mlp.c_fc.q_scale
[0081/0392] saving transformer.h.4.mlp.c_fc.bias
[0082/0392] saving transformer.h.4.mlp.c_proj.q_weight
[0083/0392] saving transformer.h.4.mlp.c_proj.q_scale
[0084/0392] saving transformer.h.4.mlp.c_proj.bias
[0085/0392] saving transformer.h.5.ln_1.weight
[0086/0392] saving transformer.h.5.ln_1.bias
[0087/0392] saving transformer.h.5.attn.c_attn.q_weight
[0088/0392] saving transformer.h.5.attn.c_attn.q_scale
[0089/0392] saving transformer.h.5.attn.c_attn.bias
[0090/0392] saving transformer.h.5.attn.c_proj.q_weight
[0091/0392] saving transformer.h.5.attn.c_proj.q_scale
[0092/0392] saving transformer.h.5.attn.c_proj.bias
[0093/0392] saving transformer.h.5.ln_2.weight
[0094/0392] saving transformer.h.5.ln_2.bias
[0095/0392] saving transformer.h.5.mlp.c_fc.q_weight
[0096/0392] saving transformer.h.5.mlp.c_fc.q_scale
[0097/0392] saving transformer.h.5.mlp.c_fc.bias
[0098/0392] saving transformer.h.5.mlp.c_proj.q_weight
[0099/0392] saving transformer.h.5.mlp.c_proj.q_scale
[0100/0392] saving transformer.h.5.mlp.c_proj.bias
[0101/0392] saving transformer.h.6.ln_1.weight
[0102/0392] saving transformer.h.6.ln_1.bias
[0103/0392] saving transformer.h.6.attn.c_attn.q_weight
[0104/0392] saving transformer.h.6.attn.c_attn.q_scale
[0105/0392] saving transformer.h.6.attn.c_attn.bias
[0106/0392] saving transformer.h.6.attn.c_proj.q_weight
[0107/0392] saving transformer.h.6.attn.c_proj.q_scale
[0108/0392] saving transformer.h.6.attn.c_proj.bias
[0109/0392] saving transformer.h.6.ln_2.weight
[0110/0392] saving transformer.h.6.ln_2.bias
[0111/0392] saving transformer.h.6.mlp.c_fc.q_weight
[0112/0392] saving transformer.h.6.mlp.c_fc.q_scale
[0113/0392] saving transformer.h.6.mlp.c_fc.bias
[0114/0392] saving transformer.h.6.mlp.c_proj.q_weight
[0115/0392] saving transformer.h.6.mlp.c_proj.q_scale
[0116/0392] saving transformer.h.6.mlp.c_proj.bias
[0117/0392] saving transformer.h.7.ln_1.weight
[0118/0392] saving transformer.h.7.ln_1.bias
[0119/0392] saving transformer.h.7.attn.c_attn.q_weight
[0120/0392] saving transformer.h.7.attn.c_attn.q_scale
[0121/0392] saving transformer.h.7.attn.c_attn.bias
[0122/0392] saving transformer.h.7.attn.c_proj.q_weight
[0123/0392] saving transformer.h.7.attn.c_proj.q_scale
[0124/0392] saving transformer.h.7.attn.c_proj.bias
[0125/0392] saving transformer.h.7.ln_2.weight
[0126/0392] saving transformer.h.7.ln_2.bias
[0127/0392] saving transformer.h.7.mlp.c_fc.q_weight
[0128/0392] saving transformer.h.7.mlp.c_fc.q_scale
[0129/0392] saving transformer.h.7.mlp.c_fc.bias
[0130/0392] saving transformer.h.7.mlp.c_proj.q_weight
[0131/0392] saving transformer.h.7.mlp.c_proj.q_scale
[0132/0392] saving transformer.h.7.mlp.c_proj.bias
[0133/0392] saving transformer.h.8.ln_1.weight
[0134/0392] saving transformer.h.8.ln_1.bias
[0135/0392] saving transformer.h.8.attn.c_attn.q_weight
[0136/0392] saving transformer.h.8.attn.c_attn.q_scale
[0137/0392] saving transformer.h.8.attn.c_attn.bias
[0138/0392] saving transformer.h.8.attn.c_proj.q_weight
[0139/0392] saving transformer.h.8.attn.c_proj.q_scale
[0140/0392] saving transformer.h.8.attn.c_proj.bias
[0141/0392] saving transformer.h.8.ln_2.weight
[0142/0392] saving transformer.h.8.ln_2.bias
[0143/0392] saving transformer.h.8.mlp.c_fc.q_weight
[0144/0392] saving transformer.h.8.mlp.c_fc.q_scale
[0145/0392] saving transformer.h.8.mlp.c_fc.bias
[0146/0392] saving transformer.h.8.mlp.c_proj.q_weight
[0147/0392] saving transformer.h.8.mlp.c_proj.q_scale
[0148/0392] saving transformer.h.8.mlp.c_proj.bias
[0149/0392] saving transformer.h.9.ln_1.weight
[0150/0392] saving transformer.h.9.ln_1.bias
[0151/0392] saving transformer.h.9.attn.c_attn.q_weight
[0152/0392] saving transformer.h.9.attn.c_attn.q_scale
[0153/0392] saving transformer.h.9.attn.c_attn.bias
[0154/0392] saving transformer.h.9.attn.c_proj.q_weight
[0155/0392] saving transformer.h.9.attn.c_proj.q_scale
[0156/0392] saving transformer.h.9.attn.c_proj.bias
[0157/0392] saving transformer.h.9.ln_2.weight
[0158/0392] saving transformer.h.9.ln_2.bias
[0159/0392] saving transformer.h.9.mlp.c_fc.q_weight
[0160/0392] saving transformer.h.9.mlp.c_fc.q_scale
[0161/0392] saving transformer.h.9.mlp.c_fc.bias
[0162/0392] saving transformer.h.9.mlp.c_proj.q_weight
[0163/0392] saving transformer.h.9.mlp.c_proj.q_scale
[0164/0392] saving transformer.h.9.mlp.c_proj.bias
[0165/0392] saving transformer.h.10.ln_1.weight
[0166/0392] saving transformer.h.10.ln_1.bias
[0167/0392] saving transformer.h.10.attn.c_attn.q_weight
[0168/0392] saving transformer.h.10.attn.c_attn.q_scale
[0169/0392] saving transformer.h.10.attn.c_attn.bias
[0170/0392] saving transformer.h.10.attn.c_proj.q_weight
[0171/0392] saving transformer.h.10.attn.c_proj.q_scale
[0172/0392] saving transformer.h.10.attn.c_proj.bias
[0173/0392] saving transformer.h.10.ln_2.weight
[0174/0392] saving transformer.h.10.ln_2.bias
[0175/0392] saving transformer.h.10.mlp.c_fc.q_weight
[0176/0392] saving transformer.h.10.mlp.c_fc.q_scale
[0177/0392] saving transformer.h.10.mlp.c_fc.bias
[0178/0392] saving transformer.h.10.mlp.c_proj.q_weight
[0179/0392] saving transformer.h.10.mlp.c_proj.q_scale
[0180/0392] saving transformer.h.10.mlp.c_proj.bias
[0181/0392] saving transformer.h.11.ln_1.weight
[0182/0392] saving transformer.h.11.ln_1.bias
[0183/0392] saving transformer.h.11.attn.c_attn.q_weight
[0184/0392] saving transformer.h.11.attn.c_attn.q_scale
[0185/0392] saving transformer.h.11.attn.c_attn.bias
[0186/0392] saving transformer.h.11.attn.c_proj.q_weight
[0187/0392] saving transformer.h.11.attn.c_proj.q_scale
[0188/0392] saving transformer.h.11.attn.c_proj.bias
[0189/0392] saving transformer.h.11.ln_2.weight
[0190/0392] saving transformer.h.11.ln_2.bias
[0191/0392] saving transformer.h.11.mlp.c_fc.q_weight
[0192/0392] saving transformer.h.11.mlp.c_fc.q_scale
[0193/0392] saving transformer.h.11.mlp.c_fc.bias
[0194/0392] saving transformer.h.11.mlp.c_proj.q_weight
[0195/0392] saving transformer.h.11.mlp.c_proj.q_scale
[0196/0392] saving transformer.h.11.mlp.c_proj.bias
[0197/0392] saving transformer.h.12.ln_1.weight
[0198/0392] saving transformer.h.12.ln_1.bias
[0199/0392] saving transformer.h.12.attn.c_attn.q_weight
[0200/0392] saving transformer.h.12.attn.c_attn.q_scale
[0201/0392] saving transformer.h.12.attn.c_attn.bias
[0202/0392] saving transformer.h.12.attn.c_proj.q_weight
[0203/0392] saving transformer.h.12.attn.c_proj.q_scale
[0204/0392] saving transformer.h.12.attn.c_proj.bias
[0205/0392] saving transformer.h.12.ln_2.weight
[0206/0392] saving transformer.h.12.ln_2.bias
[0207/0392] saving transformer.h.12.mlp.c_fc.q_weight
[0208/0392] saving transformer.h.12.mlp.c_fc.q_scale
[0209/0392] saving transformer.h.12.mlp.c_fc.bias
[0210/0392] saving transformer.h.12.mlp.c_proj.q_weight
[0211/0392] saving transformer.h.12.mlp.c_proj.q_scale
[0212/0392] saving transformer.h.12.mlp.c_proj.bias
[0213/0392] saving transformer.h.13.ln_1.weight
[0214/0392] saving transformer.h.13.ln_1.bias
[0215/0392] saving transformer.h.13.attn.c_attn.q_weight
[0216/0392] saving transformer.h.13.attn.c_attn.q_scale
[0217/0392] saving transformer.h.13.attn.c_attn.bias
[0218/0392] saving transformer.h.13.attn.c_proj.q_weight
[0219/0392] saving transformer.h.13.attn.c_proj.q_scale
[0220/0392] saving transformer.h.13.attn.c_proj.bias
[0221/0392] saving transformer.h.13.ln_2.weight
[0222/0392] saving transformer.h.13.ln_2.bias
[0223/0392] saving transformer.h.13.mlp.c_fc.q_weight
[0224/0392] saving transformer.h.13.mlp.c_fc.q_scale
[0225/0392] saving transformer.h.13.mlp.c_fc.bias
[0226/0392] saving transformer.h.13.mlp.c_proj.q_weight
[0227/0392] saving transformer.h.13.mlp.c_proj.q_scale
[0228/0392] saving transformer.h.13.mlp.c_proj.bias
[0229/0392] saving transformer.h.14.ln_1.weight
[0230/0392] saving transformer.h.14.ln_1.bias
[0231/0392] saving transformer.h.14.attn.c_attn.q_weight
[0232/0392] saving transformer.h.14.attn.c_attn.q_scale
[0233/0392] saving transformer.h.14.attn.c_attn.bias
[0234/0392] saving transformer.h.14.attn.c_proj.q_weight
[0235/0392] saving transformer.h.14.attn.c_proj.q_scale
[0236/0392] saving transformer.h.14.attn.c_proj.bias
[0237/0392] saving transformer.h.14.ln_2.weight
[0238/0392] saving transformer.h.14.ln_2.bias
[0239/0392] saving transformer.h.14.mlp.c_fc.q_weight
[0240/0392] saving transformer.h.14.mlp.c_fc.q_scale
[0241/0392] saving transformer.h.14.mlp.c_fc.bias
[0242/0392] saving transformer.h.14.mlp.c_proj.q_weight
[0243/0392] saving transformer.h.14.mlp.c_proj.q_scale
[0244/0392] saving transformer.h.14.mlp.c_proj.bias
[0245/0392] saving transformer.h.15.ln_1.weight
[0246/0392] saving transformer.h.15.ln_1.bias
[0247/0392] saving transformer.h.15.attn.c_attn.q_weight
[0248/0392] saving transformer.h.15.attn.c_attn.q_scale
[0249/0392] saving transformer.h.15.attn.c_attn.bias
[0250/0392] saving transformer.h.15.attn.c_proj.q_weight
[0251/0392] saving transformer.h.15.attn.c_proj.q_scale
[0252/0392] saving transformer.h.15.attn.c_proj.bias
[0253/0392] saving transformer.h.15.ln_2.weight
[0254/0392] saving transformer.h.15.ln_2.bias
[0255/0392] saving transformer.h.15.mlp.c_fc.q_weight
[0256/0392] saving transformer.h.15.mlp.c_fc.q_scale
[0257/0392] saving transformer.h.15.mlp.c_fc.bias
[0258/0392] saving transformer.h.15.mlp.c_proj.q_weight
[0259/0392] saving transformer.h.15.mlp.c_proj.q_scale
[0260/0392] saving transformer.h.15.mlp.c_proj.bias
[0261/0392] saving transformer.h.16.ln_1.weight
[0262/0392] saving transformer.h.16.ln_1.bias
[0263/0392] saving transformer.h.16.attn.c_attn.q_weight
[0264/0392] saving transformer.h.16.attn.c_attn.q_scale
[0265/0392] saving transformer.h.16.attn.c_attn.bias
[0266/0392] saving transformer.h.16.attn.c_proj.q_weight
[0267/0392] saving transformer.h.16.attn.c_proj.q_scale
[0268/0392] saving transformer.h.16.attn.c_proj.bias
[0269/0392] saving transformer.h.16.ln_2.weight
[0270/0392] saving transformer.h.16.ln_2.bias
[0271/0392] saving transformer.h.16.mlp.c_fc.q_weight
[0272/0392] saving transformer.h.16.mlp.c_fc.q_scale
[0273/0392] saving transformer.h.16.mlp.c_fc.bias
[0274/0392] saving transformer.h.16.mlp.c_proj.q_weight
[0275/0392] saving transformer.h.16.mlp.c_proj.q_scale
[0276/0392] saving transformer.h.16.mlp.c_proj.bias
[0277/0392] saving transformer.h.17.ln_1.weight
[0278/0392] saving transformer.h.17.ln_1.bias
[0279/0392] saving transformer.h.17.attn.c_attn.q_weight
[0280/0392] saving transformer.h.17.attn.c_attn.q_scale
[0281/0392] saving transformer.h.17.attn.c_attn.bias
[0282/0392] saving transformer.h.17.attn.c_proj.q_weight
[0283/0392] saving transformer.h.17.attn.c_proj.q_scale
[0284/0392] saving transformer.h.17.attn.c_proj.bias
[0285/0392] saving transformer.h.17.ln_2.weight
[0286/0392] saving transformer.h.17.ln_2.bias
[0287/0392] saving transformer.h.17.mlp.c_fc.q_weight
[0288/0392] saving transformer.h.17.mlp.c_fc.q_scale
[0289/0392] saving transformer.h.17.mlp.c_fc.bias
[0290/0392] saving transformer.h.17.mlp.c_proj.q_weight
[0291/0392] saving transformer.h.17.mlp.c_proj.q_scale
[0292/0392] saving transformer.h.17.mlp.c_proj.bias
[0293/0392] saving transformer.h.18.ln_1.weight
[0294/0392] saving transformer.h.18.ln_1.bias
[0295/0392] saving transformer.h.18.attn.c_attn.q_weight
[0296/0392] saving transformer.h.18.attn.c_attn.q_scale
[0297/0392] saving transformer.h.18.attn.c_attn.bias
[0298/0392] saving transformer.h.18.attn.c_proj.q_weight
[0299/0392] saving transformer.h.18.attn.c_proj.q_scale
[0300/0392] saving transformer.h.18.attn.c_proj.bias
[0301/0392] saving transformer.h.18.ln_2.weight
[0302/0392] saving transformer.h.18.ln_2.bias
[0303/0392] saving transformer.h.18.mlp.c_fc.q_weight
[0304/0392] saving transformer.h.18.mlp.c_fc.q_scale
[0305/0392] saving transformer.h.18.mlp.c_fc.bias
[0306/0392] saving transformer.h.18.mlp.c_proj.q_weight
[0307/0392] saving transformer.h.18.mlp.c_proj.q_scale
[0308/0392] saving transformer.h.18.mlp.c_proj.bias
[0309/0392] saving transformer.h.19.ln_1.weight
[0310/0392] saving transformer.h.19.ln_1.bias
[0311/0392] saving transformer.h.19.attn.c_attn.q_weight
[0312/0392] saving transformer.h.19.attn.c_attn.q_scale
[0313/0392] saving transformer.h.19.attn.c_attn.bias
[0314/0392] saving transformer.h.19.attn.c_proj.q_weight
[0315/0392] saving transformer.h.19.attn.c_proj.q_scale
[0316/0392] saving transformer.h.19.attn.c_proj.bias
[0317/0392] saving transformer.h.19.ln_2.weight
[0318/0392] saving transformer.h.19.ln_2.bias
[0319/0392] saving transformer.h.19.mlp.c_fc.q_weight
[0320/0392] saving transformer.h.19.mlp.c_fc.q_scale
[0321/0392] saving transformer.h.19.mlp.c_fc.bias
[0322/0392] saving transformer.h.19.mlp.c_proj.q_weight
[0323/0392] saving transformer.h.19.mlp.c_proj.q_scale
[0324/0392] saving transformer.h.19.mlp.c_proj.bias
[0325/0392] saving transformer.h.20.ln_1.weight
[0326/0392] saving transformer.h.20.ln_1.bias
[0327/0392] saving transformer.h.20.attn.c_attn.q_weight
[0328/0392] saving transformer.h.20.attn.c_attn.q_scale
[0329/0392] saving transformer.h.20.attn.c_attn.bias
[0330/0392] saving transformer.h.20.attn.c_proj.q_weight
[0331/0392] saving transformer.h.20.attn.c_proj.q_scale
[0332/0392] saving transformer.h.20.attn.c_proj.bias
[0333/0392] saving transformer.h.20.ln_2.weight
[0334/0392] saving transformer.h.20.ln_2.bias
[0335/0392] saving transformer.h.20.mlp.c_fc.q_weight
[0336/0392] saving transformer.h.20.mlp.c_fc.q_scale
[0337/0392] saving transformer.h.20.mlp.c_fc.bias
[0338/0392] saving transformer.h.20.mlp.c_proj.q_weight
[0339/0392] saving transformer.h.20.mlp.c_proj.q_scale
[0340/0392] saving transformer.h.20.mlp.c_proj.bias
[0341/0392] saving transformer.h.21.ln_1.weight
[0342/0392] saving transformer.h.21.ln_1.bias
[0343/0392] saving transformer.h.21.attn.c_attn.q_weight
[0344/0392] saving transformer.h.21.attn.c_attn.q_scale
[0345/0392] saving transformer.h.21.attn.c_attn.bias
[0346/0392] saving transformer.h.21.attn.c_proj.q_weight
[0347/0392] saving transformer.h.21.attn.c_proj.q_scale
[0348/0392] saving transformer.h.21.attn.c_proj.bias
[0349/0392] saving transformer.h.21.ln_2.weight
[0350/0392] saving transformer.h.21.ln_2.bias
[0351/0392] saving transformer.h.21.mlp.c_fc.q_weight
[0352/0392] saving transformer.h.21.mlp.c_fc.q_scale
[0353/0392] saving transformer.h.21.mlp.c_fc.bias
[0354/0392] saving transformer.h.21.mlp.c_proj.q_weight
[0355/0392] saving transformer.h.21.mlp.c_proj.q_scale
[0356/0392] saving transformer.h.21.mlp.c_proj.bias
[0357/0392] saving transformer.h.22.ln_1.weight
[0358/0392] saving transformer.h.22.ln_1.bias
[0359/0392] saving transformer.h.22.attn.c_attn.q_weight
[0360/0392] saving transformer.h.22.attn.c_attn.q_scale
[0361/0392] saving transformer.h.22.attn.c_attn.bias
[0362/0392] saving transformer.h.22.attn.c_proj.q_weight
[0363/0392] saving transformer.h.22.attn.c_proj.q_scale
[0364/0392] saving transformer.h.22.attn.c_proj.bias
[0365/0392] saving transformer.h.22.ln_2.weight
[0366/0392] saving transformer.h.22.ln_2.bias
[0367/0392] saving transformer.h.22.mlp.c_fc.q_weight
[0368/0392] saving transformer.h.22.mlp.c_fc.q_scale
[0369/0392] saving transformer.h.22.mlp.c_fc.bias
[0370/0392] saving transformer.h.22.mlp.c_proj.q_weight
[0371/0392] saving transformer.h.22.mlp.c_proj.q_scale
[0372/0392] saving transformer.h.22.mlp.c_proj.bias
[0373/0392] saving transformer.h.23.ln_1.weight
[0374/0392] saving transformer.h.23.ln_1.bias
[0375/0392] saving transformer.h.23.attn.c_attn.q_weight
[0376/0392] saving transformer.h.23.attn.c_attn.q_scale[2024-01-08 19:18:53] INFO convert_weight.py:141: Saved to directory: [1m/home/junrushao/tmp/tmp50xzwpqb[0m |
|
[0377/0392] saving transformer.h.23.attn.c_attn.bias
[0378/0392] saving transformer.h.23.attn.c_proj.q_weight
[0379/0392] saving transformer.h.23.attn.c_proj.q_scale
[0380/0392] saving transformer.h.23.attn.c_proj.bias
[0381/0392] saving transformer.h.23.ln_2.weight
[0382/0392] saving transformer.h.23.ln_2.bias
[0383/0392] saving transformer.h.23.mlp.c_fc.q_weight
[0384/0392] saving transformer.h.23.mlp.c_fc.q_scale
[0385/0392] saving transformer.h.23.mlp.c_fc.bias
[0386/0392] saving transformer.h.23.mlp.c_proj.q_weight
[0387/0392] saving transformer.h.23.mlp.c_proj.q_scale
[0388/0392] saving transformer.h.23.mlp.c_proj.bias
[0389/0392] saving transformer.ln_f.weight
[0390/0392] saving transformer.ln_f.bias
[0391/0392] saving lm_head.q_weight
[0392/0392] saving lm_head.q_scale |
| All finished, 21 total shards committed, record saved to /home/junrushao/tmp/tmp50xzwpqb/ndarray-cache.json |
|
|