Models in CI - a nm-testing Collection

nm-testing 's Collections

KV Cache Quantization

FP8-Block Quantized Models

LLM Compressor testing

Speculators testing

Sparse-Llama-3.1-8B-2of4

Models in CI

updated 10 days ago

nm-testing/Meta-Llama-3-8B-Instruct-W8A8-FP8-Channelwise-compressed-tensors

Text Generation • 8B • Updated Oct 9, 2024 • 20 • 2
nm-testing/Meta-Llama-3-8B-Instruct-FBGEMM-nonuniform

Text Generation • 8B • Updated Jul 20, 2024 • 1
nm-testing/Meta-Llama-3-8B-FP8-compressed-tensors-test

Text Generation • 8B • Updated Oct 9, 2024 • 6.26k
nm-testing/Meta-Llama-3-8B-Instruct-W8-Channel-A8-Dynamic-Asym-Per-Token-Test

8B • Updated Oct 9, 2024 • 7 • 1
nm-testing/Meta-Llama-3-8B-Instruct-W8-Channel-A8-Dynamic-Per-Token-Test

Text Generation • 8B • Updated Oct 9, 2024 • 76
nm-testing/Meta-Llama-3-8B-Instruct-nonuniform-test

Text Generation • 8B • Updated Oct 9, 2024 • 11.5k
nm-testing/Meta-Llama-3-70B-Instruct-FBGEMM-nonuniform

Text Generation • 71B • Updated Jul 20, 2024 • 2.04k • 1
nm-testing/Qwen1.5-MoE-A2.7B-Chat-quantized.w4a16

2B • Updated Feb 24, 2025 • 33.1k • 1
nm-testing/Qwen2-1.5B-Instruct-FP8W8

Text Generation • 2B • Updated Oct 9, 2024 • 3
nm-testing/SparseLlama-3.1-8B-gsm8k-pruned.2of4-chnl_wts_per_tok_dyn_act_fp8-BitM

5B • Updated Dec 17, 2024
nm-testing/tinyllama-oneshot-w8w8-test-static-shape-change

Text Generation • 1B • Updated Oct 9, 2024 • 27.9k
nm-testing/pixtral-12b-FP8-dynamic

Image-Text-to-Text • Updated Apr 11, 2025 • 715 • 1
RedHatAI/Mistral-Small-3.1-24B-Instruct-2503-FP8-dynamic

Image-Text-to-Text • 24B • Updated Oct 29, 2025 • 164k • 9
nm-testing/Llama-3.2-1B-Instruct-FP8-KV

1B • Updated Nov 1, 2024
nm-testing/tinyllama-oneshot-w8a8-channel-dynamic-token-v2

Text Generation • 1B • Updated Oct 9, 2024 • 9.52k
nm-testing/tinyllama-oneshot-w8-channel-a8-tensor

Text Generation • 1B • Updated Oct 9, 2024 • 647
RedHatAI/Llama-3.2-1B-quantized.w8a8

1B • Updated Jan 16, 2025 • 21.9k • 1
nm-testing/tinyllama-oneshot-w8a8-dynamic-token-v2

Text Generation • 1B • Updated Oct 9, 2024 • 6.28k
nm-testing/asym-w8w8-int8-static-per-tensor-tiny-llama

1B • Updated Oct 9, 2024 • 2.51k
nm-testing/Meta-Llama-3-8B-Instruct-W8A8-Static-Per-Tensor-Sym

8B • Updated Dec 10, 2024 • 1
nm-testing/Meta-Llama-3-8B-Instruct-W8A8-Static-Per-Tensor-Asym

8B • Updated Dec 11, 2024 • 7
nm-testing/TinyLlama-1.1B-Chat-v1.0-gsm8k-pruned.2of4-chnl_wts_per_tok_dyn_act_int8-BitM

0.7B • Updated Dec 17, 2024 • 14
nm-testing/TinyLlama-1.1B-Chat-v1.0-gsm8k-pruned.2of4-chnl_wts_tensor_act_int8-BitM

0.7B • Updated Dec 17, 2024 • 14
nm-testing/TinyLlama-1.1B-Chat-v1.0-gsm8k-pruned.2of4-tensor_wts_per_tok_dyn_act_int8-BitM

0.7B • Updated Dec 17, 2024 • 16
nm-testing/TinyLlama-1.1B-Chat-v1.0-gsm8k-pruned.2of4-tensor_wts_tensor_act_int8-BitM

0.7B • Updated Dec 17, 2024 • 17
nm-testing/TinyLlama-1.1B-Chat-v1.0-INT8-Dynamic-IA-Per-Channel-Weight-testing

1B • Updated Dec 8, 2024 • 17
nm-testing/TinyLlama-1.1B-Chat-v1.0-INT8-Static-testing

1B • Updated Dec 8, 2024 • 15
nm-testing/TinyLlama-1.1B-Chat-v1.0-INT8-Dynamic-IA-Per-Tensor-Weight-testing

1B • Updated Dec 8, 2024 • 16
nm-testing/TinyLlama-1.1B-Chat-v1.0-2of4-Sparse-Dense-Compressor

1B • Updated Dec 8, 2024 • 22
nm-testing/llama2.c-stories42M-pruned2.4-compressed

48.6M • Updated Jan 22, 2025 • 18
nm-testing/TinyLlama-1.1B-Chat-v1.0-NVFP4

0.7B • Updated Jun 6, 2025 • 2.42k
nm-testing/Llama-3.2-1B-Instruct-spinquantR1R2R4-w4a16

0.7B • Updated Aug 22, 2025 • 2.43k
nm-testing/Llama-3.2-1B-Instruct-quip-w4a16

0.8B • Updated Sep 12, 2025 • 2.42k
nm-testing/tinyllama-oneshot-w4a16-channel-v2

Text Generation • 0.3B • Updated Oct 9, 2024 • 3.28k • 1
nm-testing/test-w4a16-mixtral-actorder-group

6B • Updated Dec 26, 2024 • 1.38k
nm-testing/TinyLlama-1.1B-Chat-v1.0-kvcache-fp8-attn_head

1B • Updated Jan 14 • 31
nm-testing/TinyLlama-1.1B-Chat-v1.0-kvcache-fp8-tensor

1B • Updated Jan 14 • 2.41k
nm-testing/Qwen3-30B-A3B-MXFP4A16

17B • Updated 10 days ago • 4.46k