n24q02m/Qwen3-Reranker-0.6B-ONNX

ONNX-optimized version of Qwen/Qwen3-Reranker-0.6B for use with qwen3-embed and fastembed.

Available Variants

Variant File Size Description
INT8 onnx/model_quantized.onnx 573 MB Dynamic INT8 quantization (default)
Q4F16 onnx/model_q4f16.onnx 517 MB INT4 weights + FP16 activations
YesNo INT8 onnx/model_yesno_quantized.onnx 572 MB YES/NO-only logits (last token, 150x smaller output)
YesNo Q4F16 onnx/model_yesno_q4f16.onnx 517 MB YES/NO-only logits + INT4 weights

Usage

# INT8 (default)
from qwen3_embed import TextCrossEncoder
model = TextCrossEncoder("Qwen/Qwen3-Reranker-0.6B")

# Q4F16 (smaller, slightly less accurate)
model = TextCrossEncoder("Qwen/Qwen3-Reranker-0.6B-Q4F16")

Conversion Details

  • Source: Qwen/Qwen3-Reranker-0.6B
  • ONNX opset: 21
  • INT8: onnxruntime.quantization.quantize_dynamic (QInt8)
  • Q4F16: MatMulNBitsQuantizer (block_size=128, symmetric) + FP16 cast

Related

Downloads last month
197
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for n24q02m/Qwen3-Reranker-0.6B-ONNX

Quantized
(62)
this model