n24q02m/Qwen3-Reranker-0.6B-ONNX
ONNX-optimized version of Qwen/Qwen3-Reranker-0.6B for use with qwen3-embed and fastembed.
Available Variants
| Variant | File | Size | Description |
|---|---|---|---|
| INT8 | onnx/model_quantized.onnx |
573 MB | Dynamic INT8 quantization (default) |
| Q4F16 | onnx/model_q4f16.onnx |
517 MB | INT4 weights + FP16 activations |
| YesNo INT8 | onnx/model_yesno_quantized.onnx |
572 MB | YES/NO-only logits (last token, 150x smaller output) |
| YesNo Q4F16 | onnx/model_yesno_q4f16.onnx |
517 MB | YES/NO-only logits + INT4 weights |
Usage
# INT8 (default)
from qwen3_embed import TextCrossEncoder
model = TextCrossEncoder("Qwen/Qwen3-Reranker-0.6B")
# Q4F16 (smaller, slightly less accurate)
model = TextCrossEncoder("Qwen/Qwen3-Reranker-0.6B-Q4F16")
Conversion Details
- Source: Qwen/Qwen3-Reranker-0.6B
- ONNX opset: 21
- INT8:
onnxruntime.quantization.quantize_dynamic(QInt8) - Q4F16:
MatMulNBitsQuantizer(block_size=128, symmetric) + FP16 cast
Related
- GGUF variants: n24q02m/Qwen3-Reranker-0.6B-GGUF
- Downloads last month
- 197