Qwen3-Reranker-4B β OpenVINO IR (INT8 weight-only, asymmetric)
This is a redistribution. For the model's intended use, instruction format, full evaluation (MTEB-R / CMTEB-R / MMTEB-R / MLDR / MTEB-Code / FollowIR), and citation, please see the upstream card: Qwen/Qwen3-Reranker-4B.
OpenVINO IR conversion of
Qwen/Qwen3-Reranker-4B,
weight-only quantized to INT8 asymmetric via NNCF. Intended for the
OpenArc reranker engine and
optimum-intel pipelines targeting Intel CPUs / iGPUs / dGPUs / NPUs.
Files
openvino_model.{xml,bin}β Qwen3 (4B) decoder, INT8 weights (~4.0 GB)openvino_tokenizer.{xml,bin}/openvino_detokenizer.{xml,bin}β OpenVINO Tokenizers IRchat_template.jinja,generation_config.json- Standard HF tokenizer files:
tokenizer.json,tokenizer_config.json,special_tokens_map.json,vocab.json,merges.txt LICENSE,NOTICEβ Apache-2.0 with attribution to the upstream Qwen Team.
Architecture
| Base model | Qwen3 ForCausalLM (Qwen3-4B-Base) |
| Hidden size | 2560 |
| Layers | 36 |
| Attention heads / KV heads | 32 / 8 |
| Max position | 40 960 |
| Vocabulary | 151 669 |
| Source dtype | bfloat16 |
| Quantization | NNCF INT8 weight-only, asymmetric |
Usage with OpenArc
openarc add qwen3-4b-reranker \
--model-path /path/to/Qwen3-Reranker-4B-int8-ov \
--model-type rerank \
--engine optimum \
--device GPU
openarc serve
# POST /v1/rerank {"model": "qwen3-4b-reranker", "query": "...", "documents": [...]}
Conversion notes
The standard CLI route currently fails silently for this model on
optimum-intel @ HEAD:
optimum-cli export openvino --weight-format int8 \
--model Qwen/Qwen3-Reranker-4B ./out
# exits 0; openvino_model.xml is 0 bytes, openvino_model.bin is a ~13 MB stub
The Python API path produces a usable model:
from optimum.intel import OVModelForCausalLM, OVWeightQuantizationConfig
quant = OVWeightQuantizationConfig(bits=8, sym=False)
m = OVModelForCausalLM.from_pretrained(
"Qwen/Qwen3-Reranker-4B",
export=True,
quantization_config=quant,
trust_remote_code=True,
)
m.save_pretrained("./out")
Tokenizer / detokenizer IR generated separately via
openvino_tokenizers.convert_tokenizer(..., with_detokenizer=True).
A more detailed walkthrough lives in OpenArc/docs/openvino_qwen3.md.
License
Apache-2.0, inherited from
Qwen/Qwen3-Reranker-4B.
See LICENSE and NOTICE in this repo.
Citation
From the upstream model card:
@article{qwen3embedding,
title={Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models},
author={Zhang, Yanzhao and Li, Mingxin and Long, Dingkun and Zhang, Xin and Lin, Huan and Yang, Baosong and Xie, Pengjun and Yang, An and Liu, Dayiheng and Lin, Junyang and Huang, Fei and Zhou, Jingren},
journal={arXiv preprint arXiv:2506.05176},
year={2025}
}
- Downloads last month
- 43