PIXIE-Rune-v1.0 β ONNX Quantized Variants
ONNX-quantized derivatives of telepix/PIXIE-Rune-v1.0, an encoder-based multilingual embedding model developed by TelePIX Co., Ltd. optimized for semantic retrieval across 74 languages with specialization in Korean/English aerospace domain applications.
Original model:
telepix/PIXIE-Rune-v1.0β safetensors weights + FP32 ONNX (onnx/model.onnx+onnx/model.onnx_data). This repo adds INT8 and INT4 quantized ONNX variants for CPU-efficient deployment.
Model Description
| Property | Value |
|---|---|
| Base model | telepix/PIXIE-Rune-v1.0 (XLM-RoBERTa-large) |
| Architecture | Transformer encoder |
| Output dimensionality | 1024 |
| Pooling | Mean pooling + L2 normalize |
| Max sequence length | 6,000 tokens |
| Languages | 74 (XLM-RoBERTa vocabulary: 250,002 tokens) |
| Domain | General multilingual + aerospace specialization |
| License | Apache 2.0 |
ONNX Variants
| File | Quantization | Size | Avg cos vs FP32 | Pearson r | MRR | Notes |
|---|---|---|---|---|---|---|
onnx/model_quantized.onnx |
INT8 dynamic | 542 MB | 0.969 | 0.998 | 1.00 | quantize_dynamic, all weights |
onnx/model_int4.onnx |
INT4 + INT8 emb | 434 MB | 0.941 | 0.998 | 1.00 | MatMulNBits + INT8 Gather |
onnx/model_int4_full.onnx |
INT4 full | 337 MB | 0.941 | 0.998 | 1.00 | MatMulNBits + INT4 Gather (opset 21) |
Metrics measured on 8 semantically diverse sentences vs FP32 reference. Pearson r = correlation of pairwise cosine similarity matrices (structure preservation). MRR = Mean Reciprocal Rank on a retrieval probe β 1.00 = perfect retrieval ranking preserved.
Quantization methodology
The XLM-RoBERTa vocabulary has 250,002 tokens Γ 1024 dimensions, making the word embedding table the dominant weight (~977 MB FP32). Each variant handles it differently:
- INT8 (
model_quantized.onnx):onnxruntime.quantization.quantize_dynamic(weight_type=QInt8)β quantizes all weight tensors including the embedding Gather to INT8. Compact, maximum compatibility. - INT4 + INT8 emb (
model_int4.onnx): Two-pass. Pass 1:MatMulNBitsQuantizer(block_size=32, symmetric=True)packs transformer MatMul weights to 4-bit nibbles. Pass 2:quantize_dynamic(op_types=["Gather"], weight_type=QInt8)brings the embedding table from 977 MB FP32 β 244 MB INT8. - INT4 full (
model_int4_full.onnx): Same MatMulNBits pass, then manualDequantizeLinear(axis=0)node insertion packs the embedding table as per-row symmetric INT4 nibbles (scale = max(|row|)/7). Requires opset upgrade 14β21. Embedding: 977 MB β 122 MB.
Usage
fastembed (Rust)
This repo is integrated in fastembed-rs:
use fastembed::{EmbeddingModel, InitOptions, TextEmbedding};
// INT8 β most compatible, 542 MB
let model = TextEmbedding::try_new(InitOptions::new(EmbeddingModel::PixieRuneV1Q))?;
// INT4 + INT8 embedding β 434 MB
let model = TextEmbedding::try_new(InitOptions::new(EmbeddingModel::PixieRuneV1Int4))?;
// INT4 full β smallest, 337 MB
let model = TextEmbedding::try_new(InitOptions::new(EmbeddingModel::PixieRuneV1Int4Full))?;
let embeddings = model.embed(vec!["μλ
νμΈμ", "Hello world"], None)?;
ONNX Runtime (Python)
import onnxruntime as ort
import numpy as np
from tokenizers import Tokenizer
tokenizer = Tokenizer.from_file("tokenizer.json")
tokenizer.enable_truncation(max_length=512)
tokenizer.enable_padding(pad_token="<pad>", pad_id=1)
session = ort.InferenceSession("onnx/model_quantized.onnx",
providers=["CPUExecutionProvider"])
texts = ["ν
λ ν½μ€λ μ΄λ€ μ°μ
λΆμΌμμ μμ± λ°μ΄ν°λ₯Ό νμ©νλμ?",
"ν
λ ν½μ€λ ν΄μ, μμ, λμ
λ± λ€μν λΆμΌμμ μμ± λ°μ΄ν°λ₯Ό λΆμνμ¬ μλΉμ€λ₯Ό μ 곡ν©λλ€."]
enc = tokenizer.encode_batch(texts)
ids = np.array([e.ids for e in enc], dtype=np.int64)
mask = np.array([e.attention_mask for e in enc], dtype=np.int64)
out = session.run(None, {"input_ids": ids, "attention_mask": mask})[0] # (batch, seq, 1024)
# Mean pooling + L2 normalize
pooled = (out * mask[..., None]).sum(1) / mask.sum(1, keepdims=True).clip(1e-9)
norms = np.linalg.norm(pooled, axis=-1, keepdims=True)
embeddings = pooled / norms.clip(1e-12)
# cosine similarity
scores = embeddings @ embeddings.T
print(scores)
sentence-transformers (original FP32 weights)
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("telepix/PIXIE-Rune-v1.0")
queries = ["ν
λ ν½μ€λ μ΄λ€ μ°μ
λΆμΌμμ μμ± λ°μ΄ν°λ₯Ό νμ©νλμ?",
"κ΅λ°© λΆμΌμ μ΄λ€ μμ± μλΉμ€κ° μ 곡λλμ?"]
documents = ["ν
λ ν½μ€λ ν΄μ, μμ, λμ
λ± λ€μν λΆμΌμμ μμ± λ°μ΄ν°λ₯Ό λΆμνμ¬ μλΉμ€λ₯Ό μ 곡ν©λλ€.",
"μ μ°° λ° κ°μ λͺ©μ μ μμ± μμμ ν΅ν΄ κ΅λ°© κ΄λ ¨ μ λ° λΆμ μλΉμ€λ₯Ό μ 곡ν©λλ€."]
q_emb = model.encode(queries, prompt_name="query")
d_emb = model.encode(documents)
scores = model.similarity(q_emb, d_emb)
print(scores)
Quality Benchmarks (original model)
Results from telepix/PIXIE-Rune-v1.0, evaluated using Korean-MTEB-Retrieval-Evaluators.
6 Datasets of MTEB (Korean)
| Model | # params | Avg. NDCG | NDCG@1 | NDCG@3 | NDCG@5 | NDCG@10 |
|---|---|---|---|---|---|---|
| telepix/PIXIE-Spell-Preview-1.7B | 1.7B | 0.7567 | 0.7149 | 0.7541 | 0.7696 | 0.7882 |
| telepix/PIXIE-Spell-Preview-0.6B | 0.6B | 0.7280 | 0.6804 | 0.7258 | 0.7448 | 0.7612 |
| telepix/PIXIE-Rune-v1.0 | 0.5B | 0.7383 | 0.6936 | 0.7356 | 0.7545 | 0.7698 |
| telepix/PIXIE-Splade-Preview | 0.1B | 0.7253 | 0.6799 | 0.7217 | 0.7416 | 0.7579 |
| nlpai-lab/KURE-v1 | 0.5B | 0.7312 | 0.6826 | 0.7303 | 0.7478 | 0.7642 |
| BAAI/bge-m3 | 0.5B | 0.7126 | 0.6613 | 0.7107 | 0.7301 | 0.7483 |
| Snowflake/snowflake-arctic-embed-l-v2.0 | 0.5B | 0.7050 | 0.6570 | 0.7015 | 0.7226 | 0.7390 |
| Qwen/Qwen3-Embedding-0.6B | 0.6B | 0.6872 | 0.6423 | 0.6833 | 0.7017 | 0.7215 |
| jinaai/jina-embeddings-v3 | 0.5B | 0.6731 | 0.6224 | 0.6715 | 0.6899 | 0.7088 |
| openai/text-embedding-3-large | N/A | 0.6465 | 0.5895 | 0.6467 | 0.6646 | 0.6853 |
Benchmarks: Ko-StrategyQA, AutoRAGRetrieval, MIRACLRetrieval, PublicHealthQA, BelebeleRetrieval, MultiLongDocRetrieval.
7 Datasets of BEIR (English)
| Model | # params | Avg. NDCG | NDCG@1 | NDCG@3 | NDCG@5 | NDCG@10 |
|---|---|---|---|---|---|---|
| Snowflake/snowflake-arctic-embed-l-v2.0 | 0.5B | 0.5812 | 0.5725 | 0.5705 | 0.5811 | 0.6006 |
| telepix/PIXIE-Rune-v1.0 | 0.5B | 0.5781 | 0.5691 | 0.5663 | 0.5791 | 0.5979 |
| telepix/PIXIE-Spell-Preview-1.7B | 1.7B | 0.5630 | 0.5446 | 0.5529 | 0.5660 | 0.5885 |
| Qwen/Qwen3-Embedding-0.6B | 0.6B | 0.5558 | 0.5321 | 0.5451 | 0.5620 | 0.5839 |
| Alibaba-NLP/gte-multilingual-base | 0.3B | 0.5541 | 0.5446 | 0.5426 | 0.5574 | 0.5746 |
| BAAI/bge-m3 | 0.5B | 0.5318 | 0.5078 | 0.5231 | 0.5389 | 0.5573 |
| jinaai/jina-embeddings-v3 | 0.6B | 0.4482 | 0.4116 | 0.4379 | 0.4573 | 0.4861 |
Benchmarks: ArguAna, FEVER, FiQA-2018, HotpotQA, MSMARCO, NQ, SCIDOCS.
License
Apache 2.0 β same as the original telepix/PIXIE-Rune-v1.0.
Citation
@software{TelePIX-PIXIE-Rune-v1,
title = {PIXIE-Rune-v1.0},
author = {TelePIX AI Research Team and Bongmin Kim},
year = {2025},
url = {https://huggingface.co/telepix/PIXIE-Rune-v1.0}
}
Contact
Original model authors: bmkim@telepix.net ONNX quantization: cstr β open an issue on this repo for questions.
- Downloads last month
- 101
Model tree for cstr/PIXIE-Rune-v1.0-ONNX
Base model
telepix/PIXIE-Rune-v1.0