PIXIE-Rune-v1.0 β€” ONNX Quantized Variants

ONNX-quantized derivatives of telepix/PIXIE-Rune-v1.0, an encoder-based multilingual embedding model developed by TelePIX Co., Ltd. optimized for semantic retrieval across 74 languages with specialization in Korean/English aerospace domain applications.

Original model: telepix/PIXIE-Rune-v1.0 β€” safetensors weights + FP32 ONNX (onnx/model.onnx + onnx/model.onnx_data). This repo adds INT8 and INT4 quantized ONNX variants for CPU-efficient deployment.


Model Description

Property Value
Base model telepix/PIXIE-Rune-v1.0 (XLM-RoBERTa-large)
Architecture Transformer encoder
Output dimensionality 1024
Pooling Mean pooling + L2 normalize
Max sequence length 6,000 tokens
Languages 74 (XLM-RoBERTa vocabulary: 250,002 tokens)
Domain General multilingual + aerospace specialization
License Apache 2.0

ONNX Variants

File Quantization Size Avg cos vs FP32 Pearson r MRR Notes
onnx/model_quantized.onnx INT8 dynamic 542 MB 0.969 0.998 1.00 quantize_dynamic, all weights
onnx/model_int4.onnx INT4 + INT8 emb 434 MB 0.941 0.998 1.00 MatMulNBits + INT8 Gather
onnx/model_int4_full.onnx INT4 full 337 MB 0.941 0.998 1.00 MatMulNBits + INT4 Gather (opset 21)

Metrics measured on 8 semantically diverse sentences vs FP32 reference. Pearson r = correlation of pairwise cosine similarity matrices (structure preservation). MRR = Mean Reciprocal Rank on a retrieval probe β€” 1.00 = perfect retrieval ranking preserved.

Quantization methodology

The XLM-RoBERTa vocabulary has 250,002 tokens Γ— 1024 dimensions, making the word embedding table the dominant weight (~977 MB FP32). Each variant handles it differently:

  • INT8 (model_quantized.onnx): onnxruntime.quantization.quantize_dynamic(weight_type=QInt8) β€” quantizes all weight tensors including the embedding Gather to INT8. Compact, maximum compatibility.
  • INT4 + INT8 emb (model_int4.onnx): Two-pass. Pass 1: MatMulNBitsQuantizer(block_size=32, symmetric=True) packs transformer MatMul weights to 4-bit nibbles. Pass 2: quantize_dynamic(op_types=["Gather"], weight_type=QInt8) brings the embedding table from 977 MB FP32 β†’ 244 MB INT8.
  • INT4 full (model_int4_full.onnx): Same MatMulNBits pass, then manual DequantizeLinear(axis=0) node insertion packs the embedding table as per-row symmetric INT4 nibbles (scale = max(|row|)/7). Requires opset upgrade 14β†’21. Embedding: 977 MB β†’ 122 MB.

Usage

fastembed (Rust)

This repo is integrated in fastembed-rs:

use fastembed::{EmbeddingModel, InitOptions, TextEmbedding};

// INT8 β€” most compatible, 542 MB
let model = TextEmbedding::try_new(InitOptions::new(EmbeddingModel::PixieRuneV1Q))?;

// INT4 + INT8 embedding β€” 434 MB
let model = TextEmbedding::try_new(InitOptions::new(EmbeddingModel::PixieRuneV1Int4))?;

// INT4 full β€” smallest, 337 MB
let model = TextEmbedding::try_new(InitOptions::new(EmbeddingModel::PixieRuneV1Int4Full))?;

let embeddings = model.embed(vec!["μ•ˆλ…•ν•˜μ„Έμš”", "Hello world"], None)?;

ONNX Runtime (Python)

import onnxruntime as ort
import numpy as np
from tokenizers import Tokenizer

tokenizer = Tokenizer.from_file("tokenizer.json")
tokenizer.enable_truncation(max_length=512)
tokenizer.enable_padding(pad_token="<pad>", pad_id=1)

session = ort.InferenceSession("onnx/model_quantized.onnx",
                                providers=["CPUExecutionProvider"])

texts = ["ν…”λ ˆν”½μŠ€λŠ” μ–΄λ–€ μ‚°μ—… λΆ„μ•Όμ—μ„œ μœ„μ„± 데이터λ₯Ό ν™œμš©ν•˜λ‚˜μš”?",
         "ν…”λ ˆν”½μŠ€λŠ” ν•΄μ–‘, μžμ›, 농업 λ“± λ‹€μ–‘ν•œ λΆ„μ•Όμ—μ„œ μœ„μ„± 데이터λ₯Ό λΆ„μ„ν•˜μ—¬ μ„œλΉ„μŠ€λ₯Ό μ œκ³΅ν•©λ‹ˆλ‹€."]

enc  = tokenizer.encode_batch(texts)
ids  = np.array([e.ids            for e in enc], dtype=np.int64)
mask = np.array([e.attention_mask for e in enc], dtype=np.int64)

out = session.run(None, {"input_ids": ids, "attention_mask": mask})[0]  # (batch, seq, 1024)

# Mean pooling + L2 normalize
pooled = (out * mask[..., None]).sum(1) / mask.sum(1, keepdims=True).clip(1e-9)
norms  = np.linalg.norm(pooled, axis=-1, keepdims=True)
embeddings = pooled / norms.clip(1e-12)
# cosine similarity
scores = embeddings @ embeddings.T
print(scores)

sentence-transformers (original FP32 weights)

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("telepix/PIXIE-Rune-v1.0")

queries   = ["ν…”λ ˆν”½μŠ€λŠ” μ–΄λ–€ μ‚°μ—… λΆ„μ•Όμ—μ„œ μœ„μ„± 데이터λ₯Ό ν™œμš©ν•˜λ‚˜μš”?",
             "κ΅­λ°© 뢄야에 μ–΄λ–€ μœ„μ„± μ„œλΉ„μŠ€κ°€ μ œκ³΅λ˜λ‚˜μš”?"]
documents = ["ν…”λ ˆν”½μŠ€λŠ” ν•΄μ–‘, μžμ›, 농업 λ“± λ‹€μ–‘ν•œ λΆ„μ•Όμ—μ„œ μœ„μ„± 데이터λ₯Ό λΆ„μ„ν•˜μ—¬ μ„œλΉ„μŠ€λ₯Ό μ œκ³΅ν•©λ‹ˆλ‹€.",
             "μ •μ°° 및 κ°μ‹œ λͺ©μ μ˜ μœ„μ„± μ˜μƒμ„ 톡해 κ΅­λ°© κ΄€λ ¨ μ •λ°€ 뢄석 μ„œλΉ„μŠ€λ₯Ό μ œκ³΅ν•©λ‹ˆλ‹€."]

q_emb = model.encode(queries,   prompt_name="query")
d_emb = model.encode(documents)
scores = model.similarity(q_emb, d_emb)
print(scores)

Quality Benchmarks (original model)

Results from telepix/PIXIE-Rune-v1.0, evaluated using Korean-MTEB-Retrieval-Evaluators.

6 Datasets of MTEB (Korean)

Model # params Avg. NDCG NDCG@1 NDCG@3 NDCG@5 NDCG@10
telepix/PIXIE-Spell-Preview-1.7B 1.7B 0.7567 0.7149 0.7541 0.7696 0.7882
telepix/PIXIE-Spell-Preview-0.6B 0.6B 0.7280 0.6804 0.7258 0.7448 0.7612
telepix/PIXIE-Rune-v1.0 0.5B 0.7383 0.6936 0.7356 0.7545 0.7698
telepix/PIXIE-Splade-Preview 0.1B 0.7253 0.6799 0.7217 0.7416 0.7579
nlpai-lab/KURE-v1 0.5B 0.7312 0.6826 0.7303 0.7478 0.7642
BAAI/bge-m3 0.5B 0.7126 0.6613 0.7107 0.7301 0.7483
Snowflake/snowflake-arctic-embed-l-v2.0 0.5B 0.7050 0.6570 0.7015 0.7226 0.7390
Qwen/Qwen3-Embedding-0.6B 0.6B 0.6872 0.6423 0.6833 0.7017 0.7215
jinaai/jina-embeddings-v3 0.5B 0.6731 0.6224 0.6715 0.6899 0.7088
openai/text-embedding-3-large N/A 0.6465 0.5895 0.6467 0.6646 0.6853

Benchmarks: Ko-StrategyQA, AutoRAGRetrieval, MIRACLRetrieval, PublicHealthQA, BelebeleRetrieval, MultiLongDocRetrieval.

7 Datasets of BEIR (English)

Model # params Avg. NDCG NDCG@1 NDCG@3 NDCG@5 NDCG@10
Snowflake/snowflake-arctic-embed-l-v2.0 0.5B 0.5812 0.5725 0.5705 0.5811 0.6006
telepix/PIXIE-Rune-v1.0 0.5B 0.5781 0.5691 0.5663 0.5791 0.5979
telepix/PIXIE-Spell-Preview-1.7B 1.7B 0.5630 0.5446 0.5529 0.5660 0.5885
Qwen/Qwen3-Embedding-0.6B 0.6B 0.5558 0.5321 0.5451 0.5620 0.5839
Alibaba-NLP/gte-multilingual-base 0.3B 0.5541 0.5446 0.5426 0.5574 0.5746
BAAI/bge-m3 0.5B 0.5318 0.5078 0.5231 0.5389 0.5573
jinaai/jina-embeddings-v3 0.6B 0.4482 0.4116 0.4379 0.4573 0.4861

Benchmarks: ArguAna, FEVER, FiQA-2018, HotpotQA, MSMARCO, NQ, SCIDOCS.


License

Apache 2.0 β€” same as the original telepix/PIXIE-Rune-v1.0.

Citation

@software{TelePIX-PIXIE-Rune-v1,
  title  = {PIXIE-Rune-v1.0},
  author = {TelePIX AI Research Team and Bongmin Kim},
  year   = {2025},
  url    = {https://huggingface.co/telepix/PIXIE-Rune-v1.0}
}

Contact

Original model authors: bmkim@telepix.net ONNX quantization: cstr β€” open an issue on this repo for questions.

Downloads last month
101
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for cstr/PIXIE-Rune-v1.0-ONNX

Quantized
(1)
this model