PipeOwl-1.0 (Geometric Embedding)

PipeOwl is a transformer-free geometric embedding package built on a static embedding field stored as NumPy arrays.

This repo provides:

  • L1_base_embeddings.npy: float32 (V, 1024) embedding table (unit-normalized)
  • L1_base_vocab.json: list of vocab strings aligned to embedding rows
  • delta_base_scalar.npy: float32 (V,) optional scalar bias field
  • minimal inference engine (engine.py) and usage script (quickstart.py)

Attribution

The base embedding vectors were generated using BGE (Apache-2.0) via inference (model outputs). This repository does not redistribute any original BGE model weights.


Quickstart

pip install numpy
python quickstart.py

Or minimal usage:

from engine import PipeOwlEngine, PipeOwlConfig

engine = PipeOwlEngine(PipeOwlConfig())
q = engine.encode("雪鴞好可愛")
# use q for similarity / retrieval

Files

  • data/L1_base_embeddings.npy : embedding table (float32, V×1024)
  • data/L1_base_vocab.json : vocab aligned with rows
  • data/delta_base_scalar.npy : scalar bias (float32, V)
  • engine.py : minimal runtime
  • quickstart.py : example script

Notes

No safetensors / pytorch_model.bin is included because this model is distributed as a static NumPy embedding field.


Parameter Size

~165M embedding parameters (static matrix)

Intended Use

  • Semantic similarity
  • Lightweight retrieval
  • Geometric experimentation

Limitations

  • No contextual modeling
  • No token interaction modeling
  • Domain performance varies

Stress Test Results (Hard Retrieval Setting)

  • corpus size = 1200
  • eval size = 200
  • ood ratio = 0.28
Model in-domain MRR@10 OOD MRR@10
MiniLM 0.019 0.026
BGE 0.026 0.009
PipeOwl 0.013 0.023

Note: This test uses a harder corpus and adversarial-style queries. Absolute scores are low due to difficulty scaling.

See full experimental notes here: https://hackmd.io/@galaxy4552/BkpUEnTwbl


pipeowl/
│
├─ README.md
├─ LICENSE
│
├─ engine.py
├─ quickstart.py
│   
└─ data/
    ├─ L1_base_embeddings.npy
    ├─ delta_base_scalar.npy
    └─ L1_base_vocab.json
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support