PipeOwl-1.2(Geometric Embedding)

A transformer-free semantic retrieval engine.

PipeOwl performs deterministic vocabulary scoring over a static embedding field:

score = α⋅base + β⋅Δfield

where:

  • base = cosine similarity in embedding space
  • Δfield = static scalar field bias

Features:

  • O(n) over vocabulary.
  • No attention.
  • No transformer weights.

Patch Note

1.1

  • fix OOV
  • symbolic fallback
  • english fallback
  • japanese fallback
  • PipeOwlConfig improvement
  • Tokenizer: max_len cap
  • load_assets: contiguous + row-normalize
  • small benchmark

1.2

  • safetensors support

Architecture

  • Static embedding table (V × D)
  • Aligned vocabulary index
  • Optional scalar bias field
  • Linear scoring
  • Pluggable decoder stage
  • Targeted for CPU environments and low-latency systems (e.g. IME).
  • Single static field (~635MB), no runtime model weights.

Attribution

The base embedding vectors were generated using BGE-M3 (Apache-2.0) via inference. This repository does not redistribute any original BGE weights.

Quickstart

pip install numpy safetensors
python quickstart.py

See full experimental notes here:

https://hackmd.io/@galaxy4552/SJ5DatsuZx

Repository Structure

pipeowl1.2/
 ├ README.md
 ├ config.json
 ├ LICENSE
 ├ quickstart.py
 ├ pipeowl.safetensors
 ├ vocabulary.json
 └ engine.py

PipeOwl 是一個基於靜態語義場的幾何檢索系統。

核心公式:

score = α⋅base + β⋅Δfield

其中:

  • base = embedding cosine similarity
  • delta = 靜態場偏移量
  • α / β 為可調權重

提供一種 O(n) 的輕量語義計分方法, 適合低延遲環境(如輸入法)。

LICENSE

MIT

Downloads last month
44
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including WangKaiLin/PipeOwl-1.2