Qwen3-Embedding-8B T5 Inverter

A T5-base model trained to invert Qwen3-Embedding-8B embeddings back into text. Given a 4096-dimensional embedding vector, the model autoregressively decodes the original text that produced it.

Built on the vec2text framework.

Architecture

The model consists of three components:

  1. Embedding transform — A learned MLP that projects the 4096-dim Qwen3 embedding into a sequence of 16 pseudo-tokens in T5's hidden space (768-dim), which are fed as encoder input.
  2. T5-base encoder — Processes the projected embedding sequence.
  3. T5-base decoder — Autoregressively generates the reconstructed text.

The Qwen3-Embedding-8B embedder itself is not included in this checkpoint — only the T5 encoder-decoder and the embedding transform are saved. At inference time, you need to produce Qwen3-Embedding-8B embeddings separately and pass them as input.

Training Details

Parameter Value
Base model t5-base (220M params)
Embedder Qwen/Qwen3-Embedding-8B (frozen)
Embedding dim 4096
Num repeat tokens 16
Max sequence length 128 tokens
Optimizer AdamW (fused)
Learning rate 1e-4
LR schedule Constant with warmup (2500 steps)
Batch size 128
Training steps 230,500
Epochs ~29.5
Precision FP32
Freeze strategy None (all params trainable)
Training data ~1M English sentences with precomputed Qwen3-Embedding-8B embeddings

Final training loss: ~1.61

Usage

This model uses a custom InversionModel architecture from vec2text. To load and run inference:

# 1. Produce embeddings with Qwen3-Embedding-8B
from transformers import AutoModel, AutoTokenizer
import torch

qwen_tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-Embedding-8B")
qwen_model = AutoModel.from_pretrained("Qwen/Qwen3-Embedding-8B", torch_dtype=torch.float16).cuda()

text = "The quick brown fox jumps over the lazy dog."
inputs = qwen_tokenizer(text, return_tensors="pt", padding=True, truncation=True).to("cuda")
with torch.no_grad():
    embedding = qwen_model(**inputs).last_hidden_state.mean(dim=1)  # [1, 4096]

# 2. Load the inverter and decode
# Requires the vec2text library: https://github.com/jxmorris12/vec2text
from vec2text.models.inversion import InversionModel

inverter = InversionModel.from_pretrained("kennethge123/qwen3-8b-t5-inverter").cuda().eval()
# Pass the frozen embedding through the model's generate method

Intended Use

  • Embedding interpretability: Understanding what information is captured in Qwen3 embeddings.
  • Research: Studying the invertibility of text embedding models.
  • Debugging: Inspecting what a retrieval system "sees" for a given document embedding.

Limitations

  • Trained on English text only.
  • Max output length is 128 tokens.
  • This is the first-stage inverter (hypothesis generator). For higher-quality reconstruction, it can be paired with a corrector model in an iterative refinement loop (see vec2text).
  • Reconstruction quality degrades on out-of-distribution text (e.g., code, heavily formatted text).

Citation

This model builds on the vec2text framework:

@article{morris2023text,
  title={Text Embeddings Reveal (Almost) As Much As Text},
  author={Morris, John X and Kuleshov, Volodymyr and Shmatikov, Vitaly and Rush, Alexander M},
  journal={arXiv preprint arXiv:2310.06816},
  year={2023}
}
Downloads last month
1
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for kennethge123/qwen3-8b-t5-inverter

Finetuned
(730)
this model

Paper for kennethge123/qwen3-8b-t5-inverter