LatentLens — Qwen2-VL Contextual Text Embeddings

Pre-computed contextual text embeddings from the Qwen2-VL-7B-Instruct LLM backbone, extracted at 8 transformer layers. Used by the LatentLens quickstart for interpreting visual token representations.

What is this?

LatentLens interprets continuous token representations (e.g., visual tokens in a VLM) by finding their nearest neighbors in contextual text embedding space — the same space the LLM uses internally. These embeddings are that text embedding bank.

Each layer directory contains an embeddings_cache.pt file with:

embeddings: [300836, 3584] tensor (float16) — contextual embeddings for ~26K unique text tokens, each with up to 20 contextual variants from Visual Genome captions
token_to_indices: dict mapping token string → list of embedding row indices
metadata: list of dicts with token string, token ID, source caption, and position

Layers

Layer	Stage	Size
1	Very early	~2.1 GB
2	Early	~2.1 GB
4	Early-mid	~2.1 GB
8	Middle	~2.1 GB
16	Mid-late	~2.1 GB
24	Late	~2.1 GB
26	Near-final	~2.1 GB
27	Final	~2.1 GB

Usage

from huggingface_hub import hf_hub_download
import torch

path = hf_hub_download(
    repo_id="McGill-NLP/latentlens-qwen2vl-embeddings",
    filename="layer_27/embeddings_cache.pt",
)
cache = torch.load(path, map_location="cpu", weights_only=False)
embeddings = cache["embeddings"].float()  # [300836, 3584]

Or use the full quickstart script:

pip install latentlens
python examples/quickstart.py --image your_image.jpg

Citation

@article{krojer2026latentlens,
  title={LatentLens: Revealing Highly Interpretable Visual Tokens in LLMs},
  author={Krojer, Benno and Nayak, Shravan and Ma{\~n}as, Oscar and Adlakha, Vaibhav and Elliott, Desmond and Reddy, Siva and Mosbach, Marius},
  journal={arXiv preprint arXiv:2506.XXXXX},
  year={2026}
}

License

Apache License 2.0

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including McGill-NLP/latentlens-qwen2vl-embeddings

LatentLens Contextual Embeddings

Collection

Pre-computed contextual text embeddings for interpreting LLM/VLM hidden states. Use with: pip install latentlens • 7 items • Updated Feb 20