Error in sample codes
I am trying to run the model using the sample code in the model description page, and I received this error:
RuntimeError: Expected attn_mask dtype to be bool or float or to match query dtype, but got attn_mask.dtype: struct c10::BFloat16 and query.dtype: float instead.
- This error raises from this part of the code after successful loading of the model:
Forward pass
with torch.no_grad():
-> image_embeddings = model(**batch_images)
query_embeddings = model(**batch_queries)
- I am loading the model as defined in the model page:
model_name = "vidore/colpali-v1.3-hf"
model = ColPaliForRetrieval.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
device_map="cuda:0", # cuda:0 if nvidia gpu or "mps" if on Apple Silicon
).eval()
processor = ColPaliProcessor.from_pretrained(model_name)
- I have RTX 5090 and PyTorch with CUDA 12.8:
nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2025 NVIDIA Corporation
Built on Fri_Feb_21_20:42:46_Pacific_Standard_Time_2025
Cuda compilation tools, release 12.8, V12.8.93
Build cuda_12.8.r12.8/compiler.35583870_0
pip show torch
Name: torch
Version: 2.7.0+cu128
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: https://pytorch.org/
Author: PyTorch Team
Author-email: packages@pytorch.org
License: BSD-3-Clause
Location: E:\Python312\Lib\site-packages
Requires: filelock, fsspec, jinja2, networkx, setuptools, sympy, typing-extensions
Required-by: accelerate, compressed-tensors, outlines, sentence-transformers, torchaudio, torchvision
Hey
@martineden
, it seems to be an issue with the latest versions of transformers, the script runs fine in transformers==4.53.3 but not in 4.54 or later versions.
I'll investigate, but waiting for the fix it is simpler if you juste downgrade the transformer version of your environnement.