Inference

by mkuzmanov - opened about 5 hours ago

Discussion

mkuzmanov

about 5 hours ago

•

edited about 4 hours ago

Hi,
I noticed there is a typo in your inference example:

We recommend enabling flash_attention_2 for better acceleration and memory saving, especially in multi-image and video scenarios.

model = Qwen3VLForConditionalGeneration.from_pretrained(
"Qwen/FireRed-OCR-2B ---> missing closing ",
dtype=torch.bfloat16,
attn_implementation="flash_attention_2",
device_map="auto",
)
Also, isn't it "FireRedTeam/FireRed-OCR" rather than "Qwen/FireRed-OCR-2B"?
Best,
Mario

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment