Inference
#1
by
mkuzmanov - opened
Hi,
I noticed there is a typo in your inference example:
We recommend enabling flash_attention_2 for better acceleration and memory saving, especially in multi-image and video scenarios.
model = Qwen3VLForConditionalGeneration.from_pretrained(
"Qwen/FireRed-OCR-2B ---> missing closing ",
dtype=torch.bfloat16,
attn_implementation="flash_attention_2",
device_map="auto",
)
Also, isn't it "FireRedTeam/FireRed-OCR" rather than "Qwen/FireRed-OCR-2B"?
Best,
Mario