Inference

#1
by mkuzmanov - opened

Hi,
I noticed there is a typo in your inference example:

We recommend enabling flash_attention_2 for better acceleration and memory saving, especially in multi-image and video scenarios.

model = Qwen3VLForConditionalGeneration.from_pretrained(
"Qwen/FireRed-OCR-2B ---> missing closing ",
dtype=torch.bfloat16,
attn_implementation="flash_attention_2",
device_map="auto",
)
Also, isn't it "FireRedTeam/FireRed-OCR" rather than "Qwen/FireRed-OCR-2B"?
Best,
Mario

Sign up or log in to comment