Running Qwen3-VL-2B-Instruct on real security camera feeds β€” impressive results at IQ2 quantization

#10
by SharpAI - opened

Sharing some real-world results from running this model on live security camera footage via SharpAI Aegis + llama-server.

Setup: UD-IQ2_M quantization (0.7 GB) + mmproj-F16 (781 MB) on MacBook Air M3 24GB.

Input: A Blink battery camera mount at front door.

Output: "A mailman is delivering mail to a suburban house. The mailman is wearing a blue uniform and carrying a white mail bag. The house is white with a brown roof, and there's a driveway with a black car parked in front. The mailman is walking on a brick path surrounded by green bushes and trees."

For a 2B model at aggressive quantization, the scene comprehension is remarkably detailed β€” it correctly identifies the person's role (mailman), clothing, objects, the environment, and spatial relationships.

This is being used in a real product for continuous security camera analysis. The model runs comfortably on a Mac Mini with 8 GB RAM alongside other system tasks.

Great work on the GGUF conversion β€” the Unsloth chat template fixes are appreciated!

App: https://www.sharpai.org (free, Mac/Windows/Linux)

How does Qwen3-VL-2B-Instruct handle real-time video processing latency at IQ2 quantization? Are there any notable differences in tool call reliability between warm and cold cache states when dealing with continuous security feed inputs?

Sign up or log in to comment