4 22 6

Hyogun Lee

Haawron

AI & ML interests

Video understanding, multi-modal LLMs

Recent Activity

upvoted a paper 2 days ago

Grounding World Simulation Models in a Real-World Metropolis

authored a paper 10 months ago

Flashback: Memory-Driven Zero-shot, Real-time Video Anomaly Detection

upvoted a paper 10 months ago

Flashback: Memory-Driven Zero-shot, Real-time Video Anomaly Detection

View all activity

Organizations

None yet

upvoted a paper 2 days ago

Grounding World Simulation Models in a Real-World Metropolis

Paper • 2603.15583 • Published 3 days ago • 125

authored a paper 10 months ago

Flashback: Memory-Driven Zero-shot, Real-time Video Anomaly Detection

Paper • 2505.15205 • Published May 21, 2025 • 2

upvoted a paper 10 months ago

Flashback: Memory-Driven Zero-shot, Real-time Video Anomaly Detection

Paper • 2505.15205 • Published May 21, 2025 • 2

upvoted a paper 11 months ago

COMPACT: COMPositional Atomic-to-Complex Visual Capability Tuning

Paper • 2504.21850 • Published Apr 30, 2025 • 27

liked a model 11 months ago

facebook/PE-Core-L14-336

Zero-Shot Image Classification • Updated Apr 30, 2025 • 366k • 50

upvoted a collection 11 months ago

InternVideo2

Collection

InternVideo2 • 21 items • Updated Sep 28, 2025 • 26

upvoted a paper about 1 year ago

START: Self-taught Reasoner with Tools

Paper • 2503.04625 • Published Mar 6, 2025 • 113

liked 2 models about 1 year ago

google/gemma-3-27b-it

Image-Text-to-Text • 27B • Updated Mar 21, 2025 • 1.2M • • 1.92k

lmms-lab/llava-onevision-qwen2-7b-ov-chat

Text Generation • 8B • Updated Oct 23, 2024 • 1.04k • 23

upvoted a paper about 1 year ago

MM-RLHF: The Next Step Forward in Multimodal LLM Alignment

Paper • 2502.10391 • Published Feb 14, 2025 • 34

upvoted 4 papers over 1 year ago

Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion

Paper • 2412.04424 • Published Dec 5, 2024 • 62

NVILA: Efficient Frontier Visual Language Models

Paper • 2412.04468 • Published Dec 5, 2024 • 60

Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding

Paper • 2412.00493 • Published Nov 30, 2024 • 17

Byte Latent Transformer: Patches Scale Better Than Tokens

Paper • 2412.09871 • Published Dec 13, 2024 • 108

commented a paper over 1 year ago

Apollo: An Exploration of Video Understanding in Large Multimodal Models

Paper • 2412.10360 • Published Dec 13, 2024 • 147 •

upvoted 5 papers over 1 year ago

Hyogun Lee

AI & ML interests

Recent Activity

Organizations

Haawron's activity