Video Understanding - a Zmu Collection

Zmu 's Collections

Video Understanding

Video Understanding

updated 2 days ago

MM-VID: Advancing Video Understanding with GPT-4V(ision)

Paper • 2310.19773 • Published Oct 30, 2023 • 20
Fine-grained Audio-Visual Joint Representations for Multimodal Large Language Models

Paper • 2310.05863 • Published Oct 9, 2023 • 2
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks

Paper • 2311.06242 • Published Nov 10, 2023 • 96
I&S-ViT: An Inclusive & Stable Method for Pushing the Limit of Post-Training ViTs Quantization

Paper • 2311.10126 • Published Nov 16, 2023 • 9
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection

Paper • 2311.10122 • Published Nov 16, 2023 • 28
Retrieval-Enhanced Contrastive Vision-Text Models

Paper • 2306.07196 • Published Jun 12, 2023 • 8
Text-Conditioned Resampler For Long Form Video Understanding

Paper • 2312.11897 • Published Dec 19, 2023 • 6
Vamos: Versatile Action Models for Video Understanding

Paper • 2311.13627 • Published Nov 22, 2023 • 2
kiyoonkim/kinetics-400-targz

Viewer • Updated May 8, 2023 • 306k • 1.18k • 2
K-frames: Scene-Driven Any-k Keyframe Selection for long video understanding

Paper • 2510.13891 • Published Oct 14, 2025