Video Understanding
updated
MM-VID: Advancing Video Understanding with GPT-4V(ision)
Paper
• 2310.19773
• Published • 20
Fine-grained Audio-Visual Joint Representations for Multimodal Large
Language Models
Paper
• 2310.05863
• Published • 2
Florence-2: Advancing a Unified Representation for a Variety of Vision
Tasks
Paper
• 2311.06242
• Published • 96
I&S-ViT: An Inclusive & Stable Method for Pushing the Limit of
Post-Training ViTs Quantization
Paper
• 2311.10126
• Published • 9
Video-LLaVA: Learning United Visual Representation by Alignment Before
Projection
Paper
• 2311.10122
• Published • 28
Retrieval-Enhanced Contrastive Vision-Text Models
Paper
• 2306.07196
• Published • 8
Text-Conditioned Resampler For Long Form Video Understanding
Paper
• 2312.11897
• Published • 6
Vamos: Versatile Action Models for Video Understanding
Paper
• 2311.13627
• Published • 2
kiyoonkim/kinetics-400-targz
Viewer
• Updated • 306k • 1.18k
• 2
K-frames: Scene-Driven Any-k Keyframe Selection for long video understanding
Paper
• 2510.13891
• Published