VLFM
updated
Kosmos-2.5: A Multimodal Literate Model
Paper
• 2309.11419
• Published • 56
Mirasol3B: A Multimodal Autoregressive model for time-aligned and
contextual modalities
Paper
• 2311.05698
• Published • 11
Florence-2: Advancing a Unified Representation for a Variety of Vision
Tasks
Paper
• 2311.06242
• Published • 95
PolyMaX: General Dense Prediction with Mask Transformer
Paper
• 2311.05770
• Published • 8
Learning Vision from Models Rivals Learning Vision from Data
Paper
• 2312.17742
• Published • 16
SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning
Capabilities
Paper
• 2401.12168
• Published • 29
A Survey of Resource-efficient LLM and Multimodal Foundation Models
Paper
• 2401.08092
• Published • 3
From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on
Generalizability, Trustworthiness and Causality through Four Modalities
Paper
• 2401.15071
• Published • 37
Overcoming the Pitfalls of Vision-Language Model Finetuning for OOD
Generalization
Paper
• 2401.15914
• Published • 7
MouSi: Poly-Visual-Expert Vision-Language Models
Paper
• 2401.17221
• Published • 9
StrokeNUWA: Tokenizing Strokes for Vector Graphic Synthesis
Paper
• 2401.17093
• Published • 20
DataComp: In search of the next generation of multimodal datasets
Paper
• 2304.14108
• Published • 2
Question Aware Vision Transformer for Multimodal Reasoning
Paper
• 2402.05472
• Published • 10
Paper
• 2309.16671
• Published • 20
AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling
Paper
• 2402.12226
• Published • 45
CLoVe: Encoding Compositional Language in Contrastive Vision-Language
Models
Paper
• 2402.15021
• Published • 12
DeepSeek-VL: Towards Real-World Vision-Language Understanding
Paper
• 2403.05525
• Published • 49
InfiMM-HD: A Leap Forward in High-Resolution Multimodal Understanding
Paper
• 2403.01487
• Published • 16
Ferret-v2: An Improved Baseline for Referring and Grounding with Large
Language Models
Paper
• 2404.07973
• Published • 32
Groma: Localized Visual Tokenization for Grounding Multimodal Large
Language Models
Paper
• 2404.13013
• Published • 31
An Introduction to Vision-Language Modeling
Paper
• 2405.17247
• Published • 90
Dense Connector for MLLMs
Paper
• 2405.13800
• Published • 24