VoladorLuYu 's Collections Generative Multiple Modality
updated
Random Field Augmentations for Self-Supervised Representation Learning
Paper
• 2311.03629
• Published
• 9
TEAL: Tokenize and Embed ALL for Multi-modal Large Language Models
Paper
• 2311.04589
• Published
• 21
GENOME: GenerativE Neuro-symbOlic visual reasoning by growing and
reusing ModulEs
Paper
• 2311.04901
• Published
• 9
Q-Instruct: Improving Low-level Visual Abilities for Multi-modality
Foundation Models
Paper
• 2311.06783
• Published
• 28
Trusted Source Alignment in Large Language Models
Paper
• 2311.06697
• Published
• 12
SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for
Multi-modal Large Language Models
Paper
• 2311.07575
• Published
• 15
MMICL: Empowering Vision-language Model with Multi-Modal In-Context
Learning
Paper
• 2309.07915
• Published
• 4
LMDX: Language Model-based Document Information Extraction and
Localization
Paper
• 2309.10952
• Published
• 67
Attention Where It Matters: Rethinking Visual Document Understanding
with Selective Region Concentration
Paper
• 2309.01131
• Published
• 1
Multimodal Graph Learning for Generative Tasks
Paper
• 2310.07478
• Published
• 1
Language-Informed Visual Concept Learning
Paper
• 2312.03587
• Published
• 8
OneLLM: One Framework to Align All Modalities with Language
Paper
• 2312.03700
• Published
• 24
Mastering Text-to-Image Diffusion: Recaptioning, Planning, and
Generating with Multimodal LLMs
Paper
• 2401.11708
• Published
• 30
MM-LLMs: Recent Advances in MultiModal Large Language Models
Paper
• 2401.13601
• Published
• 48
Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass
Diffusion Transformers
Paper
• 2401.11605
• Published
• 23
Scalable Diffusion Models with Transformers
Paper
• 2212.09748
• Published
• 18
λ-ECLIPSE: Multi-Concept Personalized Text-to-Image Diffusion
Models by Leveraging CLIP Latent Space
Paper
• 2402.05195
• Published
• 19
LAVE: LLM-Powered Agent Assistance and Language Augmentation for Video
Editing
Paper
• 2402.10294
• Published
• 27
Gen4Gen: Generative Data Pipeline for Generative Multi-Concept
Composition
Paper
• 2402.15504
• Published
• 21
Robust Gaussian Splatting
Paper
• 2404.04211
• Published
• 9
Diffusion-RWKV: Scaling RWKV-Like Architectures for Diffusion Models
Paper
• 2404.04478
• Published
• 13
Scaling (Down) CLIP: A Comprehensive Analysis of Data, Architecture, and
Training Strategies
Paper
• 2404.08197
• Published
• 29
Factorized Diffusion: Perceptual Illusions by Noise Decomposition
Paper
• 2404.11615
• Published
• 2
Dynamic Typography: Bringing Words to Life
Paper
• 2404.11614
• Published
• 46
MultiBooth: Towards Generating All Your Concepts in an Image from Text
Paper
• 2404.14239
• Published
• 9
Adding Conditional Control to Text-to-Image Diffusion Models
Paper
• 2302.05543
• Published
• 58
Probabilistic Conceptual Explainers: Trustworthy Conceptual Explanations
for Vision Foundation Models
Paper
• 2406.12649
• Published
• 16