multimodal
updated
DreamLLM: Synergistic Multimodal Comprehension and Creation
Paper
• 2309.11499
• Published • 60
FoleyGen: Visually-Guided Audio Generation
Paper
• 2309.10537
• Published • 8
Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V
Paper
• 2310.11441
• Published • 29
The Chosen One: Consistent Characters in Text-to-Image Diffusion Models
Paper
• 2311.10093
• Published • 58
Camels in a Changing Climate: Enhancing LM Adaptation with Tulu 2
Paper
• 2311.10702
• Published • 19
AutoStory: Generating Diverse Storytelling Images with Minimal Human
Effort
Paper
• 2311.11243
• Published • 16
Text-to-Sticker: Style Tailoring Latent Diffusion Models for Human
Expression
Paper
• 2311.10794
• Published • 27
Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models
Paper
• 2311.12092
• Published • 22
ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs
Paper
• 2311.13600
• Published • 47
Orthogonal Adaptation for Modular Customization of Diffusion Models
Paper
• 2312.02432
• Published • 14
FaceStudio: Put Your Face Everywhere in Seconds
Paper
• 2312.02663
• Published • 32
Fine-grained Controllable Video Generation via Object Appearance and
Context
Paper
• 2312.02919
• Published • 13
Generating Illustrated Instructions
Paper
• 2312.04552
• Published • 9
PALP: Prompt Aligned Personalization of Text-to-Image Models
Paper
• 2401.06105
• Published • 49