-
InstaFlow: One Step is Enough for High-Quality Diffusion-Based Text-to-Image Generation
Paper • 2309.06380 • Published • 33 -
PhotoVerse: Tuning-Free Image Customization with Text-to-Image Diffusion Models
Paper • 2309.05793 • Published • 51 -
Generative Image Dynamics
Paper • 2309.07906 • Published • 55 -
Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation
Paper • 2309.15818 • Published • 19
Mark Redito
markredito
AI & ML interests
Generative AI, Multimodal AI, Deep Learning
Organizations
Audio
-
Retrieval-Augmented Text-to-Audio Generation
Paper • 2309.08051 • Published • 7 -
A Large-scale Dataset for Audio-Language Representation Learning
Paper • 2309.11500 • Published • 9 -
End-to-End Speech Recognition Contextualization with Large Language Models
Paper • 2309.10917 • Published • 9 -
FoleyGen: Visually-Guided Audio Generation
Paper • 2309.10537 • Published • 8
Multimodal
-
Compositional Foundation Models for Hierarchical Planning
Paper • 2309.08587 • Published • 11 -
DreamLLM: Synergistic Multimodal Comprehension and Creation
Paper • 2309.11499 • Published • 60 -
VideoDirectorGPT: Consistent Multi-scene Video Generation via LLM-Guided Planning
Paper • 2309.15091 • Published • 35 -
Context-Aware Meta-Learning
Paper • 2310.10971 • Published • 17
experiments
3D
LLMs
-
Agents: An Open-source Framework for Autonomous Language Agents
Paper • 2309.07870 • Published • 43 -
Clinical Text Summarization: Adapting Large Language Models Can Outperform Human Experts
Paper • 2309.07430 • Published • 28 -
Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers
Paper • 2309.08532 • Published • 54 -
Investigating Answerability of LLMs for Long-Form Question Answering
Paper • 2309.08210 • Published • 15
Interpretability
Music Generation
robotics
Image Generation
-
InstaFlow: One Step is Enough for High-Quality Diffusion-Based Text-to-Image Generation
Paper • 2309.06380 • Published • 33 -
PhotoVerse: Tuning-Free Image Customization with Text-to-Image Diffusion Models
Paper • 2309.05793 • Published • 51 -
Generative Image Dynamics
Paper • 2309.07906 • Published • 55 -
Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation
Paper • 2309.15818 • Published • 19
LLMs
-
Agents: An Open-source Framework for Autonomous Language Agents
Paper • 2309.07870 • Published • 43 -
Clinical Text Summarization: Adapting Large Language Models Can Outperform Human Experts
Paper • 2309.07430 • Published • 28 -
Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers
Paper • 2309.08532 • Published • 54 -
Investigating Answerability of LLMs for Long-Form Question Answering
Paper • 2309.08210 • Published • 15
Audio
-
Retrieval-Augmented Text-to-Audio Generation
Paper • 2309.08051 • Published • 7 -
A Large-scale Dataset for Audio-Language Representation Learning
Paper • 2309.11500 • Published • 9 -
End-to-End Speech Recognition Contextualization with Large Language Models
Paper • 2309.10917 • Published • 9 -
FoleyGen: Visually-Guided Audio Generation
Paper • 2309.10537 • Published • 8
Interpretability
Multimodal
-
Compositional Foundation Models for Hierarchical Planning
Paper • 2309.08587 • Published • 11 -
DreamLLM: Synergistic Multimodal Comprehension and Creation
Paper • 2309.11499 • Published • 60 -
VideoDirectorGPT: Consistent Multi-scene Video Generation via LLM-Guided Planning
Paper • 2309.15091 • Published • 35 -
Context-Aware Meta-Learning
Paper • 2310.10971 • Published • 17
Music Generation
experiments
robotics
3D