Generate realistic talking heads from image+audio
Co-Speech Gesture Video Generation (ICLR 2025 Oral)