Generate spoken audio from text with custom or cloned voices
Remove image background and get a transparent PNG