How to customize emotion in a cloned voice?
Hello everyone, I have been using Qwen TTS for my app, and its been good.
I generated different styles of voices like American, British using the VoiceDesign model, and than I passed the result of it to the Base model in order to clone and give the output. Now, just like we can customize the CustomVoice model with instructions like emotion, angry, sad etc. I want to do the same for the cloned voices as well. However, even though I passed the "instruction" parameter to the Base model as well, it doesn't do anything what being said, it only clones the voice as it is.
I want to allow users to have option Sliders like :- Emotion, pacing, strength etc. and according to their needs, I should be able to alter the different voices I created.
Any guide on this would be greatly appreciated
As per the information present in the repo, it is only possible by finetuning the voice dataset against the base model.
I have tried it myself, but the results are midiocre at best, though it also depends on how large of a dataset you have.
https://www.youtube.com/watch?v=PMzO7N8sIHY&t=1s : is a great video that teaches finetuning on Qwen3-TTS for an already present dataset.
As for if you don't have a large AI model, you can try cloning the voice and generating as many samples as possible, remove the voice samples that alter too much from the benchmark, fine-tune, and then get a custom voice that you may instruct.
I am trying the above and will let you know if I am able to see a big difference, though I am very unsure regarding this approach, mainly due to the point that we are using AI-generated info back into the AI and wishing for best results—though in a way that's what DeepSeek did initially, and that changed the industry forever. (SFT with rejection sampling)
Thank you for the comment,
The CustomVoices in the Custom Model are all chinese except only 1 english, is there any to add more voices to this using finetuning?
I think all the voices are multi-lingual. That is they can speak in english or the other 9 languages. Though some of them do come with an accent, which is an interesting and surprising touch.