Few-shots Voice Cloning in Noisy Acoustic Conditions with Domain Adversarial Training
Authors: Jian Cong, Shan Yang, Lei Xie
Abstract: Data efficient voice cloning can synthesis speech in voice of target speaker with few seconds clean reference audio. Nevertheless, in actual scenarios, users usually make recording in public, which often accompanied by background noise. We propose to use domain adversarial training algorithm for robust voice cloning with few samples. Results indicate, whether the reference audio is clean or noise, our proposed method can achieve close performance in terms of naturalness and similarity. And achieve better performance than pre-denoising to the reference audio with external speech enhancement module.
1. Examples clean audio, noisy audio and de-noised audio: