Abstract
At present, specific voice control has gradually become an important means for 5G-IoT-aided industrial control systems. However, the security of specific voice control system needs to be improved, because the voice cloning technology may lead to industrial accidents and other potential security risks. In this paper, we propose a transductive voice transfer learning method to learn the predictive function from the source domain and fine-tune in the target domain adaptively. The target learning task and source learning task are both synthesizing speech signals from the given audio while the data sets of both domains are different. By adding different penalty values to each instances and minimizing the expected risk, an optimal precise model can be learned. Many details of the experimental results show that our method can effectively synthesize the speech of the target speaker with small samples.