Chinese Journal of Acoustics----Institute Of Acoustics Chinese Academy Of Sciences

Title: Fusion of deep shallow features and models for speaker recognition

Author(s): ZHONG Weifeng; FANG Xiang; FAN Cunhang; WEN Zhengqi; TAO Jianhua;

Affiliation(s): School of Automation, Harbin University of Science and Technology; National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences; et al.

Abstract: In order to further improve the performance of speaker recognition, features fusion and models fusion are proposed. The features fusion method is to fuse deep and shallow features. The fused feature describes speaker characteristics more comprehensively than a single feature because of the complementarity between different levels of features. The models fusion method is to fuse i-vectors extracted from different speaker recognition systems. The fused model can combine advantages of different speaker recognition systems. Experimental results show the effectiveness of the proposed methods. Compared with the state-of-the-art system on CASIA North and South dialect corpus, the proposed features fusion system and models fusion system achieved about 54.8% and 69.5% relative improvement on the equal error rate (EER), respectively.