Application of Span Prosodic Features on Speaker Recognition

 |  | 

 

At present, the classical speaker recognition systems are generally established based on the underlying parameters--frequency spectrum parameter, such as MFCC, PLP, ect. But except from the frequency spectrum parameter, there are other high-level characteristic parameters contained in the speech, which can present the status of the speaker.

In recent year, some researchers bring high-level characteristic parameters, like rhythm, vocabulary, phoneme into speaker recognition. For one reason, these parameters are comparatively stable in variable acoustic environment and can't be affected by the signal channel and noise easily. For the second, these time-frequency characteristics can reflect the personality traits of the speaker.

Because of the complementarity of the characteristic parameters, researchers of Institute of Acoustics, Chinese Academy of Sciences proposed a novel speaker verification method based on long span prosodic features, which could be combined with the underlying parameters to upgrade the performance of the recognition system. 

They pre-process the speech by a voice activity detection module, and basic prosody features are extracted for each speech unit. Then, they carry out an approximation of the pitch, formant, time domain energy and harmonic energy contours by taking the leading terms in a Legendre polynomial expansion. HLDA is used to reduce the feature dimension and mean supervector in each individual Gaussian is used to represent the distribution of the time-frequency features. Experiments on NIST06 show that the proposed method can reduce the EER from 4.9% to 4.6% when fusing with the traditional MFCC-featured system.

Appendix: