Researchers from the Institute of Acoustics (IOA) of the Chinese Academy of Sciences proposed a perceptually motivated linear prediction residual estimator for single channel speech enhancement in noisy and reverberant environments, which can greatly improve the processed speech quality.
The paper entitled “A Perceptually Motivated LP Residual Estimator in Noisy and Reverberant Environments” was published in Speech Communication.
Although microphone arrays are rising in recent years, single channel speech acquirement with only one microphone is still the most widely used in low cost equipment for its robustness and moderate performance.
Traditional ways for single channel speech enhancement focus on estimating the noise power spectral density and the late reverberation spectral variance signals in the frequency domain and using the famous spectral subtraction (SS) algorithm to enhance the recorded speech signal.
“Using the generalized singular value decomposition (GSVD) to handle the noise and the late reverberation is feasible”, pointed out by LI Xiaodong, the team leader of the IOA research group, who also has proved that the noise and the late reverberation could be handled in the linear prediction residual domain with a unified processing framework.
However, in low signal to noise ratio (SNR) area, both the SS-type algorithms and the GSVD-type algorithms remain audible, unnatural ‘musical noise’, which is composed of tones at randomly distributed frequencies. “The property of human ear may be helpful to further suppress the residual noise”, said by ZHENG Chengshi, according to his experience.
The research team used the auditory masking threshold (AMT) curve to guide the residual noise suppression process. Because the AMT is only well defined in the frequency domain and cannot be applied in GSVD directly, researchers formulated the relationship between the AMT and the generalized singular value in the linear prediction residual domain, and derived the optimal LP filter with perceptual constrains, which was applied to enhance the recorded speech signal.
The proposed algorithm has shown good performance in both simulated and realistic environments in terms of objective measures, such as segSNR, PESQ, and SRMR. Furthermore, MUSHRA listening tests showed the better speech quality, when comparing with conventional competing algorithms.
Photograph of the perceptual constrained generalized singular value. (Image by IOA)
Previously, the IOA researchers had proposed the linear prediction residual domain GSVD-based speech enhancement algorithm. The results of this earlier research were published in the IEEE Signal Processing Letters (Volume 21, Issue 12, December 2014).
Reference:
PENG Renhua, TAN Zhenghua, LI Xiaodong, ZHENG Chengshi, A Perceptually Motivated LP Residual Estimator in Noisy and Reverberant Environments. Speech Communication(Volume 96, February 2018, Pages 129–141). DOI: 10.1016/j.specom.2017.12.004.
Contact:
WANG Rongquan
Institute of Acoustics, Chinese Academy of Sciences, 100190 Beijing, China
E-mail: wangrongquan@mail.ioa.ac.cn