Use a Binaural rather than Monaural Masking Model to Cancel Stereo Acoustic Echo

 |  | 

 

For stereo acoustic echo cancellation, traditional methods often only use monaural masking models as a rule to decorrelate stereo signals. Whereas, it seems more reasonable to use binaural masking models for the following two reasons. First, stereo signals are heard by two ears rather than just one. Second, psychoacoustic researchers have already shown that there are obvious masking level differences between binaural masking models and monaural masking models.

By researching binaural masking level difference models, researchers from the Institute of Acoustics of the Chinese Academy of Sciences, the Shanghai Advanced Research Institute of the Chinese Academy of Sciences, and the Institute of Acoustics and Lighting Technology of Guangzhou University first introduce a simplified binaural masking model for stereo acoustic echo cancellation. The proposed method has many advantages and can effectively improve the non-unique problem and retain good speech quality.

Considering that the interaural time difference is dominant at low frequencies (1.5 kHz) and the interaural level difference is a major cue at higher frequencies, researchers propose to use different signal decorrelation schemes at these two frequency bands. In the low-frequency band, a pitch-driven sinusoidal injection scheme is proposed to maintain the interaural time difference, where the amount of injection is determined by the proposed binaural masking model. And in the high-frequency band, a modified sinusoidal phase modulation scheme is applied to make a good trade-off between preserving the interaural level difference and decorrelating the stereophonic input signals.

Evaluations have been performed in terms of decorrelation, filter misalignment, speech distortion and stereophonic perception. Results have shown that the proposed method can effectively improve the non-unique problem and retain good speech quality.

The proposed method has many advantages. The binaural masking model helps retain good speech quality as well as stereophonic perception, since it takes into account the contribution of spatial cues in binaural listening. It is the common case in stereo acoustic echo cancellation systems. Furthermore, higher decorrelation can be obtained in the low-frequency band when comparing with the wideband noise injection method, thanks to the concentration of injection energy by using the pitch-driven sinusoidal injection technique.

Besides, high decorrelation together with low filter misalignment can be achieved over the whole spectrum, due to the combination of the sinusoidal phase modulation in the high-frequency band.

Because the proposed algorithm needs to estimate the fundamental frequency of the far-end speech signal, it would somewhat increase the computational load. However, a very efficient approach can be used to estimate the fundamental frequency in practice to solve this problem. Another problem is that the proposed algorithm can only be applied to speech signals. In other words, if the far-end signal is not a speech signal, the proposed algorithm cannot be used directly and some extensions are necessary, which can be considered for future work.

For speech applications, the proposed algorithm is robust to the double-talk situations due to that the adaptive filter coefficients will stop updating automatically with the help of some double-talk detection schemes.

Reference:

YANG Hefei, WANG Jie, ZHENG Chengshi, LI Xiaodong. Stereophonic Channel Decorrelation Using a Binaural Masking Model. Applied Acoustics (Vol. 110, September 2016, pp. 128-136). DOI: 10.1016/j.apacoust.2016.03.016

Contact:

ZHENG Chengshi

Key Laboratory of Noise and Vibration Research, Institute of Acoustics, Chinese Academy of Sciences, 100190 Beijing, China

Email: cszheng@mail.ioa.ac.cn

Appendix: