Researchers Proposed a Deep Learning Based Binaural Speech Enhancement Approach with Spatial Cues Preservation

 |  | 

As speech is usually contaminated by background noise and interferences in real environments, speech enhancement techniques have been extensively studied in the past several decades. In recent years, the deep-learning-based speech enhancement method has outstanding performance in dealing with non-stationary noise, and can significantly improve the intelligibility of speech.

Studies of binaural hearing show that speech understanding in noise can greatly benefit from the spatial information target signals. However, most speech enhancement methods generate single-channel speech signals and cannot preserve the original spatial information.

Recently, researchers from the Institute of Acoustics of the Chinese Academy of Sciences (IACAS) put forward a research on the speech enhancement method and the spatial information preservation of binaural signals, and proposed a deep-learning-based binaural speech enhancement approach with spatial cues preservation, to preserve the binaural cues while reducing the interference noise in the speech enhancement method to help people better understand the enhanced speech.

The study has been published online in May 2019 at the 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2019).

In this approach, the binaural speech signals were first combined into a complex signal with its real part corresponding to the left channel and the imaginary part corresponding to the right channel. Therefore, the binaural speech enhancement problem was transformed into a monaural one in complex domain. A complex time-frequency mask was further presented and then estimated using the complex deep neural network. The estimated mask was finally applied to the complex input signal to enhance the target signal and then rebuild the binaural signal.

Experimental results showed that this complex deep neural network-based binaural speech enhancement raised the signal-to-noise ratio more than 10dB and preserve the binaural cues well.

This binaural speech enhancement method can be applied to devices, such as hearing aids, helping human better understand speech.

This work was partially supported by the National Natural Science Foundation of China (Nos. 11590770-4, 61650202, 11722437, U1536117, 61671442, 11674352, 11504406, 61601453), the National Key Research and Development Program (Nos. 2016YFB0801203, 2016YFC0800503, 2017YFB1002803) and the Key Science and Technology Project of the Xinjiang Uygur Autonomous Region (No. 2016A03007-1).

Block diagram of the proposed binaural speech enhancement system (Image by IACAS)

Reference:

SUN Xingwei, XIA Risheng, LI Junfeng, YAN Yonghong. A Deep Learning Based Binaural Speech Enhancement Approach with Spatial Cues Preservation. ICASSP 2019, pp. 5766-5770. DOI: 10.1109/ICASSP.2019.8683589

Contact:

ZHOU Wenjia

Institute of Acoustics, Chinese Academy of Sciences, 100190 Beijing, China

E-mail: media@mail.ioa.ac.cn

Appendix: