Researchers Proposed a New Monaural Progressive Speech Enhancement Approach

 |  | 

 

In the complicated acoustic application scenarios, speech signals tend to be contaminated by environmental noise and room reverberation, leading to significant influences on robust automatic speech recognition (ASR) and speech communication. Despite deep-learning-based monaural speech enhancement methods could effectively suppress the distortion components, it is difficult for them to implement in current resource-limited micro-controllers due to the large number of trainable parameters and high computational complexity.

Recently, researchers from the Institute of Acoustics of the Chinese Academy of Sciences (IACAS) proposed a type of monaural progressive speech enhancement approach based on convolutional recurrent neural network (CRNN). It significantly decreased the trainable parameters and the computational complexity while sustaining the performance.

Based on CRNN, researchers decomposed the whole enhanced process into multiple stages. Then they modeled each of the stage with a lightweight module and improved the signal-to-noise ratio (SNR) of the training target compared with original noisy speech. As a consequence, the estimation outputs in the previous stages could be exploited as the prior information to progressively boost the results in the subsequent stages. In addition, the researchers reused long short-term memory (LSTM) module in different stages, which notably decreased the trainable parameters.

Experimental results showed that when the number of the stages was 3, the performance was similar to the complicated CRNN model and could be further improved with the increase of the number of stages.

This work could be utilized for noise suppression and speech information retrieval in the source-limited micro-controllers.

The research, published  in the Applied Acoustics, was supported by the National Natural Science Foundation of China (No. 61571435, 61801468, 11974086).

Algorithm System Flowchart (Image by IACAS)

Reference:

LI Andong, YUAN Mingming, ZHENG Chengshi and LI Xiaodong, 2020. Speech enhancement using progressive learning-based convolutional recurrent neural network. Applied Acoustics, 166, p.107347. DOI: 10.1016/j.apacoust.2020.107347

Contact:

ZHOU Wenjia

Institute of Acoustics, Chinese Academy of Sciences, 100190 Beijing, China

E-mail: media@mail.ioa.ac.cn

Appendix: