Numerous single-microphone noise reduction algorithms are widely used to mitigate the effect of noise on speech processing systems. And a noise estimator is an indispensable component that affects the performance of most algorithms in adverse environments. Besides, noise estimation is a challenging issue given that noise signals are acoustically coupled with non-stationary speech signals.
Generally, the noise estimation approach is based on the speech presence/absence discrimination. Hence, the temporal correlation of speech presence/absence is a valuable clue that is employed at varying degrees in noise estimators. The most popular technique for exploiting this temporal correlation is the time-recursive averaging method. However, this method, not unified into a theoretical framework that enables optimal noise estimation lacks an intrinsic mechanism to model temporal correlation. Moreover, the noise is estimated by heuristics. For these reasons, the performance of noise estimation can still be improved.
Therefore, a hidden Markov model (HMM) is preferred for modeling intrinsic temporal correlation. Since noise is usually assumed to be piecewise stationary, the logarithmic power of speech-absent signals in a frequency band can be modeled as a non-speech state of HMM. Accordingly, the logarithmic power of speech-present signals can be modeled as a speech state. So this is a binary-state HMM that can present the log-power sequence as a dynamic process of the transition between speech and non-speech states.
Modeling a log-power sequence by HMM has presented two problems. The first is the computational complexity that arises from updating the Markov chain in the causal sliding window. The window-based adaptation process is approximated as a first-order sequential process based on maximum likelihood and the assumption of noise piecewise stationarity. And this approximation results in an obvious reduction in computational complexity. As well, the sequential process takes the noise estimate as a function of recent historical observations and gradually disregards the historical observations over time. Meanwhile, the second problem is that speech signals are often absent for long periods, adding to the difficulty encountered in modeling speech states. In this research, a number of constraints are introduced into HMM to solve this problem. Eventually, a constrained sequential hidden Markov model (CSHMM) is formed by incorporating the constraints into the sequential HMM. And then, the HMM parameter set is sequentially estimated from one frame to another on the basis of maximum likelihood.
The proposed method is compared with well-established algorithms through various experiments. Moreover, the method delivers more accurate results and does not rely on the assumption of the ``non-speech signal onset'' as do most algorithms.
The work entitled “Noise Estimation Using a Constrained Sequential Hidden Markov Model in the Log-Spectral Domain” has been published online: http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6451162&tag=1 and on Audio, Speech, and Language Processing, IEEE Transactions (Vol.21, Issue 6, June 2013, Pages 1145 - 1157).