Multiple sound source localization is an indispensable component in many systems, such as speech separation, speech enhancement, and automatic camera monitoring. As well, Direction-of-arrival (DOA) estimation using data collected from microphone array is a primary technique of sound source localization. Recently, there is an increasing need to locate broadband sound sources.
In real applications, heavy computational load and acoustic interferences are two major problems to speech source localization. However, in the past two decades, conventional methods can mitigate one of the two problems, but deteriorate the other one. They just made a trade-off between computational robustness and efficiency.
To deal with the contradictory problem, YING Dongwen and YAN Yonghong from the Institute of Acoustics, Chinese Academy of Sciences presents a robust and fast DOA estimator based on a concave cost function, from which the optimal estimate of DOA is given by a closed-form solution. And the global search, the major reason for heavy computational load in conventional algorithms is avoided. Meanwhile, special attention is given to robustness.
Actual experiments were conducted in the adverse and friendly environments respectively. A planar array consisting of commercial omni-directional silicon microphones was set up with the circular radius of 0.08 m. The performance of the array was assessed in terms of error rate, i.e. the percentage of the incorrectly estimated DOA frames, whose azimuth error was greater than a given threshold, to all frames containing speech signal. The relationship between the error rate and the error threshold was illustrated by a curve.
Besides, the Steered Response Power PHAse Transform (SRP-PHAT) is a typical algorithm that represents the beamformer power as a sum of pairwise cross-correlations, which is a benchmark algorithm for sound source localization. In the proposed adverse environment, Fig. 1(a) showed that the proposed algorithm outperformed the SRP-PHAT. And then the experiment was conducted in a friendly environment. The experiment result in Fig. 1(b) illustrated that the proposed algorithm still outperformed the SRP-PHAT.
Fig.1 Error rate vs. error threshold under real environments: (a) adverse environment; (b) friendly environment (Image by YING).
The experiments prove that the proposed algorithm runs about ten times faster than the SRP-PHAT, and also outperforms the SRP-PHAT in terms of robustness.
The research entitled “Robust and Fast Localization of Single Speech Source Using a Planar Array” has been published online: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6525322 and on IEEE Transaction on Signal Processing Letters (VOL. 20, NO. 9, SEPTEMBER 2013).
Contact:
YING Dongwen
Institute of Acoustics, Chinese Academy of Sciences
Email: yingdongwen@hccl.ioa.ac.cn