Multiple speech source localization is widely used in numerous applications such as speech enhancement, speech separation, and speech recognition. To localize multiple speech sources, a large number of methods have already been proposed. However, spatial aliasing is a challenging issue for most speech source localization methods.
In order to resist spatial aliasing, recently, researchers HUANG Zhaoqiong et al. from the Institute of Acoustics of the Chinese Academy of Sciences find a new robust method to localize multiple speech sources. They use phase difference regression which is specially designed for the periodic variable, so the spatial aliasing is avoided just by limiting the phase difference error into one period.
This new method significantly simplifies regression, especially on large size planar arrays. Since the ambiguity in the period number of phase difference is resolved by the histogram and the regression, this method can be applied on any size planar array.
The histogram analysis is used to determine the number of sources and estimate the initial direction-of-arrivals. The time delay histogram is constructed by using all the time delay candidates of each microphone pair at all times and all frequencies. The time delays of each microphone pair are obtained by picking up the peaks of the corresponding histogram. These delays are combined to estimate the initial azimuths and elevations of speech sources, where an azimuth histogram is constructed to determine the source number.
Afterwards, the initial direction-of-arrivals are calculated to form the azimuths and elevations, which are chosen as the supervised information for bins classification. Researchers calculate a distance from each time-frequency bin to each direction-of-arrival, and each time-frequency bin is classified to the class that is corresponding to the smallest distance. Accordingly, phase difference classes of each speech sources are obtained.
Eventually, the direction-of-arrival of each source is estimated by means of regression over its associated phase difference in each class. Because the regression is conducted on the periodical variable, the spatial aliasing is avoided for the new method.
The method has been evaluated in both the simulated and real environments. The reverberation time and array radius are changed to access the robustness of all the competing methods. The experimental results show the superiority of the method in spatial anti-aliasing and under reverberation condition. In addition, the real data taken from the publicly available AV16.3 corpus is used to confirm the superiority of the new method. The experimental results confirm that the new method outperforms the competing methods.
In this research, phase difference regression is utilized to localize multiple speech sources, and the spatial aliasing occurs when the array radius is large. Since the error of phase difference is limited into one period, the period number of phase is no longer considered in the regression.
Reference:
HUANG Zhaoqiong, ZHAN Ge, YING Dongwen, ZHOU Ruohua, PAN Jielin and YAN Yonghong. Robust Multiple Speech Source Localization Based on Phase Difference Regression. ISCSLP 2016-The 10th International Symposium on Chinese Spoken Language Processing.
Contact:
HUANG Zhaoqiong
Key Laboratory of Speech Acoustics and Content Understanding, Chinese Academy of Sciences
Email: huangzhaoqiong@hccl.ioa.ac.cn