In recent years, bilingual communication becomes a common phenomenon as a result of globalization. It presents a new challenge to the real world applications of speech recognition technology. The main difficulties to handle the bilingual speech recognition for real world application are focused on two aspects: the first is to balance the performance on inter- and intra- sentential language switching and to reduce the complexity of the bilingual speech recognition system; the second is to effectively deal with the matrix language accents in embedded language.
So in order to process the intra-sentential language switching and reduce the amount of data required to robustly estimate statistical models, ZHANG Qingqing, PAN Jielin and YAN Yonghong of ThinkIT Lab, Chinese Academy of Sciences conducted a series of studies and developed a compact single set of bilingual acoustic model derived by phone set merging and clustering, instead of using two separate monolingual models for each language.