For many practical recognition applications, the rejection of out-of-vocabulary (OOV) words is an important issue. In this research, automatic speech recognition technology is used to recognize the user's speech from words in a word list. A user not familiar with the system may utter OOV words, which are not included in the system's lexicon or in the specific word list. If no measures are taken, the system will always output a recognition result given any input, which may cause incorrect reactions or incur a high cost. To avoid this problem, confidence measures are computed to make the rejection decision. In case the condition of the test date matches with the condition of the training data, the speech recognition system performs well, as do the posterior-based confidence measures. However, the training and the test data sometime differ, and the performances of recognition may decline owing to the mismatch. This is why robust confidence measures are need in this research.
To improve the robustness of the posterior-based confidence measures, SUN Yanqing, ZHOU Yu, ZHAO Qingwei, YAN Yonhong etc. carried out a series of studies to overcome the above problems.
In the study, they improve the robustness of the posterior-based confidence measures by utilizing entropy information, which is calculated for speech-unit-level posteriors using only the best recognition result, without requiring a larger computational load than conventional methods. Using different normalization methods, two posterior-based entropy confidence measures are proposed. They discuss the practical details for two typical levels of hidden Markov model (HMM)-based posterior confidence measures, and both levels are compared in terms of their performances. Experiment results show that the entropy information results in significant improvements in the posterior-based confidence measures. The absolute improvements of the out-of-vocabulary (OOV) rejection rate are more than 20% for both the phoneme-level confidence measures and the state-level confidence measures for our embedded test sets, without a significant decline of the in-vocabulary accuracy.