Study on automatic prediction of sentential stress for Chinese Putonghua Text-to-Speech system with natural style (2007 No.1)
Update time: 2007/06/28
SHAO Yanqiu  HAN Jiqing  ZHAO Yongzhen  LIU Ting

(School of Computer Science and Technology, Harbin Institute of Technology  Harbin  150001)

Received Feb. 21, 2006

Revised Apr. 5, 2006

Abstract  Stress is an important parameter for prosody processing in speech synthesis. In this paper, we compare the acoustic features of neutral tone syllables and strong stress syllables with moderate stress syllables, including pitch, syllable duration, intensity and pause length after syllable. The relation between duration and pitch, as well as the Third Tone (T3) and pitch are also studied. Three stress prediction models based on ANN, i.e. the acoustic model, the linguistic model and the mixed model, are presented for predicting Chinese sentential stress. The results show that the mixed model performs better than the other two methods. In order to solve the problem of the diversity of manual labeling, an evaluation index of support ratio is proposed.

PACS number: 43.70


