A syllable based Chinese synthesis system
Autoři
Více o knize
The automatic conversion from written text into spoken language (text-to-speech or TTS synthesis) shows growing importance in human-computer interaction. At the Dresden University of Technology, a multilingual TTS system called DreSS (Dresden Speech Synthesis) was developed. This thesis describes the development of the Mandarin Chinese component of DreSS. For this purpose, syllables are excised from spoken utterances and are connected to form the desired speech signal. This involves two major research topics: the definition of the inventory structure and the control of the prosodic parameters. In order to produce a high naturalness in the synthesis, two new approaches have been applied. The first is the construction of a coarticulation-oriented inventory to model the coarticulatory effects that cross syllable boundaries. The second is the superposition of a rule-based sentence intonation with the inherent tone structures to maintain the natural tone contours of Chinese syllables. In this way, the concatenation points in the synthesis process have been minimized and the necessary modification in tonal contours has been reduced, which works hand in hand with the concatenation technique to produce a Mandarin Chinese synthesis with high naturalness. Additionally, the two major components in the prosody control of the Mandarin speech synthesis - duration modeling and tonal concatenation - have also been studied in this paper. Natural speech database was employed for the investigation. The statistics, which resulted from the careful analysis, combined with appropriate models for duration and tonal coarticulation lead to further improved performance of the synthesis system.